Martin Mares [Fri, 12 Jul 2002 02:19:23 +0000 (02:19 +0000)]
WORD_TYPES_HIDDEN shouldn't be considered META by default.
WT_LINK shouldn't be considered accent-less. This might cause sherlockd
to fail to find matches in link texts from non-accented documents to
accented ones, but I think that it's more acceptable than producing
false matches. Unfortunately, we how no ways to describe accentedness
of a part of document text.
Martin Mares [Sat, 6 Jul 2002 03:29:41 +0000 (03:29 +0000)]
Increase line buffer sizes to 4096 bytes. Current gatherd really can
produce such long lines under several circumstances, need to examine
how is that possible.
Martin Mares [Fri, 5 Jul 2002 03:23:13 +0000 (03:23 +0000)]
When an inconsistency is encountered while shaking down the bucket
file, recover all data prior to the inconsistency by marking the
space between read and write pointer as deleted buckets (need to
use more of them if the space is too large).
Martin Mares [Sun, 23 Jun 2002 20:32:19 +0000 (20:32 +0000)]
Implemented merging of catalog attributes to the index. Just place the
catalog dump to db/catalog.gz (e.g., by running utils/fetch-cat.sh)
and run the indexer.
Unfortunately, we've just filled up all the available word types :-(
Martin Mares [Sat, 8 Jun 2002 13:17:39 +0000 (13:17 +0000)]
The universal hash table generator now uses prime table sizes instead of
powers of two. This slows down all operations a little as we now need
to perform division instead of just AND-ing with a mask, but it allows
us to use the new hash functions in hashfunc.h which are significantly
faster than the original ones (at the expense of having bad distribution
modulo non-primes).
Also changed the limit logic to avoid rehashing when the table is already
too small or too large.
Robert Spalek [Mon, 3 Jun 2002 16:02:00 +0000 (16:02 +0000)]
- str_hash.[ch] renamed to hashfunc.[ch], the functions renamed
- deleted hash-{block,istring,string}.c, their functionality merged into
hashfunc.[ch]
- str-test.c rewritten to use the new name-style, char->byte, more tests
added
Martin Mares [Sun, 21 Apr 2002 08:30:06 +0000 (08:30 +0000)]
Finally I realized why we were using secondary sorting on site_id
by default :-) I was originally searching for some magic inside the
search server which needed that to work and completely missed the
simple fact that the front-end wants the results this way :-)
So I'm resurrecting it, but now as an ordinary instance of the secondary
sorting code I've introduced yesterday. The CUSTOM_SORTING switch is gone,
sorting by site ID and page age works always.
Also, I've simplified reverse sorting by introducing a separate XOR mask.
Martin Mares [Sat, 20 Apr 2002 15:09:41 +0000 (15:09 +0000)]
Added secondary sorting (i.e., breaking ties when two documents have the same Q)
on any of the custom attributes. Just define CUSTOM_SORTING in lib/custom.h.
I've also removed secondary sorting of result heap by site ID inside refs.c
-- according to my best knowledge it wasn't required anywhere.
Maybe we can remove the CUSTOM_SORTING switch and just leave the sec_sort_key
in struct result_note initialized to zero, but it would cost us 4 bytes per
result_note which I wanted to avoid.
Martin Mares [Sat, 6 Apr 2002 18:44:18 +0000 (18:44 +0000)]
All configuration options (except for custom attributes which still dwell
in lib/custom.h) are now stored in config.mk to make them available to both
makefiles (conditional linking etc.) and C programs (lib/autoconf.h is
generated from config.mk by a simple shell script).
This gives an easy way how to create special-purpose modules (like the
SQL gatherer) which need extra libraries -- just make them a compile-time
option ;)
Martin Mares [Sat, 15 Dec 2001 22:55:42 +0000 (22:55 +0000)]
db-rebuild replaced by db-tool which allows not only database
reconstruction, but also dumping and undumping (useful for
conversion from SDBMv1 to v2).
Martin Mares [Fri, 2 Nov 2001 21:34:08 +0000 (21:34 +0000)]
HEAP_DELETE: Copy the `pos' parameter to a temporary variable as soon
soon as possible to avoid problems with callers supplying us with expressions
which could change during heap operations.
Martin Mares [Fri, 5 Oct 2001 16:33:24 +0000 (16:33 +0000)]
Moved all customizable parts of configuration and index format
(i.e., those depending on user attributes or word types, not on
our compilation environment) to a new file.
Custom configurations (indexing of objects generated from a database
and similar cases) should require only modifications of cf/sherlock
and lib/custom.h since now.