mj.ucw.cz Git - libucw.git/log

]> mj.ucw.cz Git - libucw.git/log

projects / libucw.git / log

commit | commitdiff | tree

Martin Mares [Wed, 22 Jan 2003 18:30:13 +0000 (18:30 +0000)]

Objects always live in somebody else's pool.

obj_free and odes->local_pool are gone.

commit | commitdiff | tree

Martin Mares [Wed, 22 Jan 2003 18:17:38 +0000 (18:17 +0000)]

The changes were worth updating copyright :)

commit | commitdiff | tree

Martin Mares [Wed, 22 Jan 2003 18:07:24 +0000 (18:07 +0000)]

Replaced various attempts to speed up use of obj_add_attr() by simple
internal caching: odes->cached_attr points to the last attribute added
and it's guaranteed to be the last in its chain.

Removed oattr->last_same, the gain isn't worth the extra complexity
involved.

commit | commitdiff | tree

Martin Mares [Wed, 22 Jan 2003 15:51:03 +0000 (15:51 +0000)]

The $(LIBxxx) mechanism proved useful, so I'm switching to it for all other
libraries to simplify the Makefiles a bit. Unfortunately, this introduces
ugly ordering constraints on includes in top-level Makefile, but they can
be lived with.

commit | commitdiff | tree

Martin Mares [Wed, 22 Jan 2003 11:34:43 +0000 (11:34 +0000)]

get_func comment was outdated.

commit | commitdiff | tree

Martin Mares [Wed, 22 Jan 2003 11:23:19 +0000 (11:23 +0000)]

More configuration enhancements:

o  gatherer, indexer and search server can be left out, which can be useful
   when using Sherlock for indexing databases, because unusual custom.h
   with standard word types missing makes many gatherer modules uncompilable.
o  searching by document age is optional, you can switch it off to save
   index space.
o  indexing of file types is now partially supported by the default configuration,
   because I'm going to use the bottom 5 bits of the file_type (which were
   used only for images) for storing language code of text documents and
   it certainly isn't a centrum-specific thing. On the other hand, I'd like
   to keep the exact meaning of file type codes application specific, so the
   actual matching of file types is left in the customization header. Again,
   you can switch this off to save index space.

commit | commitdiff | tree

Martin Mares [Wed, 22 Jan 2003 10:31:13 +0000 (10:31 +0000)]

Moved indexer/oook.c (the catalogue & keyword processor) to where it
belongs: among other centrum-specific modules.

Cleaned up custom rules in makefiles: instead of defining lots of variables
for custom modules, allowed config.mk to specify a custom submakefile.

commit | commitdiff | tree

Martin Mares [Tue, 14 Jan 2003 20:26:44 +0000 (20:26 +0000)]

Oops, a bug in the profiler, causing time travels :)

commit | commitdiff | tree

Martin Mares [Mon, 13 Jan 2003 21:28:04 +0000 (21:28 +0000)]

Added functions for manipulating bit arrays. One day, an optimized
version for i386 using bts instruction et al. will appear.

commit | commitdiff | tree

Robert Spalek [Mon, 13 Jan 2003 10:14:15 +0000 (10:14 +0000)]

MJ has forgotten to add ASORT_EXTRA_ARGS at the end of the sort procedure
declaration

commit | commitdiff | tree

Martin Mares [Sun, 12 Jan 2003 17:36:53 +0000 (17:36 +0000)]

Added function for measuring bucket file size (as an oid) and used it
for better progress indicator in the scanner.

commit | commitdiff | tree

Martin Mares [Sun, 12 Jan 2003 14:11:04 +0000 (14:11 +0000)]

Improved array sorter according to Robert's suggestions.

commit | commitdiff | tree

Martin Mares [Sun, 5 Jan 2003 11:32:02 +0000 (11:32 +0000)]

When killing dots at the end of host name, remove _all_ of them, not just
the last one. Without this, url_canonicalize on already believed to be
canonic names wasn't constant which causes havoc in gatherd.

commit | commitdiff | tree

Martin Mares [Sat, 4 Jan 2003 15:25:06 +0000 (15:25 +0000)]

Added generic array sorter.

Benchmark results on my K6/400MHz:
mj@albireo:~/src/sherlock/run$ bin/asort-test
qsort: 19209 ms
asort: 7544 ms

commit | commitdiff | tree

Martin Mares [Sat, 4 Jan 2003 13:56:46 +0000 (13:56 +0000)]

Line buffers are back on their original sizes, closes Bug #251.

commit | commitdiff | tree

Martin Mares [Mon, 18 Nov 2002 17:56:16 +0000 (17:56 +0000)]

In some cases, nextprime(x) could have been equal to x (reported by Milan).

commit | commitdiff | tree

Martin Mares [Mon, 11 Nov 2002 16:54:22 +0000 (16:54 +0000)]

Fixed a off-by-one error in vlog().

commit | commitdiff | tree

Robert Spalek [Tue, 5 Nov 2002 11:54:40 +0000 (11:54 +0000)]

WT_LINK added into WORD_TYPES_META

commit | commitdiff | tree

Martin Mares [Sun, 27 Oct 2002 20:06:47 +0000 (20:06 +0000)]

Worked around problems with "www.xyz.cz" and "xyz.cz" being considered identical
in the gatherer and different in the indexer by adding a hack to calculation
of fingerprints (we cannot afford calling filters for each fingerprint,
one of reasons being speed, another filters being unavailable in the search
server). Closes bug #302.

commit | commitdiff | tree

Martin Mares [Sun, 27 Oct 2002 13:22:34 +0000 (13:22 +0000)]

Removed xprintf() -- it was very ugly and its only raison d'etre was
the lack of bprintf().

commit | commitdiff | tree

Martin Mares [Sun, 27 Oct 2002 13:16:03 +0000 (13:16 +0000)]

Added printf on fastbuf streams. The current implementation is not too
optimized for anything else than simplicity.

commit | commitdiff | tree

Martin Mares [Sun, 27 Oct 2002 13:05:14 +0000 (13:05 +0000)]

Several bug fixes in the logger:

o  No more hard limits on log name length.
o  If an error occurs during log switching, don't try it again.
o  Writes to log files are really atomic, we no more rely on stdio
   buffer being large enough (which it isn't).
o  Log entries are scanned for control characters which are then mapped to 0x7f.

commit | commitdiff | tree

Martin Mares [Sat, 19 Oct 2002 10:31:16 +0000 (10:31 +0000)]

Documented limit on number of word types.

commit | commitdiff | tree

Martin Mares [Fri, 11 Oct 2002 11:32:36 +0000 (11:32 +0000)]

Added support for shared libraries (CONFIG_SHARED switch in config.mk).

The makefiles are now able to build both static and shared libraries,
objects for shared libraries get suffix '.oo'. Use $(LS) in all references
to libraries, it expands to `.so' if CONFIG_SHARED, to `.a' otherwise.

Turned on by default.

commit | commitdiff | tree

Martin Mares [Fri, 11 Oct 2002 11:29:02 +0000 (11:29 +0000)]

When installing, dereference all symlinks.

commit | commitdiff | tree

Martin Mares [Thu, 10 Oct 2002 20:47:01 +0000 (20:47 +0000)]

Fixed bug in parse_tree() causing the parser to loop forever on some inputs.
It's interesting excercise to find out why not on all inputs :-)

commit | commitdiff | tree

Martin Mares [Thu, 10 Oct 2002 20:36:45 +0000 (20:36 +0000)]

Audited usage of MAX_WORD_LEN, fixed several bugs and documented
what does this constant really mean :)

commit | commitdiff | tree

Martin Mares [Thu, 10 Oct 2002 19:57:12 +0000 (19:57 +0000)]

"Call me 2.3, please."

commit | commitdiff | tree

Martin Mares [Thu, 10 Oct 2002 19:52:40 +0000 (19:52 +0000)]

Install libraries as well. Closes bug #318.

commit | commitdiff | tree

Martin Mares [Sun, 6 Oct 2002 15:50:02 +0000 (15:50 +0000)]

Introduced obuck_slurp_pool() to make reading of the whole bucket
pool faster, possibly using mmap.

Use it in `buckettool -c'.

commit | commitdiff | tree

Martin Mares [Sun, 6 Oct 2002 15:45:45 +0000 (15:45 +0000)]

bdirect_read_prepare() now returns 0 instead of EOF.

commit | commitdiff | tree

Martin Mares [Sun, 6 Oct 2002 15:45:07 +0000 (15:45 +0000)]

When bopen() is called with buffer size 0, it switches to bopen_mm().
The plan is to make use of mmapping configurable by the buffer sizes
in cf/sherlock.

commit | commitdiff | tree

Martin Mares [Sun, 6 Oct 2002 15:43:40 +0000 (15:43 +0000)]

Finished the fb-mmap module. Basic parameters are now configurable in cf/sherlock.

commit | commitdiff | tree

Martin Mares [Thu, 3 Oct 2002 21:56:54 +0000 (21:56 +0000)]

Squash warning with 32-bit sh_off_t.

commit | commitdiff | tree

Martin Mares [Thu, 3 Oct 2002 21:03:15 +0000 (21:03 +0000)]

Older versions of glibc don't have madvise.

commit | commitdiff | tree

Martin Mares [Thu, 3 Oct 2002 21:02:58 +0000 (21:02 +0000)]

Added a temporary hack for testing of fb-mmap. Please don't turn it on
unless you're willing to help me with debugging fb-mmap :-)

commit | commitdiff | tree

Martin Mares [Mon, 30 Sep 2002 15:10:43 +0000 (15:10 +0000)]

The is_temp_file variable was originally a good idea, but it made
different fb back-ends giving the same interface almost impossible.

I've replaced it with bconfig() which is a universal interface for
altering various fb settings. It's somewhat ioctl()-ish, but I hope
it won't hurt.

commit | commitdiff | tree

Martin Mares [Fri, 27 Sep 2002 21:46:28 +0000 (21:46 +0000)]

The fb-mmap module, now read/write. Still needs a lot of benchmarking before we decide
to switch the indexer to use it.

commit | commitdiff | tree

Martin Mares [Fri, 27 Sep 2002 21:45:40 +0000 (21:45 +0000)]

When writing, the data needn't start at the beginning of the buffer.
(We need this for fb-mmap since the buffer is always page aligned.)

commit | commitdiff | tree

Martin Mares [Thu, 26 Sep 2002 18:27:19 +0000 (18:27 +0000)]

Recognize not only user names, but also passwords.

commit | commitdiff | tree

Robert Spalek [Thu, 26 Sep 2002 13:08:48 +0000 (13:08 +0000)]

added base64 module, it is imported from somewhere :))

MJ, if you dislike the code too much, please let me know

commit | commitdiff | tree

Martin Mares [Tue, 24 Sep 2002 21:38:21 +0000 (21:38 +0000)]

After a lot of benchmarking replaced the old super-smart bbcopy()
by a much simpler solution based on the bdirect interface and inlined
the fast path. Surprisingly, the new version is faster under real load
(the explanation is very simple: we use very large buffers for the
indexer and hence the bbcopy optimizations triggered rarely) and it also
works on all fastbuf streams, not only file-based ones.

Also, made bdirect_* inline.

commit | commitdiff | tree

Martin Mares [Mon, 23 Sep 2002 21:20:02 +0000 (21:20 +0000)]

Introduced bfdopen_shared() which behaves like bfdopen(), but on
bclose() the fd is left open. Especially useful for buffering stdin/out.

commit | commitdiff | tree

Martin Mares [Mon, 23 Sep 2002 18:03:42 +0000 (18:03 +0000)]

Damned automatic typecasts!

commit | commitdiff | tree

Martin Mares [Mon, 23 Sep 2002 12:41:32 +0000 (12:41 +0000)]

Better avoid the brain-dead encoding " " as "+" when generating URL's.

commit | commitdiff | tree

Martin Mares [Mon, 23 Sep 2002 12:37:32 +0000 (12:37 +0000)]

Fix escaping of "+" characters in outgoing parameters. (BTW: when Galeon
displays link URL's in the status line, it unqotes them improperly!)

Use ":" as a separator instead of "&" when constructing self-ref URL's.

commit | commitdiff | tree

Martin Mares [Mon, 23 Sep 2002 12:14:45 +0000 (12:14 +0000)]

Oops, the card array was reversed!

commit | commitdiff | tree

Martin Mares [Mon, 23 Sep 2002 12:12:31 +0000 (12:12 +0000)]

Forgot to commit with the rest of fastbuf changes.

commit | commitdiff | tree

Martin Mares [Mon, 23 Sep 2002 12:11:35 +0000 (12:11 +0000)]

Adapted to new fastbufs.

commit | commitdiff | tree

Martin Mares [Mon, 23 Sep 2002 12:10:25 +0000 (12:10 +0000)]

Adapted the bucket code to new fastbufs. Stream positions are now
always relative to bucket start (originally it was relative to file
start which was completely unusable, and hence unused :-) ).

commit | commitdiff | tree

Martin Mares [Mon, 23 Sep 2002 12:07:15 +0000 (12:07 +0000)]

Major cleanup of fastbufs:

  o  Split generic fastbuf from low-level routines. `struct fastbuf'
     no longer contains low-level data like `fd' or `is_temp_file'.
  o  Introduced safe type casting macros to avoid programming errors.
  o  `struct fastbuf' is no longer freed by the high-level code.
  o  Documented behaviour of bflush() between reads and writes.
  o  Redefined semantics of fastbuf->pos: it now corresponds to `bstop'
     instead of `buffer', hence it always coincides with real file
     position, making `fdpos' unnecessary.

commit | commitdiff | tree

Martin Mares [Thu, 19 Sep 2002 18:26:37 +0000 (18:26 +0000)]

Robert> uff, na prvni pohled nic nechapu, asi je to moc chytre :)

So I decided to learn how to use POD and write a POD documentation
for the module (in usual Perl fashion, it's a part of the module).
Use perldoc or pod2${format} to view or convert it.

commit | commitdiff | tree

Martin Mares [Wed, 18 Sep 2002 10:08:32 +0000 (10:08 +0000)]

Debug output now calls a given subroutine instead of print.

commit | commitdiff | tree

Martin Mares [Tue, 17 Sep 2002 22:51:55 +0000 (22:51 +0000)]

Added functions for automatic processing of script arguments.

commit | commitdiff | tree

Martin Mares [Fri, 6 Sep 2002 17:00:13 +0000 (17:00 +0000)]

Added first two functions of the Poor Man's CGI module.

commit | commitdiff | tree

Martin Mares [Mon, 2 Sep 2002 20:34:52 +0000 (20:34 +0000)]

More improvements of the Query module.

commit | commitdiff | tree

Martin Mares [Mon, 2 Sep 2002 19:38:09 +0000 (19:38 +0000)]

Added a simple Perl module for connecting to search server and parsing
its results to Perl data structures, converting nested structures and
multiple-valued attributes to arrays.

Also includes the print_tree function which has been originally written
as simple debugging dumper for the parsed query results, but in fact
it's able to dump any complex Perl data structure as long as it's
acyclic.

More to come, including an example (a very simple front-end for the
free version and maybe some more debugging tools).

commit | commitdiff | tree

Martin Mares [Mon, 26 Aug 2002 15:54:20 +0000 (15:54 +0000)]

Export functions for explicit locking.

commit | commitdiff | tree

Martin Mares [Mon, 26 Aug 2002 14:41:54 +0000 (14:41 +0000)]

The shell config helper now knows how to parse multiple-valued entries
and stores them as shell arrays.

Even normal entries are now output as they are seen for the first time,
leaving all the overriding on the shell.

commit | commitdiff | tree

Martin Mares [Mon, 26 Aug 2002 13:47:03 +0000 (13:47 +0000)]

Added a quick-check mode which does bucket file checks done during normal open
(i.e., the trailer check).

commit | commitdiff | tree

Martin Mares [Mon, 26 Aug 2002 13:32:10 +0000 (13:32 +0000)]

As usually, stuff in lib/* is LGPL'ed.

commit | commitdiff | tree

Martin Mares [Mon, 26 Aug 2002 13:06:19 +0000 (13:06 +0000)]

Allow setting of default configuration file.

commit | commitdiff | tree

Martin Mares [Fri, 23 Aug 2002 08:30:45 +0000 (08:30 +0000)]

Moved shell script support commands to lib/shell.

commit | commitdiff | tree

Martin Mares [Fri, 23 Aug 2002 08:29:59 +0000 (08:29 +0000)]

Added a Perl module for parsing options and configuration.

commit | commitdiff | tree

Martin Mares [Fri, 23 Aug 2002 08:26:26 +0000 (08:26 +0000)]

Allow SEEK_END. (I plan to rewrite seek in fbmem streams, the current
version is awfully slow.)

commit | commitdiff | tree

Martin Mares [Wed, 21 Aug 2002 09:22:20 +0000 (09:22 +0000)]

Deleted the .SECONDARY hack -- (1) it only patched the consequences, not
the real cause, (2) it broke building of scripts.

Work-around: after make distclean, run make runtree first. Make seems
to ignore rules for files in not-yet-existing directories.

commit | commitdiff | tree

Martin Mares [Tue, 20 Aug 2002 21:55:59 +0000 (21:55 +0000)]

Split bdirect_read to prepare and commit part (similarly to how
bdirect_write already works). This allows partial processing of
read data.

commit | commitdiff | tree

Martin Mares [Tue, 20 Aug 2002 21:55:15 +0000 (21:55 +0000)]

Tabs are legal in attribute values.

commit | commitdiff | tree

Martin Mares [Tue, 20 Aug 2002 20:33:01 +0000 (20:33 +0000)]

Forgot to decrease URL_PROTO_MAX when deleting sql protocol.

commit | commitdiff | tree

Martin Mares [Tue, 20 Aug 2002 19:04:54 +0000 (19:04 +0000)]

If not debugging, compile at least ASSERT(0) as call to an unreachable
function. (Avoid unassigned variable warnings.)

commit | commitdiff | tree

Martin Mares [Tue, 20 Aug 2002 19:03:45 +0000 (19:03 +0000)]

Oops, forgot WT_FILE there.

commit | commitdiff | tree

Martin Mares [Tue, 20 Aug 2002 19:03:32 +0000 (19:03 +0000)]

Need a cast if sh_off_t is short.

commit | commitdiff | tree

Martin Mares [Tue, 20 Aug 2002 18:36:37 +0000 (18:36 +0000)]

Forgot WT_LINK.

commit | commitdiff | tree

Martin Mares [Tue, 20 Aug 2002 18:30:54 +0000 (18:30 +0000)]

Added license notices to all library files which are not specific
to Sherlock (and are often shared with other projects) -- they will
be distributed according to the LGPL.

commit | commitdiff | tree

Martin Mares [Tue, 20 Aug 2002 18:14:50 +0000 (18:14 +0000)]

Finally found the cause of make remaking unnecessary files after complete
building from scratch: when it compiles anything by combining several
pattern rules (which exactly what we use), it deletes some of the intermediate
files. The fix is to specify all of these files as ".SECONDARY", but beware,
these special targets don't understand patterns, so we have to list the
intermediates explicitly. Uff.

commit | commitdiff | tree

Martin Mares [Mon, 19 Aug 2002 22:06:59 +0000 (22:06 +0000)]

Deleted too much.

commit | commitdiff | tree

Martin Mares [Mon, 19 Aug 2002 21:16:13 +0000 (21:16 +0000)]

Site compression turned to a configurable feature.

commit | commitdiff | tree

Martin Mares [Sun, 18 Aug 2002 09:48:38 +0000 (09:48 +0000)]

Moved customizations for Centrum to centrum/custom.

lib/custom.h now contains only generic definitions which will appear
in the freely distributable version.

commit | commitdiff | tree

Martin Mares [Sun, 18 Aug 2002 09:33:38 +0000 (09:33 +0000)]

After discussing the future of the SQL gatherer module with Milan,
we've decided to remove it (but it still remains in the CVS history,
of course).

commit | commitdiff | tree

Martin Mares [Sat, 17 Aug 2002 12:01:53 +0000 (12:01 +0000)]

Cut-and-paste comments :-)

commit | commitdiff | tree

Martin Mares [Wed, 17 Jul 2002 19:14:09 +0000 (19:14 +0000)]

Failed ASSERT now dumps core.

commit | commitdiff | tree

Martin Mares [Wed, 17 Jul 2002 02:34:43 +0000 (02:34 +0000)]

Oops, a memory leak in the presorter.

commit | commitdiff | tree

Martin Mares [Tue, 16 Jul 2002 22:04:04 +0000 (22:04 +0000)]

Expand shell metacharacters (most importantly "~") in destination name.

commit | commitdiff | tree

Martin Mares [Tue, 16 Jul 2002 21:55:15 +0000 (21:55 +0000)]

No more a development version.

commit | commitdiff | tree

Martin Mares [Tue, 16 Jul 2002 21:54:51 +0000 (21:54 +0000)]

Added an "install" target.

commit | commitdiff | tree

Martin Mares [Fri, 12 Jul 2002 02:35:49 +0000 (02:35 +0000)]

Oops, forgot a ')'.

commit | commitdiff | tree

Martin Mares [Fri, 12 Jul 2002 02:19:23 +0000 (02:19 +0000)]

WORD_TYPES_HIDDEN shouldn't be considered META by default.

WT_LINK shouldn't be considered accent-less. This might cause sherlockd
to fail to find matches in link texts from non-accented documents to
accented ones, but I think that it's more acceptable than producing
false matches. Unfortunately, we how no ways to describe accentedness
of a part of document text.

commit | commitdiff | tree

Robert Spalek [Wed, 10 Jul 2002 14:47:26 +0000 (14:47 +0000)]

added v?xprintf() functions, they will be used in the filter dumper

it is usable, for example, for printf()'ing to anything (like fastbufs)

commit | commitdiff | tree

Robert Spalek [Wed, 10 Jul 2002 13:56:06 +0000 (13:56 +0000)]

HASH_WANT_FIND_NEXT fixed and its declaration changed

commit | commitdiff | tree

Robert Spalek [Wed, 10 Jul 2002 12:58:36 +0000 (12:58 +0000)]

added the capability of hashing/finding more records with equal value
of the key

commit | commitdiff | tree

Martin Mares [Sat, 6 Jul 2002 03:29:41 +0000 (03:29 +0000)]

Increase line buffer sizes to 4096 bytes. Current gatherd really can
produce such long lines under several circumstances, need to examine
how is that possible.

commit | commitdiff | tree

Martin Mares [Fri, 5 Jul 2002 03:24:25 +0000 (03:24 +0000)]

Due to a bug, the "fsck" mode was unable to fix broken trailers.

commit | commitdiff | tree

Martin Mares [Fri, 5 Jul 2002 03:23:13 +0000 (03:23 +0000)]

When an inconsistency is encountered while shaking down the bucket
file, recover all data prior to the inconsistency by marking the
space between read and write pointer as deleted buckets (need to
use more of them if the space is too large).

commit | commitdiff | tree

Martin Mares [Thu, 27 Jun 2002 19:42:48 +0000 (19:42 +0000)]

When moving attributes, don't break the chain.

commit | commitdiff | tree

Martin Mares [Sun, 23 Jun 2002 20:32:19 +0000 (20:32 +0000)]

Implemented merging of catalog attributes to the index. Just place the
catalog dump to db/catalog.gz (e.g., by running utils/fetch-cat.sh)
and run the indexer.

Unfortunately, we've just filled up all the available word types :-(

commit | commitdiff | tree

Martin Mares [Sun, 23 Jun 2002 16:21:10 +0000 (16:21 +0000)]

Removed obsolete examples of custom attributes. The image search attributes
themselves are a good enough example.

commit | commitdiff | tree

Martin Mares [Sun, 23 Jun 2002 16:01:16 +0000 (16:01 +0000)]

When O_APPEND is given to bopen(), don't forget to set fb->pos and fb->fdpos.

commit | commitdiff | tree

Martin Mares [Sat, 22 Jun 2002 16:42:45 +0000 (16:42 +0000)]

dmalloc and efence work again (ported from rel-2.1 branch).

commit | commitdiff | tree

Martin Mares [Wed, 19 Jun 2002 14:10:55 +0000 (14:10 +0000)]

malloc -> xmalloc.

commit | commitdiff | tree

Martin Mares [Tue, 18 Jun 2002 17:37:53 +0000 (17:37 +0000)]

Introduced SKIP_TAGGED_CHAR.

UCW libraries

RSS Atom