mj.ucw.cz Git - libucw.git/log

Rewritten shake down of bucket file.

  o  Replaced read and write buffers by a single shared buffer.
     This should be somewhat faster (with the same size of memory invested
     to buffers).
  o  If ShakeSecurity is set to 2, shaking down should be reliable under
     all circumstances, including server reboots and broken bucket files.
     Buckettool -F still needs to be run after a failed shakedown and
     oid's need to be synchronized with the outside world, but no buckets
     will be lost (only some of them may be duplicated).
  o  The callback function (`the kibitz') is now allowed not only to decide
     which buckets will be kept, but also to alter contents of the buckets
     provided that it won't enlarge the bucket.

I tried to be very careful and tested the new routine thoroughly, but since
it's a pretty critical place, I would be very happy if somebody checks it
independently.

commit | commitdiff | tree

Martin Mares [Sun, 11 Jan 2004 00:07:54 +0000 (00:07 +0000)]

Don't create large bit arrays on stack. (The default stack limit on Linux is 2MB.)

commit | commitdiff | tree

Martin Mares [Sat, 10 Jan 2004 13:46:43 +0000 (13:46 +0000)]

GLUE_ again.

commit | commitdiff | tree

Martin Mares [Sat, 10 Jan 2004 13:44:38 +0000 (13:44 +0000)]

Use GLUE_ instead of HASH_GLUE.

commit | commitdiff | tree

Martin Mares [Sat, 10 Jan 2004 13:44:14 +0000 (13:44 +0000)]

Added GLUE and GLUE_ macros.

I originally wanted to use them in the new pre-sorter and didn't need them
afterwards, but they are useful anyway.

commit | commitdiff | tree

Martin Mares [Sat, 10 Jan 2004 13:41:09 +0000 (13:41 +0000)]

When pre-sorting a regular file, use lib/arraysort.h on an array of items
instead of the default merge-sort type algorithm working with linked lists.

This is much faster -- e.g., the sorting in shep-export on the current
Sherlock3 database now takes 54 sec instead of 669 :-)

However, to accomplish this I had to change two invariants:

  (1) SORT_REGULAR now means not only that the input has regular structure,
      but also that each item is reasonably small (i.e., we can use
      sorting by exchanging in place).

  (2) If SORT_PRESORT is enabled, the comparison function can be called
      with both keys equal. This trips ASSERT's on various place which
      originally helped a lot during debugging, so I decided to add
      a SORT_UNIQUE switch which in DEBUG mode causes the sorter to
      ensure that all keys are distinct, so we can remove the ASSERT's.

As both the Shepherd and the Indexer now rely heavily on sorting, it might
be worth a try to optimize the sorter even further, maybe by utilizing
polyphase sorting or something like that, the run sizes really seem to be
distributed unevenly many times.

commit | commitdiff | tree

Martin Mares [Sat, 10 Jan 2004 12:43:54 +0000 (12:43 +0000)]

Use HASH_USE_POOL for configuration space allocations.

commit | commitdiff | tree

Martin Mares [Sat, 10 Jan 2004 12:41:52 +0000 (12:41 +0000)]

Added HASH_AUTO_POOL option.

commit | commitdiff | tree

Tomas Valla [Tue, 23 Dec 2003 18:41:22 +0000 (18:41 +0000)]

Do not print "[]".

commit | commitdiff | tree

Tomas Valla [Tue, 23 Dec 2003 00:18:53 +0000 (00:18 +0000)]

Allow modules to change the log title, second attempt.

commit | commitdiff | tree

Tomas Valla [Mon, 22 Dec 2003 19:29:39 +0000 (19:29 +0000)]

Other modules shoud be able to modify the log title.

commit | commitdiff | tree

Martin Mares [Mon, 15 Dec 2003 19:20:47 +0000 (19:20 +0000)]

Another debugging switch: dump core on fatal errors.

commit | commitdiff | tree

Martin Mares [Mon, 15 Dec 2003 19:20:18 +0000 (19:20 +0000)]

The debugging memory allocator is now enabled by DEBUG_DMALLOC instead
of just "DMALLOC".

commit | commitdiff | tree

Robert Spalek [Thu, 11 Dec 2003 11:55:45 +0000 (11:55 +0000)]

deleted comment about fprecog

commit | commitdiff | tree

Martin Mares [Sun, 7 Dec 2003 14:23:58 +0000 (14:23 +0000)]

Improved and cleaned up the bucket library. The original "single operation
pending per process" invariant was no longer feasible (and it caused several
problems in Shepherd).

Reading and writing of buckets now uses dynamically allocated fastbufs and
there can be any number of readers at any time, but only a single writer
(otherwise a deadlock would occur). Read streams are seekable, write streams
at least btell()-able.

Also removed the omnipresent global variables for start of current bucket
etc., each part (Find, Slurp, Create, Shakedown, ...) has its own state
variables.

Added some more sanity checks.

commit | commitdiff | tree

Robert Spalek [Wed, 3 Dec 2003 13:04:36 +0000 (13:04 +0000)]

index version reverted to v2.6 subversion 2, because it is compatible now

commit | commitdiff | tree

Robert Spalek [Tue, 2 Dec 2003 14:08:30 +0000 (14:08 +0000)]

index version incremented due to lexmap.h change
anyway, we wanted to change 26 -> 30 some day

commit | commitdiff | tree

Martin Mares [Sat, 29 Nov 2003 11:47:02 +0000 (11:47 +0000)]

One more item type: u64.

commit | commitdiff | tree

Martin Mares [Sat, 29 Nov 2003 11:25:09 +0000 (11:25 +0000)]

Two improvements to the configuration language:

o Floating point item type introduced.
o Both integer and floating point numbers can be suffixed with a unit.

Also, I've exported parsing of integers and doubles for the convenience
of CT_FUNCTION callbacks.

commit | commitdiff | tree

Robert Spalek [Wed, 26 Nov 2003 17:30:58 +0000 (17:30 +0000)]

no need to cut www-prefix twice

commit | commitdiff | tree

Robert Spalek [Tue, 25 Nov 2003 16:11:57 +0000 (16:11 +0000)]

do not replace target url-equiv

commit | commitdiff | tree

Martin Mares [Sat, 22 Nov 2003 18:22:34 +0000 (18:22 +0000)]

Replaced obuck_fetch_end() by bclose() (which is a nop as obuck_fetch_end was :) ).

commit | commitdiff | tree

Martin Mares [Sat, 22 Nov 2003 18:21:22 +0000 (18:21 +0000)]

Added very simple functions for emulating a fastbuf stream over a static
buffer. The struct fastbuf is allocated statically as well to make everything
as simple and as fast as possible.

commit | commitdiff | tree

Robert Spalek [Mon, 17 Nov 2003 13:09:44 +0000 (13:09 +0000)]

1. db/catalog.gz ---> db/catalog
+ it is not sent to oook and feedback-cat via pipes, but it is read by them as a file
+ it is read in 2 passes and the URL's are identified in the 1st phase (catalog.c)

2. URL fingerprinting always uses cf/url-equiv, even in the indexer

commit | commitdiff | tree

Martin Mares [Sat, 15 Nov 2003 10:41:41 +0000 (10:41 +0000)]

A better function for hashing integers (the old multiplier was completely
bogus as it didn't fit in a 32-bit integer) and also a new function
for hashing pointers.

commit | commitdiff | tree

Robert Spalek [Thu, 13 Nov 2003 10:43:07 +0000 (10:43 +0000)]

I decided to turn off cf/url-equiv for indexation. however, after the indexer
is run on regular sherlock5, we cannot manually delete this file for indexer
and restore for gatherd. so I am creating a new parameter that controls
loading this prefix table.

commit | commitdiff | tree

Tomas Valla [Thu, 6 Nov 2003 16:53:58 +0000 (16:53 +0000)]

Added some headers to avoid confusion of our own developers ;)

commit | commitdiff | tree

Martin Mares [Wed, 5 Nov 2003 22:00:18 +0000 (22:00 +0000)]

Added special mode for sorting of regular files.

commit | commitdiff | tree

Martin Mares [Wed, 5 Nov 2003 20:43:27 +0000 (20:43 +0000)]

bbcopy() can be asked to copy the rest of the input file by specifying
a length of ~0U.

commit | commitdiff | tree

Martin Mares [Wed, 5 Nov 2003 20:42:20 +0000 (20:42 +0000)]

Undefine all the parameter macros at the end. (The hash tables already do it
and it showed up to be very useful.)

commit | commitdiff | tree

Robert Spalek [Mon, 3 Nov 2003 17:13:06 +0000 (17:13 +0000)]

page_size -> PAGE_SIZE

commit | commitdiff | tree

Tomas Valla [Mon, 3 Nov 2003 15:26:14 +0000 (15:26 +0000)]

And do not forget .cvsignore, of course.

commit | commitdiff | tree

Robert Spalek [Mon, 3 Nov 2003 14:35:50 +0000 (14:35 +0000)]

- giant class flag moved from attributes to card-notes
- merger only marks documents by giant flag and
  the penalization is done in chewer
- added new weight attribute to cards: Wp means weight after penalization
  added penalization notes in the form .Pg-50 (giant class penalized by -50)
- chewer.c: card_write_start() does NOT write struct card_attr and it needs
  to be done manually later
- chewer.c: weight records are sorted chronologically, I like it more :-)

commit | commitdiff | tree

Robert Spalek [Mon, 3 Nov 2003 14:15:11 +0000 (14:15 +0000)]

this could never have worked

commit | commitdiff | tree

Tomas Valla [Mon, 3 Nov 2003 12:09:04 +0000 (12:09 +0000)]

CVS should ignore files created by compiling Ulimit

commit | commitdiff | tree

Tomas Valla [Mon, 3 Nov 2003 12:04:45 +0000 (12:04 +0000)]

to supress annoying warning messages during make clean

commit | commitdiff | tree

Robert Spalek [Sun, 2 Nov 2003 18:24:18 +0000 (18:24 +0000)]

INDEX_VERSION fixed

commit | commitdiff | tree

Robert Spalek [Fri, 31 Oct 2003 10:23:21 +0000 (10:23 +0000)]

indexer rewritten to generate redirect brackets
+ code written, debugged, and polished

in particular, labels contain new attribute redir_id and attribute priority
has been deleted

we need to update search/cards.c

commit | commitdiff | tree

Tomas Valla [Sat, 25 Oct 2003 19:22:57 +0000 (19:22 +0000)]

Perl module for setting ulimits.
Should solve bug #538.
[warning - compiling perlXS is ugly ;) ]

commit | commitdiff | tree

Robert Spalek [Sat, 25 Oct 2003 14:03:52 +0000 (14:03 +0000)]

introduced type bitarray_t

commit | commitdiff | tree

Martin Mares [Sat, 25 Oct 2003 09:51:00 +0000 (09:51 +0000)]

We don't need this in v3.0.

commit | commitdiff | tree

Martin Mares [Sun, 19 Oct 2003 18:19:59 +0000 (18:19 +0000)]

Forgot to add fb-limfd.

commit | commitdiff | tree

Martin Mares [Sun, 19 Oct 2003 16:47:55 +0000 (16:47 +0000)]

Replaced the "orig_len" field in bucket headers which was never used
for anything useful by bucket type code.

commit | commitdiff | tree

Martin Mares [Sun, 19 Oct 2003 16:47:06 +0000 (16:47 +0000)]

Added fastbuf backend for reading from file descriptors with a given limit.
(Very useful for communication over sockets.)

commit | commitdiff | tree

Martin Mares [Sun, 19 Oct 2003 16:46:06 +0000 (16:46 +0000)]

Too much copying and pasting :-)

commit | commitdiff | tree

Martin Mares [Wed, 15 Oct 2003 18:32:18 +0000 (18:32 +0000)]

Good luck, v3.0!

commit | commitdiff | tree

Martin Mares [Wed, 15 Oct 2003 16:53:18 +0000 (16:53 +0000)]

Updated version numbers.

commit | commitdiff | tree

Martin Mares [Sat, 11 Oct 2003 20:14:23 +0000 (20:14 +0000)]

Added UTF8_SKIP_BWD.

commit | commitdiff | tree

Martin Mares [Sat, 11 Oct 2003 11:54:34 +0000 (11:54 +0000)]

Oops, forgot the values.

commit | commitdiff | tree

Martin Mares [Sat, 11 Oct 2003 10:20:09 +0000 (10:20 +0000)]

New tables generated from UnicodeData 4.0.1 using the new scripts.

commit | commitdiff | tree

Martin Mares [Sat, 11 Oct 2003 10:19:40 +0000 (10:19 +0000)]

Several improvements to the unicode library:

  o  All tables are now const.
  o  Redefined the categories:
- now using _U_* instead of _C_*
- introduced _U_LETTER modified with either _U_UPPER or _U_LOWER
  or none (titlecase letters, letter modifiers etc.)
  o  Added the ligature expansions and _U_LIGATURE.
  o  Minor cleanups.

commit | commitdiff | tree

Martin Mares [Sat, 11 Oct 2003 10:17:09 +0000 (10:17 +0000)]

Don't forget the ligtable.

commit | commitdiff | tree

Martin Mares [Sat, 11 Oct 2003 10:16:55 +0000 (10:16 +0000)]

Constified.

commit | commitdiff | tree

Martin Mares [Sat, 11 Oct 2003 10:16:31 +0000 (10:16 +0000)]

Added a table of compatibility ligature expansions.

commit | commitdiff | tree

Martin Mares [Sat, 11 Oct 2003 10:13:20 +0000 (10:13 +0000)]

Added const to chartype tables. Also removed _c_collate and _c_order
which didn't exist since the last glacial era.

commit | commitdiff | tree

Martin Mares [Sat, 11 Oct 2003 09:06:22 +0000 (09:06 +0000)]

One more :)

commit | commitdiff | tree

Martin Mares [Sat, 11 Oct 2003 09:05:47 +0000 (09:05 +0000)]

Expect the unicode data directory to be linked to by "unidata".

commit | commitdiff | tree

Martin Mares [Sat, 11 Oct 2003 09:04:09 +0000 (09:04 +0000)]

Updated to new names of scripts.

commit | commitdiff | tree

Martin Mares [Sat, 11 Oct 2003 09:00:20 +0000 (09:00 +0000)]

Renamed unisplit to gen-basic.

commit | commitdiff | tree

Martin Mares [Sat, 11 Oct 2003 08:58:22 +0000 (08:58 +0000)]

Renamed tabgen to gen-charconv.

commit | commitdiff | tree

Martin Mares [Sat, 11 Oct 2003 08:58:12 +0000 (08:58 +0000)]

Renamed mkunacc to gen-unacc.

commit | commitdiff | tree

Martin Mares [Sat, 11 Oct 2003 08:55:38 +0000 (08:55 +0000)]

Renamed charset import scripts.

commit | commitdiff | tree

Martin Mares [Sat, 11 Oct 2003 08:53:40 +0000 (08:53 +0000)]

Renamed mkuni to add-charnames and changed the path to the UnicodeData file.

commit | commitdiff | tree

Martin Mares [Sat, 11 Oct 2003 08:52:30 +0000 (08:52 +0000)]

Obsolete and also some of the Slovak characters were missing.

commit | commitdiff | tree

Martin Mares [Sat, 11 Oct 2003 08:48:07 +0000 (08:48 +0000)]

This was testing functions which didn't exist :-)

commit | commitdiff | tree

Martin Mares [Sat, 11 Oct 2003 08:46:58 +0000 (08:46 +0000)]

The signature charset hasn't been used for ages.

commit | commitdiff | tree

Martin Mares [Fri, 10 Oct 2003 18:01:39 +0000 (18:01 +0000)]

Export cfpool -- sometimes it's much convenient to pass just a pool than
a pointer to an allocation function.

commit | commitdiff | tree

Martin Mares [Fri, 3 Oct 2003 16:41:42 +0000 (16:41 +0000)]

Added a simple utility for generating changelogs.

commit | commitdiff | tree

Martin Mares [Fri, 3 Oct 2003 09:33:49 +0000 (09:33 +0000)]

These files have been obsoleted by the new customization system.

commit | commitdiff | tree

Martin Mares [Fri, 3 Oct 2003 09:29:58 +0000 (09:29 +0000)]

Search for custom.h at the right place.

commit | commitdiff | tree

Martin Mares [Thu, 2 Oct 2003 11:24:38 +0000 (11:24 +0000)]

Added a hook for indexing custom string types.

commit | commitdiff | tree

Martin Mares [Sat, 27 Sep 2003 19:43:36 +0000 (19:43 +0000)]

Added a lot of missing #include <alloca.h>'s.

commit | commitdiff | tree

Martin Mares [Fri, 26 Sep 2003 14:11:55 +0000 (14:11 +0000)]

Added charconv wrapper around fastbuf (currently output only).

commit | commitdiff | tree

Martin Mares [Fri, 26 Sep 2003 11:16:26 +0000 (11:16 +0000)]

EXTRA_RUNDIRS needn't form a strict hierarchy, so add a -p.

commit | commitdiff | tree

Martin Mares [Tue, 23 Sep 2003 16:20:10 +0000 (16:20 +0000)]

Added a set of functions for sliding window mmapping of large files.
Will be used by the indexer to access the card notes array.

commit | commitdiff | tree

Martin Mares [Wed, 17 Sep 2003 12:36:44 +0000 (12:36 +0000)]

Replaced enums by #define's in definitions of word, meta and string types.
It's less elegant, but it gives a chance to detect whether a specific type
exists or not.

commit | commitdiff | tree

Martin Mares [Mon, 15 Sep 2003 07:45:47 +0000 (07:45 +0000)]

Allow submakefiles to add their own installation directories and to override
the run/bin directory. Propagate the directories to the installer.

commit | commitdiff | tree

Martin Mares [Fri, 29 Aug 2003 17:39:34 +0000 (17:39 +0000)]

Updated the installation script to always check for missing directories.

commit | commitdiff | tree

Tomas Valla [Sun, 10 Aug 2003 01:30:50 +0000 (01:30 +0000)]

Recognition of variable types in parse_args is now automatic.

commit | commitdiff | tree

Tomas Valla [Sun, 20 Jul 2003 19:17:22 +0000 (19:17 +0000)]

Added 'array' feature to handle multiple variable occurrences.

commit | commitdiff | tree

Tomas Valla [Thu, 10 Jul 2003 18:12:57 +0000 (18:12 +0000)]

Just to make it more comfortable.

commit | commitdiff | tree

Tomas Valla [Wed, 9 Jul 2003 01:29:16 +0000 (01:29 +0000)]

Patch to allow processing of multiple occurences of the same argument.
Now it returns a string of values separated by "&".

commit | commitdiff | tree

Robert Spalek [Fri, 4 Jul 2003 13:17:24 +0000 (13:17 +0000)]

fixed headers

commit | commitdiff | tree

Robert Spalek [Fri, 4 Jul 2003 13:14:25 +0000 (13:14 +0000)]

fixed generated header comment

commit | commitdiff | tree

Robert Spalek [Fri, 4 Jul 2003 12:53:45 +0000 (12:53 +0000)]

regenerated by misc/generate from updated charset tables

commit | commitdiff | tree

Robert Spalek [Fri, 4 Jul 2003 12:52:02 +0000 (12:52 +0000)]

added (and renamed) all iso-8859-* charsets

commit | commitdiff | tree

Robert Spalek [Fri, 4 Jul 2003 12:49:19 +0000 (12:49 +0000)]

added (and renamed) all iso-8859-{1,2,...,16} charsets

commit | commitdiff | tree

Robert Spalek [Fri, 4 Jul 2003 12:48:39 +0000 (12:48 +0000)]

adapted to UNDEFINED characters

commit | commitdiff | tree

Robert Spalek [Fri, 4 Jul 2003 12:47:44 +0000 (12:47 +0000)]

upgraded from ftp.unicode.org and also renamed

commit | commitdiff | tree

Robert Spalek [Fri, 4 Jul 2003 12:46:59 +0000 (12:46 +0000)]

updated according to the newest tables downloaded from ftp.unicode.org

commit | commitdiff | tree

Robert Spalek [Fri, 4 Jul 2003 12:27:32 +0000 (12:27 +0000)]

imported by `trunicode` from ftp.unicode.org

commit | commitdiff | tree

Robert Spalek [Fri, 4 Jul 2003 12:26:26 +0000 (12:26 +0000)]

added a tool for importing mappings from ftp.unicode.org
I will rather use this source than `recode`

commit | commitdiff | tree

Martin Mares [Mon, 30 Jun 2003 11:18:57 +0000 (11:18 +0000)]

Several changes mixed to one commit (sorry, the CVS didn't work for a long time):

o  Changed index format ID.
o  MAX_COMPLEX_LEN went with the rest of complexes.
o  Introduced data types and handling macros for context bucket ID's.
o  Returned fp_hash() to its original definition -- the previous "fix" was
   deadly wrong: I confused indexing of bytes with indexing of words.
   Also, the fp_hash() has to be monotonic wrt. fpsort's order which the
   new one wasn't.

commit | commitdiff | tree

Martin Mares [Mon, 30 Jun 2003 11:17:00 +0000 (11:17 +0000)]

Define CONFIG_CONTEXTS whenever we use contexts.

UCW libraries

RSS Atom