]> mj.ucw.cz Git - libucw.git/log
libucw.git
20 years agoGLUE_ again.
Martin Mares [Sat, 10 Jan 2004 13:46:43 +0000 (13:46 +0000)]
GLUE_ again.

20 years agoUse GLUE_ instead of HASH_GLUE.
Martin Mares [Sat, 10 Jan 2004 13:44:38 +0000 (13:44 +0000)]
Use GLUE_ instead of HASH_GLUE.

20 years agoAdded GLUE and GLUE_ macros.
Martin Mares [Sat, 10 Jan 2004 13:44:14 +0000 (13:44 +0000)]
Added GLUE and GLUE_ macros.

I originally wanted to use them in the new pre-sorter and didn't need them
afterwards, but they are useful anyway.

20 years agoWhen pre-sorting a regular file, use lib/arraysort.h on an array of items
Martin Mares [Sat, 10 Jan 2004 13:41:09 +0000 (13:41 +0000)]
When pre-sorting a regular file, use lib/arraysort.h on an array of items
instead of the default merge-sort type algorithm working with linked lists.

This is much faster -- e.g., the sorting in shep-export on the current
Sherlock3 database now takes 54 sec instead of 669 :-)

However, to accomplish this I had to change two invariants:

  (1) SORT_REGULAR now means not only that the input has regular structure,
      but also that each item is reasonably small (i.e., we can use
      sorting by exchanging in place).

  (2) If SORT_PRESORT is enabled, the comparison function can be called
      with both keys equal. This trips ASSERT's on various place which
      originally helped a lot during debugging, so I decided to add
      a SORT_UNIQUE switch which in DEBUG mode causes the sorter to
      ensure that all keys are distinct, so we can remove the ASSERT's.

As both the Shepherd and the Indexer now rely heavily on sorting, it might
be worth a try to optimize the sorter even further, maybe by utilizing
polyphase sorting or something like that, the run sizes really seem to be
distributed unevenly many times.

20 years agoUse HASH_USE_POOL for configuration space allocations.
Martin Mares [Sat, 10 Jan 2004 12:43:54 +0000 (12:43 +0000)]
Use HASH_USE_POOL for configuration space allocations.

20 years agoAdded HASH_AUTO_POOL option.
Martin Mares [Sat, 10 Jan 2004 12:41:52 +0000 (12:41 +0000)]
Added HASH_AUTO_POOL option.

20 years agoDo not print "[]".
Tomas Valla [Tue, 23 Dec 2003 18:41:22 +0000 (18:41 +0000)]
Do not print "[]".

20 years agoAllow modules to change the log title, second attempt.
Tomas Valla [Tue, 23 Dec 2003 00:18:53 +0000 (00:18 +0000)]
Allow modules to change the log title, second attempt.

20 years agoOther modules shoud be able to modify the log title.
Tomas Valla [Mon, 22 Dec 2003 19:29:39 +0000 (19:29 +0000)]
Other modules shoud be able to modify the log title.

20 years agoAnother debugging switch: dump core on fatal errors.
Martin Mares [Mon, 15 Dec 2003 19:20:47 +0000 (19:20 +0000)]
Another debugging switch: dump core on fatal errors.

20 years agoThe debugging memory allocator is now enabled by DEBUG_DMALLOC instead
Martin Mares [Mon, 15 Dec 2003 19:20:18 +0000 (19:20 +0000)]
The debugging memory allocator is now enabled by DEBUG_DMALLOC instead
of just "DMALLOC".

20 years agodeleted comment about fprecog
Robert Spalek [Thu, 11 Dec 2003 11:55:45 +0000 (11:55 +0000)]
deleted comment about fprecog

20 years agoImproved and cleaned up the bucket library. The original "single operation
Martin Mares [Sun, 7 Dec 2003 14:23:58 +0000 (14:23 +0000)]
Improved and cleaned up the bucket library. The original "single operation
pending per process" invariant was no longer feasible (and it caused several
problems in Shepherd).

Reading and writing of buckets now uses dynamically allocated fastbufs and
there can be any number of readers at any time, but only a single writer
(otherwise a deadlock would occur). Read streams are seekable, write streams
at least btell()-able.

Also removed the omnipresent global variables for start of current bucket
etc., each part (Find, Slurp, Create, Shakedown, ...) has its own state
variables.

Added some more sanity checks.

20 years agoindex version reverted to v2.6 subversion 2, because it is compatible now
Robert Spalek [Wed, 3 Dec 2003 13:04:36 +0000 (13:04 +0000)]
index version reverted to v2.6 subversion 2, because it is compatible now

20 years agoindex version incremented due to lexmap.h change
Robert Spalek [Tue, 2 Dec 2003 14:08:30 +0000 (14:08 +0000)]
index version incremented due to lexmap.h change
anyway, we wanted to change 26 -> 30 some day

20 years agoOne more item type: u64.
Martin Mares [Sat, 29 Nov 2003 11:47:02 +0000 (11:47 +0000)]
One more item type: u64.

20 years agoTwo improvements to the configuration language:
Martin Mares [Sat, 29 Nov 2003 11:25:09 +0000 (11:25 +0000)]
Two improvements to the configuration language:

o  Floating point item type introduced.
o  Both integer and floating point numbers can be suffixed with a unit.

Also, I've exported parsing of integers and doubles for the convenience
of CT_FUNCTION callbacks.

20 years agono need to cut www-prefix twice
Robert Spalek [Wed, 26 Nov 2003 17:30:58 +0000 (17:30 +0000)]
no need to cut www-prefix twice

20 years agodo not replace target url-equiv
Robert Spalek [Tue, 25 Nov 2003 16:11:57 +0000 (16:11 +0000)]
do not replace target url-equiv

20 years agoReplaced obuck_fetch_end() by bclose() (which is a nop as obuck_fetch_end was :) ).
Martin Mares [Sat, 22 Nov 2003 18:22:34 +0000 (18:22 +0000)]
Replaced obuck_fetch_end() by bclose() (which is a nop as obuck_fetch_end was :) ).

20 years agoAdded very simple functions for emulating a fastbuf stream over a static
Martin Mares [Sat, 22 Nov 2003 18:21:22 +0000 (18:21 +0000)]
Added very simple functions for emulating a fastbuf stream over a static
buffer. The struct fastbuf is allocated statically as well to make everything
as simple and as fast as possible.

20 years ago1. db/catalog.gz ---> db/catalog
Robert Spalek [Mon, 17 Nov 2003 13:09:44 +0000 (13:09 +0000)]
1. db/catalog.gz ---> db/catalog
+ it is not sent to oook and feedback-cat via pipes, but it is read by them as a file
+ it is read in 2 passes and the URL's are identified in the 1st phase (catalog.c)

2. URL fingerprinting always uses cf/url-equiv, even in the indexer

20 years agoA better function for hashing integers (the old multiplier was completely
Martin Mares [Sat, 15 Nov 2003 10:41:41 +0000 (10:41 +0000)]
A better function for hashing integers (the old multiplier was completely
bogus as it didn't fit in a 32-bit integer) and also a new function
for hashing pointers.

20 years agoI decided to turn off cf/url-equiv for indexation. however, after the indexer
Robert Spalek [Thu, 13 Nov 2003 10:43:07 +0000 (10:43 +0000)]
I decided to turn off cf/url-equiv for indexation.  however, after the indexer
is run on regular sherlock5, we cannot manually delete this file for indexer
and restore for gatherd.  so I am creating a new parameter that controls
loading this prefix table.

20 years agoAdded some headers to avoid confusion of our own developers ;)
Tomas Valla [Thu, 6 Nov 2003 16:53:58 +0000 (16:53 +0000)]
Added some headers to avoid confusion of our own developers ;)

20 years agoAdded special mode for sorting of regular files.
Martin Mares [Wed, 5 Nov 2003 22:00:18 +0000 (22:00 +0000)]
Added special mode for sorting of regular files.

20 years agobbcopy() can be asked to copy the rest of the input file by specifying
Martin Mares [Wed, 5 Nov 2003 20:43:27 +0000 (20:43 +0000)]
bbcopy() can be asked to copy the rest of the input file by specifying
a length of ~0U.

20 years agoUndefine all the parameter macros at the end. (The hash tables already do it
Martin Mares [Wed, 5 Nov 2003 20:42:20 +0000 (20:42 +0000)]
Undefine all the parameter macros at the end. (The hash tables already do it
and it showed up to be very useful.)

20 years agopage_size -> PAGE_SIZE
Robert Spalek [Mon, 3 Nov 2003 17:13:06 +0000 (17:13 +0000)]
page_size -> PAGE_SIZE

20 years agoAnd do not forget .cvsignore, of course.
Tomas Valla [Mon, 3 Nov 2003 15:26:14 +0000 (15:26 +0000)]
And do not forget .cvsignore, of course.

20 years ago- giant class flag moved from attributes to card-notes
Robert Spalek [Mon, 3 Nov 2003 14:35:50 +0000 (14:35 +0000)]
- giant class flag moved from attributes to card-notes
- merger only marks documents by giant flag and
  the penalization is done in chewer
- added new weight attribute to cards: Wp means weight after penalization
  added penalization notes in the form .Pg-50 (giant class penalized by -50)
- chewer.c: card_write_start() does NOT write struct card_attr and it needs
  to be done manually later
- chewer.c: weight records are sorted chronologically, I like it more :-)

20 years agothis could never have worked
Robert Spalek [Mon, 3 Nov 2003 14:15:11 +0000 (14:15 +0000)]
this could never have worked

20 years agoCVS should ignore files created by compiling Ulimit
Tomas Valla [Mon, 3 Nov 2003 12:09:04 +0000 (12:09 +0000)]
CVS should ignore files created by compiling Ulimit

20 years agoto supress annoying warning messages during make clean
Tomas Valla [Mon, 3 Nov 2003 12:04:45 +0000 (12:04 +0000)]
to supress annoying warning messages during make clean

20 years agoINDEX_VERSION fixed
Robert Spalek [Sun, 2 Nov 2003 18:24:18 +0000 (18:24 +0000)]
INDEX_VERSION fixed

20 years agoindexer rewritten to generate redirect brackets
Robert Spalek [Fri, 31 Oct 2003 10:23:21 +0000 (10:23 +0000)]
indexer rewritten to generate redirect brackets
+ code written, debugged, and polished

in particular, labels contain new attribute redir_id and attribute priority
has been deleted

we need to update search/cards.c

20 years agoPerl module for setting ulimits.
Tomas Valla [Sat, 25 Oct 2003 19:22:57 +0000 (19:22 +0000)]
Perl module for setting ulimits.
Should solve bug #538.
[warning - compiling perlXS is ugly ;) ]

20 years agointroduced type bitarray_t
Robert Spalek [Sat, 25 Oct 2003 14:03:52 +0000 (14:03 +0000)]
introduced type bitarray_t

20 years agoWe don't need this in v3.0.
Martin Mares [Sat, 25 Oct 2003 09:51:00 +0000 (09:51 +0000)]
We don't need this in v3.0.

20 years agoForgot to add fb-limfd.
Martin Mares [Sun, 19 Oct 2003 18:19:59 +0000 (18:19 +0000)]
Forgot to add fb-limfd.

20 years agoReplaced the "orig_len" field in bucket headers which was never used
Martin Mares [Sun, 19 Oct 2003 16:47:55 +0000 (16:47 +0000)]
Replaced the "orig_len" field in bucket headers which was never used
for anything useful by bucket type code.

20 years agoAdded fastbuf backend for reading from file descriptors with a given limit.
Martin Mares [Sun, 19 Oct 2003 16:47:06 +0000 (16:47 +0000)]
Added fastbuf backend for reading from file descriptors with a given limit.
(Very useful for communication over sockets.)

20 years agoToo much copying and pasting :-)
Martin Mares [Sun, 19 Oct 2003 16:46:06 +0000 (16:46 +0000)]
Too much copying and pasting :-)

20 years agoGood luck, v3.0!
Martin Mares [Wed, 15 Oct 2003 18:32:18 +0000 (18:32 +0000)]
Good luck, v3.0!

20 years agoUpdated version numbers.
Martin Mares [Wed, 15 Oct 2003 16:53:18 +0000 (16:53 +0000)]
Updated version numbers.

20 years agoAdded UTF8_SKIP_BWD.
Martin Mares [Sat, 11 Oct 2003 20:14:23 +0000 (20:14 +0000)]
Added UTF8_SKIP_BWD.

20 years agoOops, forgot the values.
Martin Mares [Sat, 11 Oct 2003 11:54:34 +0000 (11:54 +0000)]
Oops, forgot the values.

20 years agoNew tables generated from UnicodeData 4.0.1 using the new scripts.
Martin Mares [Sat, 11 Oct 2003 10:20:09 +0000 (10:20 +0000)]
New tables generated from UnicodeData 4.0.1 using the new scripts.

20 years agoSeveral improvements to the unicode library:
Martin Mares [Sat, 11 Oct 2003 10:19:40 +0000 (10:19 +0000)]
Several improvements to the unicode library:

  o  All tables are now const.
  o  Redefined the categories:
- now using _U_* instead of _C_*
- introduced _U_LETTER modified with either _U_UPPER or _U_LOWER
  or none (titlecase letters, letter modifiers etc.)
  o  Added the ligature expansions and _U_LIGATURE.
  o  Minor cleanups.

20 years agoDon't forget the ligtable.
Martin Mares [Sat, 11 Oct 2003 10:17:09 +0000 (10:17 +0000)]
Don't forget the ligtable.

20 years agoConstified.
Martin Mares [Sat, 11 Oct 2003 10:16:55 +0000 (10:16 +0000)]
Constified.

20 years agoAdded a table of compatibility ligature expansions.
Martin Mares [Sat, 11 Oct 2003 10:16:31 +0000 (10:16 +0000)]
Added a table of compatibility ligature expansions.

20 years agoAdded const to chartype tables. Also removed _c_collate and _c_order
Martin Mares [Sat, 11 Oct 2003 10:13:20 +0000 (10:13 +0000)]
Added const to chartype tables. Also removed _c_collate and _c_order
which didn't exist since the last glacial era.

20 years agoOne more :)
Martin Mares [Sat, 11 Oct 2003 09:06:22 +0000 (09:06 +0000)]
One more :)

20 years agoExpect the unicode data directory to be linked to by "unidata".
Martin Mares [Sat, 11 Oct 2003 09:05:47 +0000 (09:05 +0000)]
Expect the unicode data directory to be linked to by "unidata".

20 years agoUpdated to new names of scripts.
Martin Mares [Sat, 11 Oct 2003 09:04:09 +0000 (09:04 +0000)]
Updated to new names of scripts.

20 years agoRenamed unisplit to gen-basic.
Martin Mares [Sat, 11 Oct 2003 09:00:20 +0000 (09:00 +0000)]
Renamed unisplit to gen-basic.

20 years agoRenamed tabgen to gen-charconv.
Martin Mares [Sat, 11 Oct 2003 08:58:22 +0000 (08:58 +0000)]
Renamed tabgen to gen-charconv.

20 years agoRenamed mkunacc to gen-unacc.
Martin Mares [Sat, 11 Oct 2003 08:58:12 +0000 (08:58 +0000)]
Renamed mkunacc to gen-unacc.

20 years agoRenamed charset import scripts.
Martin Mares [Sat, 11 Oct 2003 08:55:38 +0000 (08:55 +0000)]
Renamed charset import scripts.

20 years agoRenamed mkuni to add-charnames and changed the path to the UnicodeData file.
Martin Mares [Sat, 11 Oct 2003 08:53:40 +0000 (08:53 +0000)]
Renamed mkuni to add-charnames and changed the path to the UnicodeData file.

20 years agoObsolete and also some of the Slovak characters were missing.
Martin Mares [Sat, 11 Oct 2003 08:52:30 +0000 (08:52 +0000)]
Obsolete and also some of the Slovak characters were missing.

20 years agoThis was testing functions which didn't exist :-)
Martin Mares [Sat, 11 Oct 2003 08:48:07 +0000 (08:48 +0000)]
This was testing functions which didn't exist :-)

20 years agoThe signature charset hasn't been used for ages.
Martin Mares [Sat, 11 Oct 2003 08:46:58 +0000 (08:46 +0000)]
The signature charset hasn't been used for ages.

20 years agoExport cfpool -- sometimes it's much convenient to pass just a pool than
Martin Mares [Fri, 10 Oct 2003 18:01:39 +0000 (18:01 +0000)]
Export cfpool -- sometimes it's much convenient to pass just a pool than
a pointer to an allocation function.

20 years agoAdded a simple utility for generating changelogs.
Martin Mares [Fri, 3 Oct 2003 16:41:42 +0000 (16:41 +0000)]
Added a simple utility for generating changelogs.

20 years agoThese files have been obsoleted by the new customization system.
Martin Mares [Fri, 3 Oct 2003 09:33:49 +0000 (09:33 +0000)]
These files have been obsoleted by the new customization system.

20 years agoSearch for custom.h at the right place.
Martin Mares [Fri, 3 Oct 2003 09:29:58 +0000 (09:29 +0000)]
Search for custom.h at the right place.

20 years agoAdded a hook for indexing custom string types.
Martin Mares [Thu, 2 Oct 2003 11:24:38 +0000 (11:24 +0000)]
Added a hook for indexing custom string types.

20 years agoAdded a lot of missing #include <alloca.h>'s.
Martin Mares [Sat, 27 Sep 2003 19:43:36 +0000 (19:43 +0000)]
Added a lot of missing #include <alloca.h>'s.

20 years agoAdded charconv wrapper around fastbuf (currently output only).
Martin Mares [Fri, 26 Sep 2003 14:11:55 +0000 (14:11 +0000)]
Added charconv wrapper around fastbuf (currently output only).

20 years agoEXTRA_RUNDIRS needn't form a strict hierarchy, so add a -p.
Martin Mares [Fri, 26 Sep 2003 11:16:26 +0000 (11:16 +0000)]
EXTRA_RUNDIRS needn't form a strict hierarchy, so add a -p.

20 years agoAdded a set of functions for sliding window mmapping of large files.
Martin Mares [Tue, 23 Sep 2003 16:20:10 +0000 (16:20 +0000)]
Added a set of functions for sliding window mmapping of large files.
Will be used by the indexer to access the card notes array.

20 years agoReplaced enums by #define's in definitions of word, meta and string types.
Martin Mares [Wed, 17 Sep 2003 12:36:44 +0000 (12:36 +0000)]
Replaced enums by #define's in definitions of word, meta and string types.
It's less elegant, but it gives a chance to detect whether a specific type
exists or not.

20 years agoAllow submakefiles to add their own installation directories and to override
Martin Mares [Mon, 15 Sep 2003 07:45:47 +0000 (07:45 +0000)]
Allow submakefiles to add their own installation directories and to override
the run/bin directory. Propagate the directories to the installer.

20 years agoUpdated the installation script to always check for missing directories.
Martin Mares [Fri, 29 Aug 2003 17:39:34 +0000 (17:39 +0000)]
Updated the installation script to always check for missing directories.

20 years agoRecognition of variable types in parse_args is now automatic.
Tomas Valla [Sun, 10 Aug 2003 01:30:50 +0000 (01:30 +0000)]
Recognition of variable types in parse_args is now automatic.

21 years agoAdded 'array' feature to handle multiple variable occurrences.
Tomas Valla [Sun, 20 Jul 2003 19:17:22 +0000 (19:17 +0000)]
Added 'array' feature to handle multiple variable occurrences.

21 years agoJust to make it more comfortable.
Tomas Valla [Thu, 10 Jul 2003 18:12:57 +0000 (18:12 +0000)]
Just to make it more comfortable.

21 years agoPatch to allow processing of multiple occurences of the same argument.
Tomas Valla [Wed, 9 Jul 2003 01:29:16 +0000 (01:29 +0000)]
Patch to allow processing of multiple occurences of the same argument.
Now it returns a string of values separated by "&".

21 years agofixed headers
Robert Spalek [Fri, 4 Jul 2003 13:17:24 +0000 (13:17 +0000)]
fixed headers

21 years agofixed generated header comment
Robert Spalek [Fri, 4 Jul 2003 13:14:25 +0000 (13:14 +0000)]
fixed generated header comment

21 years agoregenerated by misc/generate from updated charset tables
Robert Spalek [Fri, 4 Jul 2003 12:53:45 +0000 (12:53 +0000)]
regenerated by misc/generate from updated charset tables

21 years agoadded (and renamed) all iso-8859-* charsets
Robert Spalek [Fri, 4 Jul 2003 12:52:02 +0000 (12:52 +0000)]
added (and renamed) all iso-8859-* charsets

21 years agoadded (and renamed) all iso-8859-{1,2,...,16} charsets
Robert Spalek [Fri, 4 Jul 2003 12:49:19 +0000 (12:49 +0000)]
added (and renamed) all iso-8859-{1,2,...,16} charsets

21 years agoadapted to UNDEFINED characters
Robert Spalek [Fri, 4 Jul 2003 12:48:39 +0000 (12:48 +0000)]
adapted to UNDEFINED characters

21 years agoupgraded from ftp.unicode.org and also renamed
Robert Spalek [Fri, 4 Jul 2003 12:47:44 +0000 (12:47 +0000)]
upgraded from ftp.unicode.org and also renamed

21 years agoupdated according to the newest tables downloaded from ftp.unicode.org
Robert Spalek [Fri, 4 Jul 2003 12:46:59 +0000 (12:46 +0000)]
updated according to the newest tables downloaded from ftp.unicode.org

21 years agoimported by `trunicode` from ftp.unicode.org
Robert Spalek [Fri, 4 Jul 2003 12:27:32 +0000 (12:27 +0000)]
imported by `trunicode` from ftp.unicode.org

21 years agoadded a tool for importing mappings from ftp.unicode.org
Robert Spalek [Fri, 4 Jul 2003 12:26:26 +0000 (12:26 +0000)]
added a tool for importing mappings from ftp.unicode.org
I will rather use this source than `recode`

21 years agoSeveral changes mixed to one commit (sorry, the CVS didn't work for a long time):
Martin Mares [Mon, 30 Jun 2003 11:18:57 +0000 (11:18 +0000)]
Several changes mixed to one commit (sorry, the CVS didn't work for a long time):

o  Changed index format ID.
o  MAX_COMPLEX_LEN went with the rest of complexes.
o  Introduced data types and handling macros for context bucket ID's.
o  Returned fp_hash() to its original definition -- the previous "fix" was
   deadly wrong: I confused indexing of bytes with indexing of words.
   Also, the fp_hash() has to be monotonic wrt. fpsort's order which the
   new one wasn't.

21 years agoDefine CONFIG_CONTEXTS whenever we use contexts.
Martin Mares [Mon, 30 Jun 2003 11:17:00 +0000 (11:17 +0000)]
Define CONFIG_CONTEXTS whenever we use contexts.

21 years agoAdd GET_U8 and PUT_U8 for completeness.
Martin Mares [Mon, 30 Jun 2003 10:57:25 +0000 (10:57 +0000)]
Add GET_U8 and PUT_U8 for completeness.

21 years agofixed handling of characters lost by recoding
Robert Spalek [Fri, 27 Jun 2003 12:39:50 +0000 (12:39 +0000)]
fixed handling of characters lost by recoding

21 years agoadded tools for stealing translation tables from recode
Robert Spalek [Fri, 27 Jun 2003 12:27:49 +0000 (12:27 +0000)]
added tools for stealing translation tables from recode

sanity checks:
- iso-8859-{1,2} tables are identical after extraction with the tables imported
  by MJ
- cp1250 tables is quite different from the existing win-1250 table, but I do
  not know which one is right

21 years agoa little bugfix of the test-tool
Robert Spalek [Fri, 20 Jun 2003 08:33:46 +0000 (08:33 +0000)]
a little bugfix of the test-tool

21 years agoone test has been meanwhile adjusted
Robert Spalek [Fri, 20 Jun 2003 08:19:42 +0000 (08:19 +0000)]
one test has been meanwhile adjusted

21 years agoI have meanwhile fiddled a little with #include's after I sent it to MJ
Robert Spalek [Fri, 20 Jun 2003 08:18:56 +0000 (08:18 +0000)]
I have meanwhile fiddled a little with #include's after I sent it to MJ

21 years agoFix bug in traversing of empty heap.
Martin Mares [Wed, 18 Jun 2003 18:28:55 +0000 (18:28 +0000)]
Fix bug in traversing of empty heap.

21 years agoTweak the binomial heaps a bit to make them easier to use.
Martin Mares [Wed, 18 Jun 2003 16:40:03 +0000 (16:40 +0000)]
Tweak the binomial heaps a bit to make them easier to use.