]> mj.ucw.cz Git - libucw.git/log
libucw.git
20 years agoAdded very simple functions for emulating a fastbuf stream over a static
Martin Mares [Sat, 22 Nov 2003 18:21:22 +0000 (18:21 +0000)]
Added very simple functions for emulating a fastbuf stream over a static
buffer. The struct fastbuf is allocated statically as well to make everything
as simple and as fast as possible.

20 years ago1. db/catalog.gz ---> db/catalog
Robert Spalek [Mon, 17 Nov 2003 13:09:44 +0000 (13:09 +0000)]
1. db/catalog.gz ---> db/catalog
+ it is not sent to oook and feedback-cat via pipes, but it is read by them as a file
+ it is read in 2 passes and the URL's are identified in the 1st phase (catalog.c)

2. URL fingerprinting always uses cf/url-equiv, even in the indexer

20 years agoA better function for hashing integers (the old multiplier was completely
Martin Mares [Sat, 15 Nov 2003 10:41:41 +0000 (10:41 +0000)]
A better function for hashing integers (the old multiplier was completely
bogus as it didn't fit in a 32-bit integer) and also a new function
for hashing pointers.

20 years agoI decided to turn off cf/url-equiv for indexation. however, after the indexer
Robert Spalek [Thu, 13 Nov 2003 10:43:07 +0000 (10:43 +0000)]
I decided to turn off cf/url-equiv for indexation.  however, after the indexer
is run on regular sherlock5, we cannot manually delete this file for indexer
and restore for gatherd.  so I am creating a new parameter that controls
loading this prefix table.

20 years agoAdded some headers to avoid confusion of our own developers ;)
Tomas Valla [Thu, 6 Nov 2003 16:53:58 +0000 (16:53 +0000)]
Added some headers to avoid confusion of our own developers ;)

20 years agoAdded special mode for sorting of regular files.
Martin Mares [Wed, 5 Nov 2003 22:00:18 +0000 (22:00 +0000)]
Added special mode for sorting of regular files.

20 years agobbcopy() can be asked to copy the rest of the input file by specifying
Martin Mares [Wed, 5 Nov 2003 20:43:27 +0000 (20:43 +0000)]
bbcopy() can be asked to copy the rest of the input file by specifying
a length of ~0U.

20 years agoUndefine all the parameter macros at the end. (The hash tables already do it
Martin Mares [Wed, 5 Nov 2003 20:42:20 +0000 (20:42 +0000)]
Undefine all the parameter macros at the end. (The hash tables already do it
and it showed up to be very useful.)

20 years agopage_size -> PAGE_SIZE
Robert Spalek [Mon, 3 Nov 2003 17:13:06 +0000 (17:13 +0000)]
page_size -> PAGE_SIZE

20 years agoAnd do not forget .cvsignore, of course.
Tomas Valla [Mon, 3 Nov 2003 15:26:14 +0000 (15:26 +0000)]
And do not forget .cvsignore, of course.

20 years ago- giant class flag moved from attributes to card-notes
Robert Spalek [Mon, 3 Nov 2003 14:35:50 +0000 (14:35 +0000)]
- giant class flag moved from attributes to card-notes
- merger only marks documents by giant flag and
  the penalization is done in chewer
- added new weight attribute to cards: Wp means weight after penalization
  added penalization notes in the form .Pg-50 (giant class penalized by -50)
- chewer.c: card_write_start() does NOT write struct card_attr and it needs
  to be done manually later
- chewer.c: weight records are sorted chronologically, I like it more :-)

20 years agothis could never have worked
Robert Spalek [Mon, 3 Nov 2003 14:15:11 +0000 (14:15 +0000)]
this could never have worked

20 years agoCVS should ignore files created by compiling Ulimit
Tomas Valla [Mon, 3 Nov 2003 12:09:04 +0000 (12:09 +0000)]
CVS should ignore files created by compiling Ulimit

20 years agoto supress annoying warning messages during make clean
Tomas Valla [Mon, 3 Nov 2003 12:04:45 +0000 (12:04 +0000)]
to supress annoying warning messages during make clean

20 years agoINDEX_VERSION fixed
Robert Spalek [Sun, 2 Nov 2003 18:24:18 +0000 (18:24 +0000)]
INDEX_VERSION fixed

20 years agoindexer rewritten to generate redirect brackets
Robert Spalek [Fri, 31 Oct 2003 10:23:21 +0000 (10:23 +0000)]
indexer rewritten to generate redirect brackets
+ code written, debugged, and polished

in particular, labels contain new attribute redir_id and attribute priority
has been deleted

we need to update search/cards.c

20 years agoPerl module for setting ulimits.
Tomas Valla [Sat, 25 Oct 2003 19:22:57 +0000 (19:22 +0000)]
Perl module for setting ulimits.
Should solve bug #538.
[warning - compiling perlXS is ugly ;) ]

20 years agointroduced type bitarray_t
Robert Spalek [Sat, 25 Oct 2003 14:03:52 +0000 (14:03 +0000)]
introduced type bitarray_t

20 years agoWe don't need this in v3.0.
Martin Mares [Sat, 25 Oct 2003 09:51:00 +0000 (09:51 +0000)]
We don't need this in v3.0.

20 years agoForgot to add fb-limfd.
Martin Mares [Sun, 19 Oct 2003 18:19:59 +0000 (18:19 +0000)]
Forgot to add fb-limfd.

20 years agoReplaced the "orig_len" field in bucket headers which was never used
Martin Mares [Sun, 19 Oct 2003 16:47:55 +0000 (16:47 +0000)]
Replaced the "orig_len" field in bucket headers which was never used
for anything useful by bucket type code.

20 years agoAdded fastbuf backend for reading from file descriptors with a given limit.
Martin Mares [Sun, 19 Oct 2003 16:47:06 +0000 (16:47 +0000)]
Added fastbuf backend for reading from file descriptors with a given limit.
(Very useful for communication over sockets.)

20 years agoToo much copying and pasting :-)
Martin Mares [Sun, 19 Oct 2003 16:46:06 +0000 (16:46 +0000)]
Too much copying and pasting :-)

20 years agoGood luck, v3.0!
Martin Mares [Wed, 15 Oct 2003 18:32:18 +0000 (18:32 +0000)]
Good luck, v3.0!

20 years agoUpdated version numbers.
Martin Mares [Wed, 15 Oct 2003 16:53:18 +0000 (16:53 +0000)]
Updated version numbers.

20 years agoAdded UTF8_SKIP_BWD.
Martin Mares [Sat, 11 Oct 2003 20:14:23 +0000 (20:14 +0000)]
Added UTF8_SKIP_BWD.

20 years agoOops, forgot the values.
Martin Mares [Sat, 11 Oct 2003 11:54:34 +0000 (11:54 +0000)]
Oops, forgot the values.

20 years agoNew tables generated from UnicodeData 4.0.1 using the new scripts.
Martin Mares [Sat, 11 Oct 2003 10:20:09 +0000 (10:20 +0000)]
New tables generated from UnicodeData 4.0.1 using the new scripts.

20 years agoSeveral improvements to the unicode library:
Martin Mares [Sat, 11 Oct 2003 10:19:40 +0000 (10:19 +0000)]
Several improvements to the unicode library:

  o  All tables are now const.
  o  Redefined the categories:
- now using _U_* instead of _C_*
- introduced _U_LETTER modified with either _U_UPPER or _U_LOWER
  or none (titlecase letters, letter modifiers etc.)
  o  Added the ligature expansions and _U_LIGATURE.
  o  Minor cleanups.

20 years agoDon't forget the ligtable.
Martin Mares [Sat, 11 Oct 2003 10:17:09 +0000 (10:17 +0000)]
Don't forget the ligtable.

20 years agoConstified.
Martin Mares [Sat, 11 Oct 2003 10:16:55 +0000 (10:16 +0000)]
Constified.

20 years agoAdded a table of compatibility ligature expansions.
Martin Mares [Sat, 11 Oct 2003 10:16:31 +0000 (10:16 +0000)]
Added a table of compatibility ligature expansions.

20 years agoAdded const to chartype tables. Also removed _c_collate and _c_order
Martin Mares [Sat, 11 Oct 2003 10:13:20 +0000 (10:13 +0000)]
Added const to chartype tables. Also removed _c_collate and _c_order
which didn't exist since the last glacial era.

20 years agoOne more :)
Martin Mares [Sat, 11 Oct 2003 09:06:22 +0000 (09:06 +0000)]
One more :)

20 years agoExpect the unicode data directory to be linked to by "unidata".
Martin Mares [Sat, 11 Oct 2003 09:05:47 +0000 (09:05 +0000)]
Expect the unicode data directory to be linked to by "unidata".

20 years agoUpdated to new names of scripts.
Martin Mares [Sat, 11 Oct 2003 09:04:09 +0000 (09:04 +0000)]
Updated to new names of scripts.

20 years agoRenamed unisplit to gen-basic.
Martin Mares [Sat, 11 Oct 2003 09:00:20 +0000 (09:00 +0000)]
Renamed unisplit to gen-basic.

20 years agoRenamed tabgen to gen-charconv.
Martin Mares [Sat, 11 Oct 2003 08:58:22 +0000 (08:58 +0000)]
Renamed tabgen to gen-charconv.

20 years agoRenamed mkunacc to gen-unacc.
Martin Mares [Sat, 11 Oct 2003 08:58:12 +0000 (08:58 +0000)]
Renamed mkunacc to gen-unacc.

20 years agoRenamed charset import scripts.
Martin Mares [Sat, 11 Oct 2003 08:55:38 +0000 (08:55 +0000)]
Renamed charset import scripts.

20 years agoRenamed mkuni to add-charnames and changed the path to the UnicodeData file.
Martin Mares [Sat, 11 Oct 2003 08:53:40 +0000 (08:53 +0000)]
Renamed mkuni to add-charnames and changed the path to the UnicodeData file.

20 years agoObsolete and also some of the Slovak characters were missing.
Martin Mares [Sat, 11 Oct 2003 08:52:30 +0000 (08:52 +0000)]
Obsolete and also some of the Slovak characters were missing.

20 years agoThis was testing functions which didn't exist :-)
Martin Mares [Sat, 11 Oct 2003 08:48:07 +0000 (08:48 +0000)]
This was testing functions which didn't exist :-)

20 years agoThe signature charset hasn't been used for ages.
Martin Mares [Sat, 11 Oct 2003 08:46:58 +0000 (08:46 +0000)]
The signature charset hasn't been used for ages.

20 years agoExport cfpool -- sometimes it's much convenient to pass just a pool than
Martin Mares [Fri, 10 Oct 2003 18:01:39 +0000 (18:01 +0000)]
Export cfpool -- sometimes it's much convenient to pass just a pool than
a pointer to an allocation function.

20 years agoAdded a simple utility for generating changelogs.
Martin Mares [Fri, 3 Oct 2003 16:41:42 +0000 (16:41 +0000)]
Added a simple utility for generating changelogs.

20 years agoThese files have been obsoleted by the new customization system.
Martin Mares [Fri, 3 Oct 2003 09:33:49 +0000 (09:33 +0000)]
These files have been obsoleted by the new customization system.

20 years agoSearch for custom.h at the right place.
Martin Mares [Fri, 3 Oct 2003 09:29:58 +0000 (09:29 +0000)]
Search for custom.h at the right place.

20 years agoAdded a hook for indexing custom string types.
Martin Mares [Thu, 2 Oct 2003 11:24:38 +0000 (11:24 +0000)]
Added a hook for indexing custom string types.

20 years agoAdded a lot of missing #include <alloca.h>'s.
Martin Mares [Sat, 27 Sep 2003 19:43:36 +0000 (19:43 +0000)]
Added a lot of missing #include <alloca.h>'s.

20 years agoAdded charconv wrapper around fastbuf (currently output only).
Martin Mares [Fri, 26 Sep 2003 14:11:55 +0000 (14:11 +0000)]
Added charconv wrapper around fastbuf (currently output only).

20 years agoEXTRA_RUNDIRS needn't form a strict hierarchy, so add a -p.
Martin Mares [Fri, 26 Sep 2003 11:16:26 +0000 (11:16 +0000)]
EXTRA_RUNDIRS needn't form a strict hierarchy, so add a -p.

20 years agoAdded a set of functions for sliding window mmapping of large files.
Martin Mares [Tue, 23 Sep 2003 16:20:10 +0000 (16:20 +0000)]
Added a set of functions for sliding window mmapping of large files.
Will be used by the indexer to access the card notes array.

20 years agoReplaced enums by #define's in definitions of word, meta and string types.
Martin Mares [Wed, 17 Sep 2003 12:36:44 +0000 (12:36 +0000)]
Replaced enums by #define's in definitions of word, meta and string types.
It's less elegant, but it gives a chance to detect whether a specific type
exists or not.

20 years agoAllow submakefiles to add their own installation directories and to override
Martin Mares [Mon, 15 Sep 2003 07:45:47 +0000 (07:45 +0000)]
Allow submakefiles to add their own installation directories and to override
the run/bin directory. Propagate the directories to the installer.

20 years agoUpdated the installation script to always check for missing directories.
Martin Mares [Fri, 29 Aug 2003 17:39:34 +0000 (17:39 +0000)]
Updated the installation script to always check for missing directories.

20 years agoRecognition of variable types in parse_args is now automatic.
Tomas Valla [Sun, 10 Aug 2003 01:30:50 +0000 (01:30 +0000)]
Recognition of variable types in parse_args is now automatic.

20 years agoAdded 'array' feature to handle multiple variable occurrences.
Tomas Valla [Sun, 20 Jul 2003 19:17:22 +0000 (19:17 +0000)]
Added 'array' feature to handle multiple variable occurrences.

20 years agoJust to make it more comfortable.
Tomas Valla [Thu, 10 Jul 2003 18:12:57 +0000 (18:12 +0000)]
Just to make it more comfortable.

21 years agoPatch to allow processing of multiple occurences of the same argument.
Tomas Valla [Wed, 9 Jul 2003 01:29:16 +0000 (01:29 +0000)]
Patch to allow processing of multiple occurences of the same argument.
Now it returns a string of values separated by "&".

21 years agofixed headers
Robert Spalek [Fri, 4 Jul 2003 13:17:24 +0000 (13:17 +0000)]
fixed headers

21 years agofixed generated header comment
Robert Spalek [Fri, 4 Jul 2003 13:14:25 +0000 (13:14 +0000)]
fixed generated header comment

21 years agoregenerated by misc/generate from updated charset tables
Robert Spalek [Fri, 4 Jul 2003 12:53:45 +0000 (12:53 +0000)]
regenerated by misc/generate from updated charset tables

21 years agoadded (and renamed) all iso-8859-* charsets
Robert Spalek [Fri, 4 Jul 2003 12:52:02 +0000 (12:52 +0000)]
added (and renamed) all iso-8859-* charsets

21 years agoadded (and renamed) all iso-8859-{1,2,...,16} charsets
Robert Spalek [Fri, 4 Jul 2003 12:49:19 +0000 (12:49 +0000)]
added (and renamed) all iso-8859-{1,2,...,16} charsets

21 years agoadapted to UNDEFINED characters
Robert Spalek [Fri, 4 Jul 2003 12:48:39 +0000 (12:48 +0000)]
adapted to UNDEFINED characters

21 years agoupgraded from ftp.unicode.org and also renamed
Robert Spalek [Fri, 4 Jul 2003 12:47:44 +0000 (12:47 +0000)]
upgraded from ftp.unicode.org and also renamed

21 years agoupdated according to the newest tables downloaded from ftp.unicode.org
Robert Spalek [Fri, 4 Jul 2003 12:46:59 +0000 (12:46 +0000)]
updated according to the newest tables downloaded from ftp.unicode.org

21 years agoimported by `trunicode` from ftp.unicode.org
Robert Spalek [Fri, 4 Jul 2003 12:27:32 +0000 (12:27 +0000)]
imported by `trunicode` from ftp.unicode.org

21 years agoadded a tool for importing mappings from ftp.unicode.org
Robert Spalek [Fri, 4 Jul 2003 12:26:26 +0000 (12:26 +0000)]
added a tool for importing mappings from ftp.unicode.org
I will rather use this source than `recode`

21 years agoSeveral changes mixed to one commit (sorry, the CVS didn't work for a long time):
Martin Mares [Mon, 30 Jun 2003 11:18:57 +0000 (11:18 +0000)]
Several changes mixed to one commit (sorry, the CVS didn't work for a long time):

o  Changed index format ID.
o  MAX_COMPLEX_LEN went with the rest of complexes.
o  Introduced data types and handling macros for context bucket ID's.
o  Returned fp_hash() to its original definition -- the previous "fix" was
   deadly wrong: I confused indexing of bytes with indexing of words.
   Also, the fp_hash() has to be monotonic wrt. fpsort's order which the
   new one wasn't.

21 years agoDefine CONFIG_CONTEXTS whenever we use contexts.
Martin Mares [Mon, 30 Jun 2003 11:17:00 +0000 (11:17 +0000)]
Define CONFIG_CONTEXTS whenever we use contexts.

21 years agoAdd GET_U8 and PUT_U8 for completeness.
Martin Mares [Mon, 30 Jun 2003 10:57:25 +0000 (10:57 +0000)]
Add GET_U8 and PUT_U8 for completeness.

21 years agofixed handling of characters lost by recoding
Robert Spalek [Fri, 27 Jun 2003 12:39:50 +0000 (12:39 +0000)]
fixed handling of characters lost by recoding

21 years agoadded tools for stealing translation tables from recode
Robert Spalek [Fri, 27 Jun 2003 12:27:49 +0000 (12:27 +0000)]
added tools for stealing translation tables from recode

sanity checks:
- iso-8859-{1,2} tables are identical after extraction with the tables imported
  by MJ
- cp1250 tables is quite different from the existing win-1250 table, but I do
  not know which one is right

21 years agoa little bugfix of the test-tool
Robert Spalek [Fri, 20 Jun 2003 08:33:46 +0000 (08:33 +0000)]
a little bugfix of the test-tool

21 years agoone test has been meanwhile adjusted
Robert Spalek [Fri, 20 Jun 2003 08:19:42 +0000 (08:19 +0000)]
one test has been meanwhile adjusted

21 years agoI have meanwhile fiddled a little with #include's after I sent it to MJ
Robert Spalek [Fri, 20 Jun 2003 08:18:56 +0000 (08:18 +0000)]
I have meanwhile fiddled a little with #include's after I sent it to MJ

21 years agoFix bug in traversing of empty heap.
Martin Mares [Wed, 18 Jun 2003 18:28:55 +0000 (18:28 +0000)]
Fix bug in traversing of empty heap.

21 years agoTweak the binomial heaps a bit to make them easier to use.
Martin Mares [Wed, 18 Jun 2003 16:40:03 +0000 (16:40 +0000)]
Tweak the binomial heaps a bit to make them easier to use.

21 years agoAdded a very simple generic implementation of binomial heaps. Their main
Martin Mares [Wed, 18 Jun 2003 13:07:10 +0000 (13:07 +0000)]
Added a very simple generic implementation of binomial heaps. Their main
virtue is that they are fully dynamic, needing no upper bounds on the
number of items nor frequent reallocations. Their main disadvantage is
the need of 13 bytes per node.

I did implement only those heap operations I'll use in the gatherer,
I'll add more later.

21 years ago<...> => "...".
Martin Mares [Wed, 18 Jun 2003 13:04:51 +0000 (13:04 +0000)]
<...> => "...".

21 years agoCasting addresses to longs is not portable, use addr_int_t instead.
Martin Mares [Wed, 18 Jun 2003 10:19:46 +0000 (10:19 +0000)]
Casting addresses to longs is not portable, use addr_int_t instead.

21 years agoCorrected a couple of comments.
Martin Mares [Wed, 18 Jun 2003 10:17:14 +0000 (10:17 +0000)]
Corrected a couple of comments.

21 years agoRemoved duplicate definition of DEBUG and a couple of singned/unsigned
Martin Mares [Wed, 18 Jun 2003 10:11:46 +0000 (10:11 +0000)]
Removed duplicate definition of DEBUG and a couple of singned/unsigned
comparison warnings.

21 years agoMinor changes to the RB-tree code:
Martin Mares [Wed, 18 Jun 2003 10:11:12 +0000 (10:11 +0000)]
Minor changes to the RB-tree code:

o  Static declarations are much more common, so replace TREE_STATIC
   by TREE_GLOBAL (which does just the opposite).
o  <xxx> is reserved for system includes, use "xxx" instead.
o  Use TREE_TRACE instead of TRACE to avoid collisions with other tracing macros.

21 years agoAdded generic red-black trees Robert has sent to me some months ago.
Martin Mares [Wed, 18 Jun 2003 10:03:57 +0000 (10:03 +0000)]
Added generic red-black trees Robert has sent to me some months ago.

They probably won't get used just now, but this is a good place to keep
them in.

21 years agoReplaced clist_insert() by clist_insert_{before,after}().
Martin Mares [Sun, 15 Jun 2003 20:45:00 +0000 (20:45 +0000)]
Replaced clist_insert() by clist_insert_{before,after}().

Added clist_empty() and CLIST_WALK_DELSAFE.

21 years agoAdded a straigtforward implementation of circular linked lists. They are
Martin Mares [Sun, 15 Jun 2003 20:20:51 +0000 (20:20 +0000)]
Added a straigtforward implementation of circular linked lists. They are
a small bit less efficient than our lists.h lists (testing against zero is
faster than testing against list head), but they are nicer and they save
one pointer per list head which makes them better for hash tables etc.

21 years agoAdded prefetch functions.
Martin Mares [Fri, 13 Jun 2003 21:13:51 +0000 (21:13 +0000)]
Added prefetch functions.

21 years agoASSERT is unlikely.
Martin Mares [Thu, 12 Jun 2003 21:37:13 +0000 (21:37 +0000)]
ASSERT is unlikely.

21 years agoAdded macros for hinting branch predictor.
Martin Mares [Thu, 12 Jun 2003 21:36:57 +0000 (21:36 +0000)]
Added macros for hinting branch predictor.

21 years agoAdded mapping of URL keys according to the prefix equivalence table.
Martin Mares [Wed, 11 Jun 2003 16:11:38 +0000 (16:11 +0000)]
Added mapping of URL keys according to the prefix equivalence table.

21 years agoFunctions working with tagged characters moved from index.h to a new
Martin Mares [Wed, 11 Jun 2003 13:50:09 +0000 (13:50 +0000)]
Functions working with tagged characters moved from index.h to a new
header file tagged-text.h. This also revealed a couple of unintentional
indirect includes.

21 years agoSplit URL fingerprinting inside indexer from the other fingerprints.
Martin Mares [Wed, 11 Jun 2003 13:26:04 +0000 (13:26 +0000)]
Split URL fingerprinting inside indexer from the other fingerprints.
URL fingerprints will include server equivalence mappings and other
such hacks (for now the "www." hack), the other fingerprints (used
e.g. for hashing of strings in the index) won't.

21 years agoOops, the hash function for fingerprints was terribly biased. There should
Martin Mares [Wed, 11 Jun 2003 13:03:30 +0000 (13:03 +0000)]
Oops, the hash function for fingerprints was terribly biased. There should
be XOR, not OR. Also, the shifts are meaningless, because the fingerprint
hash is believed to be very well distributed.

Beware, this means that the current mainline is incompatible with string
indices generated by v2.4!  For now, I'm not increasing the index version,
because word matching still works with old indices and I want to profile it.

21 years agobugfix found by gcc-3.3
Robert Spalek [Sun, 8 Jun 2003 18:03:34 +0000 (18:03 +0000)]
bugfix found by gcc-3.3

21 years agoMake sherlockd calculate per-filetype number of matched documents, including
Martin Mares [Wed, 4 Jun 2003 19:31:10 +0000 (19:31 +0000)]
Make sherlockd calculate per-filetype number of matched documents, including
those failing the FILETYPE filter. This breaks the nice abstraction of hiding
all filtering under EXTENDED_ATTRS, but it will allow us to get rid of lots
of STATS queries.

21 years agoStarted v2.5.
Martin Mares [Fri, 30 May 2003 18:57:55 +0000 (18:57 +0000)]
Started v2.5.

21 years agoAdded a new card flag for cards in giant classes. Don't index selected meta
Martin Mares [Sun, 13 Apr 2003 18:12:39 +0000 (18:12 +0000)]
Added a new card flag for cards in giant classes. Don't index selected meta
types in such cards.