]> mj.ucw.cz Git - libucw.git/log
libucw.git
21 years agoadded v?xprintf() functions, they will be used in the filter dumper
Robert Spalek [Wed, 10 Jul 2002 14:47:26 +0000 (14:47 +0000)]
added v?xprintf() functions, they will be used in the filter dumper

it is usable, for example, for printf()'ing to anything (like fastbufs)

21 years agoHASH_WANT_FIND_NEXT fixed and its declaration changed
Robert Spalek [Wed, 10 Jul 2002 13:56:06 +0000 (13:56 +0000)]
HASH_WANT_FIND_NEXT fixed and its declaration changed

21 years agoadded the capability of hashing/finding more records with equal value
Robert Spalek [Wed, 10 Jul 2002 12:58:36 +0000 (12:58 +0000)]
added the capability of hashing/finding more records with equal value
of the key

22 years agoIncrease line buffer sizes to 4096 bytes. Current gatherd really can
Martin Mares [Sat, 6 Jul 2002 03:29:41 +0000 (03:29 +0000)]
Increase line buffer sizes to 4096 bytes. Current gatherd really can
produce such long lines under several circumstances, need to examine
how is that possible.

22 years agoDue to a bug, the "fsck" mode was unable to fix broken trailers.
Martin Mares [Fri, 5 Jul 2002 03:24:25 +0000 (03:24 +0000)]
Due to a bug, the "fsck" mode was unable to fix broken trailers.

22 years agoWhen an inconsistency is encountered while shaking down the bucket
Martin Mares [Fri, 5 Jul 2002 03:23:13 +0000 (03:23 +0000)]
When an inconsistency is encountered while shaking down the bucket
file, recover all data prior to the inconsistency by marking the
space between read and write pointer as deleted buckets (need to
use more of them if the space is too large).

22 years agoWhen moving attributes, don't break the chain.
Martin Mares [Thu, 27 Jun 2002 19:42:48 +0000 (19:42 +0000)]
When moving attributes, don't break the chain.

22 years agoImplemented merging of catalog attributes to the index. Just place the
Martin Mares [Sun, 23 Jun 2002 20:32:19 +0000 (20:32 +0000)]
Implemented merging of catalog attributes to the index. Just place the
catalog dump to db/catalog.gz (e.g., by running utils/fetch-cat.sh)
and run the indexer.

Unfortunately, we've just filled up all the available word types :-(

22 years agoRemoved obsolete examples of custom attributes. The image search attributes
Martin Mares [Sun, 23 Jun 2002 16:21:10 +0000 (16:21 +0000)]
Removed obsolete examples of custom attributes. The image search attributes
themselves are a good enough example.

22 years agoWhen O_APPEND is given to bopen(), don't forget to set fb->pos and fb->fdpos.
Martin Mares [Sun, 23 Jun 2002 16:01:16 +0000 (16:01 +0000)]
When O_APPEND is given to bopen(), don't forget to set fb->pos and fb->fdpos.

22 years agodmalloc and efence work again (ported from rel-2.1 branch).
Martin Mares [Sat, 22 Jun 2002 16:42:45 +0000 (16:42 +0000)]
dmalloc and efence work again (ported from rel-2.1 branch).

22 years agomalloc -> xmalloc.
Martin Mares [Wed, 19 Jun 2002 14:10:55 +0000 (14:10 +0000)]
malloc -> xmalloc.

22 years agoIntroduced SKIP_TAGGED_CHAR.
Martin Mares [Tue, 18 Jun 2002 17:37:53 +0000 (17:37 +0000)]
Introduced SKIP_TAGGED_CHAR.

22 years agoUTF8_SKIP now recognizes the real end of the UTF-8 character
Martin Mares [Tue, 18 Jun 2002 17:37:18 +0000 (17:37 +0000)]
UTF8_SKIP now recognizes the real end of the UTF-8 character
and doesn't get confused by garbage after it.

22 years agoAdded a macro UTF8_SKIP for skipping a UTF-8 character without decoding it.
Martin Mares [Fri, 14 Jun 2002 23:52:33 +0000 (23:52 +0000)]
Added a macro UTF8_SKIP for skipping a UTF-8 character without decoding it.

22 years agoNew functions for manipulating attribute lists: obj_prepend_attr()
Martin Mares [Sat, 8 Jun 2002 14:00:33 +0000 (14:00 +0000)]
New functions for manipulating attribute lists: obj_prepend_attr()
and obj_insert_attr().

22 years agoMerging IS branch: New customization code and its use for images.
Martin Mares [Sat, 8 Jun 2002 13:27:56 +0000 (13:27 +0000)]
Merging IS branch: New customization code and its use for images.

22 years agoThe universal hash table generator now uses prime table sizes instead of
Martin Mares [Sat, 8 Jun 2002 13:17:39 +0000 (13:17 +0000)]
The universal hash table generator now uses prime table sizes instead of
powers of two. This slows down all operations a little as we now need
to perform division instead of just AND-ing with a mask, but it allows
us to use the new hash functions in hashfunc.h which are significantly
faster than the original ones (at the expense of having bad distribution
modulo non-primes).

Also changed the limit logic to avoid rehashing when the table is already
too small or too large.

22 years agoSHIFT_BITS changed from 5 to 7 to fit the UCS-2 strings better
Robert Spalek [Thu, 6 Jun 2002 15:18:53 +0000 (15:18 +0000)]
SHIFT_BITS changed from 5 to 7 to fit the UCS-2 strings better

22 years agoloop unrolling turned on for hashfunc.o
Robert Spalek [Wed, 5 Jun 2002 20:28:20 +0000 (20:28 +0000)]
loop unrolling turned on for hashfunc.o

22 years agowow! i have optimized str_len_uns() yet more :-) now it is not slowed
Robert Spalek [Wed, 5 Jun 2002 20:22:17 +0000 (20:22 +0000)]
wow!  i have optimized str_len_uns() yet more :-)  now it is not slowed
down when a 0x80 byte is present.  a slight simple change.

22 years agoCONST attribute of functions noted in a better place
Robert Spalek [Tue, 4 Jun 2002 08:51:37 +0000 (08:51 +0000)]
CONST attribute of functions noted in a better place

22 years agoMoved MAX_COMPLEX_LEN to index.h.
Martin Mares [Mon, 3 Jun 2002 17:02:09 +0000 (17:02 +0000)]
Moved MAX_COMPLEX_LEN to index.h.

22 years ago- str_hash.[ch] renamed to hashfunc.[ch], the functions renamed
Robert Spalek [Mon, 3 Jun 2002 16:02:00 +0000 (16:02 +0000)]
- str_hash.[ch] renamed to hashfunc.[ch], the functions renamed
- deleted hash-{block,istring,string}.c, their functionality merged into
  hashfunc.[ch]
- str-test.c rewritten to use the new name-style, char->byte, more tests
  added

22 years ago__attribute__((const)) replaced by CONST
Robert Spalek [Mon, 3 Jun 2002 14:49:53 +0000 (14:49 +0000)]
__attribute__((const)) replaced by CONST

22 years agoIf have GET_O and GET_P, we should have PUT_O and PUT_P as well.
Martin Mares [Mon, 3 Jun 2002 14:01:25 +0000 (14:01 +0000)]
If have GET_O and GET_P, we should have PUT_O and PUT_P as well.

22 years agoPrevent multiple inclusion.
Martin Mares [Sun, 2 Jun 2002 12:53:03 +0000 (12:53 +0000)]
Prevent multiple inclusion.

22 years agoST_BACKREF is gone. Frame backlinks are not indexed at all (it makes no sense
Martin Mares [Sun, 2 Jun 2002 11:10:03 +0000 (11:10 +0000)]
ST_BACKREF is gone. Frame backlinks are not indexed at all (it makes no sense
to search by them), redirect backlinks are indexed as ST_URL.

22 years agoafter a long time of experimenting, added a support for unaligned parameters
Robert Spalek [Sat, 1 Jun 2002 09:57:07 +0000 (09:57 +0000)]
after a long time of experimenting, added a support for unaligned parameters

22 years agowhen testing benchmarks of string operations, a user specified alignment
Robert Spalek [Sat, 1 Jun 2002 09:49:43 +0000 (09:49 +0000)]
when testing benchmarks of string operations, a user specified alignment
is taken into account

22 years agoadded macro UNALIGNED_PART()
Robert Spalek [Sat, 1 Jun 2002 09:48:20 +0000 (09:48 +0000)]
added macro UNALIGNED_PART()

22 years agoWarning fixes.
Martin Mares [Fri, 31 May 2002 18:10:35 +0000 (18:10 +0000)]
Warning fixes.

22 years agoAdded a macro for __attribute__((const)).
Martin Mares [Fri, 31 May 2002 13:57:00 +0000 (13:57 +0000)]
Added a macro for __attribute__((const)).

Robert, please update your hash functions to use this.

22 years agoMake PROF_STR really work.
Martin Mares [Wed, 29 May 2002 18:57:18 +0000 (18:57 +0000)]
Make PROF_STR really work.

22 years agoIndex reftexts, but don't search in them by default.
Martin Mares [Sun, 26 May 2002 18:23:26 +0000 (18:23 +0000)]
Index reftexts, but don't search in them by default.

22 years agoAdded block hash function.
Martin Mares [Sun, 26 May 2002 16:08:47 +0000 (16:08 +0000)]
Added block hash function.

22 years agoAdded word types for file name keywords and link texts.
Martin Mares [Sun, 26 May 2002 13:11:01 +0000 (13:11 +0000)]
Added word types for file name keywords and link texts.

22 years agoDon't forget to define SHERLOCK_HAVE_PREAD.
Martin Mares [Sun, 26 May 2002 13:10:44 +0000 (13:10 +0000)]
Don't forget to define SHERLOCK_HAVE_PREAD.

22 years agoShut up signed/unsigned warnings.
Martin Mares [Sun, 26 May 2002 13:10:28 +0000 (13:10 +0000)]
Shut up signed/unsigned warnings.

22 years agoAdded bopen_tmp() for opening of temporary files.
Martin Mares [Sun, 26 May 2002 10:40:52 +0000 (10:40 +0000)]
Added bopen_tmp() for opening of temporary files.

Replaced sorter_open_tmp() by bopen_tmp().

22 years ago- added str_hash.[ch] for fast evaluation of str_len() and str_hash()
Robert Spalek [Sat, 25 May 2002 13:59:34 +0000 (13:59 +0000)]
- added str_hash.[ch] for fast evaluation of str_len() and str_hash()
- added a tester/benchmark str-test.c, it is not compiled by default

22 years agommap_file() calls die() instead of returning failure.
Martin Mares [Fri, 24 May 2002 21:15:01 +0000 (21:15 +0000)]
mmap_file() calls die() instead of returning failure.

22 years agoImage objects are now marked with a special flag and the MD5 hash is calculated
Martin Mares [Fri, 24 May 2002 17:12:44 +0000 (17:12 +0000)]
Image objects are now marked with a special flag and the MD5 hash is calculated
from both text and the thumbnail.

22 years agoAdded bget_tagged_char().
Martin Mares [Wed, 22 May 2002 16:33:47 +0000 (16:33 +0000)]
Added bget_tagged_char().

22 years agoNeed to include unicode.h for GET_UTF8.
Martin Mares [Wed, 22 May 2002 16:32:46 +0000 (16:32 +0000)]
Need to include unicode.h for GET_UTF8.

22 years agoUse one-parameter bungetc() everywhere.
Martin Mares [Wed, 22 May 2002 15:43:59 +0000 (15:43 +0000)]
Use one-parameter bungetc() everywhere.

22 years agobungetc() is no longer passed the character to unget -- it always ungets
Martin Mares [Wed, 22 May 2002 15:43:47 +0000 (15:43 +0000)]
bungetc() is no longer passed the character to unget -- it always ungets
the last character read.

bputc() and bputw() are now passed unsigned int instead of byte/word.

22 years agoChanged null version of prof_format(), so that we don't need string.h.
Martin Mares [Tue, 21 May 2002 15:14:55 +0000 (15:14 +0000)]
Changed null version of prof_format(), so that we don't need string.h.

22 years agostring newline fixed
Robert Spalek [Thu, 16 May 2002 08:51:24 +0000 (08:51 +0000)]
string newline fixed

22 years agosign mismatch fixed
Robert Spalek [Thu, 16 May 2002 08:50:42 +0000 (08:50 +0000)]
sign mismatch fixed

22 years agofixed missing includes
Robert Spalek [Thu, 16 May 2002 08:47:12 +0000 (08:47 +0000)]
fixed missing includes

22 years agoRemoved partial support for LFS on Linuxes with pre-2.1 glibc.
Martin Mares [Sun, 28 Apr 2002 15:59:02 +0000 (15:59 +0000)]
Removed partial support for LFS on Linuxes with pre-2.1 glibc.

22 years agoAdded bitsig_free().
Martin Mares [Sun, 28 Apr 2002 15:46:04 +0000 (15:46 +0000)]
Added bitsig_free().

22 years agoImplemented base-224 encoder and decoder.
Martin Mares [Thu, 25 Apr 2002 17:37:11 +0000 (17:37 +0000)]
Implemented base-224 encoder and decoder.

22 years agoFinally I realized why we were using secondary sorting on site_id
Martin Mares [Sun, 21 Apr 2002 08:30:06 +0000 (08:30 +0000)]
Finally I realized why we were using secondary sorting on site_id
by default :-)  I was originally searching for some magic inside the
search server which needed that to work and completely missed the
simple fact that the front-end wants the results this way :-)

So I'm resurrecting it, but now as an ordinary instance of the secondary
sorting code I've introduced yesterday. The CUSTOM_SORTING switch is gone,
sorting by site ID and page age works always.

Also, I've simplified reverse sorting by introducing a separate XOR mask.

22 years agoForgot to commit this one during the "search by age" changes.
Martin Mares [Sat, 20 Apr 2002 18:20:04 +0000 (18:20 +0000)]
Forgot to commit this one during the "search by age" changes.

22 years agoAdded secondary sorting (i.e., breaking ties when two documents have the same Q)
Martin Mares [Sat, 20 Apr 2002 15:09:41 +0000 (15:09 +0000)]
Added secondary sorting (i.e., breaking ties when two documents have the same Q)
on any of the custom attributes. Just define CUSTOM_SORTING in lib/custom.h.

I've also removed secondary sorting of result heap by site ID inside refs.c
-- according to my best knowledge it wasn't required anywhere.

Maybe we can remove the CUSTOM_SORTING switch and just leave the sec_sort_key
in struct result_note initialized to zero, but it would cost us 4 bytes per
result_note which I wanted to avoid.

22 years agoAdded support for indexing/searching by custom attributes.
Martin Mares [Sat, 20 Apr 2002 14:37:39 +0000 (14:37 +0000)]
Added support for indexing/searching by custom attributes.

See the CUSTOM_ATTRS macro in lib/custom.h for an explanation.

22 years agoAdded .cvsignore files for all pieces of source which are machine-generated.
Martin Mares [Sat, 6 Apr 2002 18:50:35 +0000 (18:50 +0000)]
Added .cvsignore files for all pieces of source which are machine-generated.

22 years agoAll configuration options (except for custom attributes which still dwell
Martin Mares [Sat, 6 Apr 2002 18:44:18 +0000 (18:44 +0000)]
All configuration options (except for custom attributes which still dwell
in lib/custom.h) are now stored in config.mk to make them available to both
makefiles (conditional linking etc.) and C programs (lib/autoconf.h is
generated from config.mk by a simple shell script).

This gives an easy way how to create special-purpose modules (like the
SQL gatherer) which need extra libraries -- just make them a compile-time
option ;)

22 years agoAdded a generic universal multi-purpose magical hash table module.
Martin Mares [Sat, 6 Apr 2002 17:57:02 +0000 (17:57 +0000)]
Added a generic universal multi-purpose magical hash table module.
Look at introductory comments in lib/hashtable.h to see all the features.

Generic programming in C is a real adventure, but an afternoon spent
with CPP quirks is a holiday when compared with C++ templates :-)

22 years agoAdded a library module for generation of cryptographically secure
Martin Mares [Fri, 29 Mar 2002 16:34:20 +0000 (16:34 +0000)]
Added a library module for generation of cryptographically secure
random numbers.

22 years agoNo longer need to handle undefined MAP_FAILED.
Martin Mares [Fri, 29 Mar 2002 16:33:26 +0000 (16:33 +0000)]
No longer need to handle undefined MAP_FAILED.

22 years agoAdded CT_INCOMPLETE_SECTION which is equivalent to CT_SECTION except
Martin Mares [Fri, 29 Mar 2002 11:06:19 +0000 (11:06 +0000)]
Added CT_INCOMPLETE_SECTION which is equivalent to CT_SECTION except
that unknown variables in such sections are not reported as errors.

22 years agoAdded a reference to the original article.
Martin Mares [Thu, 31 Jan 2002 15:12:08 +0000 (15:12 +0000)]
Added a reference to the original article.

Improved the random generator as suggested by Robert.

22 years agoAdded a data structure for very efficient probabilistic representation
Martin Mares [Thu, 31 Jan 2002 11:36:45 +0000 (11:36 +0000)]
Added a data structure for very efficient probabilistic representation
of sets. For more info, consult comments at the start of bitsig.c.

22 years agoDevelopment branch is now called 2.2a.
Martin Mares [Mon, 21 Jan 2002 10:00:20 +0000 (10:00 +0000)]
Development branch is now called 2.2a.

22 years agoempty section of configuration item forbidden
Robert Spalek [Mon, 14 Jan 2002 19:38:41 +0000 (19:38 +0000)]
empty section of configuration item forbidden

22 years agoDon't call the callback function twice when deleting a bucket.
Martin Mares [Sun, 13 Jan 2002 14:34:00 +0000 (14:34 +0000)]
Don't call the callback function twice when deleting a bucket.

22 years agoAdded "sql" to the list of protocol names.
Martin Mares [Fri, 11 Jan 2002 21:00:04 +0000 (21:00 +0000)]
Added "sql" to the list of protocol names.

22 years agoInitial version of SQL gathering utility gsql added.
Milan Vancura [Wed, 9 Jan 2002 09:24:54 +0000 (09:24 +0000)]
Initial version of SQL gathering utility gsql added.
Milan

22 years agoClarified comments.
Martin Mares [Sun, 16 Dec 2001 19:25:03 +0000 (19:25 +0000)]
Clarified comments.

22 years agoAdded url_auto_canonicalize().
Martin Mares [Sun, 16 Dec 2001 19:24:44 +0000 (19:24 +0000)]
Added url_auto_canonicalize().

22 years agodb-rebuild replaced by db-tool which allows not only database
Martin Mares [Sat, 15 Dec 2001 22:55:42 +0000 (22:55 +0000)]
db-rebuild replaced by db-tool which allows not only database
reconstruction, but also dumping and undumping (useful for
conversion from SDBMv1 to v2).

22 years agoNew version of the SDBM library.
Martin Mares [Sat, 15 Dec 2001 22:54:41 +0000 (22:54 +0000)]
New version of the SDBM library.

Now supports databases larger than 4G (the internal structure is the
same, but page pointers are in pages instead of bytes).

Warning! The database files are _not_ compatible with the previous
version. Use db-tool to convert your databases.

Also exported sdbm_hash() to allow presorting.

22 years agobgetw() returns int instead of word, so it's possible
Martin Mares [Sat, 15 Dec 2001 22:51:05 +0000 (22:51 +0000)]
bgetw() returns int instead of word, so it's possible
to detect EOF.

22 years agoRewrote the profiler. Each module can now choose its own profiling method
Martin Mares [Sun, 2 Dec 2001 12:03:39 +0000 (12:03 +0000)]
Rewrote the profiler.  Each module can now choose its own profiling method
without recompiling the library.

22 years agoAdded CPU and OS type defines.
Martin Mares [Sun, 2 Dec 2001 12:02:51 +0000 (12:02 +0000)]
Added CPU and OS type defines.

22 years agoAdded a Poor Man's Profiler :-)
Martin Mares [Sat, 1 Dec 2001 19:19:40 +0000 (19:19 +0000)]
Added a Poor Man's Profiler :-)

22 years agoHEAP_DELETE: Copy the `pos' parameter to a temporary variable as soon
Martin Mares [Fri, 2 Nov 2001 21:34:08 +0000 (21:34 +0000)]
HEAP_DELETE: Copy the `pos' parameter to a temporary variable as soon
soon as possible to avoid problems with callers supplying us with expressions
which could change during heap operations.

22 years agoLog file names are now allowed to contain strftime() conversion specifiers.
Martin Mares [Fri, 12 Oct 2001 10:08:32 +0000 (10:08 +0000)]
Log file names are now allowed to contain strftime() conversion specifiers.

22 years agoMoved all customizable parts of configuration and index format
Martin Mares [Fri, 5 Oct 2001 16:33:24 +0000 (16:33 +0000)]
Moved all customizable parts of configuration and index format
(i.e., those depending on user attributes or word types, not on
our compilation environment) to a new file.

Custom configurations (indexing of objects generated from a database
and similar cases) should require only modifications of cf/sherlock
and lib/custom.h since now.

22 years agoInsert is now capable of inserting a sequence of blank line separated
Martin Mares [Fri, 5 Oct 2001 16:12:19 +0000 (16:12 +0000)]
Insert is now capable of inserting a sequence of blank line separated
objects.

22 years agourl_component_separators has a default value "" to accelerate
Robert Spalek [Thu, 27 Sep 2001 09:54:22 +0000 (09:54 +0000)]
url_component_separators has a default value "" to accelerate
url_has_repeated_component() if not reconfigured

22 years agourl_has_repeated_component() fully implemented and tested
Robert Spalek [Thu, 27 Sep 2001 09:42:08 +0000 (09:42 +0000)]
url_has_repeated_component() fully implemented and tested

22 years agoadded skeleton of not yet implemented function url_has_repeated_component()
Robert Spalek [Wed, 26 Sep 2001 16:46:41 +0000 (16:46 +0000)]
added skeleton of not yet implemented function url_has_repeated_component()
and its configuration items

22 years agodie() called with string containing newlines replaced by fputs(stderr) and exit()
Robert Spalek [Wed, 26 Sep 2001 16:13:34 +0000 (16:13 +0000)]
die() called with string containing newlines replaced by fputs(stderr) and exit()

22 years agoCF_USAGE printed (description of -S and -C parameters)
Robert Spalek [Wed, 26 Sep 2001 13:56:29 +0000 (13:56 +0000)]
CF_USAGE printed (description of -S and -C parameters)

22 years agoadded CF_USAGE
Robert Spalek [Wed, 26 Sep 2001 12:53:49 +0000 (12:53 +0000)]
added CF_USAGE

22 years agotypo fixed
Robert Spalek [Thu, 6 Sep 2001 15:20:46 +0000 (15:20 +0000)]
typo fixed

22 years agoAdded I/O functions on addr_int_t.
Martin Mares [Sun, 2 Sep 2001 10:23:45 +0000 (10:23 +0000)]
Added I/O functions on addr_int_t.

22 years agoAdded CPU_64BIT_POINTERS.
Martin Mares [Sun, 2 Sep 2001 10:23:27 +0000 (10:23 +0000)]
Added CPU_64BIT_POINTERS.

22 years agoAdded shakedown, but don't use it on real gatherer bucket files
Martin Mares [Sat, 1 Sep 2001 21:42:55 +0000 (21:42 +0000)]
Added shakedown, but don't use it on real gatherer bucket files
since it buckettool doesn't update any other gatherer structures.
The expirer is the right place to go.

22 years agoAdded function for shaking down the bucket file.
Martin Mares [Sat, 1 Sep 2001 21:41:39 +0000 (21:41 +0000)]
Added function for shaking down the bucket file.

22 years agoAdded new charsets: windows-1250 and x-cork.
Martin Mares [Thu, 30 Aug 2001 08:39:51 +0000 (08:39 +0000)]
Added new charsets: windows-1250 and x-cork.

22 years agoBetter encapsulation of the ipaccess filter.
Martin Mares [Wed, 29 Aug 2001 10:57:19 +0000 (10:57 +0000)]
Better encapsulation of the ipaccess filter.

22 years agoAdded generic functions for IP address access lists.
Martin Mares [Wed, 29 Aug 2001 10:40:59 +0000 (10:40 +0000)]
Added generic functions for IP address access lists.

22 years agobugfix
Robert Spalek [Tue, 14 Aug 2001 09:11:03 +0000 (09:11 +0000)]
bugfix

23 years agoMinor optimization of GET_TAGGED_CHAR.
Martin Mares [Sun, 13 May 2001 15:35:24 +0000 (15:35 +0000)]
Minor optimization of GET_TAGGED_CHAR.

23 years agoAudited TODO list and bumped version number to 2.0.
Martin Mares [Tue, 10 Apr 2001 21:36:22 +0000 (21:36 +0000)]
Audited TODO list and bumped version number to 2.0.