]>
mj.ucw.cz Git - libucw.git/log
Robert Spalek [Sat, 1 Jun 2002 09:57:07 +0000 (09:57 +0000)]
after a long time of experimenting, added a support for unaligned parameters
Robert Spalek [Sat, 1 Jun 2002 09:49:43 +0000 (09:49 +0000)]
when testing benchmarks of string operations, a user specified alignment
is taken into account
Robert Spalek [Sat, 1 Jun 2002 09:48:20 +0000 (09:48 +0000)]
added macro UNALIGNED_PART()
Martin Mares [Fri, 31 May 2002 18:10:35 +0000 (18:10 +0000)]
Warning fixes.
Martin Mares [Fri, 31 May 2002 13:57:00 +0000 (13:57 +0000)]
Added a macro for __attribute__((const)).
Robert, please update your hash functions to use this.
Martin Mares [Wed, 29 May 2002 18:57:18 +0000 (18:57 +0000)]
Make PROF_STR really work.
Martin Mares [Sun, 26 May 2002 18:23:26 +0000 (18:23 +0000)]
Index reftexts, but don't search in them by default.
Martin Mares [Sun, 26 May 2002 16:08:47 +0000 (16:08 +0000)]
Added block hash function.
Martin Mares [Sun, 26 May 2002 13:11:01 +0000 (13:11 +0000)]
Added word types for file name keywords and link texts.
Martin Mares [Sun, 26 May 2002 13:10:44 +0000 (13:10 +0000)]
Don't forget to define SHERLOCK_HAVE_PREAD.
Martin Mares [Sun, 26 May 2002 13:10:28 +0000 (13:10 +0000)]
Shut up signed/unsigned warnings.
Martin Mares [Sun, 26 May 2002 10:40:52 +0000 (10:40 +0000)]
Added bopen_tmp() for opening of temporary files.
Replaced sorter_open_tmp() by bopen_tmp().
Robert Spalek [Sat, 25 May 2002 13:59:34 +0000 (13:59 +0000)]
- added str_hash.[ch] for fast evaluation of str_len() and str_hash()
- added a tester/benchmark str-test.c, it is not compiled by default
Martin Mares [Fri, 24 May 2002 21:15:01 +0000 (21:15 +0000)]
mmap_file() calls die() instead of returning failure.
Martin Mares [Fri, 24 May 2002 17:12:44 +0000 (17:12 +0000)]
Image objects are now marked with a special flag and the MD5 hash is calculated
from both text and the thumbnail.
Martin Mares [Wed, 22 May 2002 16:33:47 +0000 (16:33 +0000)]
Added bget_tagged_char().
Martin Mares [Wed, 22 May 2002 16:32:46 +0000 (16:32 +0000)]
Need to include unicode.h for GET_UTF8.
Martin Mares [Wed, 22 May 2002 15:43:59 +0000 (15:43 +0000)]
Use one-parameter bungetc() everywhere.
Martin Mares [Wed, 22 May 2002 15:43:47 +0000 (15:43 +0000)]
bungetc() is no longer passed the character to unget -- it always ungets
the last character read.
bputc() and bputw() are now passed unsigned int instead of byte/word.
Martin Mares [Tue, 21 May 2002 15:14:55 +0000 (15:14 +0000)]
Changed null version of prof_format(), so that we don't need string.h.
Robert Spalek [Thu, 16 May 2002 08:51:24 +0000 (08:51 +0000)]
string newline fixed
Robert Spalek [Thu, 16 May 2002 08:50:42 +0000 (08:50 +0000)]
sign mismatch fixed
Robert Spalek [Thu, 16 May 2002 08:47:12 +0000 (08:47 +0000)]
fixed missing includes
Martin Mares [Sun, 28 Apr 2002 15:59:02 +0000 (15:59 +0000)]
Removed partial support for LFS on Linuxes with pre-2.1 glibc.
Martin Mares [Sun, 28 Apr 2002 15:46:04 +0000 (15:46 +0000)]
Added bitsig_free().
Martin Mares [Thu, 25 Apr 2002 17:37:11 +0000 (17:37 +0000)]
Implemented base-224 encoder and decoder.
Martin Mares [Sun, 21 Apr 2002 08:30:06 +0000 (08:30 +0000)]
Finally I realized why we were using secondary sorting on site_id
by default :-) I was originally searching for some magic inside the
search server which needed that to work and completely missed the
simple fact that the front-end wants the results this way :-)
So I'm resurrecting it, but now as an ordinary instance of the secondary
sorting code I've introduced yesterday. The CUSTOM_SORTING switch is gone,
sorting by site ID and page age works always.
Also, I've simplified reverse sorting by introducing a separate XOR mask.
Martin Mares [Sat, 20 Apr 2002 18:20:04 +0000 (18:20 +0000)]
Forgot to commit this one during the "search by age" changes.
Martin Mares [Sat, 20 Apr 2002 15:09:41 +0000 (15:09 +0000)]
Added secondary sorting (i.e., breaking ties when two documents have the same Q)
on any of the custom attributes. Just define CUSTOM_SORTING in lib/custom.h.
I've also removed secondary sorting of result heap by site ID inside refs.c
-- according to my best knowledge it wasn't required anywhere.
Maybe we can remove the CUSTOM_SORTING switch and just leave the sec_sort_key
in struct result_note initialized to zero, but it would cost us 4 bytes per
result_note which I wanted to avoid.
Martin Mares [Sat, 20 Apr 2002 14:37:39 +0000 (14:37 +0000)]
Added support for indexing/searching by custom attributes.
See the CUSTOM_ATTRS macro in lib/custom.h for an explanation.
Martin Mares [Sat, 6 Apr 2002 18:50:35 +0000 (18:50 +0000)]
Added .cvsignore files for all pieces of source which are machine-generated.
Martin Mares [Sat, 6 Apr 2002 18:44:18 +0000 (18:44 +0000)]
All configuration options (except for custom attributes which still dwell
in lib/custom.h) are now stored in config.mk to make them available to both
makefiles (conditional linking etc.) and C programs (lib/autoconf.h is
generated from config.mk by a simple shell script).
This gives an easy way how to create special-purpose modules (like the
SQL gatherer) which need extra libraries -- just make them a compile-time
option ;)
Martin Mares [Sat, 6 Apr 2002 17:57:02 +0000 (17:57 +0000)]
Added a generic universal multi-purpose magical hash table module.
Look at introductory comments in lib/hashtable.h to see all the features.
Generic programming in C is a real adventure, but an afternoon spent
with CPP quirks is a holiday when compared with C++ templates :-)
Martin Mares [Fri, 29 Mar 2002 16:34:20 +0000 (16:34 +0000)]
Added a library module for generation of cryptographically secure
random numbers.
Martin Mares [Fri, 29 Mar 2002 16:33:26 +0000 (16:33 +0000)]
No longer need to handle undefined MAP_FAILED.
Martin Mares [Fri, 29 Mar 2002 11:06:19 +0000 (11:06 +0000)]
Added CT_INCOMPLETE_SECTION which is equivalent to CT_SECTION except
that unknown variables in such sections are not reported as errors.
Martin Mares [Thu, 31 Jan 2002 15:12:08 +0000 (15:12 +0000)]
Added a reference to the original article.
Improved the random generator as suggested by Robert.
Martin Mares [Thu, 31 Jan 2002 11:36:45 +0000 (11:36 +0000)]
Added a data structure for very efficient probabilistic representation
of sets. For more info, consult comments at the start of bitsig.c.
Martin Mares [Mon, 21 Jan 2002 10:00:20 +0000 (10:00 +0000)]
Development branch is now called 2.2a.
Robert Spalek [Mon, 14 Jan 2002 19:38:41 +0000 (19:38 +0000)]
empty section of configuration item forbidden
Martin Mares [Sun, 13 Jan 2002 14:34:00 +0000 (14:34 +0000)]
Don't call the callback function twice when deleting a bucket.
Martin Mares [Fri, 11 Jan 2002 21:00:04 +0000 (21:00 +0000)]
Added "sql" to the list of protocol names.
Milan Vancura [Wed, 9 Jan 2002 09:24:54 +0000 (09:24 +0000)]
Initial version of SQL gathering utility gsql added.
Milan
Martin Mares [Sun, 16 Dec 2001 19:25:03 +0000 (19:25 +0000)]
Clarified comments.
Martin Mares [Sun, 16 Dec 2001 19:24:44 +0000 (19:24 +0000)]
Added url_auto_canonicalize().
Martin Mares [Sat, 15 Dec 2001 22:55:42 +0000 (22:55 +0000)]
db-rebuild replaced by db-tool which allows not only database
reconstruction, but also dumping and undumping (useful for
conversion from SDBMv1 to v2).
Martin Mares [Sat, 15 Dec 2001 22:54:41 +0000 (22:54 +0000)]
New version of the SDBM library.
Now supports databases larger than 4G (the internal structure is the
same, but page pointers are in pages instead of bytes).
Warning! The database files are _not_ compatible with the previous
version. Use db-tool to convert your databases.
Also exported sdbm_hash() to allow presorting.
Martin Mares [Sat, 15 Dec 2001 22:51:05 +0000 (22:51 +0000)]
bgetw() returns int instead of word, so it's possible
to detect EOF.
Martin Mares [Sun, 2 Dec 2001 12:03:39 +0000 (12:03 +0000)]
Rewrote the profiler. Each module can now choose its own profiling method
without recompiling the library.
Martin Mares [Sun, 2 Dec 2001 12:02:51 +0000 (12:02 +0000)]
Added CPU and OS type defines.
Martin Mares [Sat, 1 Dec 2001 19:19:40 +0000 (19:19 +0000)]
Added a Poor Man's Profiler :-)
Martin Mares [Fri, 2 Nov 2001 21:34:08 +0000 (21:34 +0000)]
HEAP_DELETE: Copy the `pos' parameter to a temporary variable as soon
soon as possible to avoid problems with callers supplying us with expressions
which could change during heap operations.
Martin Mares [Fri, 12 Oct 2001 10:08:32 +0000 (10:08 +0000)]
Log file names are now allowed to contain strftime() conversion specifiers.
Martin Mares [Fri, 5 Oct 2001 16:33:24 +0000 (16:33 +0000)]
Moved all customizable parts of configuration and index format
(i.e., those depending on user attributes or word types, not on
our compilation environment) to a new file.
Custom configurations (indexing of objects generated from a database
and similar cases) should require only modifications of cf/sherlock
and lib/custom.h since now.
Martin Mares [Fri, 5 Oct 2001 16:12:19 +0000 (16:12 +0000)]
Insert is now capable of inserting a sequence of blank line separated
objects.
Robert Spalek [Thu, 27 Sep 2001 09:54:22 +0000 (09:54 +0000)]
url_component_separators has a default value "" to accelerate
url_has_repeated_component() if not reconfigured
Robert Spalek [Thu, 27 Sep 2001 09:42:08 +0000 (09:42 +0000)]
url_has_repeated_component() fully implemented and tested
Robert Spalek [Wed, 26 Sep 2001 16:46:41 +0000 (16:46 +0000)]
added skeleton of not yet implemented function url_has_repeated_component()
and its configuration items
Robert Spalek [Wed, 26 Sep 2001 16:13:34 +0000 (16:13 +0000)]
die() called with string containing newlines replaced by fputs(stderr) and exit()
Robert Spalek [Wed, 26 Sep 2001 13:56:29 +0000 (13:56 +0000)]
CF_USAGE printed (description of -S and -C parameters)
Robert Spalek [Wed, 26 Sep 2001 12:53:49 +0000 (12:53 +0000)]
added CF_USAGE
Robert Spalek [Thu, 6 Sep 2001 15:20:46 +0000 (15:20 +0000)]
typo fixed
Martin Mares [Sun, 2 Sep 2001 10:23:45 +0000 (10:23 +0000)]
Added I/O functions on addr_int_t.
Martin Mares [Sun, 2 Sep 2001 10:23:27 +0000 (10:23 +0000)]
Added CPU_64BIT_POINTERS.
Martin Mares [Sat, 1 Sep 2001 21:42:55 +0000 (21:42 +0000)]
Added shakedown, but don't use it on real gatherer bucket files
since it buckettool doesn't update any other gatherer structures.
The expirer is the right place to go.
Martin Mares [Sat, 1 Sep 2001 21:41:39 +0000 (21:41 +0000)]
Added function for shaking down the bucket file.
Martin Mares [Thu, 30 Aug 2001 08:39:51 +0000 (08:39 +0000)]
Added new charsets: windows-1250 and x-cork.
Martin Mares [Wed, 29 Aug 2001 10:57:19 +0000 (10:57 +0000)]
Better encapsulation of the ipaccess filter.
Martin Mares [Wed, 29 Aug 2001 10:40:59 +0000 (10:40 +0000)]
Added generic functions for IP address access lists.
Robert Spalek [Tue, 14 Aug 2001 09:11:03 +0000 (09:11 +0000)]
bugfix
Martin Mares [Sun, 13 May 2001 15:35:24 +0000 (15:35 +0000)]
Minor optimization of GET_TAGGED_CHAR.
Martin Mares [Tue, 10 Apr 2001 21:36:22 +0000 (21:36 +0000)]
Audited TODO list and bumped version number to 2.0.
Martin Mares [Tue, 10 Apr 2001 20:51:59 +0000 (20:51 +0000)]
Relax the accent match rules of "auto" accent mode: if some _outer_ word
matches only without accents in an accented document and the match is
in URL keywords, accept it (we know it will be real match as the word
is outer). Bleeeech, it's ugly.
Martin Mares [Tue, 10 Apr 2001 20:34:00 +0000 (20:34 +0000)]
URL words split to two categories with different weights.
Martin Mares [Sun, 8 Apr 2001 16:26:12 +0000 (16:26 +0000)]
Added URLWORD search specifier.
Martin Mares [Fri, 30 Mar 2001 19:38:45 +0000 (19:38 +0000)]
Added indexing of URL words (partially ported from our old alter ego).
Robert, please ignore word types present in WORD_TYPES_HIDDEN when
searching for contexts -- URL's and other tricky stuff shouldn't show up.
Martin Mares [Fri, 30 Mar 2001 18:59:41 +0000 (18:59 +0000)]
Cleanup of word type name macros.
Martin Mares [Fri, 30 Mar 2001 18:44:35 +0000 (18:44 +0000)]
Cured memory leak.
Martin Mares [Fri, 30 Mar 2001 18:42:56 +0000 (18:42 +0000)]
Cupcase() works even for non-letters, so there is no need to call Cupper().
Robert Spalek [Fri, 30 Mar 2001 18:10:18 +0000 (18:10 +0000)]
<ctype.h> dependency deleted
Robert Spalek [Fri, 30 Mar 2001 13:20:14 +0000 (13:20 +0000)]
test audited
Robert Spalek [Fri, 30 Mar 2001 13:15:15 +0000 (13:15 +0000)]
syntax of regular expessions changed to extended
regex-test extended to test this
Robert Spalek [Fri, 30 Mar 2001 13:07:05 +0000 (13:07 +0000)]
rx_compile() can now compile with IGNORING CASE enabled too
regex-test added
Robert Spalek [Fri, 30 Mar 2001 09:18:14 +0000 (09:18 +0000)]
cards.c printed tags converted tolower
Robert Spalek [Fri, 30 Mar 2001 08:43:38 +0000 (08:43 +0000)]
added WT_NAMES from WORD_TYPE_NAMES temporarily, MJ: please check it
Martin Mares [Tue, 27 Mar 2001 17:41:01 +0000 (17:41 +0000)]
Mapping of zero-length files returns just a random non-zero address.
With this fix, empty indices are generated correctly.
Martin Mares [Tue, 27 Mar 2001 16:46:31 +0000 (16:46 +0000)]
Added optional work-arounds for path underflows and leading/trailing
spaces in URL's.
Martin Mares [Tue, 27 Mar 2001 16:29:07 +0000 (16:29 +0000)]
Added ASSERT checks for tag byte syntax. Didn't find any errors yet.
Martin Mares [Tue, 27 Mar 2001 11:02:42 +0000 (11:02 +0000)]
Load the default config file on first non-config option (several options
require config to be loaded).
Martin Mares [Tue, 27 Mar 2001 10:57:33 +0000 (10:57 +0000)]
Removed tempfile functions (nobody uses them and they probably belong
to fastbuf.c anyway).
Martin Mares [Tue, 27 Mar 2001 10:52:48 +0000 (10:52 +0000)]
Added ABS macro.
Martin Mares [Tue, 27 Mar 2001 10:29:51 +0000 (10:29 +0000)]
Removed FIXME.
Martin Mares [Tue, 27 Mar 2001 10:28:31 +0000 (10:28 +0000)]
Slow case of b(get|put)_utf8 no longer inline.
Robert Spalek [Thu, 22 Mar 2001 15:56:47 +0000 (15:56 +0000)]
CVS repository cleaned up a bit:
gather/{objdump.c,dumpconfig.[ch]} and indexer/idxdump.c --> utils
utils/lfstest.c deleted
filter/ftest is not compiled by default
rule for making lib/lfs-test.c added into Makefile
Martin Mares [Mon, 19 Mar 2001 19:55:49 +0000 (19:55 +0000)]
Oops, endianity problem in reference files.
Martin Mares [Sat, 17 Mar 2001 15:03:47 +0000 (15:03 +0000)]
Better setproctitle() inspired by sendmail's one.
Martin Mares [Sat, 17 Mar 2001 14:42:16 +0000 (14:42 +0000)]
Define setproctitle() and use it for gatherer thread status reporting.
Martin Mares [Fri, 16 Mar 2001 22:04:59 +0000 (22:04 +0000)]
Moved generic heap macros to heap.h.
Martin Mares [Thu, 15 Mar 2001 22:23:43 +0000 (22:23 +0000)]
Changed locking mechanism of the bucket library to fcntl() instead
of flock() as the flock locks have totally broken semantics -- they
happily permit multiple locks on a shared fd!
Martin Mares [Thu, 15 Mar 2001 22:22:34 +0000 (22:22 +0000)]
Use sh_ftruncate() instead of ftruncate().