]> mj.ucw.cz Git - libucw.git/log
libucw.git
22 years agoAdded a Poor Man's Profiler :-)
Martin Mares [Sat, 1 Dec 2001 19:19:40 +0000 (19:19 +0000)]
Added a Poor Man's Profiler :-)

22 years agoHEAP_DELETE: Copy the `pos' parameter to a temporary variable as soon
Martin Mares [Fri, 2 Nov 2001 21:34:08 +0000 (21:34 +0000)]
HEAP_DELETE: Copy the `pos' parameter to a temporary variable as soon
soon as possible to avoid problems with callers supplying us with expressions
which could change during heap operations.

22 years agoLog file names are now allowed to contain strftime() conversion specifiers.
Martin Mares [Fri, 12 Oct 2001 10:08:32 +0000 (10:08 +0000)]
Log file names are now allowed to contain strftime() conversion specifiers.

22 years agoMoved all customizable parts of configuration and index format
Martin Mares [Fri, 5 Oct 2001 16:33:24 +0000 (16:33 +0000)]
Moved all customizable parts of configuration and index format
(i.e., those depending on user attributes or word types, not on
our compilation environment) to a new file.

Custom configurations (indexing of objects generated from a database
and similar cases) should require only modifications of cf/sherlock
and lib/custom.h since now.

22 years agoInsert is now capable of inserting a sequence of blank line separated
Martin Mares [Fri, 5 Oct 2001 16:12:19 +0000 (16:12 +0000)]
Insert is now capable of inserting a sequence of blank line separated
objects.

22 years agourl_component_separators has a default value "" to accelerate
Robert Spalek [Thu, 27 Sep 2001 09:54:22 +0000 (09:54 +0000)]
url_component_separators has a default value "" to accelerate
url_has_repeated_component() if not reconfigured

22 years agourl_has_repeated_component() fully implemented and tested
Robert Spalek [Thu, 27 Sep 2001 09:42:08 +0000 (09:42 +0000)]
url_has_repeated_component() fully implemented and tested

22 years agoadded skeleton of not yet implemented function url_has_repeated_component()
Robert Spalek [Wed, 26 Sep 2001 16:46:41 +0000 (16:46 +0000)]
added skeleton of not yet implemented function url_has_repeated_component()
and its configuration items

22 years agodie() called with string containing newlines replaced by fputs(stderr) and exit()
Robert Spalek [Wed, 26 Sep 2001 16:13:34 +0000 (16:13 +0000)]
die() called with string containing newlines replaced by fputs(stderr) and exit()

22 years agoCF_USAGE printed (description of -S and -C parameters)
Robert Spalek [Wed, 26 Sep 2001 13:56:29 +0000 (13:56 +0000)]
CF_USAGE printed (description of -S and -C parameters)

22 years agoadded CF_USAGE
Robert Spalek [Wed, 26 Sep 2001 12:53:49 +0000 (12:53 +0000)]
added CF_USAGE

22 years agotypo fixed
Robert Spalek [Thu, 6 Sep 2001 15:20:46 +0000 (15:20 +0000)]
typo fixed

22 years agoAdded I/O functions on addr_int_t.
Martin Mares [Sun, 2 Sep 2001 10:23:45 +0000 (10:23 +0000)]
Added I/O functions on addr_int_t.

22 years agoAdded CPU_64BIT_POINTERS.
Martin Mares [Sun, 2 Sep 2001 10:23:27 +0000 (10:23 +0000)]
Added CPU_64BIT_POINTERS.

22 years agoAdded shakedown, but don't use it on real gatherer bucket files
Martin Mares [Sat, 1 Sep 2001 21:42:55 +0000 (21:42 +0000)]
Added shakedown, but don't use it on real gatherer bucket files
since it buckettool doesn't update any other gatherer structures.
The expirer is the right place to go.

22 years agoAdded function for shaking down the bucket file.
Martin Mares [Sat, 1 Sep 2001 21:41:39 +0000 (21:41 +0000)]
Added function for shaking down the bucket file.

22 years agoAdded new charsets: windows-1250 and x-cork.
Martin Mares [Thu, 30 Aug 2001 08:39:51 +0000 (08:39 +0000)]
Added new charsets: windows-1250 and x-cork.

22 years agoBetter encapsulation of the ipaccess filter.
Martin Mares [Wed, 29 Aug 2001 10:57:19 +0000 (10:57 +0000)]
Better encapsulation of the ipaccess filter.

22 years agoAdded generic functions for IP address access lists.
Martin Mares [Wed, 29 Aug 2001 10:40:59 +0000 (10:40 +0000)]
Added generic functions for IP address access lists.

22 years agobugfix
Robert Spalek [Tue, 14 Aug 2001 09:11:03 +0000 (09:11 +0000)]
bugfix

23 years agoMinor optimization of GET_TAGGED_CHAR.
Martin Mares [Sun, 13 May 2001 15:35:24 +0000 (15:35 +0000)]
Minor optimization of GET_TAGGED_CHAR.

23 years agoAudited TODO list and bumped version number to 2.0.
Martin Mares [Tue, 10 Apr 2001 21:36:22 +0000 (21:36 +0000)]
Audited TODO list and bumped version number to 2.0.

23 years agoRelax the accent match rules of "auto" accent mode: if some _outer_ word
Martin Mares [Tue, 10 Apr 2001 20:51:59 +0000 (20:51 +0000)]
Relax the accent match rules of "auto" accent mode: if some _outer_ word
matches only without accents in an accented document and the match is
in URL keywords, accept it  (we know it will be real match as the word
is outer). Bleeeech, it's ugly.

23 years agoURL words split to two categories with different weights.
Martin Mares [Tue, 10 Apr 2001 20:34:00 +0000 (20:34 +0000)]
URL words split to two categories with different weights.

23 years agoAdded URLWORD search specifier.
Martin Mares [Sun, 8 Apr 2001 16:26:12 +0000 (16:26 +0000)]
Added URLWORD search specifier.

23 years agoAdded indexing of URL words (partially ported from our old alter ego).
Martin Mares [Fri, 30 Mar 2001 19:38:45 +0000 (19:38 +0000)]
Added indexing of URL words (partially ported from our old alter ego).

Robert, please ignore word types present in WORD_TYPES_HIDDEN when
searching for contexts -- URL's and other tricky stuff shouldn't show up.

23 years agoCleanup of word type name macros.
Martin Mares [Fri, 30 Mar 2001 18:59:41 +0000 (18:59 +0000)]
Cleanup of word type name macros.

23 years agoCured memory leak.
Martin Mares [Fri, 30 Mar 2001 18:44:35 +0000 (18:44 +0000)]
Cured memory leak.

23 years agoCupcase() works even for non-letters, so there is no need to call Cupper().
Martin Mares [Fri, 30 Mar 2001 18:42:56 +0000 (18:42 +0000)]
Cupcase() works even for non-letters, so there is no need to call Cupper().

23 years ago<ctype.h> dependency deleted
Robert Spalek [Fri, 30 Mar 2001 18:10:18 +0000 (18:10 +0000)]
<ctype.h> dependency deleted

23 years agotest audited
Robert Spalek [Fri, 30 Mar 2001 13:20:14 +0000 (13:20 +0000)]
test audited

23 years agosyntax of regular expessions changed to extended
Robert Spalek [Fri, 30 Mar 2001 13:15:15 +0000 (13:15 +0000)]
syntax of regular expessions changed to extended
regex-test extended to test this

23 years agorx_compile() can now compile with IGNORING CASE enabled too
Robert Spalek [Fri, 30 Mar 2001 13:07:05 +0000 (13:07 +0000)]
rx_compile() can now compile with IGNORING CASE enabled too
regex-test added

23 years agocards.c printed tags converted tolower
Robert Spalek [Fri, 30 Mar 2001 09:18:14 +0000 (09:18 +0000)]
cards.c printed tags converted tolower

23 years agoadded WT_NAMES from WORD_TYPE_NAMES temporarily, MJ: please check it
Robert Spalek [Fri, 30 Mar 2001 08:43:38 +0000 (08:43 +0000)]
added WT_NAMES from WORD_TYPE_NAMES temporarily, MJ: please check it

23 years agoMapping of zero-length files returns just a random non-zero address.
Martin Mares [Tue, 27 Mar 2001 17:41:01 +0000 (17:41 +0000)]
Mapping of zero-length files returns just a random non-zero address.

With this fix, empty indices are generated correctly.

23 years agoAdded optional work-arounds for path underflows and leading/trailing
Martin Mares [Tue, 27 Mar 2001 16:46:31 +0000 (16:46 +0000)]
Added optional work-arounds for path underflows and leading/trailing
spaces in URL's.

23 years agoAdded ASSERT checks for tag byte syntax. Didn't find any errors yet.
Martin Mares [Tue, 27 Mar 2001 16:29:07 +0000 (16:29 +0000)]
Added ASSERT checks for tag byte syntax. Didn't find any errors yet.

23 years agoLoad the default config file on first non-config option (several options
Martin Mares [Tue, 27 Mar 2001 11:02:42 +0000 (11:02 +0000)]
Load the default config file on first non-config option (several options
require config to be loaded).

23 years agoRemoved tempfile functions (nobody uses them and they probably belong
Martin Mares [Tue, 27 Mar 2001 10:57:33 +0000 (10:57 +0000)]
Removed tempfile functions (nobody uses them and they probably belong
to fastbuf.c anyway).

23 years agoAdded ABS macro.
Martin Mares [Tue, 27 Mar 2001 10:52:48 +0000 (10:52 +0000)]
Added ABS macro.

23 years agoRemoved FIXME.
Martin Mares [Tue, 27 Mar 2001 10:29:51 +0000 (10:29 +0000)]
Removed FIXME.

23 years agoSlow case of b(get|put)_utf8 no longer inline.
Martin Mares [Tue, 27 Mar 2001 10:28:31 +0000 (10:28 +0000)]
Slow case of b(get|put)_utf8 no longer inline.

23 years agoCVS repository cleaned up a bit:
Robert Spalek [Thu, 22 Mar 2001 15:56:47 +0000 (15:56 +0000)]
CVS repository cleaned up a bit:
gather/{objdump.c,dumpconfig.[ch]} and indexer/idxdump.c --> utils
utils/lfstest.c deleted
filter/ftest is not compiled by default
rule for making lib/lfs-test.c added into Makefile

23 years agoOops, endianity problem in reference files.
Martin Mares [Mon, 19 Mar 2001 19:55:49 +0000 (19:55 +0000)]
Oops, endianity problem in reference files.

23 years agoBetter setproctitle() inspired by sendmail's one.
Martin Mares [Sat, 17 Mar 2001 15:03:47 +0000 (15:03 +0000)]
Better setproctitle() inspired by sendmail's one.

23 years agoDefine setproctitle() and use it for gatherer thread status reporting.
Martin Mares [Sat, 17 Mar 2001 14:42:16 +0000 (14:42 +0000)]
Define setproctitle() and use it for gatherer thread status reporting.

23 years agoMoved generic heap macros to heap.h.
Martin Mares [Fri, 16 Mar 2001 22:04:59 +0000 (22:04 +0000)]
Moved generic heap macros to heap.h.

23 years agoChanged locking mechanism of the bucket library to fcntl() instead
Martin Mares [Thu, 15 Mar 2001 22:23:43 +0000 (22:23 +0000)]
Changed locking mechanism of the bucket library to fcntl() instead
of flock() as the flock locks have totally broken semantics -- they
happily permit multiple locks on a shared fd!

23 years agoUse sh_ftruncate() instead of ftruncate().
Martin Mares [Thu, 15 Mar 2001 22:22:34 +0000 (22:22 +0000)]
Use sh_ftruncate() instead of ftruncate().

23 years agoBuckettool converted to standard command line parsing,
Martin Mares [Thu, 15 Mar 2001 20:48:33 +0000 (20:48 +0000)]
Buckettool converted to standard command line parsing,
so that config files work there as well.

23 years agoBucket file name and I/O buffer size are no longer hard-wired.
Martin Mares [Thu, 15 Mar 2001 20:11:34 +0000 (20:11 +0000)]
Bucket file name and I/O buffer size are no longer hard-wired.

Audited locking, but found no bugs.

23 years agoAdded macro for fetching of u32's aligned on 2-byte boundary.
Martin Mares [Sat, 10 Mar 2001 16:09:13 +0000 (16:09 +0000)]
Added macro for fetching of u32's aligned on 2-byte boundary.

23 years agoSome typecasts due to LFS.
Martin Mares [Fri, 9 Mar 2001 12:43:20 +0000 (12:43 +0000)]
Some typecasts due to LFS.

23 years agoCreated a new include for efficient access to unaligned data.
Martin Mares [Wed, 7 Mar 2001 13:38:19 +0000 (13:38 +0000)]
Created a new include for efficient access to unaligned data.
Updated fastbuf to use it.
Removed sh_foff_t -- all modules should use sh_off_t instead (the only
  case where it would differ is the 2G--4G range of file sizes you get
  when LFS is turned on and LARGE_DB turned off and it's not interesting
  anyway)
bgeto/bgetp selection is done in config.h.
LFS is turned on by default.

23 years agoRemoved old incarnation of CLAMP.
Martin Mares [Wed, 7 Mar 2001 13:35:35 +0000 (13:35 +0000)]
Removed old incarnation of CLAMP.

23 years agoAdded a couple of useful macros: MIN, MAX, CLAMP, ARRAY_SIZE.
Martin Mares [Wed, 7 Mar 2001 13:34:58 +0000 (13:34 +0000)]
Added a couple of useful macros: MIN, MAX, CLAMP, ARRAY_SIZE.

Removed NULL as it's already defined in config.h.

23 years agoAdded sh_mmap().
Martin Mares [Wed, 7 Mar 2001 13:33:37 +0000 (13:33 +0000)]
Added sh_mmap().

For now, only the "use glibc" interface works, the others miss several
essential functions (namely sh_ftruncate and sh_mmap).

23 years agoopen -> sh_open
Martin Mares [Wed, 7 Mar 2001 13:27:17 +0000 (13:27 +0000)]
open -> sh_open

23 years agoUse sh_open instead of open.
Martin Mares [Wed, 7 Mar 2001 13:24:00 +0000 (13:24 +0000)]
Use sh_open instead of open.

23 years agoDon't crash when the last fragment is of zero length.
Martin Mares [Wed, 7 Mar 2001 12:34:14 +0000 (12:34 +0000)]
Don't crash when the last fragment is of zero length.

No more SEGV's on truncated files.

23 years agoThe StringMap file contains an end marker pointing past the last
Martin Mares [Tue, 6 Mar 2001 17:42:17 +0000 (17:42 +0000)]
The StringMap file contains an end marker pointing past the last
string reference.

23 years agoIntroduced obuck_get_pos(), converted gatherd limits to use it.
Martin Mares [Mon, 5 Mar 2001 12:35:58 +0000 (12:35 +0000)]
Introduced obuck_get_pos(), converted gatherd limits to use it.

23 years agobugfix
Robert Spalek [Mon, 5 Mar 2001 12:26:54 +0000 (12:26 +0000)]
bugfix

23 years agoRemoved cf_read() and cf_default_*().
Martin Mares [Mon, 5 Mar 2001 11:52:14 +0000 (11:52 +0000)]
Removed cf_read() and cf_default_*().

23 years agoobuck_size() deleted
Robert Spalek [Mon, 5 Mar 2001 09:37:32 +0000 (09:37 +0000)]
obuck_size() deleted

23 years agocf_default_init() replaced by direct access to cfdeffile, the default value
Robert Spalek [Mon, 5 Mar 2001 09:02:11 +0000 (09:02 +0000)]
cf_default_init() replaced by direct access to cfdeffile, the default value
is DEFAULT_CONFIG
cf_default_done() replaced by automatical call if getopt() returns -1

23 years agocf_default_{init,done} interface used instead cf_read
Robert Spalek [Sun, 4 Mar 2001 15:21:50 +0000 (15:21 +0000)]
cf_default_{init,done} interface used instead cf_read

23 years agoadded cf_default_{init,done} for setting the default config filename, that
Robert Spalek [Sun, 4 Mar 2001 15:19:26 +0000 (15:19 +0000)]
added cf_default_{init,done} for setting the default config filename, that
will be automatically readed if not overriden by command-line option

23 years agoadded obuck_size() returning the size of bucket file (for gatherd.c stopping
Robert Spalek [Sun, 4 Mar 2001 15:18:08 +0000 (15:18 +0000)]
added obuck_size() returning the size of bucket file (for gatherd.c stopping
after maximal size is reached)

23 years agoMoved fp_hash() to index.h.
Martin Mares [Sat, 3 Mar 2001 22:50:29 +0000 (22:50 +0000)]
Moved fp_hash() to index.h.

23 years agoDefine FASTBUF_BYTES_PER_(O|P).
Martin Mares [Sat, 3 Mar 2001 17:22:16 +0000 (17:22 +0000)]
Define FASTBUF_BYTES_PER_(O|P).

23 years agoAdded indexer names for word and string type classes.
Martin Mares [Sat, 3 Mar 2001 13:39:36 +0000 (13:39 +0000)]
Added indexer names for word and string type classes.

23 years agoDefined which string classes contain URL's and which ones case insensitive
Martin Mares [Sat, 3 Mar 2001 12:12:35 +0000 (12:12 +0000)]
Defined which string classes contain URL's and which ones case insensitive
strings.

23 years agoadded
Robert Spalek [Fri, 2 Mar 2001 13:36:47 +0000 (13:36 +0000)]
added

23 years agoUpdated the charset conversion library to UniCode 3.0.
Martin Mares [Fri, 2 Mar 2001 11:30:11 +0000 (11:30 +0000)]
Updated the charset conversion library to UniCode 3.0.

Removed ice age relics.

Removed signature tables as they are not used anyway.

23 years agoReplaced <sys/time.h> by <time.h> where appropriate.
Martin Mares [Fri, 2 Mar 2001 11:00:33 +0000 (11:00 +0000)]
Replaced <sys/time.h> by <time.h> where appropriate.

23 years agoFixed bug in generating UTF-8 for codes >= 0x800.
Martin Mares [Thu, 1 Mar 2001 17:31:47 +0000 (17:31 +0000)]
Fixed bug in generating UTF-8 for codes >= 0x800.

23 years agoDefined a GET_TAGGED_CHAR macro to read our internal representation
Martin Mares [Thu, 1 Mar 2001 16:57:58 +0000 (16:57 +0000)]
Defined a GET_TAGGED_CHAR macro to read our internal representation
of tagged text, mapping the tags to character codes >= 0x80000000.

23 years agoGenerate index cards.
Martin Mares [Fri, 23 Feb 2001 14:02:46 +0000 (14:02 +0000)]
Generate index cards.

The chewer is complete, but it will probably need a bit of optimization
and fine-tuning when we get some real data. The searching for words present
in the cache doesn't look well.

23 years agoIndexing of strings.
Martin Mares [Fri, 23 Feb 2001 10:53:32 +0000 (10:53 +0000)]
Indexing of strings.

23 years agoSome more chewer work...
Martin Mares [Thu, 22 Feb 2001 16:01:49 +0000 (16:01 +0000)]
Some more chewer work...

23 years agoAdded bgets0().
Martin Mares [Tue, 20 Feb 2001 22:32:06 +0000 (22:32 +0000)]
Added bgets0().

23 years agoAdded a useful macro for value clamping.
Martin Mares [Tue, 20 Feb 2001 17:51:24 +0000 (17:51 +0000)]
Added a useful macro for value clamping.

23 years agoOops, breadb() was wrong.
Martin Mares [Mon, 19 Feb 2001 19:08:56 +0000 (19:08 +0000)]
Oops, breadb() was wrong.

23 years agoAdded breadb() which acts just like bread(), but die()s if a partial
Martin Mares [Mon, 19 Feb 2001 18:53:46 +0000 (18:53 +0000)]
Added breadb() which acts just like bread(), but die()s if a partial
record is read. This is mainly to avoid consistency checks in main
code path.

23 years agoSORT_DELETE_INPUT works even with SORT_INPUT_FB.
Martin Mares [Fri, 16 Feb 2001 20:16:51 +0000 (20:16 +0000)]
SORT_DELETE_INPUT works even with SORT_INPUT_FB.

23 years agoDeclare fingerprints as 12 bytes, not 3 u32's.
Martin Mares [Fri, 16 Feb 2001 20:16:26 +0000 (20:16 +0000)]
Declare fingerprints as 12 bytes, not 3 u32's.

23 years agoAdded #define PACKED __attribute__((packed)).
Martin Mares [Fri, 16 Feb 2001 20:16:00 +0000 (20:16 +0000)]
Added #define PACKED __attribute__((packed)).

23 years agoAdded merger.
Martin Mares [Fri, 16 Feb 2001 18:54:31 +0000 (18:54 +0000)]
Added merger.

23 years agoAdded unmapping and writeable mappings.
Martin Mares [Fri, 16 Feb 2001 17:56:06 +0000 (17:56 +0000)]
Added unmapping and writeable mappings.

23 years agoScanner improvements: create redirect backlinks, detect empty documents,
Martin Mares [Fri, 16 Feb 2001 16:16:25 +0000 (16:16 +0000)]
Scanner improvements: create redirect backlinks, detect empty documents,
mark accented documents.

23 years agoTesting programs are not build by default.
Martin Mares [Thu, 15 Feb 2001 19:17:47 +0000 (19:17 +0000)]
Testing programs are not build by default.

23 years agoAdded URL fingerprints.
Martin Mares [Thu, 15 Feb 2001 19:05:39 +0000 (19:05 +0000)]
Added URL fingerprints.

23 years agoAdded bputs0() -- put a null-terminated string.
Martin Mares [Thu, 15 Feb 2001 19:04:57 +0000 (19:04 +0000)]
Added bputs0() -- put a null-terminated string.

23 years agoShut up warnings.
Martin Mares [Sat, 10 Feb 2001 12:28:22 +0000 (12:28 +0000)]
Shut up warnings.

23 years agoadded cf_item_count()
Robert Spalek [Fri, 9 Feb 2001 10:48:55 +0000 (10:48 +0000)]
added cf_item_count()

23 years agodeleted unused variable prog
Robert Spalek [Fri, 9 Feb 2001 10:48:17 +0000 (10:48 +0000)]
deleted unused variable prog

23 years agoadded sort-test
Robert Spalek [Fri, 9 Feb 2001 10:48:02 +0000 (10:48 +0000)]
added sort-test

23 years agoNext version of the sorter -- both presorting and unifying works.
Martin Mares [Sun, 4 Feb 2001 20:08:14 +0000 (20:08 +0000)]
Next version of the sorter -- both presorting and unifying works.

sort-test now does just `sort -u' and it's about 30% slower than its
GNU counterpart, probably due to extra copies of sorting keys by our
buffered I/O layer. Fortunately, a typical case will be long data with
short keys where we should be efficient as we can use bbcopy().