Robert Spalek [Fri, 25 Jun 2004 11:28:40 +0000 (11:28 +0000)]
modifications done at home, not yet incorporated MJ's objections:
- added buck2obj_{alloc,free}()
- extract_odes() renamed to buck2obj_convert() and simplified its interface.
it flushes the memory pool and calls obj_new() as used e.g. in scanner.c
- attribute lengths are stored incremented s.t. 0-lengths are allowed
- the length of the compressed part is stored as U32 instead of UTF8 to allow
0-copy compression
- temporary usage of object.c's oa_allocate removed and we call
obj_add_attr_ref() instead of obj_add_attr()
- renamed constants BUCKET_TYPE_*
- defined internal macro RET_ERR()
Robert Spalek [Thu, 24 Jun 2004 12:29:44 +0000 (12:29 +0000)]
sighandler.c:
- used sigaction() instead of signal()
- no need to re-register the signal handler now :-)
- renamed my_sighandler_t to sh_sighandler_t and changed the interface
Robert Spalek [Thu, 24 Jun 2004 11:57:48 +0000 (11:57 +0000)]
- lizard_alloc() turns on the wrapper for SIGSEGV and lizard_free() restores
its original value
- lizard_decompress_safe() registers quickly the SIGSEGV handler using the
wrapper. it saves 2 sys-calls.
- allocate 3 more bytes for unaligned memory access
Robert Spalek [Wed, 23 Jun 2004 16:48:25 +0000 (16:48 +0000)]
MJ's idea:
- only lock the memory by mprotect() once
- decompress into the middle of the buffer s.t. the barrier is at the distance
as it used before
(needs adding one pointer to the structure)
Robert Spalek [Wed, 23 Jun 2004 14:41:42 +0000 (14:41 +0000)]
incorporated MJ's suggestions:
- flush_copy_command() exploits fast unaligned memory access and memcpy()
lizard_compress():
- the test in_start==copy_start replaced by flag bof, in_start deleted
- if (copy_len > 0) replaced by if (copy_len)
- pos_bit |= 1<<4
- deleted testing cropping at BOF, it is obsolete now
lizard_decompress():
- at label perform_copy_command, we set expect_copy_command=2
- exploit fast unaligned memory access
Robert Spalek [Tue, 15 Jun 2004 09:21:39 +0000 (09:21 +0000)]
crash tests changed:
- since the memory protection is removed after decompress_safe(), it makes
no sense to check the read/write protection on the corresponding page
===> removed
- now, if the returned value is < 0, print errno
- try allocating a too small buffer
OR setting too low expected length
Robert Spalek [Tue, 15 Jun 2004 09:16:13 +0000 (09:16 +0000)]
- low-level safe version of lizard_decompress() put into an extra source file
- use M_PRIVATE instead of M_SHARED
- use PROT_NONE instead of PROT_READ and only set/clear it for one page
before/after the operation instead of doing it for all the array
- errno is set instead of returning different negative values
- use longjmp in the signal handler instead of die() and return -1
- use macro ALIGN()
Robert Spalek [Mon, 14 Jun 2004 10:12:19 +0000 (10:12 +0000)]
sped up approximately 6 times:
- the whole idea of 2 hash-tables (for 3- and 4- matches) was bad
- also, collision link-lists with errors were too bad
===> greatly simplified: only one hash-table/hash-function/link-list/... for
3-matches, double-linked link-list that can be maintained in constant time
while preserving correctness, links to strings made implicit (hence the data
structures is half-size and it fits better into the CPU-cache), no arithmetics
when computing the hash-function, tuned constants determining the compression
level, commented out code for 2-matches, ...
Robert Spalek [Mon, 14 Jun 2004 09:58:37 +0000 (09:58 +0000)]
debugged, now it is fully functional:
- a lot of typos (especially priorities of operators in C and variable name
mismatches)
- bit-format errors (forgotten additive constants or negations)
- do not use hash_rec[0]
- wrong entries in the collision link-lists must NOT appear in the beginning
==> saved time when verifying and solved some strange cases
- changed constants determining the maximum prolong-factor
- added a simple test-tool
Robert Spalek [Mon, 19 Apr 2004 16:41:45 +0000 (16:41 +0000)]
- 0x08 (BACKSPACE) is a blank character and it is accepted as an ASCII-character
- 0x7f is also accepted as an ASCII-character
- both gather/content.c and gather/charset.c now use the same function
Cblank() to test it
Martin Mares [Sun, 18 Apr 2004 13:39:32 +0000 (13:39 +0000)]
Changed locking rules. Scans and appends can peacefully co-exist now.
Should solve the problems with shep-reap waiting for bucket file transmission
to finish.
Martin Mares [Sat, 10 Apr 2004 20:36:01 +0000 (20:36 +0000)]
Multi-part objects (with header and body separated by an empty line and terminated
either by EOF or by a NUL byte) are very common, so let's introduce a special
function for reading them.
Martin Mares [Thu, 8 Apr 2004 22:18:19 +0000 (22:18 +0000)]
More enhancement to the main loop library: Export all lists for easy inspection
(reading only) by the callers. When a process exits, construct a nice tombstone
string for it.
Martin Mares [Wed, 7 Apr 2004 22:03:30 +0000 (22:03 +0000)]
Added a universal main loop with timers, file descriptor polling and process
watching. Inspired by the glib main loop, but this one has a much nicer
interface.
It will be used in the Shepherd master and if it turns out to be useful,
I'll convert the other programs to use it some day.
Martin Mares [Sun, 14 Mar 2004 12:58:40 +0000 (12:58 +0000)]
Our regex functions are now able to interface to old-style BSD re_match(),
to POSIX regexec() and to libpcre. Currently it's switched to the BSD mode
as before, I'll look at it more in the evening.
Martin Mares [Tue, 2 Mar 2004 15:38:20 +0000 (15:38 +0000)]
When we try to create a temporary file and it already exists (which can happen
if a program with the same PID has crashed at some time in the past), don't
panic and rewrite the file. Should be safe since we're using our own tmp directory
nobody else can access.
Martin Mares [Sat, 28 Feb 2004 10:49:48 +0000 (10:49 +0000)]
Hopefully finally sorted out the "http://www.xyz.cz?param" mess. The true
semantics turned out to be "http://www.xyz.cz/?param" and most web servers
really require "GET /?param".
I've changed the normalization rules to add the leading slash if needed
which also solves the relative URL problem I mentioned in the comments.
However, this means that the SEMANTICS OF NORMALIZED URL'S HAS CHANGED
and gatherer databases with URL's in the "http://www.xyz.cz?param" form
are now INVALID. I'm going to delete all such URL's from our gatherer now.
Martin Mares [Tue, 24 Feb 2004 18:36:23 +0000 (18:36 +0000)]
Blank lines are considered separators, not terminators of buckets.
Hence extraneous blank lines between buckets and trailing blank lines
after the last buckets are all ignored.