sped up approximately 6 times:
- the whole idea of 2 hash-tables (for 3- and 4- matches) was bad
- also, collision link-lists with errors were too bad
===> greatly simplified: only one hash-table/hash-function/link-list/... for
3-matches, double-linked link-list that can be maintained in constant time
while preserving correctness, links to strings made implicit (hence the data
structures is half-size and it fits better into the CPU-cache), no arithmetics
when computing the hash-function, tuned constants determining the compression
level, commented out code for 2-matches, ...