2 * Sherlock Library -- String Fingerprints
4 * (c) 2001 Martin Mares <mj@ucw.cz>
8 * We use a hashing function to map all the URL's and other
9 * hairy strings we work with to a much simpler universe
10 * of constant length bit strings (currently 96-bit ones).
11 * With a random hashing function (which is equivalent to
12 * having a fixed function and random input), the probability
13 * of at least one collision happening is at most c*n^2/m
14 * where n is the number of strings we hash, m is the size
15 * of our bit string universe (2^96) and c is a small constant.
16 * We set m sufficiently large and expect no collisions
17 * to occur. On the other hand, the worst thing which could
18 * be cause by a collision is mixing up two strings or labels
19 * of two documents which is relatively harmless.
23 #include "lib/index.h"
27 fingerprint(byte *string, struct fingerprint *fp)
33 MD5Update(&c, string, strlen(string));
35 memcpy(fp->hash, digest, 12);