2 * UCW Library -- Universal Sorter
4 * (c) 2001--2007 Martin Mares <mj@ucw.cz>
5 * (c) 2004 Robert Spalek <robert@ucw.cz>
7 * This software may be freely distributed and used according to the terms
8 * of the GNU Lesser General Public License.
12 * This is not a normal header file, but a generator of sorting
13 * routines. Each time you include it with parameters set in the
14 * corresponding preprocessor macros, it generates a file sorter
15 * with the parameters given.
17 * FIXME: explain the basics (keys and data, callbacks etc.)
19 * Basic parameters and callbacks:
21 * SORT_PREFIX(x) add a name prefix (used on all global names
22 * defined by the sorter)
24 * SORT_KEY data type capable of storing a single key.
25 * SORT_KEY_REGULAR data type holding a single key both in memory and on disk;
26 * in this case, bread() and bwrite() is used to read/write keys
27 * and it's also assumed that the keys are not very long.
28 * int PREFIX_compare(SORT_KEY *a, SORT_KEY *b)
29 * compares two keys, result like strcmp(). Mandatory.
30 * int PREFIX_read_key(struct fastbuf *f, SORT_KEY *k)
31 * reads a key from a fastbuf, returns nonzero=ok, 0=EOF.
32 * Mandatory unless SORT_KEY_REGULAR is defined.
33 * void PREFIX_write_key(struct fastbuf *f, SORT_KEY *k)
34 * writes a key to a fastbuf. Mandatory unless SORT_KEY_REGULAR.
36 * SORT_KEY_SIZE(key) returns the real size of a key (a SORT_KEY type in memory
37 * can be truncated to this number of bytes without any harm;
38 * used to save memory when the keys have variable sizes).
39 * Default: always store the whole SORT_KEY.
40 * SORT_DATA_SIZE(key) gets a key and returns the amount of data following it.
41 * Default: records consist of keys only.
45 * SORT_INT(key) We are sorting by an integer value. In this mode, PREFIX_compare
46 * is supplied automatically and the sorting function gets an extra
47 * parameter specifying a range of the integers. The better the range
48 * fits, the faster we sort. Sets up SORT_HASH_xxx automatically.
50 * Hashing (optional, but it can speed sorting up):
52 * SORT_HASH_FN(key) returns a monotone hash of a given key. Monotone hash is a function f
53 * such that f(x) < f(y) implies x < y and which is approximately uniformly
55 * SORT_HASH_BITS how many bits do the hashes have.
59 * SORT_MERGE merge items with identical keys, needs the following functions:
60 * void PREFIX_write_merged(struct fastbuf *f, SORT_KEY **keys, uns n, byte *buf)
61 * takes n records in memory with keys which compare equal and writes
62 * a single record to the given fastbuf. Data for each key can
63 * be accessed by the SORT_GET_DATA(*key) macro. `buf' points
64 * to a buffer which is guaranteed to hold all given records.
65 * void PREFIX_copy_merged(SORT_KEY **keys, struct fastbuf **data, uns n, struct fastbuf *dest)
66 * takes n records with keys in memory and data in fastbufs and writes
69 * Input (choose one of these):
71 * SORT_INPUT_FILE file of a given name
72 * SORT_INPUT_FB fastbuf stream
73 * SORT_INPUT_PRESORT custom presorter: call function PREFIX_presorter (see below)
74 * to get successive batches of pre-sorted data as temporary
75 * fastbuf streams or NULL if no more data is available.
76 * The function is passed a page-aligned presorting buffer.
78 * Output (chose one of these):
80 * SORT_OUTPUT_FILE file of a given name
81 * SORT_OUTPUT_FB temporary fastbuf stream
82 * SORT_OUTPUT_THIS_FB a given fastbuf stream which can already contain a header
84 * FIXME: Maybe implement these:
85 * ??? SORT_UNIQUE all items have distinct keys (checked in debug mode)
86 * ??? SORT_DELETE_INPUT a C expression, if true, the input files are
87 * deleted as soon as possible
90 * The function generated:
92 * <outfb> PREFIX_SORT(<in>, <out>, <range>), where:
93 * <in> = input file name/fastbuf
94 * <out> = output file name/fastbuf
95 * <range> = maximum integer value for the SORT_INT mode
96 * <outfb> = output fastbuf (in SORT_OUTPUT_FB mode)
97 * (any parameter can be missing if it is not applicable).
99 * void PREFIX_merge_data(struct fastbuf *src1, *src2, *dest, SORT_KEY *k1, *k2)
100 * [used only in case SORT_UNIFY is defined]
101 * write just fetched key k to dest and merge data from
102 * two records with the same key (k1 and k2 are key occurences
103 * in the corresponding streams).
104 * SORT_KEY * PREFIX_merge_items(SORT_KEY *a, SORT_KEY *b)
105 * [used only with SORT_PRESORT && SORT_UNIFY]
106 * merge two items with the same key, returns pointer
107 * to at most one of the items, the rest will be removed
108 * from the list of items, but not deallocated, so
109 * the remaining item can freely reference data of the
112 * After including this file, all parameter macros are automatically
116 #define P(x) SORT_PREFIX(x)
119 /* FIXME: Check that we undef everything we should. */