]> mj.ucw.cz Git - libucw.git/blob - lib/sorter/TODO
52fb6cf523f51562b3fbf06694a58846a292f697
[libucw.git] / lib / sorter / TODO
1 Testing:
2 o  Giant runs.
3 o  Records of odd lengths.
4 o  Empty files.
5
6 Cleanups:
7 o  Clean up data types and make sure they cannot overflow. (size_t vs. u64 vs. sh_off_t vs. uns)
8 o  Log messages should show both original and new size of the data. The speed
9    should be probably calculated from the former.
10 o  Automatically tune ASORT_MIN_RADIX, ASORT_MIN_SHIFT and especially ASORT_RADIX_BITS.
11 o  Check undefs in sorter.h and array.h.
12
13 Improvements:
14 o  Switching between direct and normal I/O. Should use normal I/O if the input is small enough.
15 o  How does the speed of radix splitting decrease with increasing number of hash bits?
16    Does it help to use more bits than we need, so that we sort less data in memory?
17 o  Add automatic joining to the custom presorter interface?
18
19 Users of lib/arraysort.h on big arrays (consider conversion to lib/sorter/array.h):
20
21 indexer/chewer.c                        fixed           hash + others
22 indexer/chewer.c                        u32             id
23 indexer/imagesigs.c                     fixed           s32
24 indexer/lexfreq.c                       ptr             indirect int
25 indexer/lexorder.c                      ptr             complex
26 indexer/lexorder.c                      ptr             complex
27 indexer/lexsort.c                       ptr             complex
28 indexer/mergeimages.c                   fixed           s32
29 indexer/mkgraph.c                       u32             indirect int
30 indexer/mkgraph.c                       2*u32           complex, but have hash
31 indexer/mkgraph.c                       2*u32           complex, but have hash
32 indexer/reftexts.c                      fixed           indirect int