X-Git-Url: http://mj.ucw.cz/gitweb/?a=blobdiff_plain;f=lib%2Fsorter%2FTODO;h=52fb6cf523f51562b3fbf06694a58846a292f697;hb=573a9269e029f71af9fd1e4b3a32d4d692050da5;hp=b0417ad76e5309ae87fdaf4c074745fdb9e552ae;hpb=e7fcd506163c155afa5313fb28ee7e931018117f;p=libucw.git diff --git a/lib/sorter/TODO b/lib/sorter/TODO index b0417ad7..52fb6cf5 100644 --- a/lib/sorter/TODO +++ b/lib/sorter/TODO @@ -3,16 +3,30 @@ o Giant runs. o Records of odd lengths. o Empty files. -Improvements: -o Alignment? Use of SSE? -o Use radix-sort for internal sorting. -o Parallelization of internal sorting. +Cleanups: o Clean up data types and make sure they cannot overflow. (size_t vs. u64 vs. sh_off_t vs. uns) -o Buffer sizing in internal sorters. -o Switching between direct and normal I/O. -o When merging, choose the output file with less runs instead of always switching? -o Implement multi-way merge. -o Mode with only 2-way unification? -o Speed up 2-way merge. -o Speed up radix splitting. -o A debug switch for disabling the presorter. +o Log messages should show both original and new size of the data. The speed + should be probably calculated from the former. +o Automatically tune ASORT_MIN_RADIX, ASORT_MIN_SHIFT and especially ASORT_RADIX_BITS. +o Check undefs in sorter.h and array.h. + +Improvements: +o Switching between direct and normal I/O. Should use normal I/O if the input is small enough. +o How does the speed of radix splitting decrease with increasing number of hash bits? + Does it help to use more bits than we need, so that we sort less data in memory? +o Add automatic joining to the custom presorter interface? + +Users of lib/arraysort.h on big arrays (consider conversion to lib/sorter/array.h): + +indexer/chewer.c fixed hash + others +indexer/chewer.c u32 id +indexer/imagesigs.c fixed s32 +indexer/lexfreq.c ptr indirect int +indexer/lexorder.c ptr complex +indexer/lexorder.c ptr complex +indexer/lexsort.c ptr complex +indexer/mergeimages.c fixed s32 +indexer/mkgraph.c u32 indirect int +indexer/mkgraph.c 2*u32 complex, but have hash +indexer/mkgraph.c 2*u32 complex, but have hash +indexer/reftexts.c fixed indirect int