X-Git-Url: http://mj.ucw.cz/gitweb/?a=blobdiff_plain;ds=sidebyside;f=lib%2Fsorter%2FTODO;h=79a60204b09ea716c0f99192322586bc62d10d03;hb=a5ff98a53789157a6c96e58b2385bb898d688a22;hp=52fb6cf523f51562b3fbf06694a58846a292f697;hpb=573a9269e029f71af9fd1e4b3a32d4d692050da5;p=libucw.git diff --git a/lib/sorter/TODO b/lib/sorter/TODO index 52fb6cf5..79a60204 100644 --- a/lib/sorter/TODO +++ b/lib/sorter/TODO @@ -5,28 +5,15 @@ o Empty files. Cleanups: o Clean up data types and make sure they cannot overflow. (size_t vs. u64 vs. sh_off_t vs. uns) +o Clean up log levels. +o Clean up introductory comments. o Log messages should show both original and new size of the data. The speed should be probably calculated from the former. -o Automatically tune ASORT_MIN_RADIX, ASORT_MIN_SHIFT and especially ASORT_RADIX_BITS. -o Check undefs in sorter.h and array.h. +o Buffer sizing in shep-export. +o Problems with thread stack limit in radix-sorting of arrays. -Improvements: -o Switching between direct and normal I/O. Should use normal I/O if the input is small enough. -o How does the speed of radix splitting decrease with increasing number of hash bits? - Does it help to use more bits than we need, so that we sort less data in memory? -o Add automatic joining to the custom presorter interface? - -Users of lib/arraysort.h on big arrays (consider conversion to lib/sorter/array.h): - -indexer/chewer.c fixed hash + others -indexer/chewer.c u32 id -indexer/imagesigs.c fixed s32 -indexer/lexfreq.c ptr indirect int -indexer/lexorder.c ptr complex -indexer/lexorder.c ptr complex -indexer/lexsort.c ptr complex -indexer/mergeimages.c fixed s32 -indexer/mkgraph.c u32 indirect int -indexer/mkgraph.c 2*u32 complex, but have hash -indexer/mkgraph.c 2*u32 complex, but have hash -indexer/reftexts.c fixed indirect int +Users of lib/sorter/array.h which might use radix-sorting: +indexer/chewer.c +indexer/lexfreq.c +indexer/mkgraph.c +indexer/reftexts.c