X-Git-Url: http://mj.ucw.cz/gitweb/?a=blobdiff_plain;ds=sidebyside;f=lib%2Fsorter%2FTODO;h=c2b653d5b96f4b2e5a3861befdb1cbbeddc88cd8;hb=90afcc18dbf7cb6c682e1efb994007f03e304422;hp=f4a802cacbceb191459a33a6bcd864a95aab51e6;hpb=054a2b84eb5fe6e5c08435e0936cc046db8e63ec;p=libucw.git diff --git a/lib/sorter/TODO b/lib/sorter/TODO index f4a802ca..c2b653d5 100644 --- a/lib/sorter/TODO +++ b/lib/sorter/TODO @@ -3,13 +3,22 @@ o Giant runs. o Records of odd lengths. o Empty files. -Improvements: -o Use radix-sort for internal sorting. -o Parallelization of internal sorting. +Cleanups: o Clean up data types and make sure they cannot overflow. (size_t vs. u64 vs. sh_off_t vs. uns) -o Switching between direct and normal I/O. -o Deal with too rough range estimates in radix splitting. -o How does the speed of radix splitting decrease with increasing number of hash bits? - Does it help to use more bits than we need, so that we sort less data in memory? +o Clean up log levels. +o Clean up introductory comments. o Log messages should show both original and new size of the data. The speed should be probably calculated from the former. +o Automatically tune ASORT_MIN_RADIX, ASORT_MIN_SHIFT and especially ASORT_RADIX_BITS. +o Buffer sizing in shep-export. + +Improvements: +o Switching between direct and normal I/O. Should use normal I/O if the input is small enough. +o How does the speed of radix splitting decrease with increasing number of hash bits? + Does it help to use more bits than we need, so that we sort less data in memory? + +Users of lib/sorter/array.h which might use radix-sorting: +indexer/chewer.c +indexer/lexfreq.c +indexer/mkgraph.c +indexer/reftexts.c