X-Git-Url: http://mj.ucw.cz/gitweb/?a=blobdiff_plain;f=lib%2Fsorter%2FTODO;h=d78ef9b9befe7c304b9ad982ea6d2a81769cff58;hb=6bd2ff95b10c8c409eb178684294f1d17d79265b;hp=f4fe053d8666e01d8f6536b01917fc279c9f4935;hpb=19e85513ebea91a53d30be05cb8047d3b6eea526;p=libucw.git diff --git a/lib/sorter/TODO b/lib/sorter/TODO index f4fe053d..d78ef9b9 100644 --- a/lib/sorter/TODO +++ b/lib/sorter/TODO @@ -1,12 +1,15 @@ -Testing: -o Giant runs. -o Records of odd lengths. -o Empty files. - -Improvements: -o Clean up data types and make sure they cannot overflow. (size_t vs. u64 vs. sh_off_t vs. uns) -o Switching between direct and normal I/O. Should use normal I/O if the input is small enough. -o How does the speed of radix splitting decrease with increasing number of hash bits? - Does it help to use more bits than we need, so that we sort less data in memory? +Cleanups: +o Clean up introductory comments. o Log messages should show both original and new size of the data. The speed should be probably calculated from the former. +o Buffer sizing in shep-export. + +Improvements: +o When quicksorting a large input (especially in threaded case), invest more + time to picking a good pivot. + +Users of lib/sorter/array.h which might use radix-sorting: +indexer/chewer.c +indexer/lexfreq.c +indexer/mkgraph.c +indexer/reftexts.c