X-Git-Url: http://mj.ucw.cz/gitweb/?a=blobdiff_plain;f=lib%2Fsorter%2FTODO;h=bd399e2a0d6884cac7df928cae615ffc7eb5378e;hb=aafbea40b7613274180c8bab60012a0322f8d7dd;hp=30f02a8c452ba2c83144a79743fd3ffa684742c7;hpb=e95a189fcc53f9185fb8866b2b6388ed9b20cf50;p=libucw.git diff --git a/lib/sorter/TODO b/lib/sorter/TODO index 30f02a8c..bd399e2a 100644 --- a/lib/sorter/TODO +++ b/lib/sorter/TODO @@ -1,13 +1,15 @@ -Testing: -o Giant runs. -o Records of odd lengths. -o Empty files. +Cleanups: +o Log messages should show both original and new size of the data. The speed + should be probably calculated from the former. +o Buffer sizing in shep-export. Improvements: -o Use radix-sort for internal sorting. -o Parallelization of internal sorting. -o Clean up data types and make sure they cannot overflow. (size_t vs. u64 vs. sh_off_t vs. uns) -o Buffer sizing in internal sorters. -o Switching between direct and normal I/O. -o When merging, choose the output file with less runs instead of always switching? -o Deal with too rough range estimates in radix splitting. +o When quicksorting a large input (especially in threaded case), invest more + time to picking a good pivot. +o Overlay presorter I/O with internal sorting. + +Users of lib/sorter/array.h which might use radix-sorting: +indexer/chewer.c +indexer/lexfreq.c +indexer/mkgraph.c +indexer/reftexts.c