X-Git-Url: http://mj.ucw.cz/gitweb/?a=blobdiff_plain;ds=inline;f=lib%2Fsorter%2FTODO;h=a3193083089ef4d7fc45c406cc3515161b4085d9;hb=09108be37909fd087b301c221881fd11601e88b5;hp=b0417ad76e5309ae87fdaf4c074745fdb9e552ae;hpb=e7fcd506163c155afa5313fb28ee7e931018117f;p=libucw.git diff --git a/lib/sorter/TODO b/lib/sorter/TODO index b0417ad7..a3193083 100644 --- a/lib/sorter/TODO +++ b/lib/sorter/TODO @@ -1,18 +1,11 @@ -Testing: -o Giant runs. -o Records of odd lengths. -o Empty files. +Cleanups: +o Clean up introductory comments. +o Log messages should show both original and new size of the data. The speed + should be probably calculated from the former. +o Buffer sizing in shep-export. -Improvements: -o Alignment? Use of SSE? -o Use radix-sort for internal sorting. -o Parallelization of internal sorting. -o Clean up data types and make sure they cannot overflow. (size_t vs. u64 vs. sh_off_t vs. uns) -o Buffer sizing in internal sorters. -o Switching between direct and normal I/O. -o When merging, choose the output file with less runs instead of always switching? -o Implement multi-way merge. -o Mode with only 2-way unification? -o Speed up 2-way merge. -o Speed up radix splitting. -o A debug switch for disabling the presorter. +Users of lib/sorter/array.h which might use radix-sorting: +indexer/chewer.c +indexer/lexfreq.c +indexer/mkgraph.c +indexer/reftexts.c