]> mj.ucw.cz Git - libucw.git/blobdiff - lib/sorter/TODO
Moved the last few relevant NOTES to TODO.
[libucw.git] / lib / sorter / TODO
index f4a802cacbceb191459a33a6bcd864a95aab51e6..52fb6cf523f51562b3fbf06694a58846a292f697 100644 (file)
@@ -3,13 +3,30 @@ o  Giant runs.
 o  Records of odd lengths.
 o  Empty files.
 
-Improvements:
-o  Use radix-sort for internal sorting.
-o  Parallelization of internal sorting.
+Cleanups:
 o  Clean up data types and make sure they cannot overflow. (size_t vs. u64 vs. sh_off_t vs. uns)
-o  Switching between direct and normal I/O.
-o  Deal with too rough range estimates in radix splitting.
-o  How does the speed of radix splitting decrease with increasing number of hash bits?
-   Does it help to use more bits than we need, so that we sort less data in memory?
 o  Log messages should show both original and new size of the data. The speed
    should be probably calculated from the former.
+o  Automatically tune ASORT_MIN_RADIX, ASORT_MIN_SHIFT and especially ASORT_RADIX_BITS.
+o  Check undefs in sorter.h and array.h.
+
+Improvements:
+o  Switching between direct and normal I/O. Should use normal I/O if the input is small enough.
+o  How does the speed of radix splitting decrease with increasing number of hash bits?
+   Does it help to use more bits than we need, so that we sort less data in memory?
+o  Add automatic joining to the custom presorter interface?
+
+Users of lib/arraysort.h on big arrays (consider conversion to lib/sorter/array.h):
+
+indexer/chewer.c                       fixed           hash + others
+indexer/chewer.c                       u32             id
+indexer/imagesigs.c                    fixed           s32
+indexer/lexfreq.c                      ptr             indirect int
+indexer/lexorder.c                     ptr             complex
+indexer/lexorder.c                     ptr             complex
+indexer/lexsort.c                      ptr             complex
+indexer/mergeimages.c                  fixed           s32
+indexer/mkgraph.c                      u32             indirect int
+indexer/mkgraph.c                      2*u32           complex, but have hash
+indexer/mkgraph.c                      2*u32           complex, but have hash
+indexer/reftexts.c                     fixed           indirect int