where $w_{max}$ is the maximum weight.
A~real breakthrough has however been made by Fredman and Willard who introduced
-the Fusion trees~\cite{fw:fusion}. They again perform membership and predecessor
-operation on a~set of $n$~integers, but with time complexity $\O(\log_W n)$
-per operation on a~Word-RAM with $W$-bit words. This of course assumes that
-each element of the set fits in a~single word. As $W$ must at least~$\log n$,
-the operations take $\O(\log n/\log\log n)$ time and thus we are able to sort $n$~integers
-in time~$o(n\log n)$. This was a~beginning of a~long sequence of faster and
-faster sorting algorithms, culminating with the work of Thorup and Han.
-They have improved the time complexity of integer sorting to $\O(n\log\log n)$ deterministically~\cite{han:detsort}
-and expected $\O(n\sqrt{\log\log n})$ for randomized algorithms~\cite{hanthor:randsort},
-both in linear space.
+the Fusion trees~\cite{fw:fusion}. These trees also offer membership and predecessor
+operations on a~set of $n$~word-sized integers, but they reach time complexity $\O(\log_W n)$
+per operation on a~Word-RAM with $W$-bit words. As $W$ must be at least~$\log n$,
+the operations take $\O(\log n/\log\log n)$ time each and thus we are able to sort
+$n$~integers in time~$o(n\log n)$. (Of course, when $W=\Theta(\log n)$, we can even
+do that in linear time using radix-sort in base~$n$; it is the cases with large~$W$
+that is important.)
+Since then, a~long sequence of faster and faster sorting algorithms has
+emerged, culminating with the work of Thorup and Han. They have improved the
+time complexity of integer sorting to $\O(n\log\log n)$
+deterministically~\cite{han:detsort} and expected $\O(n\sqrt{\log\log n})$ for
+randomized algorithms~\cite{hanthor:randsort}, both in linear space.
The Fusion trees themselves have very limited use in graph algorithms, but the
principles behind them are ubiquitous in many other data structures and these
with those ${\rm AC}^0$ instructions present on real processors (see Thorup
\cite{thorup:aczero}). On the Word-RAM, we need to make use of the fact
that the set~$B$ is not changing too much --- there are $\O(1)$ changes
-per Q-heap operation. As Fredman and Willard have shown, it is possible
-to maintain a~``decoder'', whose state is stored in $\O(1)$ machine words
-and which helps us to extract $x[B]$ in a~constant number of operations:
+per Q-heap operation. As Fredman and Willard have shown \cite{fw:transdich},
+it is possible to maintain a~``decoder'', whose state is stored in $\O(1)$
+machine words and which helps us to extract $x[B]$ in a~constant number of
+operations:
-\lemman{Extraction of bits}\id{qhxtract}%
+\lemman{Extraction of bits, Fredman and Willard \cite{fw:transdich}}\id{qhxtract}%
Under the assumptions on~$k$, $W$ and the preprocessing time as in the Q-heaps,\foot{%
Actually, this is the only place where we need~$k$ to be as low as $W^{1/4}$.
In the ${\rm AC}^0$ implementation, it is enough to ensure $k\log k\le W$.
which allows~$x[B]$ to be extracted in $\O(1)$ time for an~arbitrary~$x$.
When a~single element is inserted to~$B$ or deleted from~$B$, the structure
can be updated in constant time, as long as $\vert B\vert \le k$.
-
-\proof
-See Fredman and Willard \cite{fw:transdich}.
\qed
\para