We can therefore view the whole memory as a~directed graph, whose vertices
correspond to the cells (the registers are stored in a~single special cell).
The outgoing edges of each vertex correspond to pointer fields of the cells and they are
-labelled with distinct labels drawn from a~finite set. In addition to that,
+labeled with distinct labels drawn from a~finite set. In addition to that,
each vertex contains a~fixed amount of symbols. The program can directly access
vertices within distance~2 from the register vertex.
or we can observe that all such constants can be easily manufactured. For example,
$(\0^b\1)^d = \1^{(b+1)d} / \1^{b+1} = (2^{(b+1)d}-1)/(2^{b+1}-1)$. The only exceptions
are the~$w$ and~$b$ in the LSB algorithm \ref{lsb}, which we are unable to produce
-in constant time.
+in constant time. In practice we use the ``bit tricks'' as frequently called subroutines
+in an~encompassing algorithm, so we usually can spend a~lot of time on the precalculation
+of constants performed once during algorithm startup.
+
+%--------------------------------------------------------------------------------
+
+\section{Q-Heaps}\id{qheaps}%
+
+We have shown how to perform relatively complicated operations on a~set of values
+in constant time, but so far only under the assumption that the number of these
+values is small enough and that the values themselves are also small enough
+(so that the whole set fits in $\O(1)$ machine words). Now we will show how to
+lift the restriction on the magnitude of the values and still keep constant time
+complexity. We will describe a~simplified version of the Q-Heaps developed by
+Fredman and Willard in~\cite{fw:transdich}.
+
+The Q-Heap represents a~set of at most~$k$ word-sized integers, where $k\le W^{1/4}$
+and $W$ is the word size of the machine. It will support insertion, deletion, finding
+of minimum, and other operations described below, in constant time, provided that
+we are willing to spend~$\O(2^{k^4})$ time on preprocessing.
+
+The exponential-time preprocessing may sound alarming, but a~typical application uses
+Q-Heaps of size $k=\log^{1/4} N$, where $N$ is the size of the algorithm's input,
+which guarantees that $k\le W^{1/4}$ and $\O(2^{k^4}) = \O(N)$. Let us however
+remark that the whole construction is primarily of theoretical importance
+and that the huge constants involved everywhere make these heaps useless
+for practical algorithms. However, many of the tricks used prove themselves
+useful even in real-life implementations.
+
+Preprocessing makes it possible to precompute tables for almost arbitrary functions
+and then assume that they can be evaluated in constant time:
+
+\lemma\id{qhprecomp}%
+When~$f$ is a~function computable in polynomial time, $\O(2^{k^4})$ time is enough
+to precompute a~table of the values of~$f$ for the values of its arguments whose total
+bit size is $\O(k^3)$.
+
+\proof
+There are $2^{\O(k^3)}$ possible combinations of arguments of the given size and for each of
+them we spend $\O(k^c)$ time by calculating the function (for some~$c\ge 1$). It remains
+to observe that $2^{\O(k^3)}\cdot \O(k^c) = \O(2^{k^4})$.
+\qed
+
+\para
+We will first show an~auxiliary construction based on tries and then derive
+the actual definition of the Q-Heap from it.
+
+\nota
+Let us introduce some notation first:
+\itemize\ibull
+\:$W$ --- the word size of the RAM,
+\:$k = \O(W^{1/4})$ --- the limit on the size of the heap,
+\:$n\le k$ --- the number of elements in the represented set,
+\:$X=\{x_1, \ldots, x_n\}$ --- the elements themselves: distinct $W$-bit numbers
+indexed in a~way that $x_1 < \ldots < x_n$,
+\:$c_i = \<MSB>(x_i \bxor x_{i+1})$ --- the most significant bit of those in which $x_i$ and~$x_{i+1}$ differ,
+\:$R_X(x)$ --- the rank of~$x$ in~$X$, that is the number of elements of~$X$, which are less than~$x$
+(where $x$~itself need not be an~element of~$X$).\foot{We will dedicate the whole chapter \ref{rankchap} to the
+study of various ranks.}
+\endlist
+
+\defn
+A~\df{trie} for a~set of strings~$S$ over a~finite alphabet~$\Sigma$ is
+a~rooted tree whose vertices are the prefixes of the strings in~$S$ and there
+is an~edge going from a~prefix~$\alpha$ to a~prefix~$\beta$ iff $\beta$ can be
+obtained from~$\alpha$ by adding a~single symbol of the alphabet. The edge
+will be labeled with the particular symbol. We will also define a~\df{letter depth}
+of a~vertex to be the length of the corresponding prefix and mark the vertices
+which match a~string of~$S$.
+
+A~\df{compressed trie} is obtained from the trie by removing the vertices of outdegree~1.
+Whereever is a~directed path whose internal vertices have outdegree~1, we replace this
+path by a~single edge labeled with the contatenation of the original edge's labels.
+
+In both kinds of tries, we will order the outgoing edges of every vertex by their labels
+lexicographically.
+
+\obs
+In both tries, the root of the tree is the empty word and for every vertex, the
+corresponding prefix is equal to the concatenation of edge labels on the path
+leading from the root to the vertex. The letter depth of the vertex is equal to
+the total size of these labels. All leaves correspond to strings in~$S$, but so can
+some internal vertices if there are two strings in~$S$ such that one is a~prefix
+of the other.
+
+Furthermore, the labels of all edges leaving a~common vertex are always
+distinct and when we compress the trie, no two such labels have share their initial
+symbols. This allows us to search in the trie efficiently: when looking for
+a~string~$x$, we follow the path from the root and whenever we visit
+an~internal vertex of letter depth~$d$, we test the $d$-th character of~$x$,
+follow the edge whose label starts with this character, and check that the
+rest of the label matches.
+
+The compressed trie is also efficient in terms of space consumption --- it has
+$\O(\vert S\vert)$ vertices (this can be easily shown by induction on~$\vert S\vert$)
+and all edge labels can be represented in space linear in the sum of the
+lengths of the strings in~$S$.
+
+\defn
+For our set~$X$, we will define~$T$ as a~compressed trie for the set of binary
+encodings of the numbers~$x_i$, padded to exactly $W$~bits, i.e., for $S = \{ \(x)_W ; x\in X \}$.
+
+\obs
+The trie~$T$ has several interesting properties. Since all words in~$S$ have the same
+length, the leaves of the trie correspond to these exact words, that is to the numbers~$x_i$.
+The inorder traversal of the trie enumerates the words of~$S$ in lexicographic order
+and therefore also the~$x_i$'s in the order of their values. Between each
+pair of leaves $x_i$ and~$x_{i+1}$ it visits an~internal vertex whose letter depth
+is exactly~$W-1-c_i$.
+
+\para
+Let us now modify the algorithm for searching in the trie and make it compare
+only the first symbols of the edges. For $x\in X$, the algorithm will still
+return the correct leave, for all~$x$ outside~$X$ it will no longer fail
+and instead it will land in some leaf $x_i$. We will call the index of this leaf $T(x)$.
+At the first sight this vertex may seem unrelated, but we will show that it can
+be used to determine the rank of~$x$ in~$X$, which will later form a~basis
+for all other Q-Heap operations:
+
+\lemma\id{qhdeterm}%
+The rank $R_X(x)$ is uniquely determined by a~combination of:
+\itemize\ibull
+\:the trie~$T$,
+\:the index $i=T(x)$ of the leaf visited when searching for~$x$ in~$T$,
+\:the relation ($<$, $=$, $>$) between $x$ and $x_i$,
+\:the bit position $b=\<MSB>(x\bxor x_i)$ of the first disagreement between~$x$ and~$x_i$.
+\endlist
+
+\proof
+If $x\in X$, we detect that from $x_i=x$ and the rank is obviously~$i$ itself.
+Let us assume that $x\not\in X$ and imagine that we follow the same path as during the search for~$T(x)$,
+but this time we check the full edge labels. The position~$b$ is the first position
+where~$\(x)$ disagrees with a~label. Before this point, all edges not taken by
+the search were leading either to subtrees containing elements all smaller than~$x$
+or all larger than~$x$ and the only values not known yet are those in the subtree
+below the edge which we currently consider. Now if $x[b]=0$ (and therefore $x<x_i$),
+all values in the subtree have $x_j[b]=1$ and thus they are larger. In the other
+case, $x[b]=1$ and $x_j[b]=0$, so they are smaller.
+\qed
+
+\para
+The preceding lemma shows that the rank can be computed in polynomial time, but
+unfortunately the variables on which it depends are too large for the table to
+be efficiently precomputed. We will therefore carefully choose a~representation
+of the trie, which is compact enough.
+
+\lemma\id{citree}%
+The trie is uniquely determined by the order of the values~$c_1,\ldots,c_{n-1}$.
+
+\proof
+We already know that the letter depths of the trie vertices are exactly
+the numbers~$W-1-c_i$. The root of the trie must have the smallest of these
+letter depths, i.e., it must correspond to the highest numbered bit. Let
+us call this bit~$c_i$. This implies that the values $x_1,\ldots,x_i$
+must lie in the left subtree of the root and $x_{i+1},\ldots,x_n$ in its
+right subtree. Both subtrees can be then constructed recursively.\foot{This
+construction is also known as the \df{cartesian tree} for the sequence
+$c_1,\ldots,c_n$.}
+\qed
+
+\para
+However, the vector of the $c_i$'s is also too long (is has $k\log W$ bits
+and we have no upper bound on~$W$ in terms of~$k$), so we will compress it even
+further:
+
+\nota
+\itemize\ibull
+\:$B = \{c_1,\ldots,c_n\}$ --- the set of all bit positions examined by the trie,
+stored as a~sorted array,
+\:$C : \{1,\ldots,n\} \rightarrow \{1,\ldots,n\}$ --- a~function such that
+$c_i = B[C(i)]$,
+\:$x[B]$ --- a~bit string containing the bits of~$x$ originally located
+at the positions indexed by~$B$.
+\endlist
+
+\obs
+The set~$B$ has $\O(k\log W)=\O(W)$ bits, so it can be stored in a~constant number
+of machine words as a~vector. The function~$C$ can be also stored as a~vector
+of $k\log k$ bits.
+
+\lemma
+The rank $R_X(x)$ can be computed in constant time from:
+\itemize\ibull
+\:the function~$C$,
+\:the values $x_1,\ldots,x_n$,
+\:the bit string~$x[B]$,
+\:$x$ itself.
+\endlist
+
+\proof
+We know that the trie~$T$ is uniquely determined by the order of the $c_i$'s
+and therefore by the function~$C$ since the array~$B$ is sorted. The shape of
+the trie together with the bits in $x[B]$ determine the leaf $T[x]$ visited
+when searching for~$x$. All this can be computed in polynomial time and it
+depends on $\O(k\log k)$ bits of input, so according to Lemma~\ref{qhprecomp}
+we can look it up in a~precomputed table.
+
+Similarly we will determine all other ingredients of Lemma~\ref{qhdeterm} in
+constant time. As we know~$x$ and all the $x_i$'s, we can immediately find
+the relation $x$ and $x_{T[x]}$ and use the LSB/MSB algorithm (\ref{lsb})
+to find the topmost disagreeing bit.
+
+All these ingredients can be stored in $\O(k\log k)$ bits, so we may assume
+that the rank can be looked up in constant time as well.
+\qed
\endpart