%--------------------------------------------------------------------------------
\section{Data structures on the RAM}
+\id{ramdssect}
There is a~lot of data structures designed specifically for the RAM, taking
advantage of both indexing and arithmetics. In many cases, they surpass the known
\:$\<Insert>(x,\alpha)$ --- inserts~$\alpha$ into a~sorted vector $\bf x$:
-We calculate $k = \<Rank>(x,\alpha)$ first, then insert~$\alpha$ as the $k$-th
+We calculate the rank of~$\alpha$ in~$x$ first, then we insert~$\alpha$ as the $k$-th
field of~$\bf x$ using masking operations and shifts.
+\algo
+\:$k\=\<Rank>(x,\alpha)$.
+\:$\ell\=x \band \1^{(b+1)(n-k-1)}\0^{(b+1)(k+1)}$. \cmt{``left'' part of the vector}
+\:$r=x \band \1^{(b+1)k}$. \cmt{``right'' part}
+\:Return $(\ell\shl (b+1)) \bor (\alpha\shl ((b+1)k)) \bor r$.
+\endalgo
+
\:$\<Unpack>(\alpha)$ --- creates a~vector whose elements are the bits of~$\(\alpha)_d$.
In other words, inserts blocks~$\0^b$ between the bits of~$\alpha$. Assuming that $b\ge d$,
we can do it as follows:
values is small enough and that the values themselves are also small enough
(so that the whole set fits in $\O(1)$ machine words). Now we will show how to
lift the restriction on the magnitude of the values and still keep constant time
-complexity. We will describe a~simplified version of the Q-heaps developed by
+complexity. We will describe a~slightly simplified version of the Q-heaps developed by
Fredman and Willard in~\cite{fw:transdich}.
The Q-heap represents a~set of at most~$k$ word-sized integers, where $k\le W^{1/4}$
we are willing to spend~$\O(2^{k^4})$ time on preprocessing.
The exponential-time preprocessing may sound alarming, but a~typical application uses
-Q-heaps of size $k=\log^{1/4} N$, where $N$ is the size of the algorithm's input,
-which guarantees that $k\le W^{1/4}$ and $\O(2^{k^4}) = \O(N)$. Let us however
+Q-heaps of size $k=\log^{1/4} N$, where $N$ is the size of the algorithm's input.
+This guarantees that $k\le W^{1/4}$ and $\O(2^{k^4}) = \O(N)$. Let us however
remark that the whole construction is primarily of theoretical importance
and that the huge constants involved everywhere make these heaps useless
-for practical algorithms. However, many of the tricks used prove themselves
+in practical algorithms. Many of the tricks used however prove themselves
useful even in real-life implementations.
-Preprocessing makes it possible to precompute tables for almost arbitrary functions
-and then assume that they can be evaluated in constant time:
+Spending the time on reprocessing makes it possible to precompute tables for
+almost arbitrary functions and then assume that they can be evaluated in
+constant time:
\lemma\id{qhprecomp}%
When~$f$ is a~function computable in polynomial time, $\O(2^{k^4})$ time is enough
-to precompute a~table of the values of~$f$ for the values of its arguments whose total
-bit size is $\O(k^3)$.
+to precompute a~table of the values of~$f$ for all arguments whose size is $\O(k^3)$ bits.
\proof
There are $2^{\O(k^3)}$ possible combinations of arguments of the given size and for each of
-them we spend $\O(k^c)$ time by calculating the function (for some~$c\ge 1$). It remains
-to observe that $2^{\O(k^3)}\cdot \O(k^c) = \O(2^{k^4})$.
+them we spend $\poly(k)$ time on calculating the function. It remains
+to observe that $2^{\O(k^3)}\cdot \poly(k) = \O(2^{k^4})$.
\qed
\para
We will first show an~auxiliary construction based on tries and then derive
-the actual definition of the Q-heap from it.
+the real definition of the Q-heap from it.
\nota
Let us introduce some notation first:
\itemize\ibull
\:$W$ --- the word size of the RAM,
\:$k = \O(W^{1/4})$ --- the limit on the size of the heap,
-\:$n\le k$ --- the number of elements in the represented set,
+\:$n\le k$ --- the number of elements stored in the heap,
\:$X=\{x_1, \ldots, x_n\}$ --- the elements themselves: distinct $W$-bit numbers
indexed in a~way that $x_1 < \ldots < x_n$,
-\:$c_i = \<MSB>(x_i \bxor x_{i+1})$ --- the most significant bit of those in which $x_i$ and~$x_{i+1}$ differ,
-\:$R_X(x)$ --- the rank of~$x$ in~$X$, that is the number of elements of~$X$, which are less than~$x$
+\:$g_i = \<MSB>(x_i \bxor x_{i+1})$ --- the position of the most significant bit in which $x_i$ and~$x_{i+1}$ differ,
+\:$R_X(x)$ --- the rank of~$x$ in~$X$, that is the number of elements of~$X$ which are less than~$x$
(where $x$~itself need not be an~element of~$X$).\foot{We will dedicate the whole chapter \ref{rankchap} to the
study of various ranks.}
\endlist
A~\df{trie} for a~set of strings~$S$ over a~finite alphabet~$\Sigma$ is
a~rooted tree whose vertices are the prefixes of the strings in~$S$ and there
is an~edge going from a~prefix~$\alpha$ to a~prefix~$\beta$ iff $\beta$ can be
-obtained from~$\alpha$ by adding a~single symbol of the alphabet. The edge
-will be labeled with the particular symbol. We will also define a~\df{letter depth}
-of a~vertex to be the length of the corresponding prefix and mark the vertices
+obtained from~$\alpha$ by appending a~single symbol of the alphabet. The edge
+will be labeled with the particular symbol. We will also define the~\df{letter depth}
+of a~vertex as the length of the corresponding prefix. We mark the vertices
which match a~string of~$S$.
-A~\df{compressed trie} is obtained from the trie by removing the vertices of outdegree~1.
-Whereever is a~directed path whose internal vertices have outdegree~1, we replace this
-path by a~single edge labeled with the contatenation of the original edge's labels.
+A~\df{compressed trie} is obtained from the trie by removing the vertices of outdegree~1
+except for the root and marked vertices.
+Whereever is a~directed path whose internal vertices have outdegree~1 and they carry
+no mark, we replace this path by a~single edge labeled with the contatenation
+of the original edge's labels.
-In both kinds of tries, we will order the outgoing edges of every vertex by their labels
+In both kinds of tries, we order the outgoing edges of every vertex by their labels
lexicographically.
\obs
-In both tries, the root of the tree is the empty word and for every vertex, the
+In both tries, the root of the tree is the empty word and for every other vertex, the
corresponding prefix is equal to the concatenation of edge labels on the path
-leading from the root to the vertex. The letter depth of the vertex is equal to
+leading from the root to that vertex. The letter depth of the vertex is equal to
the total size of these labels. All leaves correspond to strings in~$S$, but so can
some internal vertices if there are two strings in~$S$ such that one is a~prefix
of the other.
symbols. This allows us to search in the trie efficiently: when looking for
a~string~$x$, we follow the path from the root and whenever we visit
an~internal vertex of letter depth~$d$, we test the $d$-th character of~$x$,
-follow the edge whose label starts with this character, and check that the
+we follow the edge whose label starts with this character, and we check that the
rest of the label matches.
The compressed trie is also efficient in terms of space consumption --- it has
lengths of the strings in~$S$.
\defn
-For our set~$X$, we will define~$T$ as a~compressed trie for the set of binary
+For our set~$X$, we define~$T$ as a~compressed trie for the set of binary
encodings of the numbers~$x_i$, padded to exactly $W$~bits, i.e., for $S = \{ \(x)_W ; x\in X \}$.
\obs
The inorder traversal of the trie enumerates the words of~$S$ in lexicographic order
and therefore also the~$x_i$'s in the order of their values. Between each
pair of leaves $x_i$ and~$x_{i+1}$ it visits an~internal vertex whose letter depth
-is exactly~$W-1-c_i$.
+is exactly~$W-1-g_i$.
\para
Let us now modify the algorithm for searching in the trie and make it compare
-only the first symbols of the edges. For $x\in X$, the algorithm will still
-return the correct leave, for all~$x$ outside~$X$ it will no longer fail
-and instead it will land in some leaf $x_i$. We will call the index of this leaf $T(x)$.
-At the first sight this vertex may seem unrelated, but we will show that it can
-be used to determine the rank of~$x$ in~$X$, which will later form a~basis
-for all other Q-heap operations:
+only the first symbols of the edges. In other words, we will test only the bits~$g_i$
+which will be called \df{guides} (as they guide us through the tree). For $x\in
+X$, the modified algorithm will still return the correct leaf. For all~$x$ outside~$X$
+it will no longer fail and instead it will land on some leaf~$x_i$. At the
+first sight the number~$x_i$ may seem unrelated, but we will show that it can be
+used to determine the rank of~$x$ in~$X$, which will later form a~basis for all
+Q-heap operations:
\lemma\id{qhdeterm}%
The rank $R_X(x)$ is uniquely determined by a~combination of:
\itemize\ibull
\:the trie~$T$,
-\:the index $i=T(x)$ of the leaf visited when searching for~$x$ in~$T$,
+\:the index~$i$ of the leaf found when searching for~$x$ in~$T$,
\:the relation ($<$, $=$, $>$) between $x$ and $x_i$,
\:the bit position $b=\<MSB>(x\bxor x_i)$ of the first disagreement between~$x$ and~$x_i$.
\endlist
\proof
-If $x\in X$, we detect that from $x_i=x$ and the rank is obviously~$i$ itself.
-Let us assume that $x\not\in X$ and imagine that we follow the same path as during the search for~$T(x)$,
+If $x\in X$, we detect that from $x_i=x$ and the rank is obviously~$i-1$.
+Let us assume that $x\not\in X$ and imagine that we follow the same path as when
+searching for~$x$,
but this time we check the full edge labels. The position~$b$ is the first position
where~$\(x)$ disagrees with a~label. Before this point, all edges not taken by
the search were leading either to subtrees containing elements all smaller than~$x$
or all larger than~$x$ and the only values not known yet are those in the subtree
below the edge which we currently consider. Now if $x[b]=0$ (and therefore $x<x_i$),
-all values in the subtree have $x_j[b]=1$ and thus they are larger. In the other
+all values in that subtree have $x_j[b]=1$ and thus they are larger than~$x$. In the other
case, $x[b]=1$ and $x_j[b]=0$, so they are smaller.
\qed
\para
The preceding lemma shows that the rank can be computed in polynomial time, but
-unfortunately the variables on which it depends are too large for the table to
-be efficiently precomputed. We will therefore carefully choose a~representation
-of the trie, which is compact enough.
+unfortunately the variables on which it depends are too large for a~table to
+be efficiently precomputed. We will carefully choose an~equivalent representation
+of the trie which is compact enough.
\lemma\id{citree}%
-The trie is uniquely determined by the order of the values~$c_1,\ldots,c_{n-1}$.
+The trie is uniquely determined by the order of the guides~$g_1,\ldots,g_{n-1}$.
\proof
We already know that the letter depths of the trie vertices are exactly
-the numbers~$W-1-c_i$. The root of the trie must have the smallest of these
+the numbers~$W-1-g_i$. The root of the trie must have the smallest of these
letter depths, i.e., it must correspond to the highest numbered bit. Let
-us call this bit~$c_i$. This implies that the values $x_1,\ldots,x_i$
+us call this bit~$g_i$. This implies that the values $x_1,\ldots,x_i$
must lie in the left subtree of the root and $x_{i+1},\ldots,x_n$ in its
right subtree. Both subtrees can be then constructed recursively.\foot{This
construction is also known as the \df{cartesian tree} for the sequence
-$c_1,\ldots,c_n$.}
+$g_1,\ldots,g_n$ and it is useful in many other algorithms as it can be
+built in $\O(n)$ time. A~nice application on the Lowest Common Ancestor
+and Range Minimum problems has been described by Bender et al.~in \cite{bender:lca}.}
\qed
\para
-However, the vector of the $c_i$'s is also too long (is has $k\log W$ bits
+Unfortunately, the vector of the $g_i$'s is also too long (is has $k\log W$ bits
and we have no upper bound on~$W$ in terms of~$k$), so we will compress it even
further:
-\nota
+\nota\id{qhnota}%
\itemize\ibull
-\:$B = \{c_1,\ldots,c_n\}$ --- the set of all bit positions examined by the trie,
-stored as a~sorted array,
-\:$C : \{1,\ldots,n\} \rightarrow \{1,\ldots,n\}$ --- a~function such that
-$c_i = B[C(i)]$,
+\:$B = \{g_1,\ldots,g_n\}$ --- the set of bit positions of all the guides, stored as a~sorted array,
+\:$G : \{1,\ldots,n\} \rightarrow \{1,\ldots,n\}$ --- a~function mapping
+the guides to their bit positions in~$B$: $g_i = B[G(i)]$,
\:$x[B]$ --- a~bit string containing the bits of~$x$ originally located
-at the positions indexed by~$B$.
+at the positions given by~$B$, i.e., the concatenation of bits $x[B[1]],
+x[B[2]],\ldots, x[B[n]]$.
\endlist
-\obs
+\obs\id{qhsetb}%
The set~$B$ has $\O(k\log W)=\O(W)$ bits, so it can be stored in a~constant number
-of machine words as a~vector. The function~$C$ can be also stored as a~vector
-of $k\log k$ bits.
+of machine words in form of a~sorted vector. The function~$G$ can be also stored as a~vector
+of $\O(k\log k)$ bits. We can change a~single~$g_i$ in constant time using
+vector operations: First we delete the original value of~$g_i$ from~$B$ if it
+is not used anywhere else. Then we add the new value to~$B$ if it was not
+there yet and we write its position in~$B$ to~$G(i)$. Whenever we insert
+or delete a~value in~$B$, the values at the higher positions shift one position
+up or down and we have to update the pointers in~$G$. This can be fortunately
+accomplished by adding or subtracting a~result of vector comparison.
+
+In this representation, we can reformulate our lemma on ranks as follows:
\lemma\id{qhrank}%
The rank $R_X(x)$ can be computed in constant time from:
\itemize\ibull
-\:the function~$C$,
+\:the function~$G$,
\:the values $x_1,\ldots,x_n$,
\:the bit string~$x[B]$,
\:$x$ itself.
\endlist
\proof
-We know that the trie~$T$ is uniquely determined by the order of the $c_i$'s
-and therefore by the function~$C$ since the array~$B$ is sorted. The shape of
-the trie together with the bits in $x[B]$ determine the leaf $T[x]$ visited
-when searching for~$x$. All this can be computed in polynomial time and it
+Let us prove that all ingredients of Lemma~\ref{qhdeterm} are either small
+enough or computable in constant time.
+
+We know that the shape of the trie~$T$ is uniquely determined by the order of the $g_i$'s
+and therefore by the function~$G$ since the array~$B$ is sorted. The shape of
+the trie together with the bits in $x[B]$ determine the leaf~$x_i$ found when searching
+for~$x$ using only the guides. This can be computed in polynomial time and it
depends on $\O(k\log k)$ bits of input, so according to Lemma~\ref{qhprecomp}
we can look it up in a~precomputed table.
-Similarly we will determine all other ingredients of Lemma~\ref{qhdeterm} in
-constant time. As we know~$x$ and all the $x_i$'s, we can immediately find
-the relation between $x$ and $x_{T[x]}$ and use the LSB/MSB algorithm (\ref{lsb})
-to find the topmost disagreeing bit.
+The relation between $x$ and~$x_i$ can be obtained directly as we know the~$x_i$.
+The bit position of the first disagreement can be calculated in constant time
+using the LSB/MSB algorithm (\ref{lsb}).
All these ingredients can be stored in $\O(k\log k)$ bits, so we may assume
that the rank can be looked up in constant time as well.
for a~newly inserted element in constant time. However, the set is too large
to fit in a~vector and we cannot perform insertion on an~ordinary array in
constant time. This can be worked around by keeping the set in an~unsorted
-array together with a~vector containing a~permutation which sorts the array.
+array together with a~vector containing the permutation which sorts the array.
We can then insert a~new element at an~arbitrary place in the array and just
update the permutation to reflect the correct order.
-We are now ready for the real definition of the Q-heap and the description
+We are now ready for the real definition of the Q-heap and for the description
of the basic operations on it.
\defn
\:$k$, $n$ --- the capacity of the heap and the current number of elements (word-sized integers),
\:$X$ --- the set of word-sized elements stored in the heap (an~array of words in an~arbitrary order),
\:$\varrho$ --- a~permutation on~$\{1,\ldots,n\}$ such that $X[\varrho(1)] < \ldots < X[\varrho(n)]$
-(a~vector of $\O(n\log k)$ bits; we will write $x_i$ for $X[\varrho[i]]$),
+(a~vector of $\O(n\log k)$ bits; we will write $x_i$ for $X[\varrho(i)]$),
\:$B$ --- a~set of ``interesting'' bit positions
(a~sorted vector of~$\O(n\log W)$ bits),
-\:$C$ --- the function determining the order of bits as tested by the trie
+\:$G$ --- the function which maps the guides to the bit positions in~$B$
(a~vector of~$\O(n\log k)$ bits),
\:precomputed tables of various functions.
\endlist
-\algn{Search in the Q-heap}
+\algn{Search in the Q-heap}\id{qhfirst}%
\algo
\algin A~Q-heap and an~integer~$x$ to search for.
\:$i\=R_X(x)+1$, using Lemma~\ref{qhrank} to calculate the rank.
\:$i\=R_X(x)+1$, using Lemma~\ref{qhrank} to calculate the rank.
\:If $x=x_i$, return immediately (the value is already present).
\:Insert the new value to~$X$:
-\::$n\=n+1$
-\::$X[n]\=x$ and insert~$r$ at the $i$-th position in the permutation~$\varrho$.
-\:Update the $c_i$'s:
- Insert a~new element between~$C(i-1)$ and~$C(i)$ and recalculate $c_{i-1}$ and
- the new~$c_i$ according to the definition. For each changed~$c_j$:
-\::If the new value of~$c_j$ is not in~$B$, insert it there.
-\::\FIXME{Update~$C$ on shifts in~$B$}
-\::Update $C(j)$ to point at the position of~$c_j$ in~$B$.
-\::If the old value of the~$c_j$ is no longer used, delete it from~$B$.
+\::$n\=n+1$.
+\::$X[n]\=x$.
+\::Insert~$n$ at the $i$-th position in the permutation~$\varrho$.
+\:Update the $g_j$'s:
+\::Move all~$g_j$ for $j\ge i$ one position up. \hfil\break
+ This translates to insertion in the vector representing~$G$.
+\::Recalculate $g_{i-1}$ and~$g_i$ according to the definition.
+ \hfil\break Update~$B$ and~$G$ as described in~\ref{qhsetb}.
\algout The updated Q-heap.
\endalgo
\::$X[\varrho(i)]\=X[n]$.
\::Find $j$ such that~$\varrho(j)=n$ and set $\varrho(j)\=\varrho(i)$.
\::$n\=n-1$.
-\:Update the $c_i$'s like in the previous algorithm.
+\:Update the $g_j$'s like in the previous algorithm.
\algout The updated Q-heap.
\endalgo
-\algn{Finding the $i$-th smallest element in the Q-heap}
+\algn{Finding the $i$-th smallest element in the Q-heap}\id{qhlast}%
\algo
\algin A~Q-heap and an~index~$i$.
\:If $i<1$ or $i>n$, return {\sc undefined.}
-\:Return $X[\varrho(i)]$.
+\:Return~$x_i$.
\algout The $i$-th smallest element in the heap.
\endalgo
+\para
+The heap algorithms we have just described have been built from primitives
+operating in constant time, with one notable exception: the extraction
+$x[B]$ of all bits of~$x$ at positions specified by the set~$B$. This cannot be done
+in~$\O(1)$ time on the Word-RAM, but we can implement it with ${\rm AC}^0$
+instructions as suggested by Andersson in \cite{andersson:fusion} or even
+with those ${\rm AC}^0$ instructions present on real processors (see Thorup
+\cite{thorup:aczero}). On the Word-RAM, we need to make use of the fact
+that the set~$B$ is not changing too much --- there are $\O(1)$ changes
+per Q-heap operation. As Fredman and Willard have shown, it is possible
+to maintain a~``decoder'', whose state is stored in $\O(1)$ machine words,
+and which helps us to extract $x[B]$ in a~constant number of operations:
+
+\lemman{Extraction of bits}\id{qhxtract}%
+Under the assumptions on~$k$, $W$ and the preprocessing time as in the Q-heaps,\foot{%
+Actually, this is the only place where we need~$k$ to be as low as $W^{1/4}$.
+In the ${\rm AC}^0$ implementation, it is enough to ensure $k\log k\le W$.
+On the other hand, we need not care about the exponent because it can
+be arbitrarily increased using the Q-heap trees described below.}
+it is possible to maintain a~data structure for a~set~$B$ of bit positions,
+which allows~$x[B]$ to be extracted in $\O(1)$ time for an~arbitrary~$x$.
+When a~single element is inserted to~$B$ or deleted from~$B$, the structure
+can be updated in constant time, as long as $\vert B\vert \le k$.
+
+\proof
+See Fredman and Willard \cite{fw:transdich}.
+\qed
+
+\para
+This was the last missing bit of the mechanics of the Q-heaps. We are
+therefore ready to conclude this section by the following theorem
+and its consequences:
+
+\thmn{Q-heaps, Fredman and Willard \cite{fw:transdich}}\id{qh}%
+Let $W$ and~$k$ be positive integers such that $k=\O(W^{1/4})$. Let~$Q$
+be a~Q-heap of at most $k$-elements of $W$~bits each. Then the Q-heap
+operations \ref{qhfirst} to \ref{qhlast} on~$Q$ (insertion, deletion,
+search for a~given value and search for the $i$-th smallest element)
+run in constant time on a~Word-RAM with word size~$W$, after spending
+time $\O(2^{k^4})$ on the same RAM on precomputing of tables.
+
+\proof
+Every operation on the Q-heap can be performed in a~constant number of
+vector operations and calculations of ranks. The ranks are computed
+in $\O(1)$ steps involving again $\O(1)$ vector operations, binary
+logarithms and bit extraction. All these can be calculated in constant
+time using the results of section \ref{bitsect} and Lemma \ref{qhxtract}.
+\qed
+
+\rem
+We can also use the Q-heaps as building blocks of more complex structures
+like Atomic heaps and AF-heaps (see once again \cite{fw:transdich}). We will
+show a~simpler, but useful construction, sometimes called the \df{Q-heap tree.}
+Suppose we have a~Q-heap of capacity~$k$ and a~parameter $d\in{\bb N}^+$. We
+can build a~balanced $k$-ary tree of depth~$d$ such that its leaves contain
+a~given set and every internal vertex keeps the minimum value in the subtree
+rooted in it, together with a~Q-heap containing the values in all its sons.
+This allows minimum to be extracted in constant time (it is placed in the root)
+and when any element is changed, it is sufficient to recalculate the values
+from the path from this element to the root, which takes $\O(d)$ Q-heap
+operations.
+
+\corn{Q-heap trees}\id{qhtree}%
+For every positive integer~$r$ and $\delta>0$ there exists a~data structure
+capable of maintaining the minimum of a~set of at most~$r$ word-sized numbers
+under insertions and deletions. Each operation takes $\O(1)$ time on a~Word-RAM
+with word size $W=\Omega(r^{\delta})$, after spending time
+$\O(2^{r^\delta})$ on precomputing of tables.
+
+\proof
+Choose $\delta' \le \delta$ such that $r^{\delta'} = \O(W^{1/4})$. Build
+a~Q-heap tree of depth $d=\lceil \delta/\delta'\rceil$ containing Q-heaps of
+size $k=r^{\delta'}$. \qed
+
+\rem\id{qhtreerem}%
+When we have an~algorithm with input of size~$N$, the word size is at least~$\log N$
+and we can spend time $\O(N)$ on preprocessing, so we can choose $r=\log N$ and
+$\delta=1$ in the above corollary and get a~heap of size $\log N$ working in
+constant time per operation.
+
\endpart