Traditionally, computer scientists use a~variety of computational models
as a~formalism in which their algorithms are stated. If we were studying
-NP-completeness, we could safely assume that all the models are equivalent,
+NP-complete\-ness, we could safely assume that all the models are equivalent,
possibly up to polynomial slowdown which is negligible. In our case, the
differences between good and not-so-good algorithms are on a~much smaller
scale. In this chapter, we will replace the usual ``tape measure'' by a~micrometer,
\:Both restrictions at once.
\endlist
-Thorup discusses the usual techniques employed by RAM algorithms in~\cite{thorup:aczero}
+Thorup \cite{thorup:aczero} discusses the usual techniques employed by RAM algorithms
and he shows that they work on both Word-RAM and ${\rm AC}^0$-RAM, but the combination
of the two restrictions is too weak. On the other hand, the intersection of~${\rm AC}^0$
with the instruction set of modern processors is already strong enough (e.g., when we
algorithms, but the quadratic bound is good enough for our purposes.}
\qed
-\FIXME{Add references, especially to the unbounded parallelism remark.}
-
\thm
Every program for the PM running in polynomial time can be translated to a~program
computing the same on the Word-RAM with only $\O(1)$ slowdown.
techniques of design and analysis of randomized algorithms apply (see for
example Motwani and Raghavan~\cite{motwani:randalg}).
-\FIXME{Consult sources. Does it make more sense to generate random words at once on the RAM?}
-
\rem
There is one more interesting machine: the \df{Immutable Pointer Machine} (see
the description of LISP machines in \cite{benamram:pm}). It differs from the
This is also $\O(n+m)$.
\paran{Tree isomorphism}%
-Another nice example of pointer-based radix sorting is a~Pointer machine algorithm for
+Another nice example of pointer-based radix sorting is a~Pointer Machine algorithm for
deciding whether two rooted trees are isomorphic. Let us assume for a~moment that
the outdegree of each vertex is at most a~fixed constant~$k$. We begin by sorting the subtrees
of both trees by their depth. This can be accomplished by running depth-first search to calculate
\lemman{Sequence unification}\id{suniflemma}%
Partitioning of a~collection of sequences $S_1,\ldots,S_n$, whose elements are
arbitrary pointers and symbols from a~finite alphabet, to equality classes can
-be performed on the Pointer machine in time $\O(n + \sum_i \vert S_i \vert)$.
+be performed on the Pointer Machine in time $\O(n + \sum_i \vert S_i \vert)$.
\rem
The first linear-time algorithm that partitions all subtrees to isomorphism equivalence
encoded in numbers, which can be then used to index arrays containing the precomputed
results. As the previous algorithm for subtree isomorphism shows, indexing is not strictly
required for identifying equivalent microtrees and it can be replaced by bucket
-sorting on the Pointer machine. Buchsbaum et al.~\cite{buchsbaum:verify} have extended
+sorting on the Pointer Machine. Buchsbaum et al.~\cite{buchsbaum:verify} have extended
this technique to general graphs in form of so called topological graph computations.
Let us define them.
of graph encodings:
\defn
-A~\df{canonical encoding} of a~given labeled graph represented by adjancency lists
+A~\df{canonical encoding} of a~given labeled graph represented by adjacency lists
is obtained by running the depth-first search on the graph and recording its traces.
We start with an~empty encoding. When we enter
a~vertex, we assign an~identifier to it (again using a~yardstick to represent numbers)
\lemman{Graph unification}\id{guniflemma}%
A~collection~$\C$ of labeled graphs can be partitioned into classes which share the same
-canonical encoding in time $\O(\Vert\C\Vert)$ on the Pointer machine.
+canonical encoding in time $\O(\Vert\C\Vert)$ on the Pointer Machine.
\proof
Construct canonical encodings of all the graphs and then apply the Sequence unification lemma
the \df{actual graphs} in~$\C$ to the generic graphs in~$\cal G$. This gives us the following
theorem:
-\thmn{Batched topological computations, Buchsbaum et al.~\cite{buchsbaum:verify}}\id{topothm}%
+\thmn{Topological computations, Buchsbaum et al.~\cite{buchsbaum:verify}}\id{topothm}%
Suppose that we have a~topological graph computation~$\cal T$ that can be performed in time
$T(k)$ for graphs on $k$~vertices. Then we can run~$\cal T$ on a~collection~$\C$
of labeled graphs on~$k$ vertices in time $\O(\Vert\C\Vert + (k+s)^{k(k+2)}\cdot (T(k)+k^2))$,
\section{Data structures on the RAM}
\id{ramdssect}
-There is a~lot of data structures designed specifically for the RAM, taking
-advantage of both indexing and arithmetics. In many cases, they surpass the known
-lower bounds for the same problem on the~PM and they often achieve constant time
+There is a~lot of data structures designed specifically for the RAM. These structures
+take advantage of both indexing and arithmetics and they often surpass the known
+lower bounds for the same problem on the~PM. In many cases, they achieve constant time
per operation, at least when either the magnitude of the values or the size of
the data structure are suitably bounded.
-A~classical result of this type are the trees of van Emde Boas~\cite{boas:vebt},
-which represent a~subset of the integers $\{0,\ldots,U-1\}$, allowing insertion,
+A~classical result of this type are the trees of van Emde Boas~\cite{boas:vebt}
+which represent a~subset of the integers $\{0,\ldots,U-1\}$. They allow insertion,
deletion and order operations (minimum, maximum, successor etc.) in time $\O(\log\log U)$,
regardless of the size of the subset. If we replace the heap used in the Jarn\'\i{}k's
algorithm (\ref{jarnik}) by this structure, we immediately get an~algorithm
for finding the MST in integer-weighted graphs in time $\O(m\log\log w_{max})$,
-where $w_{max}$ is the maximum weight. We will show later that it is even
-possible to achieve linear time complexity for arbitrary integer weights.
+where $w_{max}$ is the maximum weight.
-A~real breakthrough has been made by Fredman and Willard, who introduced
-the Fusion trees~\cite{fw:fusion} which again perform membership and predecessor
-operation on a~set of $n$~integers, but this time with complexity $\O(\log_W n)$
+A~real breakthrough has been however made by Fredman and Willard who introduced
+the Fusion trees~\cite{fw:fusion}. They again perform membership and predecessor
+operation on a~set of $n$~integers, but with time complexity $\O(\log_W n)$
per operation on a~Word-RAM with $W$-bit words. This of course assumes that
each element of the set fits in a~single word. As $W$ must at least~$\log n$,
-the operations take $\O(\log n/\log\log n)$ and we are able to sort $n$~integers
+the operations take $\O(\log n/\log\log n)$ time and thus we are able to sort $n$~integers
in time~$o(n\log n)$. This was a~beginning of a~long sequence of faster and
faster sorting algorithms, culminating with the work by Thorup and Han.
They have improved the time complexity of integer sorting to $\O(n\log\log n)$ deterministically~\cite{han:detsort}
and expected $\O(n\sqrt{\log\log n})$ for randomized algorithms~\cite{hanthor:randsort},
both in linear space.
-Despite the recent progress, the corner-stone of most RAM data structures
-is still the representation of data structures by integers introduced by Fredman
-and Willard. It will also form a~basis for the rest of this chapter.
-
-\FIXME{Add more history.}
+The Fusion trees themselves have very limited use in graph algorithms, but the
+principles behind them are ubiquitous in many other data structures and these
+will serve us well and often. We are going to build the theory of Q-heaps in
+Section \ref{qheaps}, which will later lead to a~linear-time MST algorithm
+for arbitrary integer weights in Section \ref{iteralg}. Other such structures
+will help us in building linear-time RAM algorithms for computing the ranks
+of various combinatorial structures in Chapter~\ref{rankchap}.
+
+Outside our area, important consequences of these data structures include the
+Thorup's $\O(m)$ algorithm for single-source shortest paths in undirected
+graphs with positive integer weights \cite{thorup:usssp} and his $\O(m\log\log
+n)$ algorithm for the same problem in directed graphs \cite{thorup:sssp}. Both
+algorithms have been then significantly simplified by Hagerup
+\cite{hagerup:sssp}.
+
+Despite the progress in the recent years, the corner-stone of all RAM structures
+is still the representation of combinatorial objects by integers introduced by
+Fredman and Willard. It will also form a~basis for the rest of this chapter.
%--------------------------------------------------------------------------------
$x_0,\ldots,x_{d-1}$.
\para
-If we want to fit the whole vector in a~single word, the parameters $b$ and~$d$ must satisty
+If we want to fit the whole vector in a~single word, the parameters $b$ and~$d$ must satisfy
the condition $(b+1)d\le W$.
By using multiple-precision arithmetics, we can encode all vectors satisfying $bd=\O(W)$.
We will now describe how to translate simple vector manipulations to sequences of $\O(1)$ RAM operations
in an~encompassing algorithm, so we usually can spend a~lot of time on the precalculation
of constants performed once during algorithm startup.
+\rem
+The history of combining arithmetic and logical operations to obtain fast programs for various
+interesting functions is blurred. Many of the ``bit tricks'' we have described have been
+discovered independently by numerous people in the early ages of digital computers.
+Since then, they have become a~part of the computer science folklore. Probably the
+earliest documented occurrence is in the 1972's memo of the MIT Artificial Intelligence
+Lab \cite{hakmem}. However, until the work of Fredman and Willard nobody seemed to
+realize the full consequences.
+
%--------------------------------------------------------------------------------
\section{Q-Heaps}\id{qheaps}%
This guarantees that $k\le W^{1/4}$ and $\O(2^{k^4}) = \O(N)$. Let us however
remark that the whole construction is primarily of theoretical importance
and that the huge constants involved everywhere make these heaps useless
-in practical algorithms. Many of the tricks used however prove themselves
-useful even in real-life implementations.
+in practical algorithms. Despise this, many of the tricks we develop have proven
+themselves useful even in real-life implementations.
Spending the time on reprocessing makes it possible to precompute tables for
almost arbitrary functions and then assume that they can be evaluated in
to observe that $2^{\O(k^3)}\cdot \poly(k) = \O(2^{k^4})$.
\qed
-\para
+\paran{Tries and ranks}%
We will first show an~auxiliary construction based on tries and then derive
the real definition of the Q-heap from it.
A~\df{compressed trie} is obtained from the trie by removing the vertices of outdegree~1
except for the root and marked vertices.
-Whereever is a~directed path whose internal vertices have outdegree~1 and they carry
-no mark, we replace this path by a~single edge labeled with the contatenation
-of the original edge's labels.
+Wherever is a~directed path whose internal vertices have outdegree~1 and they carry
+no mark, we replace this path by a~single edge labeled with the concatenation
+of the original edges' labels.
In both kinds of tries, we order the outgoing edges of every vertex by their labels
lexicographically.
\defn
For our set~$X$, we define~$T$ as a~compressed trie for the set of binary
-encodings of the numbers~$x_i$, padded to exactly $W$~bits, i.e., for $S = \{ \(x)_W ; x\in X \}$.
+encodings of the numbers~$x_i$, padded to exactly $W$~bits, i.e., for $S = \{ \(x)_W \mid x\in X \}$.
\obs
The trie~$T$ has several interesting properties. Since all words in~$S$ have the same
case, $x[b]=1$ and $x_j[b]=0$, so they are smaller.
\qed
-\para
+\paran{A~better representation}%
The preceding lemma shows that the rank can be computed in polynomial time, but
unfortunately the variables on which it depends are too large for a~table to
be efficiently precomputed. We will carefully choose an~equivalent representation
of the trie which is compact enough.
\lemma\id{citree}%
-The trie is uniquely determined by the order of the guides~$g_1,\ldots,g_{n-1}$.
+The compressed trie is uniquely determined by the order of the guides~$g_1,\ldots,g_{n-1}$.
\proof
We already know that the letter depths of the trie vertices are exactly
\algout The $i$-th smallest element in the heap.
\endalgo
-\para
+\paran{Extraction}%
The heap algorithms we have just described have been built from primitives
operating in constant time, with one notable exception: the extraction
$x[B]$ of all bits of~$x$ at positions specified by the set~$B$. This cannot be done
time using the results of section \ref{bitsect} and Lemma \ref{qhxtract}.
\qed
-\rem
+\paran{Combining Q-heaps}%
We can also use the Q-heaps as building blocks of more complex structures
like Atomic heaps and AF-heaps (see once again \cite{fw:transdich}). We will
show a~simpler, but useful construction, sometimes called the \df{Q-heap tree.}