From b4632abd2cb4f9ea27f2bf8d027a6c2cce37616f Mon Sep 17 00:00:00 2001
From: Martin Mares <mj@ucw.cz>
Date: Sat, 1 Mar 2008 00:43:50 +0100
Subject: [PATCH] First part of Q-Heaps.

---
 PLAN     |  12 ++--
 mst.tex  |   1 +
 ram.tex  | 208 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 rank.tex |   1 +
 4 files changed, 216 insertions(+), 6 deletions(-)

diff --git a/PLAN b/PLAN
index 3e7490d..b0a17b1 100644
--- a/PLAN
+++ b/PLAN
@@ -11,18 +11,16 @@
   o  Models and machines
   o  Radix-sorting
   o  Bit tricks
-  .  Ranking sets
-  .  Bitwise B-trees
   .  Q-Heaps
 
 *  Advanced MST Algorithms
 
   o  Minor-closed classes
   o  Fredman-Tarjan algorithm
-  .  ?? Chazelle ??
-  .  ?? Pettie ??
   .  MST verification
   .  Randomized algorithms
+  .  ?? Chazelle ??
+  .  ?? Pettie ??
 
 *  Ranking combinatorial objects
 
@@ -57,11 +55,17 @@ Spanning trees:
 - Some algorithms (most notably Fredman-Tarjan) do not need flattening
 - reference to mixed Boruvka-Jarnik
 - use the notation for contraction by a set
+- practical considerations: katriel:cycle, moret:practice (mention pairing heaps)
+- parallel algorithms: p243-cole (are there others?)
 
 Models:
 
 - bit tricks: reference to HAKMEM
 - mention in-place radix-sorting?
+- consequences of Q-Heaps: Thorup's undirected SSSP etc.
+- refs on Cartesian trees
+- update notation.tex
+- iteration of Q-Heaps
 
 Ranking:
 
diff --git a/mst.tex b/mst.tex
index 8eea862..de4ffa9 100644
--- a/mst.tex
+++ b/mst.tex
@@ -611,6 +611,7 @@ which obviously works in multigraphs as well.)
 \rem
 In the previous algorithm, the role of the mapping~$\pi^{-1}$ is of course played by the edge labels~$\ell$.
 
+\para
 Finally, we will show a family of graphs where the $\O(m\log n)$ bound on time complexity
 is tight. The graphs do not have unique weights, but they are constructed in a way that
 the algorithm never compares two edges with the same weight. Therefore, when two such
diff --git a/ram.tex b/ram.tex
index 699d682..f327f38 100644
--- a/ram.tex
+++ b/ram.tex
@@ -161,7 +161,7 @@ constant time).
 We can therefore view the whole memory as a~directed graph, whose vertices
 correspond to the cells (the registers are stored in a~single special cell).
 The outgoing edges of each vertex correspond to pointer fields of the cells and they are
-labelled with distinct labels drawn from a~finite set. In addition to that,
+labeled with distinct labels drawn from a~finite set. In addition to that,
 each vertex contains a~fixed amount of symbols. The program can directly access
 vertices within distance~2 from the register vertex.
 
@@ -630,7 +630,211 @@ Either we can write non-uniform programs (see \ref{nonuniform}) and use native c
 or we can observe that all such constants can be easily manufactured. For example,
 $(\0^b\1)^d = \1^{(b+1)d} / \1^{b+1} = (2^{(b+1)d}-1)/(2^{b+1}-1)$. The only exceptions
 are the~$w$ and~$b$ in the LSB algorithm \ref{lsb}, which we are unable to produce
-in constant time.
+in constant time. In practice we use the ``bit tricks'' as frequently called subroutines
+in an~encompassing algorithm, so we usually can spend a~lot of time on the precalculation
+of constants performed once during algorithm startup.
+
+%--------------------------------------------------------------------------------
+
+\section{Q-Heaps}\id{qheaps}%
+
+We have shown how to perform relatively complicated operations on a~set of values
+in constant time, but so far only under the assumption that the number of these
+values is small enough and that the values themselves are also small enough
+(so that the whole set fits in $\O(1)$ machine words). Now we will show how to
+lift the restriction on the magnitude of the values and still keep constant time
+complexity. We will describe a~simplified version of the Q-Heaps developed by
+Fredman and Willard in~\cite{fw:transdich}.
+
+The Q-Heap represents a~set of at most~$k$ word-sized integers, where $k\le W^{1/4}$
+and $W$ is the word size of the machine. It will support insertion, deletion, finding
+of minimum, and other operations described below, in constant time, provided that
+we are willing to spend~$\O(2^{k^4})$ time on preprocessing.
+
+The exponential-time preprocessing may sound alarming, but a~typical application uses
+Q-Heaps of size $k=\log^{1/4} N$, where $N$ is the size of the algorithm's input,
+which guarantees that $k\le W^{1/4}$ and $\O(2^{k^4}) = \O(N)$. Let us however
+remark that the whole construction is primarily of theoretical importance
+and that the huge constants involved everywhere make these heaps useless
+for practical algorithms. However, many of the tricks used prove themselves
+useful even in real-life implementations.
+
+Preprocessing makes it possible to precompute tables for almost arbitrary functions
+and then assume that they can be evaluated in constant time:
+
+\lemma\id{qhprecomp}%
+When~$f$ is a~function computable in polynomial time, $\O(2^{k^4})$ time is enough
+to precompute a~table of the values of~$f$ for the values of its arguments whose total
+bit size is $\O(k^3)$.
+
+\proof
+There are $2^{\O(k^3)}$ possible combinations of arguments of the given size and for each of
+them we spend $\O(k^c)$ time by calculating the function (for some~$c\ge 1$). It remains
+to observe that $2^{\O(k^3)}\cdot \O(k^c) = \O(2^{k^4})$.
+\qed
+
+\para
+We will first show an~auxiliary construction based on tries and then derive
+the actual definition of the Q-Heap from it.
+
+\nota
+Let us introduce some notation first:
+\itemize\ibull
+\:$W$ --- the word size of the RAM,
+\:$k = \O(W^{1/4})$ --- the limit on the size of the heap,
+\:$n\le k$ --- the number of elements in the represented set,
+\:$X=\{x_1, \ldots, x_n\}$ --- the elements themselves: distinct $W$-bit numbers
+indexed in a~way that $x_1 < \ldots < x_n$,
+\:$c_i = \<MSB>(x_i \bxor x_{i+1})$ --- the most significant bit of those in which $x_i$ and~$x_{i+1}$ differ,
+\:$R_X(x)$ --- the rank of~$x$ in~$X$, that is the number of elements of~$X$, which are less than~$x$
+(where $x$~itself need not be an~element of~$X$).\foot{We will dedicate the whole chapter \ref{rankchap} to the
+study of various ranks.}
+\endlist
+
+\defn
+A~\df{trie} for a~set of strings~$S$ over a~finite alphabet~$\Sigma$ is
+a~rooted tree whose vertices are the prefixes of the strings in~$S$ and there
+is an~edge going from a~prefix~$\alpha$ to a~prefix~$\beta$ iff $\beta$ can be
+obtained from~$\alpha$ by adding a~single symbol of the alphabet. The edge
+will be labeled with the particular symbol. We will also define a~\df{letter depth}
+of a~vertex to be the length of the corresponding prefix and mark the vertices
+which match a~string of~$S$.
+
+A~\df{compressed trie} is obtained from the trie by removing the vertices of outdegree~1.
+Whereever is a~directed path whose internal vertices have outdegree~1, we replace this
+path by a~single edge labeled with the contatenation of the original edge's labels.
+
+In both kinds of tries, we will order the outgoing edges of every vertex by their labels
+lexicographically.
+
+\obs
+In both tries, the root of the tree is the empty word and for every vertex, the
+corresponding prefix is equal to the concatenation of edge labels on the path
+leading from the root to the vertex. The letter depth of the vertex is equal to
+the total size of these labels. All leaves correspond to strings in~$S$, but so can
+some internal vertices if there are two strings in~$S$ such that one is a~prefix
+of the other.
+
+Furthermore, the labels of all edges leaving a~common vertex are always
+distinct and when we compress the trie, no two such labels have share their initial
+symbols. This allows us to search in the trie efficiently: when looking for
+a~string~$x$, we follow the path from the root and whenever we visit
+an~internal vertex of letter depth~$d$, we test the $d$-th character of~$x$,
+follow the edge whose label starts with this character, and check that the
+rest of the label matches.
+
+The compressed trie is also efficient in terms of space consumption --- it has
+$\O(\vert S\vert)$ vertices (this can be easily shown by induction on~$\vert S\vert$)
+and all edge labels can be represented in space linear in the sum of the
+lengths of the strings in~$S$.
+
+\defn
+For our set~$X$, we will define~$T$ as a~compressed trie for the set of binary
+encodings of the numbers~$x_i$, padded to exactly $W$~bits, i.e., for $S = \{ \(x)_W ; x\in X \}$.
+
+\obs
+The trie~$T$ has several interesting properties. Since all words in~$S$ have the same
+length, the leaves of the trie correspond to these exact words, that is to the numbers~$x_i$.
+The inorder traversal of the trie enumerates the words of~$S$ in lexicographic order
+and therefore also the~$x_i$'s in the order of their values. Between each
+pair of leaves $x_i$ and~$x_{i+1}$ it visits an~internal vertex whose letter depth
+is exactly~$W-1-c_i$.
+
+\para
+Let us now modify the algorithm for searching in the trie and make it compare
+only the first symbols of the edges. For $x\in X$, the algorithm will still
+return the correct leave, for all~$x$ outside~$X$ it will no longer fail
+and instead it will land in some leaf $x_i$. We will call the index of this leaf $T(x)$.
+At the first sight this vertex may seem unrelated, but we will show that it can
+be used to determine the rank of~$x$ in~$X$, which will later form a~basis
+for all other Q-Heap operations:
+
+\lemma\id{qhdeterm}%
+The rank $R_X(x)$ is uniquely determined by a~combination of:
+\itemize\ibull
+\:the trie~$T$,
+\:the index $i=T(x)$ of the leaf visited when searching for~$x$ in~$T$,
+\:the relation ($<$, $=$, $>$) between $x$ and $x_i$,
+\:the bit position $b=\<MSB>(x\bxor x_i)$ of the first disagreement between~$x$ and~$x_i$.
+\endlist
+
+\proof
+If $x\in X$, we detect that from $x_i=x$ and the rank is obviously~$i$ itself.
+Let us assume that $x\not\in X$ and imagine that we follow the same path as during the search for~$T(x)$,
+but this time we check the full edge labels. The position~$b$ is the first position
+where~$\(x)$ disagrees with a~label. Before this point, all edges not taken by
+the search were leading either to subtrees containing elements all smaller than~$x$
+or all larger than~$x$ and the only values not known yet are those in the subtree
+below the edge which we currently consider. Now if $x[b]=0$ (and therefore $x<x_i$),
+all values in the subtree have $x_j[b]=1$ and thus they are larger. In the other
+case, $x[b]=1$ and $x_j[b]=0$, so they are smaller.
+\qed
+
+\para
+The preceding lemma shows that the rank can be computed in polynomial time, but
+unfortunately the variables on which it depends are too large for the table to
+be efficiently precomputed. We will therefore carefully choose a~representation
+of the trie, which is compact enough.
+
+\lemma\id{citree}%
+The trie is uniquely determined by the order of the values~$c_1,\ldots,c_{n-1}$.
+
+\proof
+We already know that the letter depths of the trie vertices are exactly
+the numbers~$W-1-c_i$. The root of the trie must have the smallest of these
+letter depths, i.e., it must correspond to the highest numbered bit. Let
+us call this bit~$c_i$. This implies that the values $x_1,\ldots,x_i$
+must lie in the left subtree of the root and $x_{i+1},\ldots,x_n$ in its
+right subtree. Both subtrees can be then constructed recursively.\foot{This
+construction is also known as the \df{cartesian tree} for the sequence
+$c_1,\ldots,c_n$.}
+\qed
+
+\para
+However, the vector of the $c_i$'s is also too long (is has $k\log W$ bits
+and we have no upper bound on~$W$ in terms of~$k$), so we will compress it even
+further:
+
+\nota
+\itemize\ibull
+\:$B = \{c_1,\ldots,c_n\}$ --- the set of all bit positions examined by the trie,
+stored as a~sorted array,
+\:$C : \{1,\ldots,n\} \rightarrow \{1,\ldots,n\}$ --- a~function such that
+$c_i = B[C(i)]$,
+\:$x[B]$ --- a~bit string containing the bits of~$x$ originally located
+at the positions indexed by~$B$.
+\endlist
+
+\obs
+The set~$B$ has $\O(k\log W)=\O(W)$ bits, so it can be stored in a~constant number
+of machine words as a~vector. The function~$C$ can be also stored as a~vector
+of $k\log k$ bits.
+
+\lemma
+The rank $R_X(x)$ can be computed in constant time from:
+\itemize\ibull
+\:the function~$C$,
+\:the values $x_1,\ldots,x_n$,
+\:the bit string~$x[B]$,
+\:$x$ itself.
+\endlist
+
+\proof
+We know that the trie~$T$ is uniquely determined by the order of the $c_i$'s
+and therefore by the function~$C$ since the array~$B$ is sorted. The shape of
+the trie together with the bits in $x[B]$ determine the leaf $T[x]$ visited
+when searching for~$x$. All this can be computed in polynomial time and it
+depends on $\O(k\log k)$ bits of input, so according to Lemma~\ref{qhprecomp}
+we can look it up in a~precomputed table.
+
+Similarly we will determine all other ingredients of Lemma~\ref{qhdeterm} in
+constant time. As we know~$x$ and all the $x_i$'s, we can immediately find
+the relation $x$ and $x_{T[x]}$ and use the LSB/MSB algorithm (\ref{lsb})
+to find the topmost disagreeing bit.
+
+All these ingredients can be stored in $\O(k\log k)$ bits, so we may assume
+that the rank can be looked up in constant time as well.
+\qed
 
 
 \endpart
diff --git a/rank.tex b/rank.tex
index c457a81..51df62e 100644
--- a/rank.tex
+++ b/rank.tex
@@ -3,6 +3,7 @@
 \fi
 
 \chapter{Ranking Combinatorial Structures}
+\id{rankchap}
 
 \section{Ranking and unranking}
 
-- 
2.39.5