From 0fbdeb08e4a685906b268bd8414f95403a2731b5 Mon Sep 17 00:00:00 2001
From: Martin Mares <mj@ucw.cz>
Date: Tue, 4 Mar 2008 13:06:43 +0100
Subject: [PATCH] More Q-heaps.

---
 ram.tex | 89 +++++++++++++++++++++++++++++++++------------------------
 1 file changed, 51 insertions(+), 38 deletions(-)
diff --git a/ram.tex b/ram.tex
index df7aa41..db6fd5a 100644
--- a/ram.tex
+++ b/ram.tex
@@ -500,9 +500,16 @@ assuming that the result fits in~$b$ bits:
 
 \:$\<Insert>(x,\alpha)$ --- inserts~$\alpha$ into a~sorted vector $\bf x$:
 
-We calculate $k = \<Rank>(x,\alpha)$ first, then insert~$\alpha$ as the $k$-th
+We calculate the rank of~$\alpha$ in~$x$ first, then we insert~$\alpha$ as the $k$-th
 field of~$\bf x$ using masking operations and shifts.
 
+\algo
+\:$k\=\<Rank>(x,\alpha)$.
+\:$\ell\=x \band \1^{(b+1)(n-k-1)}\0^{(b+1)(k+1)}$. \cmt{``left'' part of the vector}
+\:$r=x \band \1^{(b+1)k}$. \cmt{``right'' part}
+\:Return $(\ell\shl (b+1)) \bor (\alpha\shl ((b+1)k)) \bor r$.
+\endalgo
+
 \:$\<Unpack>(\alpha)$ --- creates a~vector whose elements are the bits of~$\(\alpha)_d$.
 In other words, inserts blocks~$\0^b$ between the bits of~$\alpha$. Assuming that $b\ge d$,
 we can do it as follows:
@@ -685,7 +692,7 @@ Let us introduce some notation first:
 \:$n\le k$ --- the number of elements in the represented set,
 \:$X=\{x_1, \ldots, x_n\}$ --- the elements themselves: distinct $W$-bit numbers
 indexed in a~way that $x_1 < \ldots < x_n$,
-\:$c_i = \<MSB>(x_i \bxor x_{i+1})$ --- the most significant bit of those in which $x_i$ and~$x_{i+1}$ differ,
+\:$g_i = \<MSB>(x_i \bxor x_{i+1})$ --- the most significant bit of those in which $x_i$ and~$x_{i+1}$ differ,
 \:$R_X(x)$ --- the rank of~$x$ in~$X$, that is the number of elements of~$X$, which are less than~$x$
 (where $x$~itself need not be an~element of~$X$).\foot{We will dedicate the whole chapter \ref{rankchap} to the
 study of various ranks.}
@@ -738,29 +745,31 @@ length, the leaves of the trie correspond to these exact words, that is to the n
 The inorder traversal of the trie enumerates the words of~$S$ in lexicographic order
 and therefore also the~$x_i$'s in the order of their values. Between each
 pair of leaves $x_i$ and~$x_{i+1}$ it visits an~internal vertex whose letter depth
-is exactly~$W-1-c_i$.
+is exactly~$W-1-g_i$.
 
 \para
 Let us now modify the algorithm for searching in the trie and make it compare
-only the first symbols of the edges. For $x\in X$, the algorithm will still
-return the correct leave, for all~$x$ outside~$X$ it will no longer fail
-and instead it will land in some leaf $x_i$. We will call the index of this leaf $T(x)$.
-At the first sight this vertex may seem unrelated, but we will show that it can
-be used to determine the rank of~$x$ in~$X$, which will later form a~basis
-for all other Q-heap operations:
+only the first symbols of the edges. In other words, we will test only the bits~$g_i$
+which will be called \df{guides,} as they guide us through the tree. For $x\in
+X$, the modified algorithm will still return the correct leaf. For all~$x$ outside~$X$
+it will no longer fail and instead it will land on some leaf~$x_i$. At the
+first sight this vertex may seem unrelated, but we will show that it can be
+used to determine the rank of~$x$ in~$X$, which will later form a~basis for all
+other Q-heap operations:
 
 \lemma\id{qhdeterm}%
 The rank $R_X(x)$ is uniquely determined by a~combination of:
 \itemize\ibull
 \:the trie~$T$,
-\:the index $i=T(x)$ of the leaf visited when searching for~$x$ in~$T$,
+\:the index~$i$ of the leaf visited when searching for~$x$ in~$T$,
 \:the relation ($<$, $=$, $>$) between $x$ and $x_i$,
 \:the bit position $b=\<MSB>(x\bxor x_i)$ of the first disagreement between~$x$ and~$x_i$.
 \endlist
 
 \proof
 If $x\in X$, we detect that from $x_i=x$ and the rank is obviously~$i$ itself.
-Let us assume that $x\not\in X$ and imagine that we follow the same path as during the search for~$T(x)$,
+Let us assume that $x\not\in X$ and imagine that we follow the same path as when
+searching for~$x$,
 but this time we check the full edge labels. The position~$b$ is the first position
 where~$\(x)$ disagrees with a~label. Before this point, all edges not taken by
 the search were leading either to subtrees containing elements all smaller than~$x$
@@ -774,54 +783,60 @@ case, $x[b]=1$ and $x_j[b]=0$, so they are smaller.
 The preceding lemma shows that the rank can be computed in polynomial time, but
 unfortunately the variables on which it depends are too large for the table to
 be efficiently precomputed. We will therefore carefully choose a~representation
-of the trie, which is compact enough.
+of the trie which is compact enough.
 
 \lemma\id{citree}%
-The trie is uniquely determined by the order of the values~$c_1,\ldots,c_{n-1}$.
+The trie is uniquely determined by the order of the guides~$g_1,\ldots,g_{n-1}$.
 
 \proof
 We already know that the letter depths of the trie vertices are exactly
-the numbers~$W-1-c_i$. The root of the trie must have the smallest of these
+the numbers~$W-1-g_i$. The root of the trie must have the smallest of these
 letter depths, i.e., it must correspond to the highest numbered bit. Let
-us call this bit~$c_i$. This implies that the values $x_1,\ldots,x_i$
+us call this bit~$g_i$. This implies that the values $x_1,\ldots,x_i$
 must lie in the left subtree of the root and $x_{i+1},\ldots,x_n$ in its
 right subtree. Both subtrees can be then constructed recursively.\foot{This
 construction is also known as the \df{cartesian tree} for the sequence
-$c_1,\ldots,c_n$.}
+$g_1,\ldots,g_n$.}
 \qed
 
 \para
-However, the vector of the $c_i$'s is also too long (is has $k\log W$ bits
+However, the vector of the $g_i$'s is also too long (is has $k\log W$ bits
 and we have no upper bound on~$W$ in terms of~$k$), so we will compress it even
 further:
 
 \nota
 \itemize\ibull
-\:$B = \{c_1,\ldots,c_n\}$ --- the set of all bit positions examined by the trie,
-stored as a~sorted array,
-\:$C : \{1,\ldots,n\} \rightarrow \{1,\ldots,n\}$ --- a~function such that
-$c_i = B[C(i)]$,
+\:$B = \{g_1,\ldots,g_n\}$ --- the set of bit positions of all the guides, stored as a~sorted array,
+\:$G : \{1,\ldots,n\} \rightarrow \{1,\ldots,n\}$ --- a~function such that
+$g_i = B[G(i)]$, which maps guides to their bit positions,
 \:$x[B]$ --- a~bit string containing the bits of~$x$ originally located
-at the positions indexed by~$B$.
+at the positions given by~$B$, i.e., the concatenation of bits $x[B[1]],
+x[B[2]],\ldots, x[B[n]]$.
 \endlist
 
-\obs
+\obs\id{qhsetb}%
 The set~$B$ has $\O(k\log W)=\O(W)$ bits, so it can be stored in a~constant number
-of machine words as a~vector. The function~$C$ can be also stored as a~vector
-of $k\log k$ bits.
+of machine words as a~sorted vector. The function~$G$ can be also stored as a~vector
+of $k\log k$ bits. We can change a~single~$g_i$ in constant time using
+vector operations: First we delete the original value of~$g_i$ from~$B$ if it
+is not used anywhere else. Then we add the new value to~$B$ if it was not
+there yet and we write its position in~$B$ to~$G(i)$. Whenever we insert
+or delete a~value in~$B$, values at the higher positions shift one position
+up or down and we have to update the pointers in~$G$. This can be fortunately
+accomplished by adding or subtracting a~result of parallel comparison.
 
 \lemma\id{qhrank}%
 The rank $R_X(x)$ can be computed in constant time from:
 \itemize\ibull
-\:the function~$C$,
+\:the function~$G$,
 \:the values $x_1,\ldots,x_n$,
 \:the bit string~$x[B]$,
 \:$x$ itself.
 \endlist
 
 \proof
-We know that the trie~$T$ is uniquely determined by the order of the $c_i$'s
-and therefore by the function~$C$ since the array~$B$ is sorted. The shape of
+We know that the trie~$T$ is uniquely determined by the order of the $g_i$'s
+and therefore by the function~$G$ since the array~$B$ is sorted. The shape of
 the trie together with the bits in $x[B]$ determine the leaf $T[x]$ visited
 when searching for~$x$. All this can be computed in polynomial time and it
 depends on $\O(k\log k)$ bits of input, so according to Lemma~\ref{qhprecomp}
@@ -858,7 +873,7 @@ A~\df{Q-heap} consists of:
 (a~vector of $\O(n\log k)$ bits; we will write $x_i$ for $X[\varrho[i]]$),
 \:$B$ --- a~set of ``interesting'' bit positions
 (a~sorted vector of~$\O(n\log W)$ bits),
-\:$C$ --- the function determining the order of bits as tested by the trie
+\:$G$ --- the function which maps the guides to the bit positions in~$B$
 (a~vector of~$\O(n\log k)$ bits),
 \:precomputed tables of various functions.
 \endlist
@@ -879,13 +894,11 @@ A~\df{Q-heap} consists of:
 \:Insert the new value to~$X$:
 \::$n\=n+1$
 \::$X[n]\=x$ and insert~$r$ at the $i$-th position in the permutation~$\varrho$.
-\:Update the $c_i$'s:
-  Insert a~new element between~$C(i-1)$ and~$C(i)$ and recalculate $c_{i-1}$ and
-  the new~$c_i$ according to the definition. For each changed~$c_j$:
-\::If the new value of~$c_j$ is not in~$B$, insert it there.
-\::\FIXME{Update~$C$ on shifts in~$B$}
-\::Update $C(j)$ to point at the position of~$c_j$ in~$B$.
-\::If the old value of the~$c_j$ is no longer used, delete it from~$B$.
+\:Update the $g_j$'s:
+\::Move all~$g_j$ for $j\ge i$ one position up. \hfil\break
+   This translates to insertion in the vector representing~$G$.
+\::Recalculate $g_{i-1}$ and~$g_i$ according to the definition.
+   \hfil\break Update~$B$ and~$G$ as described in~\ref{qhsetb}.
 \algout The updated Q-heap.
 \endalgo
 
@@ -898,7 +911,7 @@ A~\df{Q-heap} consists of:
 \::$X[\varrho(i)]\=X[n]$.
 \::Find $j$ such that~$\varrho(j)=n$ and set $\varrho(j)\=\varrho(i)$.
 \::$n\=n-1$.
-\:Update the $c_i$'s like in the previous algorithm.
+\:Update the $g_i$'s like in the previous algorithm.
 \algout The updated Q-heap.
 \endalgo
 
@@ -906,7 +919,7 @@ A~\df{Q-heap} consists of:
 \algo
 \algin A~Q-heap and an~index~$i$.
 \:If $i<1$ or $i>n$, return {\sc undefined.}
-\:Return $X[\varrho(i)]$.
+\:Return~$x_i$.
 \algout The $i$-th smallest element in the heap.
 \endalgo
 
-- 
2.39.5