More Q-heaps.

author Martin Mares <mj@ucw.cz>

Tue, 4 Mar 2008 12:06:43 +0000 (13:06 +0100)

committer Martin Mares <mj@ucw.cz>

Tue, 4 Mar 2008 12:06:43 +0000 (13:06 +0100)
author Martin Mares <mj@ucw.cz>
Tue, 4 Mar 2008 12:06:43 +0000 (13:06 +0100)
committer Martin Mares <mj@ucw.cz>
Tue, 4 Mar 2008 12:06:43 +0000 (13:06 +0100)
diff --git a/ram.tex b/ram.tex

index df7aa4164ea5d5630b7bfa333b0ca05ec23d920d..db6fd5a68d247c0967e97905fa370b4ebced144f 100644 (file)
--- a/ram.tex
+++ b/ram.tex
@@ -500,9 +500,16 @@ assuming that the result fits in~$b$ bits:
  
  \:$\<Insert>(x,\alpha)$ --- inserts~$\alpha$ into a~sorted vector $\bf x$:
  
-We calculate $k = \<Rank>(x,\alpha)$ first, then insert~$\alpha$ as the $k$-th
+We calculate the rank of~$\alpha$ in~$x$ first, then we insert~$\alpha$ as the $k$-th
  field of~$\bf x$ using masking operations and shifts.
  
+\algo
+\:$k\=\<Rank>(x,\alpha)$.
+\:$\ell\=x \band \1^{(b+1)(n-k-1)}\0^{(b+1)(k+1)}$. \cmt{``left'' part of the vector}
+\:$r=x \band \1^{(b+1)k}$. \cmt{``right'' part}
+\:Return $(\ell\shl (b+1)) \bor (\alpha\shl ((b+1)k)) \bor r$.
+\endalgo
+
  \:$\<Unpack>(\alpha)$ --- creates a~vector whose elements are the bits of~$\(\alpha)_d$.
  In other words, inserts blocks~$\0^b$ between the bits of~$\alpha$. Assuming that $b\ge d$,
  we can do it as follows:
@@ -685,7 +692,7 @@ Let us introduce some notation first:
  \:$n\le k$ --- the number of elements in the represented set,
  \:$X=\{x_1, \ldots, x_n\}$ --- the elements themselves: distinct $W$-bit numbers
  indexed in a~way that $x_1 < \ldots < x_n$,
-\:$c_i = \<MSB>(x_i \bxor x_{i+1})$ --- the most significant bit of those in which $x_i$ and~$x_{i+1}$ differ,
+\:$g_i = \<MSB>(x_i \bxor x_{i+1})$ --- the most significant bit of those in which $x_i$ and~$x_{i+1}$ differ,
  \:$R_X(x)$ --- the rank of~$x$ in~$X$, that is the number of elements of~$X$, which are less than~$x$
  (where $x$~itself need not be an~element of~$X$).\foot{We will dedicate the whole chapter \ref{rankchap} to the
  study of various ranks.}
@@ -738,29 +745,31 @@ length, the leaves of the trie correspond to these exact words, that is to the n
  The inorder traversal of the trie enumerates the words of~$S$ in lexicographic order
  and therefore also the~$x_i$'s in the order of their values. Between each
  pair of leaves $x_i$ and~$x_{i+1}$ it visits an~internal vertex whose letter depth
-is exactly~$W-1-c_i$.
+is exactly~$W-1-g_i$.
  
  \para
  Let us now modify the algorithm for searching in the trie and make it compare
-only the first symbols of the edges. For $x\in X$, the algorithm will still
-return the correct leave, for all~$x$ outside~$X$ it will no longer fail
-and instead it will land in some leaf $x_i$. We will call the index of this leaf $T(x)$.
-At the first sight this vertex may seem unrelated, but we will show that it can
-be used to determine the rank of~$x$ in~$X$, which will later form a~basis
-for all other Q-heap operations:
+only the first symbols of the edges. In other words, we will test only the bits~$g_i$
+which will be called \df{guides,} as they guide us through the tree. For $x\in
+X$, the modified algorithm will still return the correct leaf. For all~$x$ outside~$X$
+it will no longer fail and instead it will land on some leaf~$x_i$. At the
+first sight this vertex may seem unrelated, but we will show that it can be
+used to determine the rank of~$x$ in~$X$, which will later form a~basis for all
+other Q-heap operations:
  
  \lemma\id{qhdeterm}%
  The rank $R_X(x)$ is uniquely determined by a~combination of:
  \itemize\ibull
  \:the trie~$T$,
-\:the index $i=T(x)$ of the leaf visited when searching for~$x$ in~$T$,
+\:the index~$i$ of the leaf visited when searching for~$x$ in~$T$,
  \:the relation ($<$, $=$, $>$) between $x$ and $x_i$,
  \:the bit position $b=\<MSB>(x\bxor x_i)$ of the first disagreement between~$x$ and~$x_i$.
  \endlist
  
  \proof
  If $x\in X$, we detect that from $x_i=x$ and the rank is obviously~$i$ itself.
-Let us assume that $x\not\in X$ and imagine that we follow the same path as during the search for~$T(x)$,
+Let us assume that $x\not\in X$ and imagine that we follow the same path as when
+searching for~$x$,
  but this time we check the full edge labels. The position~$b$ is the first position
  where~$\(x)$ disagrees with a~label. Before this point, all edges not taken by
  the search were leading either to subtrees containing elements all smaller than~$x$
@@ -774,54 +783,60 @@ case, $x[b]=1$ and $x_j[b]=0$, so they are smaller.
  The preceding lemma shows that the rank can be computed in polynomial time, but
  unfortunately the variables on which it depends are too large for the table to
  be efficiently precomputed. We will therefore carefully choose a~representation
-of the trie, which is compact enough.
+of the trie which is compact enough.
  
  \lemma\id{citree}%
-The trie is uniquely determined by the order of the values~$c_1,\ldots,c_{n-1}$.
+The trie is uniquely determined by the order of the guides~$g_1,\ldots,g_{n-1}$.
  
  \proof
  We already know that the letter depths of the trie vertices are exactly
-the numbers~$W-1-c_i$. The root of the trie must have the smallest of these
+the numbers~$W-1-g_i$. The root of the trie must have the smallest of these
  letter depths, i.e., it must correspond to the highest numbered bit. Let
-us call this bit~$c_i$. This implies that the values $x_1,\ldots,x_i$
+us call this bit~$g_i$. This implies that the values $x_1,\ldots,x_i$
  must lie in the left subtree of the root and $x_{i+1},\ldots,x_n$ in its
  right subtree. Both subtrees can be then constructed recursively.\foot{This
  construction is also known as the \df{cartesian tree} for the sequence
-$c_1,\ldots,c_n$.}
+$g_1,\ldots,g_n$.}
  \qed
  
  \para
-However, the vector of the $c_i$'s is also too long (is has $k\log W$ bits
+However, the vector of the $g_i$'s is also too long (is has $k\log W$ bits
  and we have no upper bound on~$W$ in terms of~$k$), so we will compress it even
  further:
  
  \nota
  \itemize\ibull
-\:$B = \{c_1,\ldots,c_n\}$ --- the set of all bit positions examined by the trie,
-stored as a~sorted array,
-\:$C : \{1,\ldots,n\} \rightarrow \{1,\ldots,n\}$ --- a~function such that
-$c_i = B[C(i)]$,
+\:$B = \{g_1,\ldots,g_n\}$ --- the set of bit positions of all the guides, stored as a~sorted array,
+\:$G : \{1,\ldots,n\} \rightarrow \{1,\ldots,n\}$ --- a~function such that
+$g_i = B[G(i)]$, which maps guides to their bit positions,
  \:$x[B]$ --- a~bit string containing the bits of~$x$ originally located
-at the positions indexed by~$B$.
+at the positions given by~$B$, i.e., the concatenation of bits $x[B[1]],
+x[B[2]],\ldots, x[B[n]]$.
  \endlist
  
-\obs
+\obs\id{qhsetb}%
  The set~$B$ has $\O(k\log W)=\O(W)$ bits, so it can be stored in a~constant number
-of machine words as a~vector. The function~$C$ can be also stored as a~vector
-of $k\log k$ bits.
+of machine words as a~sorted vector. The function~$G$ can be also stored as a~vector
+of $k\log k$ bits. We can change a~single~$g_i$ in constant time using
+vector operations: First we delete the original value of~$g_i$ from~$B$ if it
+is not used anywhere else. Then we add the new value to~$B$ if it was not
+there yet and we write its position in~$B$ to~$G(i)$. Whenever we insert
+or delete a~value in~$B$, values at the higher positions shift one position
+up or down and we have to update the pointers in~$G$. This can be fortunately
+accomplished by adding or subtracting a~result of parallel comparison.
  
  \lemma\id{qhrank}%
  The rank $R_X(x)$ can be computed in constant time from:
  \itemize\ibull
-\:the function~$C$,
+\:the function~$G$,
  \:the values $x_1,\ldots,x_n$,
  \:the bit string~$x[B]$,
  \:$x$ itself.
  \endlist
  
  \proof
-We know that the trie~$T$ is uniquely determined by the order of the $c_i$'s
-and therefore by the function~$C$ since the array~$B$ is sorted. The shape of
+We know that the trie~$T$ is uniquely determined by the order of the $g_i$'s
+and therefore by the function~$G$ since the array~$B$ is sorted. The shape of
  the trie together with the bits in $x[B]$ determine the leaf $T[x]$ visited
  when searching for~$x$. All this can be computed in polynomial time and it
  depends on $\O(k\log k)$ bits of input, so according to Lemma~\ref{qhprecomp}
@@ -858,7 +873,7 @@ A~\df{Q-heap} consists of:
  (a~vector of $\O(n\log k)$ bits; we will write $x_i$ for $X[\varrho[i]]$),
  \:$B$ --- a~set of ``interesting'' bit positions
  (a~sorted vector of~$\O(n\log W)$ bits),
-\:$C$ --- the function determining the order of bits as tested by the trie
+\:$G$ --- the function which maps the guides to the bit positions in~$B$
  (a~vector of~$\O(n\log k)$ bits),
  \:precomputed tables of various functions.
  \endlist
@@ -879,13 +894,11 @@ A~\df{Q-heap} consists of:
  \:Insert the new value to~$X$:
  \::$n\=n+1$
  \::$X[n]\=x$ and insert~$r$ at the $i$-th position in the permutation~$\varrho$.
-\:Update the $c_i$'s:
-  Insert a~new element between~$C(i-1)$ and~$C(i)$ and recalculate $c_{i-1}$ and
-  the new~$c_i$ according to the definition. For each changed~$c_j$:
-\::If the new value of~$c_j$ is not in~$B$, insert it there.
-\::\FIXME{Update~$C$ on shifts in~$B$}
-\::Update $C(j)$ to point at the position of~$c_j$ in~$B$.
-\::If the old value of the~$c_j$ is no longer used, delete it from~$B$.
+\:Update the $g_j$'s:
+\::Move all~$g_j$ for $j\ge i$ one position up. \hfil\break
+   This translates to insertion in the vector representing~$G$.
+\::Recalculate $g_{i-1}$ and~$g_i$ according to the definition.
+   \hfil\break Update~$B$ and~$G$ as described in~\ref{qhsetb}.
  \algout The updated Q-heap.
  \endalgo
  
@@ -898,7 +911,7 @@ A~\df{Q-heap} consists of:
  \::$X[\varrho(i)]\=X[n]$.
  \::Find $j$ such that~$\varrho(j)=n$ and set $\varrho(j)\=\varrho(i)$.
  \::$n\=n-1$.
-\:Update the $c_i$'s like in the previous algorithm.
+\:Update the $g_i$'s like in the previous algorithm.
  \algout The updated Q-heap.
  \endalgo
  
@@ -906,7 +919,7 @@ A~\df{Q-heap} consists of:
  \algo
  \algin A~Q-heap and an~index~$i$.
  \:If $i<1$ or $i>n$, return {\sc undefined.}
-\:Return $X[\varrho(i)]$.
+\:Return~$x_i$.
  \algout The $i$-th smallest element in the heap.
  \endalgo
author	Martin Mares <mj@ucw.cz>
	Tue, 4 Mar 2008 12:06:43 +0000 (13:06 +0100)
committer	Martin Mares <mj@ucw.cz>
	Tue, 4 Mar 2008 12:06:43 +0000 (13:06 +0100)