From ab63bbae88428b5cf6c9df0649f4ee4f340736f9 Mon Sep 17 00:00:00 2001
From: Martin Mares <mj@ucw.cz>
Date: Sun, 16 Mar 2008 22:05:55 +0100
Subject: [PATCH] A plenty of corrections to the verification algorithm.

---
 adv.tex | 140 ++++++++++++++++++++++++++++++--------------------------
 1 file changed, 74 insertions(+), 66 deletions(-)
diff --git a/adv.tex b/adv.tex
index cf39940..47f5845 100644
--- a/adv.tex
+++ b/adv.tex
@@ -669,7 +669,7 @@ As for every~$i$ the path $T[v,T_e[i+1]]$ is contained within $T[v,T_e[i]]$,
 the edges of~$P_e$ must have non-increasing weights, that is $w(P_e[i+1]) \le
 w(P_e[i])$.
 
-\alg $\<FindHeavy>(u,p,T_p,P_p)$ --- process all queries in the subtree rooted
+\alg $\<FindPeaks>(u,p,T_p,P_p)$ --- process all queries in the subtree rooted
 at~$u$ entered from its parent via an~edge~$p$.
 \id{findheavy}
 
@@ -685,26 +685,28 @@ the desired edge from~$P_p[i]$.
    if there is a~query path which has~$u$ as its top and which has bottom somewhere
    in the subtree rooted at~$v$.
 
-\::Construct the array of the peaks~$P_e$: Start with~$P_p$,
-   remove the entries corresponding to the tops which are no longer active.
+\::Prepare the array of the peaks~$P_e$: Start with~$P_p$, remove the entries
+   corresponding to the tops which are no longer active. If $u$ became an~active
+   top, append~$e$ to the array.
+
+\::Finish~$P_e$:
    Since the paths leading to all active tops have been extended by the
    edge~$e$, compare $w(e)$ with weights of the edges recorded in~$P_e$ and replace
    those edges which are lighter by~$e$.
    Since $P_p$ was sorted, we can use binary search
-   to locate the boundary between lighter and heavier edges in~$P_e$. Finally
-   append~$e$ to the array if the top $u$ became active.
+   to locate the boundary between lighter and heavier edges in~$P_e$.
 
-\::Recurse on~$v$: call $\<FindHeavy>(v,e,T_e,P_e)$.
+\::Recurse on~$v$: call $\<FindPeaks>(v,e,T_e,P_e)$.
 \endalgo
 
 \>As we need a~parent edge to start the recursion, we add an~imaginary parent
 edge~$p_0$ of the root vertex~$r$, for which no queries are defined. We can
-therefore start with $\<FindHeavy>(r,p_0,\emptyset,\emptyset)$.
+therefore start with $\<FindPeaks>(r,p_0,\emptyset,\emptyset)$.
 
 Let us account for the comparisons:
 
 \lemma\id{vercompares}%
-When the procedure \<FindHeavy> is called on the transformed problem, it
+When the procedure \<FindPeaks> is called on the transformed problem, it
 performs $\O(n+q)$ comparisons, where $n$ is the size of the tree and
 $q$ is the number of query paths.
 
@@ -764,7 +766,7 @@ of query paths in~$T$ (these are exactly the paths covered by the edges
 of $G\setminus T$). We use the reduction from Lemma \ref{verbranch} to get
 an~equivalent problem with a~full branching tree and a~set of parent-descendant
 paths. The reduction costs $\O(m+n)$ comparisons.
-Then we run the \<FindHeavy> procedure (Algorithm \ref{findheavy}) to find
+Then we run the \<FindPeaks> procedure (Algorithm \ref{findheavy}) to find
 the tops of all query paths. According to Lemma \ref{vercompares}, this spends another $\O(m+n)$
 comparisons. Since we (as always) assume that~$G$ is connected, $\O(m+n)=\O(m)$.
 \qed
@@ -794,13 +796,13 @@ in time $\O(mn\log n)$.
 \section{Verification in linear time}
 
 We have proven that $\O(m)$ edge weight comparisons suffice to verify minimality
-of a~given spanning tree. In this section, we will show an~algorithm for the RAM,
+of a~given spanning tree. Now we will show an~algorithm for the RAM,
 which finds the required comparisons in linear time. We will follow the idea
 of King from \cite{king:verify}, but as we have the power of the RAM data structures
 from Section~\ref{bitsect} at our command, the low-level details will be easier,
 especially the construction of vertex and edge labels.
 
-\paran{Reduction}
+\para
 First of all, let us make sure that the reduction to fully branching trees
 in Lemma \ref{verbranch} can be made run in linear time. As already noticed
 in the proof, the Bor\o{u}vka's algorithm runs in linear time. Constructing
@@ -810,7 +812,7 @@ step of the algorithm.
 Finding the common ancestors is not trivial, but Harel and Tarjan have shown
 in \cite{harel:nca} that linear time is sufficient on the RAM. Several more
 accessible algorithms have been developed since then (see the Alstrup's survey
-paper \cite{alstrup:nca} and a~particularly elegant algorithm shown by Bender
+paper \cite{alstrup:nca} and a~particularly elegant algorithm described by Bender
 and Falach-Colton in \cite{bender:lca}). Any of them implies the following
 theorem:
 
@@ -818,34 +820,32 @@ theorem:
 On the RAM, it is possible to preprocess a~tree~$T$ in time $\O(n)$ and then
 answer lowest common ancestor queries presented online in constant time.
 
-\proof
-See for example Bender and Falach-Colton \cite{bender:lca}.
-\qed
-
 \cor
 The reductions in Lemma \ref{verbranch} can be performed in time $\O(m)$.
 
 \para
-Having the reduced problem at hand, we have to implement the procedure \<FindHeavy>
+Having the reduced problem at hand, it remains to implement the procedure \<FindPeaks>
 of Algorithm \ref{findheavy} efficiently. We need a~compact representation of
-the arrays $T_e$ and~$P_e$ by vectors, so that the overhead of the algorithm
-will be linear in the number of comparisons performed.
+the arrays $T_e$ and~$P_e$, which will allow to reduce the overhead of the algorithm
+to time linear will be linear in the number of comparisons performed. To achieve
+this goal, we will encode the arrays in RAM vectors (see Section \ref{bitsect}
+for the vector operations).
 
 \defn
 
-\em{Vertex identifiers:} Since all vertices referred to by the procedure
-lie on the path from root to the current vertex~$u$, we modify the algorithm
-to keep a~stack of these vertices in an~array and refer to each vertex by its
+\em{Vertex identifiers:} Since all vertices processed by the procedure
+lie on the path from the root to the current vertex~$u$, we modify the algorithm
+to keep a~stack of these vertices in an~array. We will often refer to each vertex by its
 index in this array, i.e., by its depth. We will call these identifiers \df{vertex
-labels} and we note that each label require only $\ell=\lceil \log\log n\rceil$
+labels} and we note that each label requires only $\ell=\lceil \log\lceil\log n\rceil\rceil$
 bits. As every tree edge is uniquely identified by its bottom vertex, we can
 use the same encoding for \df{edge labels.}
 
 \em{Slots:} As we will need several operations which are not computable
 in constant time on the RAM, we precompute tables for these operations
-like we did in the Q-Heaps (cf.~Lemma \ref{qhprecomp}). A~table for word-sized
+like we did in the Q-heaps (cf.~Lemma \ref{qhprecomp}). A~table for word-sized
 arguments would take too much time to precompute, so we will generally store
-our data structures in \df{slots} of $s=1/3\cdot\lceil\log n\rceil$ bits each.
+our data structures in \df{slots} of $s=\lceil 1/3\cdot\log n\rceil$ bits each.
 We will show soon that it is possible to precompute a~table of any reasonable
 function whose arguments fit in two slots.
 
@@ -853,13 +853,15 @@ function whose arguments fit in two slots.
 of the possible tops~$t$ (i.e., the ancestors of the current vertex), we store
 a~single bit telling whether $t\in T_e$. Each top mask fits in $\lceil\log n\rceil$
 bits and therefore in a~single machine word. If needed, it can be split to three slots.
+Unions and intersections of sets of tops then translate to calling $\band$/$\bor$
+on the top masks.
 
 \em{Small and big lists:} The heaviest edge found so far for each top is stored
 by the algorithm in the array~$P_e$. Instead of keeping the real array,
 we store the labels of these edges in a~list encoded in a~bit string.
-Depending on the size of the list, we use two one of two possible encodings:
+Depending on the size of the list, we use one of two possible encodings:
 \df{Small lists} are stored in a~vector which fits in a~single slot, with
-the unused fields filled by a~special constant, so that we can infer the
+the unused fields filled by a~special constant, so that we can easily infer the
 length of the list.
 
 If the data do not fit in a~small list, we use a~\df{big list} instead, which
@@ -868,20 +870,20 @@ vector. Unlike the small lists, we use the big lists as arrays. If a~top~$t$ of
 depth~$d$ is active, we keep the corresponding entry of~$P_e$ in the $d$-th
 field of the big list. Otherwise, we keep that entry unused.
 
-We will want to perform all operations on small lists in constant time,
+We want to perform all operations on small lists in constant time,
 but we can afford spending time $\O(\log\log n)$ on every big list. This
-is true because whenever we need a~big list, $\vert T_e\vert = \Omega(\log n/\log\log n)$,
-so we need $\log\vert T_e\vert = \Omega(\log\log n)$ comparisons anyway.
+is true because whenever we use a~big list, $\vert T_e\vert = \Omega(\log n/\log\log n)$,
+hence we need $\log\vert T_e\vert = \Omega(\log\log n)$ comparisons anyway.
 
 \em{Pointers:} When we need to construct a~small list containing a~sub-list
-of a~big list, we do not have enough time to see the whole big list. To solve
-this problem, we will introduce \df{pointers} as another kind of edge identifiers.
+of a~big list, we do not have enough time to see the whole big list. To handle
+this, we introduce \df{pointers} as another kind of edge identifiers.
 A~pointer is an~index to the nearest big list on the path from the small
 list containing the pointer to the root. As each big list has at most $\lceil\log n\rceil$
 fields, the pointer fits in~$\ell$ bits, but we need one extra bit to distinguish
 between normal labels and pointers.
 
-\lemma
+\lemman{Precomputation of tables}
 When~$f$ is a~function of two arguments computable in polynomial time, we can
 precompute a~table of the values of~$f$ for all values of arguments which fit
 in a~single slot. The precomputation takes $\O(n)$ time.
@@ -893,21 +895,22 @@ possible values of arguments, so the precomputation takes time $\O(n^{2/3}\cdot\
 \qed
 
 \example
-We can assume we can compute the following functions in constant time (after $\O(n)$ preprocessing):
+As we can afford spending spending $\O(n)$ time on preprocessing,
+we can assume that we can compute the following functions in constant time:
 
 \itemize\ibull
-\:$\<Weight>(x)$ --- computes the Hamming weight of a~slot-sized number~$x$
+\:$\<Weight>(x)$ --- the Hamming weight of a~slot-sized number~$x$
 (we already considered this operation in Algorithm \ref{lsbmsb}, but we needed
 quadratic word size for it). We can easily extend this to $\log n$-bit numbers
 by splitting the number in three slots and adding their weights.
 
-\:$\<FindKth>(x,k)$ --- find the $k$-th set bit from the top of the slot-sized
+\:$\<FindKth>(x,k)$ --- the $k$-th set bit from the top of the slot-sized
 number~$x$. Again, this can be extended to multi-slot numbers by calculating
 the \<Weight> of each slot first and then finding the slot containing the
-$k$-th one.
+$k$-th~\1.
 
 \:$\<Bits>(m)$ --- for a~slot-sized bit mask~$m$, it returns a~small list
-of the positions of bits set in~$\(m)$.
+of the positions of the bits set in~$\(m)$.
 
 \:$\<Select>(x,m)$ --- constructs a~slot containing the substring of $\(x)$
 selected by the bits set in~$\(m)$.
@@ -917,7 +920,7 @@ a~small list containing the elements of~$x$ selected by the bits set in~$m$.
 \endlist
 
 \para
-We will now show how to perform all parts of the procedure \<FindHeavy>
+We will now show how to perform all parts of the procedure \<FindPeaks>
 in the required time. We will denote the size of the tree by~$n$ and the
 number of query paths by~$q$.
 
@@ -928,7 +931,7 @@ Depths of all vertices and all top masks can be computed in time $\O(n+q)$.
 Run depth-first search on the tree, assign the depth of a~vertex when entering
 it and construct its top mask when leaving it. The top mask can be obtained
 by $\bor$-ing the masks of its sons, excluding the level of the sons and
-including the tops of all query paths which have bottom at the current vertex
+including the tops of all query paths which have their bottoms at the current vertex
 (the depths of the tops are already assigned).
 \qed
 
@@ -944,7 +947,7 @@ slot and the position of the field inside the slot. This field can be then
 extracted using bit masking and shifts.
 
 If it is a~small list, we extract the field directly, but we have to
-dereference it in case it is a pointer. We modify the recursion in \<FindHeavy>
+dereference it in case it is a pointer. We modify the recursion in \<FindPeaks>
 to pass the depth of the lowest edge endowed with a~big list and when we
 encounter a~pointer, we index this big list.
 \qed
@@ -962,31 +965,33 @@ by counting bits of the top mask~$M_e$ at position~$d$ and higher
 \qed
 
 \lemma\id{verfh}%
-The procedure \<FindHeavy> processes an~edge~$e$ in time $\O(\log \vert T_e\vert + q_e)$,
-where $q_e$~is the number of query paths having~$e$ as its bottommost edge.
+The procedure \<FindPeaks> processes an~edge~$e$ in time $\O(\log \vert T_e\vert + q_e)$,
+where $q_e$~is the number of query paths having~$e$ as its bottom edge.
 
 \proof
-The edge is examined in steps 1, 2, 4 and~5 of the algorithm. Step~4 is trivial
-as we have already computed the top masks and we can reconstruct the entries
-of~$T_e$ in constant time (Lemma \ref{verth}). We have to perform the other
-in constant time if $P_e$ is a~small list or $\O(\log\log n)$ if it is big.
+The edge is examined in steps 1, 3, 4 and~5 of the algorithm. We will show how to
+perform each of these steps in constant time if $P_e$ is a~small list or
+$\O(\log\log n)$ if it is big.
 
-\em{Step 5} involves binary search on~$P_e$ in $\O(\log\vert T_e\vert)$ comparisons,
-each of them indexes~$P_e$, which is $\O(1)$ by Lemma \ref{verth}. Rewriting the
-lighter edges is $\O(1)$ for small lists by replication and bit masking, for a~big
-list we do the same for each of its slot.
+\em{Step~1} looks up $q_e$~tops in~$P_e$ and we already know from Lemma \ref{verhe}
+how to do that in constant time per top.
 
-\em{Step 1} looks up $q_e$~tops in~$P_e$ and we already know from Lemma \ref{verhe}
-how to do that in constant time.
+\em{Step~3} is trivial as we have already computed the top masks and we can
+reconstruct the entries of~$T_e$ in constant time according to Lemma \ref{verth}.
 
-\em{Step 2} is the only non-trivial one. We already know which tops to select
+\em{Step~5} involves binary search on~$P_e$ in $\O(\log\vert T_e\vert)$ comparisons,
+each of them indexes~$P_e$, which is $\O(1)$ again by Lemma \ref{verth}. Rewriting the
+lighter edges is $\O(1)$ for small lists by replication and bit masking, for a~big
+list we do the same for each of its slots.
+
+\em{Step~4} is the only non-trivial one. We already know which tops to select
 (we have the top masks $M_e$ and~$M_p$ precomputed), but we have to carefully
-extract the sublist. 
-We have to handle these four cases:
+extract the sublist.
+We need to handle these four cases:
 
 \itemize\ibull
-\:\em{Small from small:} We use \<Select> to find the fields of~$P_p$ which
-shall be deleted. Then we apply \<SubList> to actually delete them. Pointers
+\:\em{Small from small:} We use $\<Select>(T_e,T_p)$ to find the fields of~$P_p$
+which shall be deleted by a~subsequent call to \<SubList>. Pointers
 can be retained as they still refer to the same ancestor list.
 
 \:\em{Big from big:} We can copy the whole~$P_p$, since the layout of the
@@ -1002,20 +1007,23 @@ a~subvector of~$S$ containing only the pointers referring to that slot,
 adjusted to be relative to the beginning of the slot (we use \<Compare>
 and \<Replicate> from Algorithm \ref{vecops} and bit masking). Then we
 use a~precomputed table to replace the pointers by the fields of~$B_i$
-they point to. Finally, we $\bor$ together the partial results.
+they point to. We $\bor$ together the partial results and we again have
+a~small list.
 
-Finally, we have to spread the fields of the small list to the whole big list.
+Finally, we have to spread the fields of this small list to the whole big list.
 This is similar: for each slot of the big list, we find the part of the small
-list keeping the fields we want (\<Weight> on the sub-words of~$M_e$ before
-and above the intended interval of depths) and we use a~precomputed table
+list keeping the fields we want (we call \<Weight> on the sub-words of~$M_e$ before
+and after the intended interval of depths) and we use a~tabulated function
 to shift the fields to the right locations in the slot (controlled by the
 sub-word of~$M_e$ in the intended interval).
 \qeditem
 \endlist
 
+\>We are now ready to combine these steps and get the following theorem:
+
 \thmn{Verification of MST on the RAM}\id{ramverify}%
 There is a~RAM algorithm, which for every weighted graph~$G$ and its spanning tree~$T$
-determines whether~$T$ is minimum and finds all $T$-light edges in~$G$.
+determines whether~$T$ is minimum and finds all $T$-light edges in~$G$ in time $\O(m)$.
 
 \proof
 Implement the Koml\'os's algorithm from Theorem \ref{verify} with the data
@@ -1030,9 +1038,9 @@ which is $\O(m)$ by Theorem \ref{verify}.
 Buchsbaum et al.~have recently shown in \cite{buchsbaum:verify} that linear-time
 verification can be achieved even on the pointer machine. They first solve the
 problem of finding the lowest common ancestors for a~set of pairs of vertices
-by batch processing, combining an~$\O(m\alpha(m,n))$ algorithm using the Union-Find
-data structure with table lookup for small subtrees. Then they use a~similar
-technique for the path maxima themselves. The tricky part is of course the table
+by batch processing. They combine an~algorithm of time complexity $\O(m\alpha(m,n))$
+using the Union-Find data structure with table lookup for small subtrees. Then they use a~similar
+technique for finding the peaks themselves. The tricky part is of course the table
 lookup, which they handle by radix-sorting pointer-based codes of the subtrees.
 
 \rem
-- 
2.39.5