+\section{Verification in linear time}\id{verifysect}%
+
+We have proven that $\O(m)$ edge weight comparisons suffice to verify minimality
+of a~given spanning tree. Now we will show an~algorithm for the RAM
+which finds the required comparisons in linear time. We will follow the idea
+of King from \cite{king:verifytwo}, but as we have the power of the RAM data structures
+from Section~\ref{bitsect} at our command, the low-level details will be easier,
+especially the construction of vertex and edge labels.
+
+\para
+First of all, let us make sure that the reduction to fully branching trees
+in the Balancing lemma (\ref{verbranch}) can be made run in linear time. As already noticed
+in the proof, the Bor\o{u}vka's algorithm runs in linear time. Constructing
+the Bor\o{u}vka tree in the process adds at most a~constant overhead to every
+step of the algorithm.
+
+Finding the common ancestors is not trivial, but Harel and Tarjan have shown
+in \cite{harel:nca} that linear time is sufficient on the RAM. Several more
+accessible algorithms have been developed since then (see the Alstrup's survey
+paper \cite{alstrup:nca} and a~particularly elegant algorithm described by Bender
+and Falach-Colton in \cite{bender:lca}). Any of them implies the following
+theorem:
+
+\thmn{Lowest common ancestors}\id{lcathm}%
+On the RAM, it is possible to preprocess a~tree~$T$ in time $\O(n)$ and then
+answer lowest common ancestor queries presented online in constant time.
+
+\cor
+The reductions in Lemma \ref{verbranch} can be performed in time $\O(m)$.
+
+\para
+Having the balanced problem at hand, it remains to implement the procedure \<FindPeaks>
+of Algorithm \ref{findpeaks} efficiently. We need a~compact representation of
+the arrays $T_e$ and~$P_e$, which will allow to reduce the overhead of the algorithm
+to time linear in the number of comparisons performed. To achieve
+this goal, we will encode the arrays in RAM vectors (see Section \ref{bitsect}
+for the vector operations).
+
+\defn
+
+\em{Vertex identifiers:} Since all vertices processed by the procedure
+lie on the path from the root to the current vertex~$u$, we modify the algorithm
+to keep a~stack of these vertices in an~array. We will often refer to each vertex by its
+index in this array, i.e., by its depth. We will call these identifiers \df{vertex
+labels} and we note that each label requires only $\ell=\lceil \log\lceil\log n\rceil\rceil$
+bits. As every tree edge is uniquely identified by its bottom vertex, we can
+use the same encoding for \df{edge labels.}
+
+\em{Slots:} As we are going to need several operations which are not computable
+in constant time on the RAM, we precompute tables for these operations
+like we did in the Q-heaps (cf.~Lemma \ref{qhprecomp}). A~table for word-sized
+arguments would take too much time to precompute, so we will generally store
+our data structures in \df{slots} of $s=\lceil 1/3\cdot\log n\rceil$ bits each.
+We will soon show that it is possible to precompute a~table of any reasonable
+function whose arguments fit in two slots.
+
+\em{Top masks:} The array~$T_e$ will be represented by a~bit mask~$M_e$ called the \df{top mask.} For each
+of the possible tops~$t$ (i.e., the ancestors of the current vertex), we store
+a~single bit telling whether $t\in T_e$. Each top mask fits in $\lceil\log n\rceil$
+bits and therefore in a~single machine word. If needed, it can be split to three slots.
+Unions and intersections of sets of tops then translate to $\band$/$\bor$
+on the top masks.
+
+\em{Small and big lists:} The heaviest edge found so far for each top is stored
+by the algorithm in the array~$P_e$. Instead of keeping the real array,
+we store the labels of these edges in a~list encoded in a~bit string.
+Depending on the size of the list, we use one of two possible encodings:
+\df{Small lists} are stored in a~vector that fits in a~single slot, with
+the unused fields filled by a~special constant, so that we can easily infer the
+length of the list.
+
+If the data do not fit in a~small list, we use a~\df{big list} instead. It
+is stored in $\O(\log\log n)$ words, each of them containing a~slot-sized
+vector. Unlike the small lists, we use the big lists as arrays. If a~top~$t$ of
+depth~$d$ is active, we keep the corresponding entry of~$P_e$ in the $d$-th
+field of the big list. Otherwise, we keep that entry unused.
+
+We want to perform all operations on small lists in constant time,
+but we can afford spending time $\O(\log\log n)$ on every big list. This
+is true because whenever we use a~big list, $\vert T_e\vert = \Omega(\log n/\log\log n)$,
+hence we need $\log\vert T_e\vert = \Omega(\log\log n)$ comparisons anyway.
+
+\em{Pointers:} When we need to construct a~small list containing a~sub-list
+of a~big list, we do not have enough time to see the whole big list. To handle
+this, we introduce \df{pointers} as another kind of edge identifiers.
+A~pointer is an~index to the nearest big list on the path from the small
+list containing the pointer to the root. As each big list has at most $\lceil\log n\rceil$
+fields, the pointer fits in~$\ell$ bits, but we need one extra bit to distinguish
+between regular labels and pointers.
+
+\lemman{Precomputation of tables}
+When~$f$ is a~function of up to two arguments computable in polynomial time, we can
+precompute a~table of the values of~$f$ for all values of arguments that fit
+in a~single slot. The precomputation takes $\O(n)$ time.
+
+\proof
+Similar to the proof of Lemma \ref{qhprecomp}. There are $\O(2^{2s}) = \O(n^{2/3})$
+possible values of arguments, so the precomputation takes time $\O(n^{2/3}\cdot\poly(s))
+= \O(n^{2/3}\cdot\poly(\log n)) = \O(n)$.
+\qed
+
+\example
+As we can afford spending $\O(n)$ time on preprocessing,
+we can assume that we can compute the following functions in constant time:
+
+\itemize\ibull
+\:$\<Weight>(x)$ --- the Hamming weight of a~slot-sized number~$x$
+(we already considered this operation in Algorithm \ref{lsbmsb}, but we needed
+quadratic word size for it). We can easily extend this function to $\log n$-bit numbers
+by splitting the number in three slots and adding their weights.
+
+\:$\<FindKth>(x,k)$ --- the $k$-th set bit from the top of the slot-sized
+number~$x$. Again, this can be extended to multi-slot numbers by calculating
+the \<Weight> of each slot first and then finding the slot containing the
+$k$-th~\1.
+
+\:$\<Bits>(m)$ --- for a~slot-sized bit mask~$m$, it returns a~small list
+of the positions of the bits set in~$\(m)$.
+
+\:$\<Select>(x,m)$ --- constructs a~slot containing the substring of $\(x)$
+selected by the bits set in~$\(m)$.
+
+\:$\<SubList>(x,m)$ --- when~$x$ is a~small list and~$m$ a bit mask, it returns
+a~small list containing the elements of~$x$ selected by the bits set in~$m$.
+\endlist
+
+\para
+We will now show how to perform all parts of the procedure \<FindPeaks>
+in the required time. We will denote the size of the tree by~$n$ and the
+number of query paths by~$q$.
+
+\lemma
+Depths of all vertices and all top masks can be computed in time $\O(n+q)$.
+
+\proof
+Run depth-first search on the tree, assign the depth of a~vertex when entering
+it and construct its top mask when leaving it. The top mask can be obtained
+by $\bor$-ing the masks of its sons, excluding the level of the sons and
+including the tops of all query paths that have their bottoms at the current vertex
+(the depths of the tops are already assigned).
+\qed
+
+\lemma\id{verth}%
+The arrays $T_e$ and~$P_e$ can be indexed in constant time.
+
+\proof
+Indexing~$T_e$ is exactly the operation \<FindKth> applied on the corresponding
+top mask~$M_e$.
+
+If $P_e$ is stored in a~big list, we calculate the index of the particular
+slot and the position of the field inside the slot. This field can be then
+extracted using bit masking and shifts.
+
+If it is a~small list, we extract the field directly, but we have to
+dereference it in case it is a pointer. We modify the recursion in \<FindPeaks>
+to pass the depth of the lowest edge endowed with a~big list and when we
+encounter a~pointer, we index this big list.
+\qed
+
+\lemma\id{verhe}%
+For an~arbitrary active top~$t$, the corresponding entry of~$P_e$ can be
+extracted in constant time.
+
+\proof
+We look up the precomputed depth~$d$ of~$t$ first.
+If $P_e$ is stored in a~big list, we extract the $d$-th entry of the list.
+If the list is small, we find the position of the particular field
+by counting bits of the top mask~$M_e$ at position~$d$ and higher
+(this is \<Weight> of $M_e$ with the lower bits masked out).
+\qed
+
+\lemma\id{verfh}%
+The procedure \<FindPeaks> processes an~edge~$e$ in time $\O(\log \vert T_e\vert + q_e)$,
+where $q_e$~is the number of query paths having~$e$ as its bottom edge.
+
+\proof
+The edge is examined in steps 1, 3, 4 and~5 of the algorithm. We will show how to
+perform each of these steps in constant time if $P_e$ is a~small list or
+$\O(\log\log n)$ if it is big.
+
+\em{Step~1} looks up $q_e$~tops in~$P_e$ and we already know from Lemma \ref{verhe}
+how to do that in constant time per top.
+
+\em{Step~3} is trivial as we have already computed the top masks and we can
+reconstruct the entries of~$T_e$ in constant time according to Lemma \ref{verth}.
+
+\em{Step~5} involves binary search on~$P_e$ in $\O(\log\vert T_e\vert)$ comparisons,
+each of them indexes~$P_e$, which is $\O(1)$ again by Lemma \ref{verth}. Rewriting the
+lighter edges is $\O(1)$ for small lists by replication and bit masking, for a~big
+list we do the same for each of its slots.
+
+\em{Step~4} is the only non-trivial one. We already know which tops to select
+(we have the top masks $M_e$ and~$M_p$ precomputed), but we have to carefully
+extract the sublist.
+We need to handle these four cases:
+
+\itemize\ibull
+\:\em{Small from small:} We use $\<Select>(T_e,T_p)$ to find the fields of~$P_p$
+that shall be deleted by a~subsequent call to \<SubList>. Pointers
+can be retained as they still refer to the same ancestor list.
+
+\:\em{Big from big:} We can copy the whole~$P_p$, since the layout of the
+big lists is fixed and the items, which we do not want, simply end up as unused
+fields in~$P_e$.
+
+\:\em{Small from big:} We use the operation \<Bits> to construct a~list
+of pointers (we use bit masking to add the ``this is a~pointer'' flags).
+
+\:\em{Big from small:} First we have to dereference the pointers in the
+small list~$S$. For each slot~$B_i$ of the ancestor big list, we construct
+a~subvector of~$S$ containing only the pointers referring to that slot,
+adjusted to be relative to the beginning of the slot (we use \<Compare>
+and \<Replicate> from Algorithm \ref{vecops} and bit masking). Then we
+use a~precomputed table to replace the pointers by the fields of~$B_i$
+they point to. We $\bor$ together the partial results and we again have
+a~small list.
+
+Finally, we have to spread the fields of this small list to the whole big list.
+This is similar: for each slot of the big list, we find the part of the small
+list keeping the fields we want (we call \<Weight> on the sub-words of~$M_e$ before
+and after the intended interval of depths) and we use a~tabulated function
+to shift the fields to the right locations in the slot (controlled by the
+sub-word of~$M_e$ in the intended interval).
+\qeditem
+\endlist
+
+\>We are now ready to combine these steps and get the following theorem:
+
+\thmn{Verification of MST on the RAM}\id{ramverify}%
+There is a~RAM algorithm which for every weighted graph~$G$ and its spanning tree~$T$
+determines whether~$T$ is minimum and finds all $T$-light edges in~$G$ in time $\O(m)$.
+
+\proof
+Implement the Koml\'os's algorithm from Theorem \ref{verify} with the data
+structures developed in this section.
+According to Lemma \ref{verfh}, the algorithm runs in time $\sum_e \O(\log\vert T_e\vert + q_e)
+= \O(\sum_e \log\vert T_e\vert) + \O(\sum_e q_e)$. The second sum is $\O(m)$
+as there are $\O(1)$ query paths per edge, the first sum is $\O(\#\hbox{comparisons})$,
+which is $\O(m)$ by Theorem \ref{verify}.
+\qed
+
+\>In Section \ref{kbestsect}, we will need a~more specialized statement:
+
+\cor\id{rampeaks}%
+There is a~RAM algorithm which for every weighted tree~$T$ and a~set~$P$ of
+paths in~$T$ calculates the peaks of these paths in time $\O(n(T) + \vert P\vert)$.
+
+\paran{Verification on the Pointer Machine}\id{pmverify}%
+Buchsbaum et al.~\cite{buchsbaum:verify} have recently shown that linear-time
+verification can be achieved even on the Pointer Machine. They first solve the
+problem of finding the lowest common ancestors for a~set of pairs of vertices
+by batch processing: They combine an~algorithm of time complexity $\O(m\timesalpha(m,n))$
+based on the Disjoint Set Union data structure with the framework of topological graph
+computations described in Section \ref{bucketsort}. Then they use a~similar
+technique for finding the peaks themselves.
+
+\paran{Online verification}%
+The online version of this problem has turned out to be more difficult. It calls for an~algorithm
+that preprocesses the tree and then answers queries for peaks of paths presented online. Pettie
+\cite{pettie:onlineverify} has proven an~interesting lower bound based on the inverses of the
+Ackermann's function. If we want to answer queries within $t$~comparisons, we
+have to invest $\Omega(n\log\lambda_t(n))$ time into preprocessing.\foot{$\lambda_t(n)$ is the
+$t$-th row inverse of the Ackermann's function, $\alpha(n)$ is its diagonal inverse. See
+\ref{ackerinv} for the exact definitions.} This implies that with
+preprocessing in linear time, the queries require $\Omega(\alpha(n))$ time.
+
+%--------------------------------------------------------------------------------
+
+\section{A randomized algorithm}\id{randmst}%
+
+When we analysed the Contractive Bor\o{u}vka's algorithm in Section~\ref{contalg},
+we observed that while the number of vertices per iteration decreases exponentially,
+the number of edges generally does not, so we spend $\Theta(m)$ time on every phase.
+Karger, Klein and Tarjan \cite{karger:randomized} have overcome this problem by
+combining the Bor\o{u}vka's algorithm with filtering based on random sampling.
+This leads to a~randomized algorithm which runs in linear expected time.
+
+The principle of the filtering is simple: Let us consider any spanning tree~$T$
+of the input graph~$G$. Each edge of~$G$ that is $T$-heavy is the heaviest edge
+of some cycle, so by the Red lemma (\ref{redlemma}) it cannot participate in
+the MST of~$G$. We can therefore discard all $T$-heavy edges and continue with
+finding the MST on the reduced graph. Of course, not all choices of~$T$ are equally
+good, but it will soon turn out that when we take~$T$ as the MST of a~randomly selected
+subgraph, only a~small expected number of edges remains.
+
+Selecting a~subgraph at random will unavoidably produce disconnected subgraphs
+at occasion, so we will drop the implicit assumption that all graphs are
+connected for this section and we will always search for the minimum spanning forest.
+As we already noted (\ref{disconn}), with a~little bit of care our
+algorithms and theorems keep working.
+
+Since we need the MST verification algorithm for finding the $T$-heavy edges,
+we will assume that we are working on the RAM.
+
+\lemman{Random sampling, Karger \cite{karger:sampling}}\id{samplemma}%
+Let $H$~be a~subgraph of~$G$ obtained by including each edge independently
+with probability~$p$. Let further $F$~be the minimum spanning forest of~$H$. Then the
+expected number of $F$-nonheavy\foot{That is, $F$-light edges and also edges of~$F$ itself.}
+edges in~$G$ is at most $n/p$.
+
+\proof
+Let us observe that we can obtain the forest~$F$ by running the Kruskal's algorithm
+(\ref{kruskal}) combined with the random process producing~$H$ from~$G$. We sort all edges of~$G$
+by their weights and we start with an~empty forest~$F$. For each edge, we first
+flip a~biased coin (that gives heads with probability~$p$) and if it comes up
+tails, we discard the edge. Otherwise we perform a~single step of the Kruskal's
+algorithm: We check whether $F+e$ contains a~cycle. If it does, we discard~$e$, otherwise
+we add~$e$ to~$F$. At the end, we have produced the subgraph~$H$ and its MSF~$F$.
+
+When we exchange the check for cycles with flipping the coin, we get an~equivalent
+algorithm which will turn out to be more convenient to analyse:
+\algo
+\:If $F+e$ contains a~cycle, we immediately discard~$e$ (we can flip
+the coin, but we need not to, because the edge will be discarded regardless of
+the outcome). We note that~$e$ is $F$-heavy with respect to both the
+current state of~$F$ and the final MSF.
+\:If $F+e$ is acyclic, we flip the coin:
+\::If it comes up heads, we add~$e$ to~$F$. In this case, $e$~is neither $F$-light
+ nor $F$-heavy.
+\::If it comes up tails, we discard~$e$. Such edges are $F$-light.
+\endalgo
+
+The number of $F$-nonheavy edges is therefore equal to the total number of the coin
+flips in step~2 of this algorithm. We also know that the algorithm stops before
+it adds $n$~edges to~$F$. Therefore it flips at most as many coins as a~simple
+random process that repeatedly flips until it gets~$n$ heads. As waiting for
+every occurrence of heads takes expected time~$1/p$, waiting for~$n$ heads
+must take $n/p$. This is the bound we wanted to achieve.
+\qed
+
+\para
+We will formulate the algorithm as a~doubly-recursive procedure. It alternatively
+performs steps of the Bor\o{u}vka's algorithm and filtering based on the above lemma.
+The first recursive call computes the MSF of the sampled subgraph, the second one
+finds the MSF of the original graph, but without the heavy edges.
+
+As in all contractive algorithms, we use edge labels to keep track of the
+original locations of the edges in the input graph. For the sake of simplicity,
+we do not mention it in the algorithm explicitly.
+
+\algn{MSF by random sampling --- the KKT algorithm}\id{kkt}%
+\algo
+\algin A~graph $G$ with an~edge comparison oracle.
+\:Remove isolated vertices from~$G$. If no vertices remain, stop and return an~empty forest.
+\:Perform two Bor\o{u}vka steps (iterations of Algorithm \ref{contbor}) on~$G$ and
+ remember the set~$B$ of the edges having been contracted.
+\:Select a~subgraph~$H\subseteq G$ by including each edge independently with
+ probability $1/2$.
+\:$F\=\msf(H)$ calculated recursively.
+\:Construct $G'\subseteq G$ by removing all $F$-heavy edges of~$G$.
+\:$R\=\msf(G')$ calculated recursively.
+\:Return $R\cup B$.
+\algout The minimum spanning forest of~$G$.
+\endalgo
+
+\nota
+Let us analyse the time complexity of this algorithm by studying properties of its \df{recursion tree.}
+This tree describes the subproblems processed by the recursive calls. For any vertex~$v$
+of the tree, we denote the number of vertices and edges of the corresponding subproblem~$G_v$
+by~$n_v$ and~$m_v$ respectively.
+If $m_v>0$, the recursion continues: the left son of~$v$ corresponds to the
+call on the sampled subgraph~$H_v$, the right son to the reduced graph~$G^\prime_v$.
+(Similarly, we use letters subscripted with~$v$ for the state of the other variables
+of the algorithm.)
+The root of the recursion tree is obviously the original graph~$G$, the leaves are
+trivial graphs with no edges.
+
+\obs
+The Bor\o{u}vka steps together with the removal of isolated vertices guarantee that the number
+of vertices drops at least by a~factor of four in every recursive call. The size of a~subproblem~$G_v$
+at level~$i$ is therefore at most $n/4^i$ and the depth of the tree is at most $\lceil\log_4 n\rceil$.
+As there are no more than~$2^i$ subproblems at level~$i$, the sum of all~$n_v$'s
+on that level is at most $n/2^i$, which is at most~$2n$ when summed over the whole tree.
+
+We are going to show that the worst case of the KKT algorithm is not worse than
+of the plain contractive algorithm, while the average case is linear.
+
+\lemma
+For every subproblem~$G_v$, the KKT algorithm runs in time $\O(m_v+n_v)$ plus the time
+spent on the recursive calls.
+
+\proof
+We know from Lemma \ref{contiter} that each Bor\o{u}vka step takes time $\O(m_v+n_v)$.\foot{We
+add $n_v$ as the graph could be disconnected.}
+The selection of the edges of~$H_v$ is straightforward.
+Finding the $F_v$-heavy edges is not, but we have already shown in Theorem \ref{ramverify}
+that linear time is sufficient on the RAM.
+\qed
+
+\thmn{Worst-case complexity of the KKT algorithm}
+The KKT algorithm runs in time $\O(\min(n^2,m\log n))$ in the worst case on the RAM.
+
+\proof
+The argument for the $\O(n^2)$ bound is similar to the analysis of the plain
+contractive algorithm. As every subproblem~$G_v$ is a~simple graph, the number
+of its edges~$m_v$ is less than~$n_v^2$. By the previous lemma, we spend time
+$\O(n_v^2)$ on it. Summing over all subproblems yields $\sum_v \O(n_v^2) =
+\O((\sum_v n_v)^2) = \O(n^2)$.
+
+In order to prove the $\O(m\log n)$ bound, it is sufficient to show that the total time
+spent on every level of the recursion tree is $\O(m)$. Suppose that $v$~is a~vertex
+of the recursion tree with its left son~$\ell$ and right son~$r$. Some edges of~$G_v$
+are removed in the Bor\o{u}vka steps, let us denote their number by~$b_v$.
+The remaining edges fall either to~$G_\ell = H_v$, or to $G_r = G^\prime_v$, or possibly
+to both.
+
+We can observe that the intersection $G_\ell\cap G_r$ cannot be large: The edges of~$H_v$ that
+are not in the forest~$F_v$ are $F_v$-heavy, so they do not end up in~$G_r$. Therefore the
+intersection can contain only the edges of~$F_v$. As there are at most $n_v/4$ such edges,
+we have $m_\ell + m_r + b_v \le m_v + n_v/4$.
+
+On the other hand, the first Bor\o{u}vka step selects at least $n_v/2$ edges,
+so $b_v \ge n_v/2$. The duplication of edges between $G_\ell$ and~$G_r$ is therefore
+compensated by the loss of edges by contraction and $m_\ell + m_r \le m_v$. So the total
+number of edges per level does not decrease and it remains to apply the previous lemma.
+\qed
+
+\thmn{Expected complexity of the KKT algorithm}\id{kktavg}%
+The expected time complexity of the KKT algorithm on the RAM is $\O(m)$.
+
+\proof
+The structure of the recursion tree depends on the random choices taken,
+but as its worst-case depth is at most~$\lceil \log_4 n\rceil$, the tree
+is always a~subtree of the complete binary tree of that depth. We will
+therefore prove the theorem by examining the complete tree, possibly with
+empty subproblems in some vertices.
+
+The left edges of the tree (edges connecting a~parent with its left
+son) form a~set of \df{left paths.} Let us consider the expected time spent on
+a~single left path. When walking the path downwards from its top vertex~$r$,
+the expected size of the subproblems decreases exponentially: for a~son~$\ell$
+of a~vertex~$v$, we have $n_\ell \le n_v/4$ and $\E m_\ell = \E m_v/2$. The
+expected total time spend on the path is therefore $\O(n_r+m_r)$ and it remains
+to sum this over all left paths.
+
+With the exception of the path going from the root of the tree,
+the top~$r$ of a~left path is always a~right son of a~unique parent vertex~$v$.
+Since the subproblem~$G_r$ has been obtained from its parent subproblem~$G_v$
+by filtering out all heavy edges, we can use the Sampling lemma (\ref{samplemma}) to show that
+$\E m_r \le 2n_v$. The sum of the expected sizes of all top subproblems is
+then $\sum_r n_r + m_r \le \sum_v 3n_v = \O(n)$. After adding the exceptional path
+from the root, we get $\O(m+n)=\O(m)$.
+\qed