+\section{Verification in linear time}\id{verifysect}%
+
+We have proven that $\O(m)$ edge weight comparisons suffice to verify minimality
+of a~given spanning tree. Now we will show an~algorithm for the RAM
+which finds the required comparisons in linear time. We will follow the idea
+of King from \cite{king:verifytwo}, but as we have the power of the RAM data structures
+from Section~\ref{bitsect} at our command, the low-level details will be easier,
+especially the construction of vertex and edge labels.
+
+\para
+First of all, let us make sure that the reduction to fully branching trees
+in the Balancing lemma (\ref{verbranch}) can be made run in linear time. As already noticed
+in the proof, the Bor\o{u}vka's algorithm runs in linear time. Constructing
+the Bor\o{u}vka tree in the process adds at most a~constant overhead to every
+step of the algorithm.
+
+Finding the common ancestors is not trivial, but Harel and Tarjan have shown
+in \cite{harel:nca} that linear time is sufficient on the RAM. Several more
+accessible algorithms have been developed since then (see the Alstrup's survey
+paper \cite{alstrup:nca} and a~particularly elegant algorithm described by Bender
+and Falach-Colton in \cite{bender:lca}). Any of them implies the following
+theorem:
+
+\thmn{Lowest common ancestors}\id{lcathm}%
+On the RAM, it is possible to preprocess a~tree~$T$ in time $\O(n)$ and then
+answer lowest common ancestor queries presented online in constant time.
+
+\cor
+The reductions in Lemma \ref{verbranch} can be performed in time $\O(m)$.
+
+\para
+Having the balanced problem at hand, it remains to implement the procedure \<FindPeaks>
+of Algorithm \ref{findpeaks} efficiently. We need a~compact representation of
+the arrays $T_e$ and~$P_e$, which will allow to reduce the overhead of the algorithm
+to time linear in the number of comparisons performed. To achieve
+this goal, we will encode the arrays in RAM vectors (see Section \ref{bitsect}
+for the vector operations).
+
+\defn
+
+\em{Vertex identifiers:} Since all vertices processed by the procedure
+lie on the path from the root to the current vertex~$u$, we modify the algorithm
+to keep a~stack of these vertices in an~array. We will often refer to each vertex by its
+index in this array, i.e., by its depth. We will call these identifiers \df{vertex
+labels} and we note that each label requires only $\ell=\lceil \log\lceil\log n\rceil\rceil$
+bits. As every tree edge is uniquely identified by its bottom vertex, we can
+use the same encoding for \df{edge labels.}
+
+\em{Slots:} As we are going to need several operations which are not computable
+in constant time on the RAM, we precompute tables for these operations
+like we did in the Q-heaps (cf.~Lemma \ref{qhprecomp}). A~table for word-sized
+arguments would take too much time to precompute, so we will generally store
+our data structures in \df{slots} of $s=\lceil 1/3\cdot\log n\rceil$ bits each.
+We will soon show that it is possible to precompute a~table of any reasonable
+function whose arguments fit in two slots.
+
+\em{Top masks:} The array~$T_e$ will be represented by a~bit mask~$M_e$ called the \df{top mask.} For each
+of the possible tops~$t$ (i.e., the ancestors of the current vertex), we store
+a~single bit telling whether $t\in T_e$. Each top mask fits in $\lceil\log n\rceil$
+bits and therefore in a~single machine word. If needed, it can be split to three slots.
+Unions and intersections of sets of tops then translate to $\band$/$\bor$
+on the top masks.
+
+\em{Small and big lists:} The heaviest edge found so far for each top is stored
+by the algorithm in the array~$P_e$. Instead of keeping the real array,
+we store the labels of these edges in a~list encoded in a~bit string.
+Depending on the size of the list, we use one of two possible encodings:
+\df{Small lists} are stored in a~vector that fits in a~single slot, with
+the unused fields filled by a~special constant, so that we can easily infer the
+length of the list.
+
+If the data do not fit in a~small list, we use a~\df{big list} instead. It
+is stored in $\O(\log\log n)$ words, each of them containing a~slot-sized
+vector. Unlike the small lists, we use the big lists as arrays. If a~top~$t$ of
+depth~$d$ is active, we keep the corresponding entry of~$P_e$ in the $d$-th
+field of the big list. Otherwise, we keep that entry unused.
+
+We want to perform all operations on small lists in constant time,
+but we can afford spending time $\O(\log\log n)$ on every big list. This
+is true because whenever we use a~big list, $\vert T_e\vert = \Omega(\log n/\log\log n)$,
+hence we need $\log\vert T_e\vert = \Omega(\log\log n)$ comparisons anyway.
+
+\em{Pointers:} When we need to construct a~small list containing a~sub-list
+of a~big list, we do not have enough time to see the whole big list. To handle
+this, we introduce \df{pointers} as another kind of edge identifiers.
+A~pointer is an~index to the nearest big list on the path from the small
+list containing the pointer to the root. As each big list has at most $\lceil\log n\rceil$
+fields, the pointer fits in~$\ell$ bits, but we need one extra bit to distinguish
+between regular labels and pointers.
+
+\lemman{Precomputation of tables}
+When~$f$ is a~function of up to two arguments computable in polynomial time, we can
+precompute a~table of the values of~$f$ for all values of arguments that fit
+in a~single slot. The precomputation takes $\O(n)$ time.
+
+\proof
+Similar to the proof of Lemma \ref{qhprecomp}. There are $\O(2^{2s}) = \O(n^{2/3})$
+possible values of arguments, so the precomputation takes time $\O(n^{2/3}\cdot\poly(s))
+= \O(n^{2/3}\cdot\poly(\log n)) = \O(n)$.
+\qed
+
+\example
+As we can afford spending $\O(n)$ time on preprocessing,
+we can assume that we can compute the following functions in constant time:
+
+\itemize\ibull
+\:$\<Weight>(x)$ --- the Hamming weight of a~slot-sized number~$x$
+(we already considered this operation in Algorithm \ref{lsbmsb}, but we needed
+quadratic word size for it). We can easily extend this function to $\log n$-bit numbers
+by splitting the number in three slots and adding their weights.
+
+\:$\<FindKth>(x,k)$ --- the $k$-th set bit from the top of the slot-sized
+number~$x$. Again, this can be extended to multi-slot numbers by calculating
+the \<Weight> of each slot first and then finding the slot containing the
+$k$-th~\1.
+
+\:$\<Bits>(m)$ --- for a~slot-sized bit mask~$m$, it returns a~small list
+of the positions of the bits set in~$\(m)$.
+
+\:$\<Select>(x,m)$ --- constructs a~slot containing the substring of $\(x)$
+selected by the bits set in~$\(m)$.
+
+\:$\<SubList>(x,m)$ --- when~$x$ is a~small list and~$m$ a bit mask, it returns
+a~small list containing the elements of~$x$ selected by the bits set in~$m$.
+\endlist
+
+\para
+We will now show how to perform all parts of the procedure \<FindPeaks>
+in the required time. We will denote the size of the tree by~$n$ and the
+number of query paths by~$q$.
+
+\lemma
+Depths of all vertices and all top masks can be computed in time $\O(n+q)$.
+
+\proof
+Run depth-first search on the tree, assign the depth of a~vertex when entering
+it and construct its top mask when leaving it. The top mask can be obtained
+by $\bor$-ing the masks of its sons, excluding the level of the sons and
+including the tops of all query paths that have their bottoms at the current vertex
+(the depths of the tops are already assigned).
+\qed
+
+\lemma\id{verth}%
+The arrays $T_e$ and~$P_e$ can be indexed in constant time.
+
+\proof
+Indexing~$T_e$ is exactly the operation \<FindKth> applied on the corresponding
+top mask~$M_e$.
+
+If $P_e$ is stored in a~big list, we calculate the index of the particular
+slot and the position of the field inside the slot. This field can be then
+extracted using bit masking and shifts.
+
+If it is a~small list, we extract the field directly, but we have to
+dereference it in case it is a pointer. We modify the recursion in \<FindPeaks>
+to pass the depth of the lowest edge endowed with a~big list and when we
+encounter a~pointer, we index this big list.
+\qed
+
+\lemma\id{verhe}%
+For an~arbitrary active top~$t$, the corresponding entry of~$P_e$ can be
+extracted in constant time.
+
+\proof
+We look up the precomputed depth~$d$ of~$t$ first.
+If $P_e$ is stored in a~big list, we extract the $d$-th entry of the list.
+If the list is small, we find the position of the particular field
+by counting bits of the top mask~$M_e$ at position~$d$ and higher
+(this is \<Weight> of $M_e$ with the lower bits masked out).
+\qed
+
+\lemma\id{verfh}%
+The procedure \<FindPeaks> processes an~edge~$e$ in time $\O(\log \vert T_e\vert + q_e)$,
+where $q_e$~is the number of query paths having~$e$ as its bottom edge.
+
+\proof
+The edge is examined in steps 1, 3, 4 and~5 of the algorithm. We will show how to
+perform each of these steps in constant time if $P_e$ is a~small list or
+$\O(\log\log n)$ if it is big.
+
+\em{Step~1} looks up $q_e$~tops in~$P_e$ and we already know from Lemma \ref{verhe}
+how to do that in constant time per top.
+
+\em{Step~3} is trivial as we have already computed the top masks and we can
+reconstruct the entries of~$T_e$ in constant time according to Lemma \ref{verth}.
+
+\em{Step~5} involves binary search on~$P_e$ in $\O(\log\vert T_e\vert)$ comparisons,
+each of them indexes~$P_e$, which is $\O(1)$ again by Lemma \ref{verth}. Rewriting the
+lighter edges is $\O(1)$ for small lists by replication and bit masking, for a~big
+list we do the same for each of its slots.
+
+\em{Step~4} is the only non-trivial one. We already know which tops to select
+(we have the top masks $M_e$ and~$M_p$ precomputed), but we have to carefully
+extract the sublist.
+We need to handle these four cases:
+
+\itemize\ibull
+\:\em{Small from small:} We use $\<Select>(T_e,T_p)$ to find the fields of~$P_p$
+that shall be deleted by a~subsequent call to \<SubList>. Pointers
+can be retained as they still refer to the same ancestor list.
+
+\:\em{Big from big:} We can copy the whole~$P_p$, since the layout of the
+big lists is fixed and the items, which we do not want, simply end up as unused
+fields in~$P_e$.
+
+\:\em{Small from big:} We use the operation \<Bits> to construct a~list
+of pointers (we use bit masking to add the ``this is a~pointer'' flags).
+
+\:\em{Big from small:} First we have to dereference the pointers in the
+small list~$S$. For each slot~$B_i$ of the ancestor big list, we construct
+a~subvector of~$S$ containing only the pointers referring to that slot,
+adjusted to be relative to the beginning of the slot (we use \<Compare>
+and \<Replicate> from Algorithm \ref{vecops} and bit masking). Then we
+use a~precomputed table to replace the pointers by the fields of~$B_i$
+they point to. We $\bor$ together the partial results and we again have
+a~small list.
+
+Finally, we have to spread the fields of this small list to the whole big list.
+This is similar: for each slot of the big list, we find the part of the small
+list keeping the fields we want (we call \<Weight> on the sub-words of~$M_e$ before
+and after the intended interval of depths) and we use a~tabulated function
+to shift the fields to the right locations in the slot (controlled by the
+sub-word of~$M_e$ in the intended interval).
+\qeditem
+\endlist
+
+\>We are now ready to combine these steps and get the following theorem:
+
+\thmn{Verification of MST on the RAM}\id{ramverify}%
+There is a~RAM algorithm which for every weighted graph~$G$ and its spanning tree~$T$
+determines whether~$T$ is minimum and finds all $T$-light edges in~$G$ in time $\O(m)$.
+
+\proof
+Implement the Koml\'os's algorithm from Theorem \ref{verify} with the data
+structures developed in this section.
+According to Lemma \ref{verfh}, the algorithm runs in time $\sum_e \O(\log\vert T_e\vert + q_e)
+= \O(\sum_e \log\vert T_e\vert) + \O(\sum_e q_e)$. The second sum is $\O(m)$
+as there are $\O(1)$ query paths per edge, the first sum is $\O(\#\hbox{comparisons})$,
+which is $\O(m)$ by Theorem \ref{verify}.
+\qed
+
+\>In Section \ref{kbestsect}, we will need a~more specialized statement:
+
+\cor\id{rampeaks}%
+There is a~RAM algorithm which for every weighted tree~$T$ and a~set~$P$ of
+paths in~$T$ calculates the peaks of these paths in time $\O(n(T) + \vert P\vert)$.
+
+\paran{Verification on the Pointer Machine}\id{pmverify}%
+Buchsbaum et al.~\cite{buchsbaum:verify} have recently shown that linear-time
+verification can be achieved even on the Pointer Machine. They first solve the
+problem of finding the lowest common ancestors for a~set of pairs of vertices
+by batch processing: They combine an~algorithm of time complexity $\O(m\timesalpha(m,n))$
+based on the Disjoint Set Union data structure with the framework of topological graph
+computations described in Section \ref{bucketsort}. Then they use a~similar
+technique for finding the peaks themselves.
+
+\paran{Online verification}%
+The online version of this problem has turned out to be more difficult. It calls for an~algorithm
+that preprocesses the tree and then answers queries for peaks of paths presented online. Pettie
+\cite{pettie:onlineverify} has proven an~interesting lower bound based on the inverses of the
+Ackermann's function. If we want to answer queries within $t$~comparisons, we
+have to invest $\Omega(n\log\lambda_t(n))$ time into preprocessing.\foot{$\lambda_t(n)$ is the
+$t$-th row inverse of the Ackermann's function, $\alpha(n)$ is its diagonal inverse. See
+\ref{ackerinv} for the exact definitions.} This implies that with
+preprocessing in linear time, the queries require $\Omega(\alpha(n))$ time.