+paths. The reduction costs $\O(m+n)$ comparisons.
+Then we run the \<FindPeaks> procedure (Algorithm \ref{findpeaks}) to find
+the tops of all query paths. According to Lemma \ref{vercompares}, this spends another $\O(m+n)$
+comparisons. Since we (as always) assume that~$G$ is connected, $\O(m+n)=\O(m)$.
+\qed
+
+\paran{Applications}%
+The problem of computing path maxima or minima in a~weighted tree has several other interesting
+applications. One of them is computing minimum cuts separating given pairs of vertices in a~given
+weighted undirected graph~$G$. We construct a~Gomory-Hu tree~$T$ for the graph as described in \cite{gomoryhu}
+(see also \cite{bhalgat:ght} for a~more efficient algorithm running in time
+$\widetilde\O(mn)$ for unit-cost graphs). The crucial property of this tree is that for every two
+vertices $u$, $v$ of the graph~$G$, the minimum-cost edge on $T[u,v]$ has the same cost
+as the minimum cut separating $u$ and~$v$ in~$G$. Since the construction of~$T$ generally
+takes $\Omega(n^2)$ time, we could of course invest this time in precomputing the minima for
+all pairs of vertices. This would however require quadratic space, so we can better use
+the method of this section which fits in $\O(n+q)$ space for $q$~queries.
+
+\paran{Dynamic verification}%
+A~dynamic version of the problem is also often considered. It calls for a~data structure
+representing a~weighted forest with operations for modifying the structure of the forest
+and querying minima or maxima on paths. Sleator and Tarjan have shown in \cite{sleator:trees}
+how to do this in $\O(\log n)$ time amortized per operation, which leads to
+an~implementation of the Dinic's maximum flow algorithm \cite{dinic:flow}
+in time $\O(mn\log n)$.
+
+%--------------------------------------------------------------------------------
+
+\section{Verification in linear time}\id{verifysect}%
+
+We have proven that $\O(m)$ edge weight comparisons suffice to verify minimality
+of a~given spanning tree. Now we will show an~algorithm for the RAM
+which finds the required comparisons in linear time. We will follow the idea
+of King from \cite{king:verifytwo}, but as we have the power of the RAM data structures
+from Section~\ref{bitsect} at our command, the low-level details will be easier,
+especially the construction of vertex and edge labels.
+
+\para
+First of all, let us make sure that the reduction to fully branching trees
+in the Balancing lemma (\ref{verbranch}) can be made run in linear time. As already noticed
+in the proof, the Bor\o{u}vka's algorithm runs in linear time. Constructing
+the Bor\o{u}vka tree in the process adds at most a~constant overhead to every
+step of the algorithm.
+
+Finding the common ancestors is not trivial, but Harel and Tarjan have shown
+in \cite{harel:nca} that linear time is sufficient on the RAM. Several more
+accessible algorithms have been developed since then (see the Alstrup's survey
+paper \cite{alstrup:nca} and a~particularly elegant algorithm described by Bender
+and Falach-Colton in \cite{bender:lca}). Any of them implies the following
+theorem:
+
+\thmn{Lowest common ancestors}\id{lcathm}%
+On the RAM, it is possible to preprocess a~tree~$T$ in time $\O(n)$ and then
+answer lowest common ancestor queries presented online in constant time.
+
+\cor
+The reductions in Lemma \ref{verbranch} can be performed in time $\O(m)$.
+
+\para
+Having the balanced problem at hand, it remains to implement the procedure \<FindPeaks>
+of Algorithm \ref{findpeaks} efficiently. We need a~compact representation of
+the arrays $T_e$ and~$P_e$, which will allow to reduce the overhead of the algorithm
+to time linear in the number of comparisons performed. To achieve
+this goal, we will encode the arrays in RAM vectors (see Section \ref{bitsect}
+for the vector operations).
+
+\defn
+
+\em{Vertex identifiers:} Since all vertices processed by the procedure
+lie on the path from the root to the current vertex~$u$, we modify the algorithm
+to keep a~stack of these vertices in an~array. We will often refer to each vertex by its
+index in this array, i.e., by its depth. We will call these identifiers \df{vertex
+labels} and we note that each label requires only $\ell=\lceil \log\lceil\log n\rceil\rceil$
+bits. As every tree edge is uniquely identified by its bottom vertex, we can
+use the same encoding for \df{edge labels.}
+
+\em{Slots:} As we are going to need several operations which are not computable
+in constant time on the RAM, we precompute tables for these operations
+like we did in the Q-heaps (cf.~Lemma \ref{qhprecomp}). A~table for word-sized
+arguments would take too much time to precompute, so we will generally store
+our data structures in \df{slots} of $s=\lceil 1/3\cdot\log n\rceil$ bits each.
+We will soon show that it is possible to precompute a~table of any reasonable
+function whose arguments fit in two slots.
+
+\em{Top masks:} The array~$T_e$ will be represented by a~bit mask~$M_e$ called the \df{top mask.} For each
+of the possible tops~$t$ (i.e., the ancestors of the current vertex), we store
+a~single bit telling whether $t\in T_e$. Each top mask fits in $\lceil\log n\rceil$
+bits and therefore in a~single machine word. If needed, it can be split to three slots.
+Unions and intersections of sets of tops then translate to $\band$/$\bor$
+on the top masks.
+
+\em{Small and big lists:} The heaviest edge found so far for each top is stored
+by the algorithm in the array~$P_e$. Instead of keeping the real array,
+we store the labels of these edges in a~list encoded in a~bit string.
+Depending on the size of the list, we use one of two possible encodings:
+\df{Small lists} are stored in a~vector that fits in a~single slot, with
+the unused fields filled by a~special constant, so that we can easily infer the
+length of the list.
+
+If the data do not fit in a~small list, we use a~\df{big list} instead. It
+is stored in $\O(\log\log n)$ words, each of them containing a~slot-sized
+vector. Unlike the small lists, we use the big lists as arrays. If a~top~$t$ of
+depth~$d$ is active, we keep the corresponding entry of~$P_e$ in the $d$-th
+field of the big list. Otherwise, we keep that entry unused.
+
+We want to perform all operations on small lists in constant time,
+but we can afford spending time $\O(\log\log n)$ on every big list. This
+is true because whenever we use a~big list, $\vert T_e\vert = \Omega(\log n/\log\log n)$,
+hence we need $\log\vert T_e\vert = \Omega(\log\log n)$ comparisons anyway.
+
+\em{Pointers:} When we need to construct a~small list containing a~sub-list
+of a~big list, we do not have enough time to see the whole big list. To handle
+this, we introduce \df{pointers} as another kind of edge identifiers.
+A~pointer is an~index to the nearest big list on the path from the small
+list containing the pointer to the root. As each big list has at most $\lceil\log n\rceil$
+fields, the pointer fits in~$\ell$ bits, but we need one extra bit to distinguish
+between regular labels and pointers.
+
+\lemman{Precomputation of tables}
+When~$f$ is a~function of up to two arguments computable in polynomial time, we can
+precompute a~table of the values of~$f$ for all values of arguments that fit
+in a~single slot. The precomputation takes $\O(n)$ time.
+
+\proof
+Similar to the proof of Lemma \ref{qhprecomp}. There are $\O(2^{2s}) = \O(n^{2/3})$
+possible values of arguments, so the precomputation takes time $\O(n^{2/3}\cdot\poly(s))
+= \O(n^{2/3}\cdot\poly(\log n)) = \O(n)$.
+\qed
+
+\example
+As we can afford spending $\O(n)$ time on preprocessing,
+we can assume that we can compute the following functions in constant time:
+
+\itemize\ibull
+\:$\<Weight>(x)$ --- the Hamming weight of a~slot-sized number~$x$
+(we already considered this operation in Algorithm \ref{lsbmsb}, but we needed
+quadratic word size for it). We can easily extend this function to $\log n$-bit numbers
+by splitting the number in three slots and adding their weights.
+
+\:$\<FindKth>(x,k)$ --- the $k$-th set bit from the top of the slot-sized
+number~$x$. Again, this can be extended to multi-slot numbers by calculating
+the \<Weight> of each slot first and then finding the slot containing the
+$k$-th~\1.
+
+\:$\<Bits>(m)$ --- for a~slot-sized bit mask~$m$, it returns a~small list
+of the positions of the bits set in~$\(m)$.
+
+\:$\<Select>(x,m)$ --- constructs a~slot containing the substring of $\(x)$
+selected by the bits set in~$\(m)$.
+
+\:$\<SubList>(x,m)$ --- when~$x$ is a~small list and~$m$ a bit mask, it returns
+a~small list containing the elements of~$x$ selected by the bits set in~$m$.
+\endlist
+
+\para
+We will now show how to perform all parts of the procedure \<FindPeaks>
+in the required time. We will denote the size of the tree by~$n$ and the
+number of query paths by~$q$.
+
+\lemma
+Depths of all vertices and all top masks can be computed in time $\O(n+q)$.
+
+\proof
+Run depth-first search on the tree, assign the depth of a~vertex when entering
+it and construct its top mask when leaving it. The top mask can be obtained
+by $\bor$-ing the masks of its sons, excluding the level of the sons and
+including the tops of all query paths that have their bottoms at the current vertex
+(the depths of the tops are already assigned).