Now we will turn our attention to a~slightly different problem: given a~spanning
tree, how to verify that it is minimum? We will show that this can be achieved
-in linear time and it will serve as a~basis for the randomized linear-time
-MST algorithm in the next section.
+in linear time and it will serve as a~basis for a~randomized linear-time
+MST algorithm in Section~\ref{randmst}.
MST verification has been studied by Koml\'os \cite{komlos:verify}, who has
proven that $\O(m)$ edge comparisons are sufficient, but his algorithm needed
superlinear time to find the edges to compare. Dixon, Rauch and Tarjan
have later shown in \cite{dixon:verify} that the overhead can be reduced
to linear time on the RAM using preprocessing and table lookup on small
-subtrees. This algorithm was then simplified by King in \cite{king:verifytwo}.
-We will follow the results of Koml\'os and King, but as we have the power of the
-RAM data structures from Section~\ref{bitsect} at our command, we will not have
-to worry about much technical details.
+subtrees. Later, King has given a~simpler algorithm in \cite{king:verifytwo}.
+In this section, we will follow Koml\'os's steps and study the comparisons
+needed, saving the actual efficient implementation for later.
To verify that a~spanning~$T$ is minimum, it is sufficient to check that all
edges outside~$T$ are $T$-heavy (by Theorem \ref{mstthm}). In fact, we will be
able to find all $T$-light edges efficiently. For each edge $uv\in E\setminus T$,
\df{lowest common ancestor} of~$x$ and~$y$ in~$T$. It is therefore sufficient to
consider only paths which connect a~vertex with one of its ancestors.
-Finding the common ancestors is not trivial, but Harel and Tarjan have shown
-in \cite{harel:nca} that linear time is sufficient on the RAM. Several more
-accessible algorithms have been developed since then (see the Alstrup's survey
-paper \cite{alstrup:nca} and a~particularly elegant algorithm shown by Bender
-and Falach-Colton in \cite{bender:lca}). Any of them implies the following
-\thmn{Lowest common ancestors}\id{lcathm}%
-On the RAM, it is possible to preprocess a~tree~$T$ in time $\O(n)$ and then
-answer lowest common ancestor queries presented online in constant time.
-See for example Bender and Falach-Colton \cite{bender:lca}.
-We summarize the reductions in the following lemma, also showing that they can
-be performed in linear time:
+When we combine the two transforms, we get:
-For each tree~$T$ and a~set of query paths~$Q$ on~$T$, it is possible to find
-a~complete branching tree~$T'$ in linear time together with a~set~$Q'$ of query
+For each tree~$T$ on $n$~vertices and a~set of $m$~query paths~$Q$ on~$T$, it is possible to find
+a~complete branching tree~$T'$ in $\O(n)$ comparisons, together with a~set~$Q'$ of
paths on~$T'$, such that the weights of the heaviest edges of the new paths can
-be transformed to the weights of the heaviest edges of the paths in~$Q$ in
-linear time.
-The tree $T$ has at most $2n(T)$ vertices and $\O(\log n(T))$ levels. The set~$Q'$ contains
-at most~$2\vert Q\vert$ paths and each of them connects a~vertex of~$T'$ with one
+be transformed to the weights of the heaviest edges of the paths in~$Q$ in $\O(m)$ comparisons.
+The tree $T'$ has at most $2n(T)$ vertices and $\O(\log n(T))$ levels. The set~$Q'$ contains
+at most~$2m$ paths and each of them connects a~vertex of~$T'$ with one
of its ancestors.
-The tree~$T'$ will be the Bor\o{u}vka tree for~$T$. We run the contractive version
-of the Bor\o{u}vka's algorithm (Algorithm \ref{contbor}) on~$T$. It runs in linear time,
-for example because trees are planar (Theorem \ref{planarbor}). The construction of~$T'$
-itself adds only a~constant overhead to every step of the algorithm. As~$T'$ has~$m(T)=n(T)-1$
-leaves and it is a~complete branching tree, it has at most~$m(T)$ internal vertices,
-so~$n(T')\le 2n(T)$ as promised. Since all internal vertices have at least two sons,
-the depth must be logarithmic.
+The tree~$T'$ will be the Bor\o{u}vka tree for~$T$, obtained by running the
+contractive version of the Bor\o{u}vka's algorithm (Algorithm \ref{contbor})
+on~$T$. The algorithm runs in linear time, for example because trees are planar
+(Theorem \ref{planarbor}). We therefore spend $\O(n)$ comparisons in it.
+As~$T'$ has~$m(T)=n(T)-1$ leaves and it is a~complete branching tree, it has at most~$m(T)$ internal vertices,
+so~$n(T')\le 2n(T)$ as promised. Since the number of passes of the Bor\o{u}vka's
+algorithm is $\O(\log n)$, the depth of the tree must be logarithmic as well.
For each query path $T[x,y]$ we find the lowest common ancestor of~$x$ and~$y$
-using Theorem \ref{lcathm} and replace the path by the two half-paths. This
-produces a~set~$Q'$ of at most~$2\vert Q\vert$ paths. If we remember the origin
-of each of the new paths, the reconstruction of answers to the original queries
-is then trivial.
+and replace the path by the two half-paths. This
+produces a~set~$Q'$ of at most~$2m$ half-paths. If we remember the origin
+of each of the new half-paths, the reconstruction of answers to the original queries
+is then just taking the minimum of the answers for the two half-paths.
We will now describe a~simple variant of depth-first search which finds the
-maximum-weight edges for all query paths of the transformed problem. For the
-time being, we will not care about the time complexity of the algorithm (as long
-as it is polynomial) and we will minimize only the number of edge weight comparisons
+heaviest edges for all query paths of the transformed problem. As we promised,
+we will take care of the number of comparisons only, as long as all other operations
+are well-defined and they can be performed in polynomial time.
For every edge~$e=uv$, we consider the set $Q_e$ of all query paths containing~$e$.
The vertex of a~path, which is closer to the root, will be called its \df{top,}
\:First find the tops~$T$ which will be shared by all edges going from~$u$ downwards.
These are the tops from~$T_p$ except for the ones which have ceased to be active,
-because all query paths which were referring to them have~$u$ as their bottom.
+because all query paths which were referring to them either have~$u$ as their bottom
+or continue from~$u$ downwards by another edge.
Select the corresponding array of the heaviest edges~$H$ from~$H_p$.
\:For every son~$v$ of~$u$, do:
Putting all three parts together, we conclude that:
-c \le n + m + \O(n) = \O(n+m).
+c \le n + m + \O(n) = \O(n+m). \qedmath
When we combine this lemma with the above reduction, we get the following theorem:
perform $\O(m)$ comparisons of edge weights to determine whether~$T$ is minimum
and to find all $T$-light edges in~$G$.
We first transform the problem to finding the heaviest edges for a~set
of query paths in~$T$ (these are exactly the paths covered by the edges
of $G\setminus T$). We use the reduction from Lemma \ref{verbranch} to get
an~equivalent problem with a~full branching tree and a~set of parent-descendant
-paths. Then we run the \<FindHeavy> procedure (\ref{findheavy}) to find
-the heaviest edges and we employ Lemma \ref{vercompares} to bound the number
-of comparisons used.
+paths, which costs $\O(m+n)$ comparisons.
+Then we run the \<FindHeavy> procedure (\ref{findheavy}) to find
+the heaviest edges and according to Lemma \ref{vercompares} it spends
+another $\O(m+n)$ comparisons. Since we (as always) assume that~$G$ is connected,
+The problem of computing path maxima or minima in a~tree has several other interesting applications,
+such as computing minimum cuts separating given pairs of vertices in a~given weighted undirected
+graph~$G$. We construct a~Gomory-Hu tree~$T$ for the graph as described in \cite{gomoryhu}
+(see also \cite{bhalgat:ght} for a~more efficient algorithm running in time
+$\widetilde\O(mn)$ for unit-cost graphs). This tree has the property that for every two
+vertices $u$, $v$ of the graph~$G$, the minimum-cost edge on $T[u,v]$ has the same cost
+as the minimum cut separating $u$ and~$v$ in~$G$. Since the construction of~$T$ generally
+takes $\Omega(n^2)$ time, we could also precompute the minima for all pairs of vertices
+at no extra cost. This would however require quadratic space, while the method of this
+section needs only $\O(n+q)$ to process~$q$ queries.
+A~dynamic version of the problem is also often considered. It calls for a~data structure
+representing a~weighted tree with operations for modifying the structure of the tree
+and querying minima or maxima on paths. Sleator and Tarjan have shown in \cite{sleator:trees}
+how to do this in $\O(\log n)$ time amortized per operation, which for example
+allows an~implementation of the Dinic's maximum flow algorithm \cite{dinic:flow}
+in time $\O(mn\log n)$.
+\section{Verification in linear time}
+We have proven that $\O(m)$ edge weight comparisons suffice to verify minimality
+of a~given spanning tree. In this section, we will show an~algorithm for the RAM,
+which finds the required comparisons in linear time. We will follow the idea
+of King from \cite{king:verify}, but as we have the power of the RAM data structures
+from Section~\ref{bitsect} at our command, the low-level details will be easier.
+First of all, let us make sure that the reduction to fully branching trees
+in Lemma \ref{verbranch} can be made run in linear time. As already noticed
+in the proof, the Bor\o{u}vka's algorithm runs in linear time. Constructing
+the Bor\o{u}vka tree in the process adds at most a~constant overhead to every
+step of the algorithm.
+Finding the common ancestors is not trivial, but Harel and Tarjan have shown
+in \cite{harel:nca} that linear time is sufficient on the RAM. Several more
+accessible algorithms have been developed since then (see the Alstrup's survey
+paper \cite{alstrup:nca} and a~particularly elegant algorithm shown by Bender
+and Falach-Colton in \cite{bender:lca}). Any of them implies the following
+\thmn{Lowest common ancestors}\id{lcathm}%
+On the RAM, it is possible to preprocess a~tree~$T$ in time $\O(n)$ and then
+answer lowest common ancestor queries presented online in constant time.
+See for example Bender and Falach-Colton \cite{bender:lca}.
+The reductions in Lemma \ref{verbranch} can be performed in time $\O(m)$.
-We will now show an~efficient implementation of \<FindHeavy>, which will
-run in linear time on the RAM.
+Having the reduced problem at hand, we have to implement the procedure \<FindHeavy>
+of Algorithm \ref{findheavy} efficiently. We need a~compact representation of
+the arrays $T_e$ and~$H_e$ by vectors, so that the overhead of the algorithm
+will be linear in the number of comparisons performed.
+\em{Vertex identifiers:} Since all vertices referred to by the procedure
+lie on the path from root to the current vertex~$u$, we modify the algorithm
+to keep a~stack of these vertices in an~array and refer to each vertex by its
+index in this array, i.e., by its depth. We will call these identifiers \df{vertex
+labels} and we note that each label require only $\ell=\lceil \log\log n\rceil$
+bits. As every tree edge is uniquely identified by its bottom vertex, we can
+use the same encoding for \df{edge labels.}
+\em{Slots:} As we will need several operations which are not computable
+in constant time on the RAM, we precompute tables for these operations
+like we did in the Q-Heaps (cf.~Lemma \ref{qhprecomp}). A~table for word-sized
+arguments would take too much time to precompute, so we will generally store
+our data structures in \df{slots} of $s=1/3\cdot\lceil\log n\rceil$ bits each.
+We will show soon that it is possible to precompute a~table of any reasonable
+function whose arguments fit in two slots.
+\em{Top masks:} The array~$T_e$ will be represented by bit masks. For each
+of the possible tops~$t$ (i.e., the ancestors of the current vertex), we store
+a~single bit telling whether $t\in T_e$. Each bit mask fits in $\lceil\log n\rceil$
+bits and therefore in a~single machine word. If needed, it can be split to three slots.
+\em{Small and big lists:} The heaviest edge found so far for each top is stored
+by the algorithm in the array~$H_e$. Instead of keeping the real array,
+we store the labels of these edges in a~list encoded in a~bit string.
+Depending on the size of the list, we use two one of two possible encodings:
+\df{Small lists} are stored in a~vector which fits in a~single slot, with
+the unused entries filled by a~special constant, so that we can infer the
+length of the list.
+If the data do not fit in a~small list, we use a~\df{big list} instead, which
+is stored in $\O(\log\log n)$ words, each of them containing a~slot-sized
+vector. Unlike the small lists, we use the big lists as arrays. If a~top~$t$ of
+depth~$d$ is active, we keep the corresponding entry of~$H_e$ in the $d$-th
+entry of the big list. Otherwise, we keep that entry unused.
+We will want to perform all operations on small lists in constant time,
+but we can afford spending time $\O(\log\log n)$ on every big list. This
+is true because whenever we need a~big list, $\vert T_e\vert = \Omega(\log n/\log\log n)$,
+so we need $\log\vert T_e\vert = \Omega(\log\log n)$ comparisons anyway.
+\em{Pointers:} When we need to construct a~small list containing a~sub-list
+of a~big list, we do not have enough time to see the whole big list. To solve
+this problem, we will introduce \df{pointers} as another kind of edge identifiers.
+A~pointer is an~index to the nearest big list on the path from the small
+list containing the pointer to the root. As each big list has at most $\lceil\log n\rceil$
+entries, the pointer fits in~$\ell$ bits, but we need one extra bit to distinguish
+between normal labels and pointers.
+When~$f$ is a~function of two arguments computable in polynomial time, we can
+precompute a~table of the values of~$f$ for all values of arguments which fit
+in a~single slot. The precomputation takes $\O(n)$ time.
+Similar to the proof of Lemma \ref{qhprecomp}. There are $\O(2^{2s}) = \O(n^{2/3})$
+possible values of arguments, so the precomputation takes time $\O(n^{2/3}\cdot\poly(s))
+= \O(n^{2/3}\cdot\poly(\log n)) = \O(n)$.
+We can assume we can compute the following functions in constant time (after $\O(n)$ preprocessing):
+\:$\<Weight>(x)$ --- computes the Hamming weight of a~slot-sized number~$x$
+(we already considered this operation in Algorithm \ref{lsbmsb}, but we needed
+quadratic word size for it). We can easily extend this to $\log n$-bit numbers
+by splitting the number in three slots and adding their weights.
+\:$\<FindKth>(x,k)$ --- find the $k$-th set bit from the top of the slot-sized
+number~$x$. Again, this can be extended to multi-slot numbers by calculating
+the \<Weight> of each slot first and then finding the slot containing the
+$k$-th one.
+\:$\<Bits>(m)$ --- for a~slot-sized bit mask~$m$, it returns a~small list
+of the positions of bits set in~$m$.
+\:$\<SubList>(x,m)$ --- when~$x$ is a~small list and~$m$ a bit mask, it returns
+a~small list containing the elements of~$x$ selected by the bits set in~$m$.
+We will now show how to perform all parts of the procedure \<FindHeavy>
+in the required time.
+The array $H_e$ can be indexed in constant time.
+\section{A~randomized algorithm}\id{randmst}%