Corrections to Sections 5.1--5.2.

[saga.git] / adv.tex
diff --git a/adv.tex b/adv.tex

index 8eff27580a3607338733aebef8c2d3bfd93e9fa9..b21a363c9eda3ef5e7de5066311db598ec0da307 100644 (file)
--- a/adv.tex
+++ b/adv.tex
@@ -6,9 +6,9 @@
  
  \section{Minor-closed graph classes}\id{minorclosed}%
  
  
  \section{Minor-closed graph classes}\id{minorclosed}%
  
-The contractive algorithm given in section~\ref{contalg} has been found to perform
+The contractive algorithm given in Section~\ref{contalg} has been found to perform
  well on planar graphs, but in the general case its time complexity was not linear.
  well on planar graphs, but in the general case its time complexity was not linear.
-Can we find any broader class of graphs where the algorithm is still efficient?
+Can we find any broader class of graphs where this algorithm is still efficient?
  The right context turns out to be the minor-closed graph classes, which are
  closed under contractions and have bounded density.
  
  The right context turns out to be the minor-closed graph classes, which are
  closed under contractions and have bounded density.
  
@@ -18,7 +18,7 @@ from a~subgraph of~$G$ by a sequence of simple graph contractions (see \ref{simp
  
  \defn
  A~class~$\cal C$ of graphs is \df{minor-closed}, when for every $G\in\cal C$ and
  
  \defn
  A~class~$\cal C$ of graphs is \df{minor-closed}, when for every $G\in\cal C$ and
-its every minor~$H$, the graph~$H$ lies in~$\cal C$ as well. A~class~$\cal C$ is called
+every minor~$H$ of~$G$, the graph~$H$ lies in~$\cal C$ as well. A~class~$\cal C$ is called
  \df{non-trivial} if at least one graph lies in~$\cal C$ and at least one lies outside~$\cal C$.
  
  \example
  \df{non-trivial} if at least one graph lies in~$\cal C$ and at least one lies outside~$\cal C$.
  
  \example
@@ -32,14 +32,14 @@ Non-trivial minor-closed classes include:
  
  \para
  Many of the nice structural properties of planar graphs extend to
  
  \para
  Many of the nice structural properties of planar graphs extend to
-minor-closed classes, too (see \cite{lovasz:minors} for a~nice survey
-of this theory and \cite{diestel:gt} for some of the deeper results).
+minor-closed classes, too (see Lov\'asz \cite{lovasz:minors} for a~nice survey
+of this theory and Diestel \cite{diestel:gt} for some of the deeper results).
  The most important property is probably the characterization
  of such classes in terms of their forbidden minors.
  
  \defn
  For a~class~$\cal H$ of graphs we define $\Forb({\cal H})$ as the class
  The most important property is probably the characterization
  of such classes in terms of their forbidden minors.
  
  \defn
  For a~class~$\cal H$ of graphs we define $\Forb({\cal H})$ as the class
-of graphs which do not contain any of the graphs in~$\cal H$ as a~minor.
+of graphs that do not contain any of the graphs in~$\cal H$ as a~minor.
  We will call $\cal H$ the set of \df{forbidden (or excluded) minors} for this class.
  We will often abbreviate $\Forb(\{M_1,\ldots,M_n\})$ to $\Forb(M_1,\ldots,M_n)$.
  
  We will call $\cal H$ the set of \df{forbidden (or excluded) minors} for this class.
  We will often abbreviate $\Forb(\{M_1,\ldots,M_n\})$ to $\Forb(M_1,\ldots,M_n)$.
  
@@ -52,7 +52,7 @@ For example, the planar graphs can be equivalently described as the class $\Forb
  --- this follows from the Kuratowski's theorem (the theorem speaks of forbidden
  subdivisions, but while in general this is not the same as forbidden minors, it
  is for $K_5$ and $K_{3,3}$). The celebrated theorem by Robertson and Seymour
  --- this follows from the Kuratowski's theorem (the theorem speaks of forbidden
  subdivisions, but while in general this is not the same as forbidden minors, it
  is for $K_5$ and $K_{3,3}$). The celebrated theorem by Robertson and Seymour
-guarantees that we can always find a~finite set of forbidden minors.
+guarantees that we can always find a~finite set of forbidden minors:
  
  \thmn{Excluded minors, Robertson \& Seymour \cite{rs:wagner}}
  For every non-trivial minor-closed graph class~$\cal C$ there exists
  
  \thmn{Excluded minors, Robertson \& Seymour \cite{rs:wagner}}
  For every non-trivial minor-closed graph class~$\cal C$ there exists
@@ -69,12 +69,12 @@ For analysis of the contractive algorithm,
  we will make use of another important property --- the bounded density of
  minor-closed classes. The connection between minors and density dates back to
  Mader in the 1960's and it can be proven without use of the Robertson-Seymour
  we will make use of another important property --- the bounded density of
  minor-closed classes. The connection between minors and density dates back to
  Mader in the 1960's and it can be proven without use of the Robertson-Seymour
-theorem.
+theory.
  
  \defn\id{density}%
  
  \defn\id{density}%
-Let $\cal C$ be a class of graphs. We define its \df{edge density} $\varrho(\cal C)$
-to be the infimum of all~$\varrho$'s such that $m(G) \le \varrho\cdot n(G)$
-holds for every $G\in\cal C$.
+Let $G$ be a~graph and $\cal C$ be a class of graphs. We define the \df{edge density}
+$\varrho(G)$ of~$G$ as the average number of edges per vertex, i.e., $m(G)/n(G)$. The
+edge density $\varrho(\cal C)$ of the class is then defined as the infimum of $\varrho(G)$ over all $G\in\cal C$.
  
  \thmn{Mader \cite{mader:dens}}
  For every $k\in{\bb N}$ there exists $h(k)\in{\bb R}$ such that every graph
  
  \thmn{Mader \cite{mader:dens}}
  For every $k\in{\bb N}$ there exists $h(k)\in{\bb R}$ such that every graph
@@ -85,7 +85,7 @@ of average degree at least~$h(k)$ contains a~subdivision of~$K_{k}$ as a~subgrap
  
  Let us fix~$k$ and prove by induction on~$m$ that every graph of average
  degree at least~$2^m$ contains a~subdivision of some graph with $k$~vertices
  
  Let us fix~$k$ and prove by induction on~$m$ that every graph of average
  degree at least~$2^m$ contains a~subdivision of some graph with $k$~vertices
-and ${k\choose 2}\ge m\ge k$~edges. For $m={k\choose 2}$ the theorem follows
+and $m$~edges (for $k\le m\le {k\choose 2}$). When we reach $m={k\choose 2}$, the theorem follows
  as the only graph with~$k$ vertices and~$k\choose 2$ edges is~$K_k$.
  
  The base case $m=k$: Let us observe that when the average degree
  as the only graph with~$k$ vertices and~$k\choose 2$ edges is~$K_k$.
  
  The base case $m=k$: Let us observe that when the average degree
@@ -98,13 +98,13 @@ the cycle~$C_k$.
  Induction step: Let~$G$ be a~graph with average degree at least~$2^m$ and
  assume that the theorem already holds for $m-1$. Without loss of generality,
  $G$~is connected. Consider a~maximal set $U\subseteq V$ such that the subgraph $G[U]$
  Induction step: Let~$G$ be a~graph with average degree at least~$2^m$ and
  assume that the theorem already holds for $m-1$. Without loss of generality,
  $G$~is connected. Consider a~maximal set $U\subseteq V$ such that the subgraph $G[U]$
-induced by~$U$ is connected and the graph $G.U$ ($G$~with $U$~contracted to
+induced by~$U$ is connected and the graph $G\sgc U$ ($G$~with $U$~contracted to
  a~single vertex) has average degree at least~$2^m$ (such~$U$ exists, because
  a~single vertex) has average degree at least~$2^m$ (such~$U$ exists, because
-$G=G.U$ whenever $\vert U\vert=1$). Now consider the subgraph~$H$ induced
+$G=G\sgc U$ whenever $\vert U\vert=1$). Now consider the subgraph~$H$ induced
  in~$G$ by the neighbors of~$U$. Every $v\in V(H)$ must have $\deg_H(v) \ge 2^{m-1}$,
  as otherwise we can add this vertex to~$U$, contradicting its
  maximality. By the induction hypothesis, $H$ contains a~subdivision of some
  in~$G$ by the neighbors of~$U$. Every $v\in V(H)$ must have $\deg_H(v) \ge 2^{m-1}$,
  as otherwise we can add this vertex to~$U$, contradicting its
  maximality. By the induction hypothesis, $H$ contains a~subdivision of some
-graph~$R$ with $r$~vertices and $m-1$ edges. Any two non-adjacent vertices
+graph~$R$ with $k$~vertices and $m-1$ edges. Any two non-adjacent vertices
  of~$R$ can be connected in the subdivision by a~path lying entirely in~$G[U]$,
  which reveals a~subdivision of a~graph with $m$~edges. \qed
  
  of~$R$ can be connected in the subdivision by a~path lying entirely in~$G[U]$,
  which reveals a~subdivision of a~graph with $m$~edges. \qed
  
@@ -112,8 +112,9 @@ which reveals a~subdivision of a~graph with $m$~edges. \qed
  Every non-trivial minor-closed class of graphs has finite edge density.
  
  \proof
  Every non-trivial minor-closed class of graphs has finite edge density.
  
  \proof
-Let~$\cal C$ be any such class, $X$~its smallest excluded minor and $x=n(X)$.
-As $H\minorof K_x$, the class $\cal C$ entirely lies in ${\cal C}'=\Forb(K_x)$, so
+Let~$\cal C$ be any such class, $X$~its excluded minor with the smallest number
+of vertices~$x$.
+As $X\minorof K_x$, the class $\cal C$ is entirely contained in ${\cal C}'=\Forb(K_x)$, so
  $\varrho({\cal C}) \le \varrho({\cal C}')$ and therefore it suffices to prove the
  theorem for classes excluding a~single complete graph~$K_x$.
  
  $\varrho({\cal C}) \le \varrho({\cal C}')$ and therefore it suffices to prove the
  theorem for classes excluding a~single complete graph~$K_x$.
  
@@ -124,29 +125,32 @@ $G$~would contain a~subdivision of~$K_x$ and hence $K_x$ as a~minor.
  \qed
  
  \rem
  \qed
  
  \rem
-Minor-closed classes share many other interesting properties, as shown for
-example by Theorem 6.1 of \cite{nesetril:minors}.
+Minor-closed classes share many other interesting properties, for example bounded chromatic
+numbers of various kinds, as shown by Theorem 6.1 of \cite{nesetril:minors}.
  
  
-\thmn{MST on minor-closed classes \cite{mm:mst}}\id{mstmcc}%
+Let us return to the analysis of our algorithm.
+
+\thmn{MST on minor-closed classes, Mare\v{s} \cite{mm:mst}}\id{mstmcc}%
  For any fixed non-trivial minor-closed class~$\cal C$ of graphs, the Contractive Bor\o{u}vka's
  algorithm (\ref{contbor}) finds the MST of any graph of this class in time
  $\O(n)$. (The constant hidden in the~$\O$ depends on the class.)
  
  \proof
  Following the proof for planar graphs (\ref{planarbor}), we denote the graph considered
  For any fixed non-trivial minor-closed class~$\cal C$ of graphs, the Contractive Bor\o{u}vka's
  algorithm (\ref{contbor}) finds the MST of any graph of this class in time
  $\O(n)$. (The constant hidden in the~$\O$ depends on the class.)
  
  \proof
  Following the proof for planar graphs (\ref{planarbor}), we denote the graph considered
-by the algorithm at the beginning of the $i$-th iteration by~$G_i$ and its number of vertices
+by the algorithm at the beginning of the $i$-th Bor\o{u}vka step by~$G_i$ and its number of vertices
  and edges by $n_i$ and $m_i$ respectively. Again the $i$-th phase runs in time $\O(m_i)$
  and edges by $n_i$ and $m_i$ respectively. Again the $i$-th phase runs in time $\O(m_i)$
-and $n_i \le n/2^i$, so it remains to show a linear bound for the $m_i$'s.
+and we have $n_i \le n/2^i$, so it remains to show a linear bound for the $m_i$'s.
  
  Since each $G_i$ is produced from~$G_{i-1}$ by a sequence of edge contractions,
  
  Since each $G_i$ is produced from~$G_{i-1}$ by a sequence of edge contractions,
-all $G_i$'s are minors of~$G$.\foot{Technically, these are multigraph contractions,
+all $G_i$'s are minors of the input graph.\foot{Technically, these are multigraph contractions,
  but followed by flattening, so they are equivalent to contractions on simple graphs.}
  but followed by flattening, so they are equivalent to contractions on simple graphs.}
-So they also belong to~$\cal C$ and by the previous theorem $m_i\le \varrho({\cal C})\cdot n_i$.
+So they also belong to~$\cal C$ and by the Density theorem $m_i\le \varrho({\cal C})\cdot n_i$.
+The time complexity is therefore $\sum_i \O(m_i) = \sum_i \O(n_i) = \O(\sum_i n/2^i) = \O(n)$.
  \qed
  
  \qed
  
-\rem\id{nobatch}%
+\paran{Local contractions}\id{nobatch}%
  The contractive algorithm uses ``batch processing'' to perform many contractions
  The contractive algorithm uses ``batch processing'' to perform many contractions
-in a single step. It is also possible to perform contractions one edge at a~time,
+in a single step. It is also possible to perform them one edge at a~time,
  batching only the flattenings. A~contraction of an edge~$uv$ can be done
  in time~$\O(\deg(u))$ by removing all edges incident with~$u$ and inserting them back
  with $u$ replaced by~$v$. Therefore we need to find a lot of vertices with small
  batching only the flattenings. A~contraction of an edge~$uv$ can be done
  in time~$\O(\deg(u))$ by removing all edges incident with~$u$ and inserting them back
  with $u$ replaced by~$v$. Therefore we need to find a lot of vertices with small
@@ -167,11 +171,11 @@ of edges being at most $\varrho n$.
  \rem
  The proof can be also viewed
  probabilistically: let $X$ be the degree of a vertex of~$G$ chosen uniformly at
  \rem
  The proof can be also viewed
  probabilistically: let $X$ be the degree of a vertex of~$G$ chosen uniformly at
-random. Then ${\bb E}X \le 2\varrho$, hence by the Markov's inequality
+random. Then $\E X \le 2\varrho$, hence by the Markov's inequality
  ${\rm Pr}[X > 4\varrho] < 1/2$, so for at least $n/2$ vertices~$v$ we have
  $\deg(v)\le 4\varrho$.
  
  ${\rm Pr}[X > 4\varrho] < 1/2$, so for at least $n/2$ vertices~$v$ we have
  $\deg(v)\le 4\varrho$.
  
-\algn{Local Bor\o{u}vka's Algorithm \cite{mm:mst}}%
+\algn{Local Bor\o{u}vka's Algorithm, Mare\v{s} \cite{mm:mst}}%
  \algo
  \algin A~graph~$G$ with an edge comparison oracle and a~parameter~$t\in{\bb N}$.
  \:$T\=\emptyset$.
  \algo
  \algin A~graph~$G$ with an edge comparison oracle and a~parameter~$t\in{\bb N}$.
  \:$T\=\emptyset$.
@@ -179,7 +183,7 @@ $\deg(v)\le 4\varrho$.
  \:While $n(G)>1$:
  \::While there exists a~vertex~$v$ such that $\deg(v)\le t$:
  \:::Select the lightest edge~$e$ incident with~$v$.
  \:While $n(G)>1$:
  \::While there exists a~vertex~$v$ such that $\deg(v)\le t$:
  \:::Select the lightest edge~$e$ incident with~$v$.
-\:::Contract~$G$ along~$e$.
+\:::Contract~$e$.
  \:::$T\=T + \ell(e)$.
  \::Flatten $G$, removing parallel edges and loops.
  \algout Minimum spanning tree~$T$.
  \:::$T\=T + \ell(e)$.
  \::Flatten $G$, removing parallel edges and loops.
  \algout Minimum spanning tree~$T$.
@@ -201,30 +205,30 @@ are minors of the graph~$G$ given as the input.
  For the choice $t=4\varrho$, the Lemma on low-degree vertices (\ref{lowdeg})
  guarantees that at the beginning of the $i$-th iteration, at least $n_i/2$ vertices
  have degree at most~$t$. Each selected edge removes one such vertex and
  For the choice $t=4\varrho$, the Lemma on low-degree vertices (\ref{lowdeg})
  guarantees that at the beginning of the $i$-th iteration, at least $n_i/2$ vertices
  have degree at most~$t$. Each selected edge removes one such vertex and
-possibly increases the degree of another, so at least $n_i/4$ edges get selected.
-Hence $n_i\le 3/4\cdot n_{i-1}$ and therefore $n_i\le n\cdot (3/4)^i$ and the
+possibly increases the degree of another one, so at least $n_i/4$ edges get selected.
+Hence $n_i\le 3/4\cdot n_{i-1}$ and $n_i\le n\cdot (3/4)^i$, so the
  algorithm terminates after $\O(\log n)$ iterations.
  
  Each selected edge belongs to $\mst(G)$, because it is the lightest edge of
  algorithm terminates after $\O(\log n)$ iterations.
  
  Each selected edge belongs to $\mst(G)$, because it is the lightest edge of
-the trivial cut $\delta(v)$ (see the Blue Rule in \ref{rbma}).
+the trivial cut $\delta(v)$ (see the Blue rule, Lemma \ref{rbma}).
  The steps 6 and~7 therefore correspond to the operation
  The steps 6 and~7 therefore correspond to the operation
-described by the Lemma on contraction of MST edges (\ref{contlemma}) and when
+described by the Contraction Lemma (\ref{contlemma}) and when
  the algorithm stops, $T$~is indeed the minimum spanning tree.
  
  the algorithm stops, $T$~is indeed the minimum spanning tree.
  
-It remains to analyse the time complexity of the algorithm. Since $G_i\in{\cal C}$, we have
+It remains to analyse the time complexity of the algorithm. Since $G_i\in{\cal C}$, we know that
  $m_i\le \varrho n_i \le \varrho n/2^i$.
  We will show that the $i$-th iteration is carried out in time $\O(m_i)$.
  Steps 5 and~6 run in time $\O(\deg(v))=\O(t)$ for each~$v$, so summed
  $m_i\le \varrho n_i \le \varrho n/2^i$.
  We will show that the $i$-th iteration is carried out in time $\O(m_i)$.
  Steps 5 and~6 run in time $\O(\deg(v))=\O(t)$ for each~$v$, so summed
-over all $v$'s they take $\O(tn_i)$, which is linear for a fixed class~$\cal C$.
-Flattening takes $\O(m_i)$, as already noted in the analysis of the Contracting
+over all $v$'s they take $\O(tn_i)$, which is $\O(n_i)$ for a fixed class~$\cal C$.
+Flattening takes $\O(m_i)$ as already noted in the analysis of the Contracting
  Bor\o{u}vka's Algorithm (see \ref{contiter}).
  
  The whole algorithm therefore runs in time $\O(\sum_i m_i) = \O(\sum_i n/2^i) = \O(n)$.
  \qed
  
  Bor\o{u}vka's Algorithm (see \ref{contiter}).
  
  The whole algorithm therefore runs in time $\O(\sum_i m_i) = \O(\sum_i n/2^i) = \O(n)$.
  \qed
  
-\rem
-For planar graphs, we can get a sharper version of the low-degree lemma,
-showing that the algorithm works with $t=8$ as well (we had $t=12$ as
+\paran{Back to planar graphs}%
+For planar graphs, we can obtain a sharper version of the low-degree lemma
+showing that the algorithm works with $t=8$ as well (we had $t=12$ from
  $\varrho=3$). While this does not change the asymptotic time complexity
  of the algorithm, the constant-factor speedup can still delight the hearts of
  its practical users.
  $\varrho=3$). While this does not change the asymptotic time complexity
  of the algorithm, the constant-factor speedup can still delight the hearts of
  its practical users.
@@ -236,40 +240,39 @@ have degree at most~8.
  \proof
  It suffices to show that the lemma holds for triangulations (if there
  are any edges missing, the situation can only get better) with at
  \proof
  It suffices to show that the lemma holds for triangulations (if there
  are any edges missing, the situation can only get better) with at
-least 3 vertices. Since $G$ is planar, $\sum_v \deg(v) < 6n$.
+least 4 vertices. Since $G$ is planar, we have $\sum_v \deg(v) < 6n$.
  The numbers $d(v):=\deg(v)-3$ are non-negative and $\sum_v d(v) < 3n$,
  so by the same argument as in the proof of the general lemma, for at least $n/2$
  The numbers $d(v):=\deg(v)-3$ are non-negative and $\sum_v d(v) < 3n$,
  so by the same argument as in the proof of the general lemma, for at least $n/2$
-vertices~$v$ it holds that $d(v) < 6$, hence $\deg(v) \le 8$.
+vertices~$v$ it holds that $d(v) < 6$, and thus $\deg(v) \le 8$.
  \qed
  
  \rem\id{hexa}%
  The constant~8 in the previous lemma is the best we can have.
  Consider a $k\times k$ triangular grid. It has $n=k^2$ vertices, $\O(k)$ of them
  \qed
  
  \rem\id{hexa}%
  The constant~8 in the previous lemma is the best we can have.
  Consider a $k\times k$ triangular grid. It has $n=k^2$ vertices, $\O(k)$ of them
-lie on the outer face and have degrees at most~6, the remaining $n-\O(k)$ interior
+lie on the outer face and they have degree at most~6, the remaining $n-\O(k)$ interior
  vertices have degree exactly~6. Therefore the number of faces~$f$ is $6/3\cdot n=2n$,
  ignoring terms of order $\O(k)$. All interior triangles can be properly colored with
  two colors, black and white. Now add a~new vertex inside each white face and connect
  vertices have degree exactly~6. Therefore the number of faces~$f$ is $6/3\cdot n=2n$,
  ignoring terms of order $\O(k)$. All interior triangles can be properly colored with
  two colors, black and white. Now add a~new vertex inside each white face and connect
-it to all three vertices on the boundary of that face. This adds $f/2 \approx n$
+it to all three vertices on the boundary of that face (see the picture). This adds $f/2 \approx n$
  vertices of degree~3 and it increases the degrees of the original $\approx n$ interior
  vertices of degree~3 and it increases the degrees of the original $\approx n$ interior
-vertices to~9, therefore about a half of the vertices of the new planar graph
+vertices to~9, therefore about a~half of the vertices of the new planar graph
  has degree~9.
  
  \figure{hexangle.eps}{\epsfxsize}{The construction from Remark~\ref{hexa}}
  
  \rem
  has degree~9.
  
  \figure{hexangle.eps}{\epsfxsize}{The construction from Remark~\ref{hexa}}
  
  \rem
-The observation in~Theorem~\ref{mstmcc} was also made by Gustedt in~\cite{gustedt:parallel},
-who studied a~parallel version of the contractive Bor\o{u}vka's algorithm applied
+The observation in~Theorem~\ref{mstmcc} was also independently made by Gustedt \cite{gustedt:parallel},
+who studied a~parallel version of the Contractive Bor\o{u}vka's algorithm applied
  to minor-closed classes.
  
  %--------------------------------------------------------------------------------
  
  to minor-closed classes.
  
  %--------------------------------------------------------------------------------
  
-\section{Using Fibonacci heaps}
-\id{fibonacci}
+\section{Iterated algorithms}\id{iteralg}%
  
  We have seen that the Jarn\'\i{}k's Algorithm \ref{jarnik} runs in $\Theta(m\log n)$ time.
  
  We have seen that the Jarn\'\i{}k's Algorithm \ref{jarnik} runs in $\Theta(m\log n)$ time.
-Fredman and Tarjan have shown a~faster implementation in~\cite{ft:fibonacci}
-using their Fibonacci heaps. In this section, we convey their results and we
-show several interesting consequences.
+Fredman and Tarjan \cite{ft:fibonacci} have shown a~faster implementation
+using their Fibonacci heaps. In this section, we will convey their results and we
+will show several interesting consequences.
  
  The previous implementation of the algorithm used a binary heap to store all edges
  separating the current tree~$T$ from the rest of the graph, i.e., edges of the cut~$\delta(T)$.
  
  The previous implementation of the algorithm used a binary heap to store all edges
  separating the current tree~$T$ from the rest of the graph, i.e., edges of the cut~$\delta(T)$.
@@ -280,20 +283,20 @@ and keep them in a~Fibonacci heap, ordered by weight.
  When we want to extend~$T$ by the lightest edge of~$\delta(T)$, it is sufficient to
  find the lightest active edge~$uv$ and add this edge to~$T$ together with the new vertex~$v$.
  Then we have to update the active edges as follows. The edge~$uv$ has just ceased to
  When we want to extend~$T$ by the lightest edge of~$\delta(T)$, it is sufficient to
  find the lightest active edge~$uv$ and add this edge to~$T$ together with the new vertex~$v$.
  Then we have to update the active edges as follows. The edge~$uv$ has just ceased to
-be active. We scan all neighbors~$w$ of the vertex~$v$. When $w$~is in~$T$, no action
+be active. We scan all neighbors~$w$ of the vertex~$v$. When $w$~is already in~$T$, no action
  is needed. If $w$~is outside~$T$ and it was not adjacent to~$T$ (there is no active edge
  remembered for it so far), we set the edge~$vw$ as active. Otherwise we check the existing
  active edge for~$w$ and replace it by~$vw$ if the new edge is lighter.
  
  The following algorithm shows how these operations translate to insertions, decreases
  is needed. If $w$~is outside~$T$ and it was not adjacent to~$T$ (there is no active edge
  remembered for it so far), we set the edge~$vw$ as active. Otherwise we check the existing
  active edge for~$w$ and replace it by~$vw$ if the new edge is lighter.
  
  The following algorithm shows how these operations translate to insertions, decreases
-and deletions on the heap.
+and deletions in the heap.
  
  \algn{Active Edge Jarn\'\i{}k; Fredman and Tarjan \cite{ft:fibonacci}}\id{jarniktwo}%
  \algo
  \algin A~graph~$G$ with an edge comparison oracle.
  \:$v_0\=$ an~arbitrary vertex of~$G$.
  \:$T\=$ a tree containing just the vertex~$v_0$.
  
  \algn{Active Edge Jarn\'\i{}k; Fredman and Tarjan \cite{ft:fibonacci}}\id{jarniktwo}%
  \algo
  \algin A~graph~$G$ with an edge comparison oracle.
  \:$v_0\=$ an~arbitrary vertex of~$G$.
  \:$T\=$ a tree containing just the vertex~$v_0$.
-\:$H\=$ a~Fibonacci heap of active edges stored as pairs $(u,v)$ where $u\in T,v\not\in T$, ordered by the weights $w(uv)$, initially empty.
+\:$H\=$ a~Fibonacci heap of active edges stored as pairs $(u,v)$ where $u\in T,v\not\in T$, ordered by the weights $w(uv)$, and initially empty.
  \:$A\=$ a~mapping of vertices outside~$T$ to their active edges in the heap; initially all elements undefined.
  \:\<Insert> all edges incident with~$v_0$ to~$H$ and update~$A$ accordingly.
  \:While $H$ is not empty:
  \:$A\=$ a~mapping of vertices outside~$T$ to their active edges in the heap; initially all elements undefined.
  \:\<Insert> all edges incident with~$v_0$ to~$H$ and update~$A$ accordingly.
  \:While $H$ is not empty:
@@ -306,15 +309,15 @@ and deletions on the heap.
  \algout Minimum spanning tree~$T$.
  \endalgo
  
  \algout Minimum spanning tree~$T$.
  \endalgo
  
-\para
-To analyze the time complexity of this algorithm, we will use the standard
+\paran{Analysis}%
+To analyse the time complexity of this algorithm, we will use the standard
  theorem on~complexity of the Fibonacci heap:
  
  theorem on~complexity of the Fibonacci heap:
  
-\thmn{Fibonacci heaps} The~Fibonacci heap performs the following operations
+\thmn{Fibonacci heaps, Fredman and Tarjan \cite{ft:fibonacci}} The~Fibonacci heap performs the following operations
  with the indicated amortized time complexities:
  \itemize\ibull
  \:\<Insert> (insertion of a~new element) in $\O(1)$,
  with the indicated amortized time complexities:
  \itemize\ibull
  \:\<Insert> (insertion of a~new element) in $\O(1)$,
-\:\<Decrease> (decreasing value of an~existing element) in $\O(1)$,
+\:\<Decrease> (decreasing the value of an~existing element) in $\O(1)$,
  \:\<Merge> (merging of two heaps into one) in $\O(1)$,
  \:\<DeleteMin> (deletion of the minimal element) in $\O(\log n)$,
  \:\<Delete> (deletion of an~arbitrary element) in $\O(\log n)$,
  \:\<Merge> (merging of two heaps into one) in $\O(1)$,
  \:\<DeleteMin> (deletion of the minimal element) in $\O(\log n)$,
  \:\<Delete> (deletion of an~arbitrary element) in $\O(\log n)$,
@@ -344,14 +347,14 @@ thus by the previous theorem the operations take $\O(m+n\log n)$ time in total.
  \cor
  For graphs with edge density at least $\log n$, this algorithm runs in linear time.
  
  \cor
  For graphs with edge density at least $\log n$, this algorithm runs in linear time.
  
-\rem
-We can consider using other kinds of heaps which have the property that inserts
+\remn{Other heaps}%
+We can consider using other kinds of heaps that have the property that inserts
  and decreases are faster than deletes. Of course, the Fibonacci heaps are asymptotically
  optimal (by the standard $\Omega(n\log n)$ lower bound on sorting by comparisons, see
  for example \cite{clrs}), so the other data structures can improve only
  multiplicative constants or offer an~easier implementation.
  
  and decreases are faster than deletes. Of course, the Fibonacci heaps are asymptotically
  optimal (by the standard $\Omega(n\log n)$ lower bound on sorting by comparisons, see
  for example \cite{clrs}), so the other data structures can improve only
  multiplicative constants or offer an~easier implementation.
  
-A~nice example is a~\df{$d$-regular heap} --- a~variant of the usual binary heap
+A~nice example is the \df{$d$-regular heap} --- a~variant of the usual binary heap
  in the form of a~complete $d$-regular tree. \<Insert>, \<Decrease> and other operations
  involving bubbling the values up spend $\O(1)$ time at a~single level, so they run
  in~$\O(\log_d n)$ time. \<Delete> and \<DeleteMin> require bubbling down, which incurs
  in the form of a~complete $d$-regular tree. \<Insert>, \<Decrease> and other operations
  involving bubbling the values up spend $\O(1)$ time at a~single level, so they run
  in~$\O(\log_d n)$ time. \<Delete> and \<DeleteMin> require bubbling down, which incurs
@@ -368,19 +371,17 @@ queues described in \cite{thorup:pqsssp} which have constant-time \<Insert> and
  and $\O(\log\log n)$ time \<DeleteMin>. (We will however omit the details since we will
  show a~faster integer algorithm soon.)
  
  and $\O(\log\log n)$ time \<DeleteMin>. (We will however omit the details since we will
  show a~faster integer algorithm soon.)
  
-\para
+\paran{Combining MST algorithms}%
  As we already noted, the improved Jarn\'\i{}k's algorithm runs in linear time
  for sufficiently dense graphs. In some cases, it is useful to combine it with
  another MST algorithm, which identifies a~part of the MST edges and contracts
  As we already noted, the improved Jarn\'\i{}k's algorithm runs in linear time
  for sufficiently dense graphs. In some cases, it is useful to combine it with
  another MST algorithm, which identifies a~part of the MST edges and contracts
-the graph to increase its density. For example, we can perform several
-iterations of the Contractive Bor\o{u}vka's algorithm and find the rest of the
-MST by the Active Edge Jarn\'\i{}k's algorithm.
+them to increase the density of the graph. For example, we can perform several Bor\o{u}vka
+steps  and then find the rest of the MST by the Active Edge Jarn\'\i{}k's algorithm.
  
  \algn{Mixed Bor\o{u}vka-Jarn\'\i{}k}
  \algo
  \algin A~graph~$G$ with an edge comparison oracle.
  
  \algn{Mixed Bor\o{u}vka-Jarn\'\i{}k}
  \algo
  \algin A~graph~$G$ with an edge comparison oracle.
-\:Run $\log\log n$ iterations of the Contractive Bor\o{u}vka's algorithm (\ref{contbor}),
-  getting a~MST~$T_1$.
+\:Run $\log\log n$ Bor\o{u}vka steps (\ref{contbor}), getting a~MST~$T_1$.
  \:Run the Active Edge Jarn\'\i{}k's algorithm (\ref{jarniktwo}) on the resulting
    graph, getting a~MST~$T_2$.
  \:Combine $T_1$ and~$T_2$ to~$T$ as in the Contraction lemma (\ref{contlemma}).
  \:Run the Active Edge Jarn\'\i{}k's algorithm (\ref{jarniktwo}) on the resulting
    graph, getting a~MST~$T_2$.
  \:Combine $T_1$ and~$T_2$ to~$T$ as in the Contraction lemma (\ref{contlemma}).
@@ -398,16 +399,16 @@ $m'\le m$ and $n'\le n/\log n$. The second step then runs in time $\O(m'+n'\log
  and both trees can be combined in linear time, too.
  \qed
  
  and both trees can be combined in linear time, too.
  \qed
  
-\para
+\paran{Iterating Jarn\'\i{}k's algorithm}%
  Actually, there is a~much better choice of the algorithms to combine: use the
  Actually, there is a~much better choice of the algorithms to combine: use the
-Active Edge Jarn\'\i{}k's algorithm multiple times, each time stopping after a~while.
+Active Edge Jarn\'\i{}k's algorithm multiple times, each time stopping it after a~while.
  A~good choice of the stopping condition is to place a~limit on the size of the heap.
  We start with an~arbitrary vertex, grow the tree as usually and once the heap gets too large,
  we conserve the current tree and start with a~different vertex and an~empty heap. When this
  process runs out of vertices, it has identified a~sub-forest of the MST, so we can
  A~good choice of the stopping condition is to place a~limit on the size of the heap.
  We start with an~arbitrary vertex, grow the tree as usually and once the heap gets too large,
  we conserve the current tree and start with a~different vertex and an~empty heap. When this
  process runs out of vertices, it has identified a~sub-forest of the MST, so we can
-contract the graph along the edges of~this forest and iterate.
+contract the edges of~this forest and iterate.
  
  
-\algn{Iterated Jarn\'\i{}k; Fredman and Tarjan \cite{ft:fibonacci}}
+\algn{Iterated Jarn\'\i{}k; Fredman and Tarjan \cite{ft:fibonacci}}\id{itjar}%
  \algo
  \algin A~graph~$G$ with an edge comparison oracle.
  \:$T\=\emptyset$. \cmt{edges of the MST}
  \algo
  \algin A~graph~$G$ with an edge comparison oracle.
  \:$T\=\emptyset$. \cmt{edges of the MST}
@@ -424,7 +425,7 @@ contract the graph along the edges of~this forest and iterate.
  \:::Denote the resulting tree~$R$.
  \:::$F\=F\cup R$.
  \::$T\=T\cup \ell[F]$. \cmt{Remember MST edges found in this phase.}
  \:::Denote the resulting tree~$R$.
  \:::$F\=F\cup R$.
  \::$T\=T\cup \ell[F]$. \cmt{Remember MST edges found in this phase.}
-\::Contract~$G$ along all edges of~$F$ and flatten it.
+\::Contract all edges of~$F$ and flatten~$G$.
  \algout Minimum spanning tree~$T$.
  \endalgo
  
  \algout Minimum spanning tree~$T$.
  \endalgo
  
@@ -439,29 +440,29 @@ However the choice of the parameter~$t$ can seem mysterious, the following
  lemma makes the reason clear:
  
  \lemma\id{ijphase}%
  lemma makes the reason clear:
  
  \lemma\id{ijphase}%
-The $i$-th phase of the Iterated Jarn\'\i{}k's algorithm runs in time~$\O(m)$.
+Each phase of the Iterated Jarn\'\i{}k's algorithm runs in time~$\O(m)$.
  
  \proof
  
  \proof
-During the phase, the heap always contains at most~$t_i$ elements, so it takes
+During the $i$-th phase, the heap always contains at most~$t_i$ elements, so it takes
  time~$\O(\log t_i)=\O(m/n_i)$ to delete an~element from the heap. The trees~$R_i^j$
  are edge-disjoint, so there are at most~$n_i$ \<DeleteMin>'s over the course of the phase.
  Each edge is considered at most twice (once per its endpoint), so the number
  of the other heap operations is~$\O(m_i)$. Together, it equals $\O(m_i + n_i\log t_i) = \O(m_i+m) = \O(m)$.
  \qed
  
  time~$\O(\log t_i)=\O(m/n_i)$ to delete an~element from the heap. The trees~$R_i^j$
  are edge-disjoint, so there are at most~$n_i$ \<DeleteMin>'s over the course of the phase.
  Each edge is considered at most twice (once per its endpoint), so the number
  of the other heap operations is~$\O(m_i)$. Together, it equals $\O(m_i + n_i\log t_i) = \O(m_i+m) = \O(m)$.
  \qed
  
-\lemma
+\lemma\id{ijsize}%
  Unless the $i$-th phase is final, the forest~$F_i$ consists of at most $2m_i/t_i$ trees.
  
  \proof
  As every edge of~$G_i$ is incident with at most two trees of~$F_i$, it is sufficient
  to establish that there are at least~$t_i$ edges incident with every such tree, including
  Unless the $i$-th phase is final, the forest~$F_i$ consists of at most $2m_i/t_i$ trees.
  
  \proof
  As every edge of~$G_i$ is incident with at most two trees of~$F_i$, it is sufficient
  to establish that there are at least~$t_i$ edges incident with every such tree, including
-connecting two vertices of the tree.
+edges connecting two vertices of the same tree.
  
  The forest~$F_i$ evolves by additions of the trees~$R_i^j$. Let us consider the possibilities
  how the algorithm could have stopped growing the tree~$R_i^j$:
  \itemize\ibull
  \:the heap had more than~$t_i$ elements (step~10): since the each elements stored in the heap
  
  The forest~$F_i$ evolves by additions of the trees~$R_i^j$. Let us consider the possibilities
  how the algorithm could have stopped growing the tree~$R_i^j$:
  \itemize\ibull
  \:the heap had more than~$t_i$ elements (step~10): since the each elements stored in the heap
-  corresponds to a~unique edges incident with~$R_i^j$, we have enough such edges;
+  corresponds to a~unique edge incident with~$R_i^j$, we have enough such edges;
  \:the algorithm just added a~vertex of~$F_i$ to~$R_i^j$ (step~9): in this case, an~existing
    tree of~$F_i$ is extended, so the number of edges incident with it cannot decrease;\foot{%
    This is the place where we needed to count the interior edges as well.}
  \:the algorithm just added a~vertex of~$F_i$ to~$R_i^j$ (step~9): in this case, an~existing
    tree of~$F_i$ is extended, so the number of edges incident with it cannot decrease;\foot{%
    This is the place where we needed to count the interior edges as well.}
@@ -471,7 +472,7 @@ how the algorithm could have stopped growing the tree~$R_i^j$:
  
  \thm\id{itjarthm}%
  The Iterated Jarn\'\i{}k's algorithm finds the MST of the input graph in time
  
  \thm\id{itjarthm}%
  The Iterated Jarn\'\i{}k's algorithm finds the MST of the input graph in time
-$\O(m\timesbeta(m,n))$, where $\beta(m,n):=\min\{ i: \log^{(i)}n \le m/n \}$.
+$\O(m\timesbeta(m,n))$, where $\beta(m,n):=\min\{ i \mid \log^{(i)}n \le m/n \}$.
  
  \proof
  Phases are finite and in every phase at least one edge is contracted, so the outer
  
  \proof
  Phases are finite and in every phase at least one edge is contracted, so the outer
@@ -487,20 +488,20 @@ $$
  \left. \vcenter{\hbox{$\displaystyle t_i \ge 2^{2^{\scriptstyle 2^{\scriptstyle\rddots^{\scriptstyle m/n}}}} $}}\;\right\}
  \,\hbox{a~tower of~$i$ exponentials.}
  $$
  \left. \vcenter{\hbox{$\displaystyle t_i \ge 2^{2^{\scriptstyle 2^{\scriptstyle\rddots^{\scriptstyle m/n}}}} $}}\;\right\}
  \,\hbox{a~tower of~$i$ exponentials.}
  $$
-As soon as~$t_i\ge n$, the $i$-th phase must be final, because at that time
-there is enough space in the heap to process the whole graph. So~there are
-at most~$\beta(m,n)$ phases and we already know (Lemma~\ref{ijphase}) that each
-phase runs in linear time.
+As soon as~$t_i\ge n$, the $i$-th phase is final, because at that time
+there is enough space in the heap to process the whole graph without stopping. So~there are
+at most~$\beta(m,n)$ phases and we already know that each phase runs in linear
+time (Lemma~\ref{ijphase}).
  \qed
  
  \cor
  The Iterated Jarn\'\i{}k's algorithm runs in time $\O(m\log^* n)$.
  
  \proof
  \qed
  
  \cor
  The Iterated Jarn\'\i{}k's algorithm runs in time $\O(m\log^* n)$.
  
  \proof
-$\beta(m,n) \le \beta(1,n) = \log^* n$.
+$\beta(m,n) \le \beta(1,n) \le \log^* n$.
  \qed
  
  \qed
  
-\cor
+\cor\id{ijdens}%
  When we use the Iterated Jarn\'\i{}k's algorithm on graphs with edge density
  at least~$\log^{(k)} n$ for some $k\in{\bb N}^+$, it runs in time~$\O(km)$.
  
  When we use the Iterated Jarn\'\i{}k's algorithm on graphs with edge density
  at least~$\log^{(k)} n$ for some $k\in{\bb N}^+$, it runs in time~$\O(km)$.
  
@@ -508,7 +509,7 @@ at least~$\log^{(k)} n$ for some $k\in{\bb N}^+$, it runs in time~$\O(km)$.
  If $m/n \ge \log^{(k)} n$, then $\beta(m,n)\le k$.
  \qed
  
  If $m/n \ge \log^{(k)} n$, then $\beta(m,n)\le k$.
  \qed
  
-\obs
+\paran{Integer weights}%
  The algorithm spends most of the time in phases which have small heaps. Once the
  heap grows to $\Omega(\log^{(k)} n)$ for any fixed~$k$, the graph gets dense enough
  to guarantee that at most~$k$ phases remain. This means that if we are able to
  The algorithm spends most of the time in phases which have small heaps. Once the
  heap grows to $\Omega(\log^{(k)} n)$ for any fixed~$k$, the graph gets dense enough
  to guarantee that at most~$k$ phases remain. This means that if we are able to
@@ -516,32 +517,31 @@ construct a~heap of size $\Omega(\log^{(k)} n)$ with constant time per operation
  we can get a~linear-time algorithm for MST. This is the case when the weights are
  integers:
  
  we can get a~linear-time algorithm for MST. This is the case when the weights are
  integers:
  
-\thmn{MST for graphs with integer weights, Fredman and Willard \cite{fw:transdich}}\id{intmst}%
+\thmn{MST for integer weights, Fredman and Willard \cite{fw:transdich}}\id{intmst}%
  MST of a~graph with integer edge weights can be found in time $\O(m)$ on the Word-RAM.
  
  \proof
  MST of a~graph with integer edge weights can be found in time $\O(m)$ on the Word-RAM.
  
  \proof
-We will combine the Iterated Jarn\'\i{}k's algorithm with the Q-heaps from section \ref{qheaps}.
+We will combine the Iterated Jarn\'\i{}k's algorithm with the Q-heaps from Section \ref{qheaps}.
  We modify the first pass of the algorithm to choose $t=\log n$ and use the Q-heap tree instead
  of the Fibonacci heap. From Theorem \ref{qh} and Remark \ref{qhtreerem} we know that the
  operations on the Q-heap tree run in constant time, so the modified first phase takes time~$\O(m)$.
  Following the analysis of the original algorithm in the proof of Theorem \ref{itjarthm} we obtain
  $t_2\ge 2^{t_1} = 2^{\log n} = n$, so the algorithm stops after the second phase.\foot{%
  We modify the first pass of the algorithm to choose $t=\log n$ and use the Q-heap tree instead
  of the Fibonacci heap. From Theorem \ref{qh} and Remark \ref{qhtreerem} we know that the
  operations on the Q-heap tree run in constant time, so the modified first phase takes time~$\O(m)$.
  Following the analysis of the original algorithm in the proof of Theorem \ref{itjarthm} we obtain
  $t_2\ge 2^{t_1} = 2^{\log n} = n$, so the algorithm stops after the second phase.\foot{%
-Alternatively, we can use the Q-heaps directly with $k=\log^{1/4}n$ and then stop
+Alternatively, we can use the Q-heaps directly with $k=\log^{1/4}n$ and then the algorithm stops
  after the third phase.}
  \qed
  
  after the third phase.}
  \qed
  
-\rem
+\paran{Further improvements}%
  Gabow et al.~\cite{gabow:mst} have shown how to speed up the Iterated Jarn\'\i{}k's algorithm to~$\O(m\log\beta(m,n))$.
  They split the adjacency lists of the vertices to small buckets, keep each bucket
  sorted and consider only the lightest edge in each bucket until it is removed.
  The mechanics of the algorithm is complex and there is a~lot of technical details
  which need careful handling, so we omit the description of this algorithm.
  Gabow et al.~\cite{gabow:mst} have shown how to speed up the Iterated Jarn\'\i{}k's algorithm to~$\O(m\log\beta(m,n))$.
  They split the adjacency lists of the vertices to small buckets, keep each bucket
  sorted and consider only the lightest edge in each bucket until it is removed.
  The mechanics of the algorithm is complex and there is a~lot of technical details
  which need careful handling, so we omit the description of this algorithm.
-
-\FIXME{Reference to Chazelle.}
+A~better algorithm will be shown in Chapter~\ref{optchap}.
  
  %--------------------------------------------------------------------------------
  
  
  %--------------------------------------------------------------------------------
  
-\section{Verification of minimality}
+\section{Verification of minimality}\id{verifysect}%
  
  Now we will turn our attention to a~slightly different problem: given a~spanning
  tree, how to verify that it is minimum? We will show that this can be achieved
  
  Now we will turn our attention to a~slightly different problem: given a~spanning
  tree, how to verify that it is minimum? We will show that this can be achieved
@@ -550,8 +550,8 @@ MST algorithm in Section~\ref{randmst}.
  
  MST verification has been studied by Koml\'os \cite{komlos:verify}, who has
  proven that $\O(m)$ edge comparisons are sufficient, but his algorithm needed
  
  MST verification has been studied by Koml\'os \cite{komlos:verify}, who has
  proven that $\O(m)$ edge comparisons are sufficient, but his algorithm needed
-superlinear time to find the edges to compare. Dixon, Rauch and Tarjan
-have later shown in \cite{dixon:verify} that the overhead can be reduced
+super-linear time to find the edges to compare. Dixon, Rauch and Tarjan \cite{dixon:verify}
+have later shown that the overhead can be reduced
  to linear time on the RAM using preprocessing and table lookup on small
  subtrees. Later, King has given a~simpler algorithm in \cite{king:verifytwo}.
  
  to linear time on the RAM using preprocessing and table lookup on small
  subtrees. Later, King has given a~simpler algorithm in \cite{king:verifytwo}.
  
@@ -559,81 +559,80 @@ In this section, we will follow Koml\'os's steps and study the comparisons
  needed, saving the actual efficient implementation for later.
  
  \para
  needed, saving the actual efficient implementation for later.
  
  \para
-To verify that a~spanning~$T$ is minimum, it is sufficient to check that all
-edges outside~$T$ are $T$-heavy (by Theorem \ref{mstthm}). In fact, we will be
+To verify that a~spanning tree~$T$ is minimum, it is sufficient to check that all
+edges outside~$T$ are $T$-heavy (by the Minimality Theorem, \ref{mstthm}). In fact, we will be
  able to find all $T$-light edges efficiently. For each edge $uv\in E\setminus T$,
  we will find the heaviest edge of the tree path $T[u,v]$ and compare its weight
  able to find all $T$-light edges efficiently. For each edge $uv\in E\setminus T$,
  we will find the heaviest edge of the tree path $T[u,v]$ and compare its weight
-to $w(uv)$. It is therefore sufficient to solve this problem:
+to $w(uv)$. It is therefore sufficient to solve the following problem:
  
  \problem
  
  \problem
-Given a~weighted tree~$T$ and a~set of \df{query paths} $Q \subseteq \{ T[u,v] ; u,v\in V(T) \}$
-specified by their endpoints, find the heaviest edge for every path in~$Q$.
+Given a~weighted tree~$T$ and a~set of \df{query paths} $Q \subseteq \{ T[u,v] \mid u,v\in V(T) \}$
+specified by their endpoints, find the heaviest edge \df{(peak)} of every path in~$Q$.
  
  
-\para
-Finding the heaviest edges can be burdensome if the tree~$T$ is degenerated,
+\paran{Bor\o{u}vka trees}%
+Finding the peaks can be burdensome if the tree~$T$ is degenerated,
  so we will first reduce it to the same problem on a~balanced tree. We run
  the Bor\o{u}vka's algorithm on~$T$, which certainly produces $T$ itself, and we
  so we will first reduce it to the same problem on a~balanced tree. We run
  the Bor\o{u}vka's algorithm on~$T$, which certainly produces $T$ itself, and we
-record the order in which the subtrees have been merged in another tree~$B(T)$.
-The queries on~$T$ can be then easily translated to queries on~$B(T)$.
+record the order, in which the subtrees have been merged, in another tree~$B(T)$.
+The peak queries on~$T$ can be then easily translated to peak queries on~$B(T)$.
  
  \defn
  For a~weighted tree~$T$ we define its \df{Bor\o{u}vka tree} $B(T)$ as a~rooted tree which records
  the execution of the Bor\o{u}vka's algorithm run on~$T$. The leaves of $B(T)$
  
  \defn
  For a~weighted tree~$T$ we define its \df{Bor\o{u}vka tree} $B(T)$ as a~rooted tree which records
  the execution of the Bor\o{u}vka's algorithm run on~$T$. The leaves of $B(T)$
-are exactly the vertices of~$T$, every internal vertex~$v$ at level~$i$ from the bottom
-corresponds to a~component tree~$C(v)$ formed in the $i$-th phase of the algorithm. When
-a~tree $C(v)$ selects an adjacent edge~$xy$ and gets merged with some other trees to form
-a~component $C(u)$, we add an~edge $uv$ to~$B(T)$ and set its weight to $w(xy)$.
+are all the vertices of~$T$, an~internal vertex~$v$ at level~$i$ from the bottom
+corresponds to a~component tree~$C(v)$ formed in the $i$-th iteration of the algorithm. When
+a~tree $C(v)$ selects an adjacent edge~$e$ and gets merged with some other trees to form
+a~component $C(u)$, we add an~edge $uv$ to~$B(T)$ and set its weight to $w(e)$.
  
  \figure{bortree.eps}{\epsfxsize}{An octipede and its Bor\o{u}vka tree}
  
  \obs
  
  \figure{bortree.eps}{\epsfxsize}{An octipede and its Bor\o{u}vka tree}
  
  \obs
-As the algorithm finishes with a~single component after the last phase, the Bor\o{u}vka tree
-is really a~tree. All its vertices are on the same level and each internal vertex has
+As the algorithm finishes with a~single component in the last phase, the Bor\o{u}vka tree
+is really a~tree. All its leaves are on the same level and each internal vertex has
  at least two sons. Such trees will be called \df{complete branching trees.}
  
  \lemma
  at least two sons. Such trees will be called \df{complete branching trees.}
  
  \lemma
-For every tree~$T$ and every pair of its vertices $x,y\in V(T)$, the heaviest edge
-of the path $T[x,y]$ has the same weight as the heaviest edge of~the path
-$B(T)[x,y]$.
+For every tree~$T$ and every pair of its vertices $x,y\in V(T)$, the peak
+of the path $T[x,y]$ has the same weight as the peak of~the path $B(T)[x,y]$.
  
  \proof
  Let us denote the path $T[x,y]$ by~$P$ and its heaviest edge by~$h=ab$. Similarly,
  let us use $P'$ for $B(T)[x,y]$ and $h'$ for the heaviest edge of~$P'$.
  
  We will first prove that~$h$ has its counterpart of the same weight in~$P'$,
  
  \proof
  Let us denote the path $T[x,y]$ by~$P$ and its heaviest edge by~$h=ab$. Similarly,
  let us use $P'$ for $B(T)[x,y]$ and $h'$ for the heaviest edge of~$P'$.
  
  We will first prove that~$h$ has its counterpart of the same weight in~$P'$,
-so $w(h') \ge w(h)$. Consider the lowest vertex $u$ of~$T$ such that the
+so $w(h') \ge w(h)$. Consider the lowest vertex $u$ of~$B(T)$ such that the
  component $C(u)$ contains both $a$ and~$b$, and consider the sons $v_a$ and $v_b$ of~$u$
  for which $a\in C(v_a)$ and $b\in C(v_b)$. As the edge~$h$ must have been
  component $C(u)$ contains both $a$ and~$b$, and consider the sons $v_a$ and $v_b$ of~$u$
  for which $a\in C(v_a)$ and $b\in C(v_b)$. As the edge~$h$ must have been
-selected by at least one of these components, assume without loss of generality that
+selected by at least one of these components, we assume without loss of generality that
  it was $C(v_a)$, and hence we have $w(uv_a)=w(h)$. We will show that the
  it was $C(v_a)$, and hence we have $w(uv_a)=w(h)$. We will show that the
-edge~$uv_a$ lies in~$P'$, because exactly one of the endpoints of~$h$ lies
-in~$C(v_a)$. Both endpoints cannot lie there, since it would imply that $C(v_a)$,
+edge~$uv_a$ lies in~$P'$, because exactly one of the vertices $x$, $y$ lies
+in~$C(v_a)$. Both cannot lie there, since it would imply that $C(v_a)$,
  being connected, contains the whole path~$P$, including~$h$. On the other hand,
  if $C(v_a)$ contained neither~$x$ nor~$y$, it would have to be incident with
  another edge of~$P$ different from~$h$, so this lighter edge would be selected
  instead of~$h$.
  
  being connected, contains the whole path~$P$, including~$h$. On the other hand,
  if $C(v_a)$ contained neither~$x$ nor~$y$, it would have to be incident with
  another edge of~$P$ different from~$h$, so this lighter edge would be selected
  instead of~$h$.
  
-For the other direction: for any edge~$uv\in P'$, the tree~$C(v)$ is incident
+In the other direction: for any edge~$uv\in P'$, the tree~$C(v)$ is incident
  with at least one edge of~$P$, so the selected edge must be lighter or equal
  with at least one edge of~$P$, so the selected edge must be lighter or equal
-to this edge and hence also than~$h$.
+to this edge and hence also to~$h$.
  \qed
  
  \para
  \qed
  
  \para
-We will make one more simplification: For an~arbitrary tree~$T$, we split each
+We will simplify the problem even further: For an~arbitrary tree~$T$, we split each
  query path $T[x,y]$ to two half-paths $T[x,a]$ and $T[a,y]$ where~$a$ is the
  \df{lowest common ancestor} of~$x$ and~$y$ in~$T$. It is therefore sufficient to
  query path $T[x,y]$ to two half-paths $T[x,a]$ and $T[a,y]$ where~$a$ is the
  \df{lowest common ancestor} of~$x$ and~$y$ in~$T$. It is therefore sufficient to
-consider only paths which connect a~vertex with one of its ancestors.
+consider only paths that connect a~vertex with one of its ancestors.
  
  When we combine the two transforms, we get:
  
  
  When we combine the two transforms, we get:
  
-\lemma\id{verbranch}%
-For each tree~$T$ on $n$~vertices and a~set of $m$~query paths~$Q$ on~$T$, it is possible to find
-a~complete branching tree~$T'$ in $\O(n)$ comparisons, together with a~set~$Q'$ of
-paths on~$T'$, such that the weights of the heaviest edges of the new paths can
-be transformed to the weights of the heaviest edges of the paths in~$Q$ in $\O(m)$ comparisons.
-The tree $T'$ has at most $2n(T)$ vertices and $\O(\log n(T))$ levels. The set~$Q'$ contains
-at most~$2m$ paths and each of them connects a~vertex of~$T'$ with one
-of its ancestors.
+\lemman{Balancing of trees}\id{verbranch}%
+For each tree~$T$ on $n$~vertices and a~set~$Q$ of $q$~query paths on~$T$, it is possible
+to find a~complete branching tree~$T'$, together with a~set~$Q'$ of paths on~$T'$,
+such that the weights of the heaviest edges of the paths in~$Q$ can be deduced from
+the same of the paths in~$Q'$. The tree $T'$ has at most $2n$ vertices and $\O(\log n)$
+levels. The set~$Q'$ contains at most~$2q$ paths and each of them connects a~vertex of~$T'$
+with one of its ancestors. The construction of~$T'$ involves $\O(n)$ comparisons
+and the transformation of the answers takes $\O(q)$ comparisons.
  
  \proof
  The tree~$T'$ will be the Bor\o{u}vka tree for~$T$, obtained by running the
  
  \proof
  The tree~$T'$ will be the Bor\o{u}vka tree for~$T$, obtained by running the
@@ -641,104 +640,106 @@ contractive version of the Bor\o{u}vka's algorithm (Algorithm \ref{contbor})
  on~$T$. The algorithm runs in linear time, for example because trees are planar
  (Theorem \ref{planarbor}). We therefore spend $\O(n)$ comparisons in it.
  
  on~$T$. The algorithm runs in linear time, for example because trees are planar
  (Theorem \ref{planarbor}). We therefore spend $\O(n)$ comparisons in it.
  
-As~$T'$ has~$m(T)=n(T)-1$ leaves and it is a~complete branching tree, it has at most~$m(T)$ internal vertices,
-so~$n(T')\le 2n(T)$ as promised. Since the number of passes of the Bor\o{u}vka's
-algorithm is $\O(\log n)$, the depth of the tree must be logarithmic as well.
+As~$T'$ has~$n$ leaves and it is a~complete branching tree, it has at most~$n$ internal vertices,
+so~$n(T')\le 2n$ as promised. Since the number of iterations of the Bor\o{u}vka's
+algorithm is $\O(\log n)$, the depth of the Bor\o{u}vka tree must be logarithmic as well.
  
  For each query path $T[x,y]$ we find the lowest common ancestor of~$x$ and~$y$
  
  For each query path $T[x,y]$ we find the lowest common ancestor of~$x$ and~$y$
-and replace the path by the two half-paths. This
-produces a~set~$Q'$ of at most~$2m$ half-paths. If we remember the origin
-of each of the new half-paths, the reconstruction of answers to the original queries
-is then just taking the minimum of the answers for the two half-paths.
+and split the path by the two half-paths. This produces a~set~$Q'$ of at most~$2q$ half-paths.
+The peak of every original query path is then the heavier of the peaks of its halves.
  \qed
  
  \qed
  
-\para
-We will now describe a~simple variant of depth-first search which finds the
-heaviest edges for all query paths of the transformed problem. As we promised,
+\paran{Bounding comparisons}%
+We will now describe a~simple variant of the depth-first search which finds the
+peaks of all query paths of the balanced problem. As we promised,
  we will take care of the number of comparisons only, as long as all other operations
  are well-defined and they can be performed in polynomial time.
  
  we will take care of the number of comparisons only, as long as all other operations
  are well-defined and they can be performed in polynomial time.
  
+\defn
  For every edge~$e=uv$, we consider the set $Q_e$ of all query paths containing~$e$.
  For every edge~$e=uv$, we consider the set $Q_e$ of all query paths containing~$e$.
-The vertex of a~path, which is closer to the root, will be called its \df{top,}
+The vertex of a~path, that is closer to the root, will be called the \df{top} of the path,
  the other vertex its \df{bottom.}
  the other vertex its \df{bottom.}
-We define arrays $T_e$ and~$H_e$ as follows: $T_e$ contains
+We define arrays $T_e$ and~$P_e$ as follows: $T_e$ contains
  the tops of the paths in~$Q_e$ in order of their increasing depth (we
  will call them \df{active tops} and each of them will be stored exactly once). For
  the tops of the paths in~$Q_e$ in order of their increasing depth (we
  will call them \df{active tops} and each of them will be stored exactly once). For
-each active top~$t=T_e[i]$, we define $H_e[i]$ as the heaviest edge of the path $T[v,t]$.
+each active top~$t=T_e[i]$, we define $P_e[i]$ as the peak of the path $T[v,t]$.
+
+\obs
  As for every~$i$ the path $T[v,T_e[i+1]]$ is contained within $T[v,T_e[i]]$,
  As for every~$i$ the path $T[v,T_e[i+1]]$ is contained within $T[v,T_e[i]]$,
-the edges in~$H_e$ must have non-increasing weights, that is $w(H_e[i+1]) \le
-w(H_e[i])$.
+the edges of~$P_e$ must have non-increasing weights, that is $w(P_e[i+1]) \le
+w(P_e[i])$.
+This leads to the following algorithm:
  
  
-\alg $\<FindHeavy>(u,p,T_p,H_p)$ --- process all queries in the subtree rooted
+\alg $\<FindPeaks>(u,p,T_p,P_p)$ --- process all queries located in the subtree rooted
  at~$u$ entered from its parent via an~edge~$p$.
  at~$u$ entered from its parent via an~edge~$p$.
-\id{findheavy}
+\id{findpeaks}
  
  \algo
  
  \algo
-\:Process all query paths whose bottom is~$u$. For each of them, find the top of
-the path in~$T_p$ and record the heaviest edge remembered for it in~$H_p$.
-
-\:First find the tops~$T$ which will be shared by all edges going from~$u$ downwards.
-These are the tops from~$T_p$ except for the ones which have ceased to be active,
-because all query paths which were referring to them either have~$u$ as their bottom
-or continue from~$u$ downwards by another edge.
-Select the corresponding array of the heaviest edges~$H$ from~$H_p$.
-
-\:For every son~$v$ of~$u$, do:
-
-\::Construct the array of tops~$T_e$ for the edge $e=uv$. It contains all tops
-   from~$T$ and possibly also the vertex~$u$ itself if there is a~query path
-   which has~$u$ as its top and which has bottom somewhere in the subtree
-   rooted at~$v$.
-
-\::Construct the array of the heaviest edges~$H_e$. For all tops from~$T$, just compare
-   the weight of the edge recorded in~$H$ with $w(e)$ and if it was smaller,
-   replace the remembered edge by~$e$. Since $H_p$ was sorted, so is~$H$
-   and we can use binary search to locate the boundary between lighter and
-   heavier edges in~$H$.
-   For the new top~$u$, the edge~$e$ itself is obviously the heaviest.
-
-\::Recurse on~$v$: call $\<FindHeavy>(v,e,T_e,H_e)$.
+\:Process all query paths whose bottom is~$u$ and record their peaks.
+This is accomplished by finding the index~$i$ of each path's top in~$T_p$ and reading
+the desired edge from~$P_p[i]$.
+
+\:For every son~$v$ of~$u$, process the edge $e=uv$:
+
+\::Construct the array of tops~$T_e$ for the edge~$e$: Start with~$T_p$, remove
+   the tops of the paths that do not contain~$e$ and add the vertex~$u$ itself
+   if there is a~query path which has~$u$ as its top and whose bottom lies somewhere
+   in the subtree rooted at~$v$.
+
+\::Prepare the array of the peaks~$P_e$: Start with~$P_p$, remove the entries
+   corresponding to the tops that are no longer active. If $u$ became an~active
+   top, append~$e$ to the array.
+
+\::Finish~$P_e$:
+   Since the paths leading to all active tops have been extended by the
+   edge~$e$, compare $w(e)$ with weights of the edges recorded in~$P_e$ and replace
+   those edges which are lighter by~$e$.
+   Since $P_p$ was sorted, we can use binary search
+   to locate the boundary between the lighter and heavier edges in~$P_e$.
+
+\::Recurse on~$v$: call $\<FindPeaks>(v,e,T_e,P_e)$.
  \endalgo
  
  \>As we need a~parent edge to start the recursion, we add an~imaginary parent
  edge~$p_0$ of the root vertex~$r$, for which no queries are defined. We can
  \endalgo
  
  \>As we need a~parent edge to start the recursion, we add an~imaginary parent
  edge~$p_0$ of the root vertex~$r$, for which no queries are defined. We can
-therefore start with $\<FindHeavy>(r,p_0,\emptyset,\emptyset)$.
+therefore start with $\<FindPeaks>(r,p_0,\emptyset,\emptyset)$.
  
  Let us account for the comparisons:
  
  \lemma\id{vercompares}%
  
  Let us account for the comparisons:
  
  \lemma\id{vercompares}%
-When the procedure \<FindHeavy> is called on the transformed problem, it
-performs $\O(n+m)$ comparisons, where $n$ is the size of the tree and
-$m$ is the number of query paths.
+When the procedure \<FindPeaks> is called on the balanced problem, it
+performs $\O(n+q)$ comparisons, where $n$ is the size of the tree and
+$q$ is the number of query paths.
  
  \proof
  We will calculate the number of comparisons~$c_i$ performed when processing the edges
  going from the $(i+1)$-th to the $i$-th level of the tree.
  
  \proof
  We will calculate the number of comparisons~$c_i$ performed when processing the edges
  going from the $(i+1)$-th to the $i$-th level of the tree.
-The levels are numbered from the bottom, so leaves are at level~0, the root
-is at level $\ell\le \lceil \log_2 n\rceil$ and there are $n_i\le n/2^i$ vertices
-at the $i$-th level, so we consider exactly $n_i$ edges.
+The levels are numbered from the bottom, so leaves are at level~0 and the root
+is at level $\ell\le \lceil \log_2 n\rceil$. There are $n_i\le n/2^i$ vertices
+at the $i$-th level, so we consider exactly $n_i$ edges. To avoid taking a~logarithm
+of zero, we define $\vert T_e\vert=1$ for $T_e=\emptyset$.
  \def\eqalign#1{\null\,\vcenter{\openup\jot
    \ialign{\strut\hfil$\displaystyle{##}$&$\displaystyle{{}##}$\hfil
        \crcr#1\crcr}}\,}
  $$\vcenter{\openup\jot\halign{\strut\hfil $\displaystyle{#}$&$\displaystyle{{}#}$\hfil&\quad#\hfil\cr
  \def\eqalign#1{\null\,\vcenter{\openup\jot
    \ialign{\strut\hfil$\displaystyle{##}$&$\displaystyle{{}##}$\hfil
        \crcr#1\crcr}}\,}
  $$\vcenter{\openup\jot\halign{\strut\hfil $\displaystyle{#}$&$\displaystyle{{}#}$\hfil&\quad#\hfil\cr
-c_i &\le \sum_e \left( 1 + \log \vert T_e\vert \right)&(Total cost of the binary search.)\cr
+c_i &\le \sum_e \left( 1 + \log \vert T_e\vert \right)&(Total cost of the binary searches.)\cr
      &\le n_i + \sum_e \log\vert T_e\vert&(We sum over $n_i$ edges.)\cr
      &\le n_i + n_i \cdot {\sum_e \log\vert T_e\vert \over n_i}&(Consider the average of logarithms.) \cr
      &\le n_i + n_i \cdot \log{\sum_e \vert T_e\vert \over n_i}&(Logarithm is concave.) \cr
      &\le n_i + \sum_e \log\vert T_e\vert&(We sum over $n_i$ edges.)\cr
      &\le n_i + n_i \cdot {\sum_e \log\vert T_e\vert \over n_i}&(Consider the average of logarithms.) \cr
      &\le n_i + n_i \cdot \log{\sum_e \vert T_e\vert \over n_i}&(Logarithm is concave.) \cr
-    &\le n_i + n_i \cdot \log{m\over n_i}&(Bound the number of tops by queries.) \cr
-    &= n_i \cdot \left( 1 + \log\left({m\over n}\cdot{n\over n_i}\right) \right)\cr
-    &= n_i + n_i\log{m\over n} + n_i\log{n\over n_i}.\cr
+    &\le n_i + n_i \cdot \log{q+n\over n_i}&(Bound the number of tops by queries.) \cr
+    &= n_i \cdot \left( 1 + \log\left({q+n\over n}\cdot{n\over n_i}\right) \right)\cr
+    &= n_i + n_i\log{q+n\over n} + n_i\log{n\over n_i}.\cr
  }}$$
  Summing over all levels, we estimate the total number of comparisons as:
  $$
  }}$$
  Summing over all levels, we estimate the total number of comparisons as:
  $$
-c = \sum_i c_i = \left( \sum_i n_i \right) + \left( \sum_i n_i \log{m\over n}\right) + \left( \sum_i n_i \log{n\over n_i}\right).
+c = \sum_i c_i = \left( \sum_i n_i \right) + \left( \sum_i n_i \log{q+n\over n}\right) + \left( \sum_i n_i \log{n\over n_i}\right).
  $$
  $$
-The first part is equal to~$n$, the second one to $n\log(m/n)\le m$. For the third
-one, we would like to plug in $n_i \le n/2^i$, but we unfortunately have one~$n_i$
+The first part is equal to~$n$, the second one to $n\log((q+n)/n)\le q+n$. For the third
+one, we would like to plug in the bound $n_i \le n/2^i$, but we unfortunately have one~$n_i$
  in the denominator. We save the situation by observing that the function $f(x)=x\log(n/x)$
  is decreasing\foot{We can easily check the derivative: $f(x)=(x\ln n-x\ln x)/\ln 2$, so $f'(x)\cdot \ln2 =
  \ln n - \ln x - 1$. We want $f'(x)<0$ and therefore $\ln x > \ln n - 1$, i.e., $x > n/e$.}
  in the denominator. We save the situation by observing that the function $f(x)=x\log(n/x)$
  is decreasing\foot{We can easily check the derivative: $f(x)=(x\ln n-x\ln x)/\ln 2$, so $f'(x)\cdot \ln2 =
  \ln n - \ln x - 1$. We want $f'(x)<0$ and therefore $\ln x > \ln n - 1$, i.e., $x > n/e$.}
-for $x > n/e$, so except for $i=0$ and $i=1$ it holds that:
+for $x > n/e$, so for $i\ge 2$ it holds that:
  $$
  n_i\log{n\over n_i} \le {n\over 2^i}\cdot\log{n\over n/2^i} = {n\over 2^i} \cdot i.
  $$
  $$
  n_i\log{n\over n_i} \le {n\over 2^i}\cdot\log{n\over n/2^i} = {n\over 2^i} \cdot i.
  $$
@@ -749,62 +750,62 @@ $$\eqalign{
  }$$
  Putting all three parts together, we conclude that:
  $$
  }$$
  Putting all three parts together, we conclude that:
  $$
-c \le n + m + \O(n) = \O(n+m). \qedmath
+c \le n + (q+n) + \O(n) = \O(n+q). \qedmath
  $$
  
  \para
  $$
  
  \para
-When we combine this lemma with the above reduction, we get the following theorem:
+When we combine this lemma with the above reduction from general trees to balanced trees, we get the following theorem:
  
  
-\thmn{Verification of the MST, Koml\'os \cite{komlos:verify}}
+\thmn{Verification of the MST, Koml\'os \cite{komlos:verify}}\id{verify}%
  For every weighted graph~$G$ and its spanning tree~$T$, it is sufficient to
  perform $\O(m)$ comparisons of edge weights to determine whether~$T$ is minimum
  and to find all $T$-light edges in~$G$.
  
  \proof
  For every weighted graph~$G$ and its spanning tree~$T$, it is sufficient to
  perform $\O(m)$ comparisons of edge weights to determine whether~$T$ is minimum
  and to find all $T$-light edges in~$G$.
  
  \proof
-We first transform the problem to finding the heaviest edges for a~set
+We first transform the problem to finding all peaks of a~set
  of query paths in~$T$ (these are exactly the paths covered by the edges
  of $G\setminus T$). We use the reduction from Lemma \ref{verbranch} to get
  an~equivalent problem with a~full branching tree and a~set of parent-descendant
  of query paths in~$T$ (these are exactly the paths covered by the edges
  of $G\setminus T$). We use the reduction from Lemma \ref{verbranch} to get
  an~equivalent problem with a~full branching tree and a~set of parent-descendant
-paths, which costs $\O(m+n)$ comparisons.
-Then we run the \<FindHeavy> procedure (\ref{findheavy}) to find
-the heaviest edges and according to Lemma \ref{vercompares} it spends
-another $\O(m+n)$ comparisons. Since we (as always) assume that~$G$ is connected,
-$\O(m+n)=\O(m)$.
+paths. The reduction costs $\O(m+n)$ comparisons.
+Then we run the \<FindPeaks> procedure (Algorithm \ref{findpeaks}) to find
+the tops of all query paths. According to Lemma \ref{vercompares}, this spends another $\O(m+n)$
+comparisons. Since we (as always) assume that~$G$ is connected, $\O(m+n)=\O(m)$.
  \qed
  
  \qed
  
-\rem
-The problem of computing path maxima or minima in a~tree has several other interesting applications,
-such as computing minimum cuts separating given pairs of vertices in a~given weighted undirected
-graph~$G$. We construct a~Gomory-Hu tree~$T$ for the graph as described in \cite{gomoryhu}
+\paran{Applications}%
+The problem of computing path maxima or minima in a~weighted tree has several other interesting
+applications. One of them is computing minimum cuts separating given pairs of vertices in a~given
+weighted undirected graph~$G$. We construct a~Gomory-Hu tree~$T$ for the graph as described in \cite{gomoryhu}
  (see also \cite{bhalgat:ght} for a~more efficient algorithm running in time
  (see also \cite{bhalgat:ght} for a~more efficient algorithm running in time
-$\widetilde\O(mn)$ for unit-cost graphs). This tree has the property that for every two
+$\widetilde\O(mn)$ for unit-cost graphs). The crucial property of this tree is that for every two
  vertices $u$, $v$ of the graph~$G$, the minimum-cost edge on $T[u,v]$ has the same cost
  as the minimum cut separating $u$ and~$v$ in~$G$. Since the construction of~$T$ generally
  vertices $u$, $v$ of the graph~$G$, the minimum-cost edge on $T[u,v]$ has the same cost
  as the minimum cut separating $u$ and~$v$ in~$G$. Since the construction of~$T$ generally
-takes $\Omega(n^2)$ time, we could also precompute the minima for all pairs of vertices
-at no extra cost. This would however require quadratic space, while the method of this
-section needs only $\O(n+q)$ to process~$q$ queries.
+takes $\Omega(n^2)$ time, we could of course invest this time in precomputing the minima for
+all pairs of vertices. This would however require quadratic space, so we can better use
+the method of this section which fits in $\O(n+q)$ space for $q$~queries.
  
  
-\rem
+\paran{Dynamic verification}%
  A~dynamic version of the problem is also often considered. It calls for a~data structure
  A~dynamic version of the problem is also often considered. It calls for a~data structure
-representing a~weighted tree with operations for modifying the structure of the tree
+representing a~weighted forest with operations for modifying the structure of the forest
  and querying minima or maxima on paths. Sleator and Tarjan have shown in \cite{sleator:trees}
  and querying minima or maxima on paths. Sleator and Tarjan have shown in \cite{sleator:trees}
-how to do this in $\O(\log n)$ time amortized per operation, which for example
-allows an~implementation of the Dinic's maximum flow algorithm \cite{dinic:flow}
+how to do this in $\O(\log n)$ time amortized per operation, which leads to
+an~implementation of the Dinic's maximum flow algorithm \cite{dinic:flow}
  in time $\O(mn\log n)$.
  
  %--------------------------------------------------------------------------------
  
  in time $\O(mn\log n)$.
  
  %--------------------------------------------------------------------------------
  
-\section{Verification in linear time}
+\section{Verification in linear time}\id{verifysect}%
  
  We have proven that $\O(m)$ edge weight comparisons suffice to verify minimality
  
  We have proven that $\O(m)$ edge weight comparisons suffice to verify minimality
-of a~given spanning tree. In this section, we will show an~algorithm for the RAM,
+of a~given spanning tree. Now we will show an~algorithm for the RAM
  which finds the required comparisons in linear time. We will follow the idea
  which finds the required comparisons in linear time. We will follow the idea
-of King from \cite{king:verify}, but as we have the power of the RAM data structures
-from Section~\ref{bitsect} at our command, the low-level details will be easier.
+of King from \cite{king:verifytwo}, but as we have the power of the RAM data structures
+from Section~\ref{bitsect} at our command, the low-level details will be easier,
+especially the construction of vertex and edge labels.
  
  
-\paran{Reduction}
+\para
  First of all, let us make sure that the reduction to fully branching trees
  First of all, let us make sure that the reduction to fully branching trees
-in Lemma \ref{verbranch} can be made run in linear time. As already noticed
+in the Balancing lemma (\ref{verbranch}) can be made run in linear time. As already noticed
  in the proof, the Bor\o{u}vka's algorithm runs in linear time. Constructing
  the Bor\o{u}vka tree in the process adds at most a~constant overhead to every
  step of the algorithm.
  in the proof, the Bor\o{u}vka's algorithm runs in linear time. Constructing
  the Bor\o{u}vka tree in the process adds at most a~constant overhead to every
  step of the algorithm.
@@ -812,7 +813,7 @@ step of the algorithm.
  Finding the common ancestors is not trivial, but Harel and Tarjan have shown
  in \cite{harel:nca} that linear time is sufficient on the RAM. Several more
  accessible algorithms have been developed since then (see the Alstrup's survey
  Finding the common ancestors is not trivial, but Harel and Tarjan have shown
  in \cite{harel:nca} that linear time is sufficient on the RAM. Several more
  accessible algorithms have been developed since then (see the Alstrup's survey
-paper \cite{alstrup:nca} and a~particularly elegant algorithm shown by Bender
+paper \cite{alstrup:nca} and a~particularly elegant algorithm described by Bender
  and Falach-Colton in \cite{bender:lca}). Any of them implies the following
  theorem:
  
  and Falach-Colton in \cite{bender:lca}). Any of them implies the following
  theorem:
  
@@ -820,72 +821,72 @@ theorem:
  On the RAM, it is possible to preprocess a~tree~$T$ in time $\O(n)$ and then
  answer lowest common ancestor queries presented online in constant time.
  
  On the RAM, it is possible to preprocess a~tree~$T$ in time $\O(n)$ and then
  answer lowest common ancestor queries presented online in constant time.
  
-\proof
-See for example Bender and Falach-Colton \cite{bender:lca}.
-\qed
-
  \cor
  The reductions in Lemma \ref{verbranch} can be performed in time $\O(m)$.
  
  \para
  \cor
  The reductions in Lemma \ref{verbranch} can be performed in time $\O(m)$.
  
  \para
-Having the reduced problem at hand, we have to implement the procedure \<FindHeavy>
-of Algorithm \ref{findheavy} efficiently. We need a~compact representation of
-the arrays $T_e$ and~$H_e$ by vectors, so that the overhead of the algorithm
-will be linear in the number of comparisons performed.
+Having the balanced problem at hand, it remains to implement the procedure \<FindPeaks>
+of Algorithm \ref{findpeaks} efficiently. We need a~compact representation of
+the arrays $T_e$ and~$P_e$, which will allow to reduce the overhead of the algorithm
+to time linear in the number of comparisons performed. To achieve
+this goal, we will encode the arrays in RAM vectors (see Section \ref{bitsect}
+for the vector operations).
  
  \defn
  
  
  \defn
  
-\em{Vertex identifiers:} Since all vertices referred to by the procedure
-lie on the path from root to the current vertex~$u$, we modify the algorithm
-to keep a~stack of these vertices in an~array and refer to each vertex by its
+\em{Vertex identifiers:} Since all vertices processed by the procedure
+lie on the path from the root to the current vertex~$u$, we modify the algorithm
+to keep a~stack of these vertices in an~array. We will often refer to each vertex by its
  index in this array, i.e., by its depth. We will call these identifiers \df{vertex
  index in this array, i.e., by its depth. We will call these identifiers \df{vertex
-labels} and we note that each label require only $\ell=\lceil \log\log n\rceil$
+labels} and we note that each label requires only $\ell=\lceil \log\lceil\log n\rceil\rceil$
  bits. As every tree edge is uniquely identified by its bottom vertex, we can
  use the same encoding for \df{edge labels.}
  
  bits. As every tree edge is uniquely identified by its bottom vertex, we can
  use the same encoding for \df{edge labels.}
  
-\em{Slots:} As we will need several operations which are not computable
+\em{Slots:} As we are going to need several operations which are not computable
  in constant time on the RAM, we precompute tables for these operations
  in constant time on the RAM, we precompute tables for these operations
-like we did in the Q-Heaps (cf.~Lemma \ref{qhprecomp}). A~table for word-sized
+like we did in the Q-heaps (cf.~Lemma \ref{qhprecomp}). A~table for word-sized
  arguments would take too much time to precompute, so we will generally store
  arguments would take too much time to precompute, so we will generally store
-our data structures in \df{slots} of $s=1/3\cdot\lceil\log n\rceil$ bits each.
-We will show soon that it is possible to precompute a~table of any reasonable
+our data structures in \df{slots} of $s=\lceil 1/3\cdot\log n\rceil$ bits each.
+We will soon show that it is possible to precompute a~table of any reasonable
  function whose arguments fit in two slots.
  
  function whose arguments fit in two slots.
  
-\em{Top masks:} The array~$T_e$ will be represented by bit masks. For each
+\em{Top masks:} The array~$T_e$ will be represented by a~bit mask~$M_e$ called the \df{top mask.} For each
  of the possible tops~$t$ (i.e., the ancestors of the current vertex), we store
  of the possible tops~$t$ (i.e., the ancestors of the current vertex), we store
-a~single bit telling whether $t\in T_e$. Each bit mask fits in $\lceil\log n\rceil$
+a~single bit telling whether $t\in T_e$. Each top mask fits in $\lceil\log n\rceil$
  bits and therefore in a~single machine word. If needed, it can be split to three slots.
  bits and therefore in a~single machine word. If needed, it can be split to three slots.
+Unions and intersections of sets of tops then translate to $\band$/$\bor$
+on the top masks.
  
  \em{Small and big lists:} The heaviest edge found so far for each top is stored
  
  \em{Small and big lists:} The heaviest edge found so far for each top is stored
-by the algorithm in the array~$H_e$. Instead of keeping the real array,
+by the algorithm in the array~$P_e$. Instead of keeping the real array,
  we store the labels of these edges in a~list encoded in a~bit string.
  we store the labels of these edges in a~list encoded in a~bit string.
-Depending on the size of the list, we use two one of two possible encodings:
-\df{Small lists} are stored in a~vector which fits in a~single slot, with
-the unused entries filled by a~special constant, so that we can infer the
+Depending on the size of the list, we use one of two possible encodings:
+\df{Small lists} are stored in a~vector that fits in a~single slot, with
+the unused fields filled by a~special constant, so that we can easily infer the
  length of the list.
  
  length of the list.
  
-If the data do not fit in a~small list, we use a~\df{big list} instead, which
+If the data do not fit in a~small list, we use a~\df{big list} instead. It
  is stored in $\O(\log\log n)$ words, each of them containing a~slot-sized
  vector. Unlike the small lists, we use the big lists as arrays. If a~top~$t$ of
  is stored in $\O(\log\log n)$ words, each of them containing a~slot-sized
  vector. Unlike the small lists, we use the big lists as arrays. If a~top~$t$ of
-depth~$d$ is active, we keep the corresponding entry of~$H_e$ in the $d$-th
-entry of the big list. Otherwise, we keep that entry unused.
+depth~$d$ is active, we keep the corresponding entry of~$P_e$ in the $d$-th
+field of the big list. Otherwise, we keep that entry unused.
  
  
-We will want to perform all operations on small lists in constant time,
+We want to perform all operations on small lists in constant time,
  but we can afford spending time $\O(\log\log n)$ on every big list. This
  but we can afford spending time $\O(\log\log n)$ on every big list. This
-is true because whenever we need a~big list, $\vert T_e\vert = \Omega(\log n/\log\log n)$,
-so we need $\log\vert T_e\vert = \Omega(\log\log n)$ comparisons anyway.
+is true because whenever we use a~big list, $\vert T_e\vert = \Omega(\log n/\log\log n)$,
+hence we need $\log\vert T_e\vert = \Omega(\log\log n)$ comparisons anyway.
  
  \em{Pointers:} When we need to construct a~small list containing a~sub-list
  
  \em{Pointers:} When we need to construct a~small list containing a~sub-list
-of a~big list, we do not have enough time to see the whole big list. To solve
-this problem, we will introduce \df{pointers} as another kind of edge identifiers.
+of a~big list, we do not have enough time to see the whole big list. To handle
+this, we introduce \df{pointers} as another kind of edge identifiers.
  A~pointer is an~index to the nearest big list on the path from the small
  list containing the pointer to the root. As each big list has at most $\lceil\log n\rceil$
  A~pointer is an~index to the nearest big list on the path from the small
  list containing the pointer to the root. As each big list has at most $\lceil\log n\rceil$
-entries, the pointer fits in~$\ell$ bits, but we need one extra bit to distinguish
-between normal labels and pointers.
+fields, the pointer fits in~$\ell$ bits, but we need one extra bit to distinguish
+between regular labels and pointers.
  
  
-\lemma
-When~$f$ is a~function of two arguments computable in polynomial time, we can
-precompute a~table of the values of~$f$ for all values of arguments which fit
+\lemman{Precomputation of tables}
+When~$f$ is a~function of up to two arguments computable in polynomial time, we can
+precompute a~table of the values of~$f$ for all values of arguments that fit
  in a~single slot. The precomputation takes $\O(n)$ time.
  
  \proof
  in a~single slot. The precomputation takes $\O(n)$ time.
  
  \proof
@@ -895,187 +896,366 @@ possible values of arguments, so the precomputation takes time $\O(n^{2/3}\cdot\
  \qed
  
  \example
  \qed
  
  \example
-We can assume we can compute the following functions in constant time (after $\O(n)$ preprocessing):
+As we can afford spending $\O(n)$ time on preprocessing,
+we can assume that we can compute the following functions in constant time:
  
  \itemize\ibull
  
  \itemize\ibull
-\:$\<Weight>(x)$ --- computes the Hamming weight of a~slot-sized number~$x$
+\:$\<Weight>(x)$ --- the Hamming weight of a~slot-sized number~$x$
  (we already considered this operation in Algorithm \ref{lsbmsb}, but we needed
  (we already considered this operation in Algorithm \ref{lsbmsb}, but we needed
-quadratic word size for it). We can easily extend this to $\log n$-bit numbers
+quadratic word size for it). We can easily extend this function to $\log n$-bit numbers
  by splitting the number in three slots and adding their weights.
  
  by splitting the number in three slots and adding their weights.
  
-\:$\<FindKth>(x,k)$ --- find the $k$-th set bit from the top of the slot-sized
+\:$\<FindKth>(x,k)$ --- the $k$-th set bit from the top of the slot-sized
  number~$x$. Again, this can be extended to multi-slot numbers by calculating
  the \<Weight> of each slot first and then finding the slot containing the
  number~$x$. Again, this can be extended to multi-slot numbers by calculating
  the \<Weight> of each slot first and then finding the slot containing the
-$k$-th one.
+$k$-th~\1.
  
  \:$\<Bits>(m)$ --- for a~slot-sized bit mask~$m$, it returns a~small list
  
  \:$\<Bits>(m)$ --- for a~slot-sized bit mask~$m$, it returns a~small list
-of the positions of bits set in~$m$.
+of the positions of the bits set in~$\(m)$.
+
+\:$\<Select>(x,m)$ --- constructs a~slot containing the substring of $\(x)$
+selected by the bits set in~$\(m)$.
  
  \:$\<SubList>(x,m)$ --- when~$x$ is a~small list and~$m$ a bit mask, it returns
  a~small list containing the elements of~$x$ selected by the bits set in~$m$.
  \endlist
  
  \para
  
  \:$\<SubList>(x,m)$ --- when~$x$ is a~small list and~$m$ a bit mask, it returns
  a~small list containing the elements of~$x$ selected by the bits set in~$m$.
  \endlist
  
  \para
-We will now show how to perform all parts of the procedure \<FindHeavy>
-in the required time.
+We will now show how to perform all parts of the procedure \<FindPeaks>
+in the required time. We will denote the size of the tree by~$n$ and the
+number of query paths by~$q$.
  
  \lemma
  
  \lemma
+Depths of all vertices and all top masks can be computed in time $\O(n+q)$.
  
  \proof
  
  \proof
+Run depth-first search on the tree, assign the depth of a~vertex when entering
+it and construct its top mask when leaving it. The top mask can be obtained
+by $\bor$-ing the masks of its sons, excluding the level of the sons and
+including the tops of all query paths that have their bottoms at the current vertex
+(the depths of the tops are already assigned).
  \qed
  
  \qed
  
-\lemma
-The array $H_e$ can be indexed in constant time.
+\lemma\id{verth}%
+The arrays $T_e$ and~$P_e$ can be indexed in constant time.
  
  \proof
  
  \proof
+Indexing~$T_e$ is exactly the operation \<FindKth> applied on the corresponding
+top mask~$M_e$.
+
+If $P_e$ is stored in a~big list, we calculate the index of the particular
+slot and the position of the field inside the slot. This field can be then
+extracted using bit masking and shifts.
+
+If it is a~small list, we extract the field directly, but we have to
+dereference it in case it is a pointer. We modify the recursion in \<FindPeaks>
+to pass the depth of the lowest edge endowed with a~big list and when we
+encounter a~pointer, we index this big list.
  \qed
  
  \qed
  
+\lemma\id{verhe}%
+For an~arbitrary active top~$t$, the corresponding entry of~$P_e$ can be
+extracted in constant time.
  
  
-\FIXME{Mention online version}
+\proof
+We look up the precomputed depth~$d$ of~$t$ first.
+If $P_e$ is stored in a~big list, we extract the $d$-th entry of the list.
+If the list is small, we find the position of the particular field
+by counting bits of the top mask~$M_e$ at position~$d$ and higher
+(this is \<Weight> of $M_e$ with the lower bits masked out).
+\qed
  
  
-%--------------------------------------------------------------------------------
+\lemma\id{verfh}%
+The procedure \<FindPeaks> processes an~edge~$e$ in time $\O(\log \vert T_e\vert + q_e)$,
+where $q_e$~is the number of query paths having~$e$ as its bottom edge.
+
+\proof
+The edge is examined in steps 1, 3, 4 and~5 of the algorithm. We will show how to
+perform each of these steps in constant time if $P_e$ is a~small list or
+$\O(\log\log n)$ if it is big.
+
+\em{Step~1} looks up $q_e$~tops in~$P_e$ and we already know from Lemma \ref{verhe}
+how to do that in constant time per top.
+
+\em{Step~3} is trivial as we have already computed the top masks and we can
+reconstruct the entries of~$T_e$ in constant time according to Lemma \ref{verth}.
  
  
-\section{A~randomized algorithm}\id{randmst}%
+\em{Step~5} involves binary search on~$P_e$ in $\O(\log\vert T_e\vert)$ comparisons,
+each of them indexes~$P_e$, which is $\O(1)$ again by Lemma \ref{verth}. Rewriting the
+lighter edges is $\O(1)$ for small lists by replication and bit masking, for a~big
+list we do the same for each of its slots.
+
+\em{Step~4} is the only non-trivial one. We already know which tops to select
+(we have the top masks $M_e$ and~$M_p$ precomputed), but we have to carefully
+extract the sublist.
+We need to handle these four cases:
+
+\itemize\ibull
+\:\em{Small from small:} We use $\<Select>(T_e,T_p)$ to find the fields of~$P_p$
+that shall be deleted by a~subsequent call to \<SubList>. Pointers
+can be retained as they still refer to the same ancestor list.
+
+\:\em{Big from big:} We can copy the whole~$P_p$, since the layout of the
+big lists is fixed and the items, which we do not want, simply end up as unused
+fields in~$P_e$.
+
+\:\em{Small from big:} We use the operation \<Bits> to construct a~list
+of pointers (we use bit masking to add the ``this is a~pointer'' flags).
+
+\:\em{Big from small:} First we have to dereference the pointers in the
+small list~$S$. For each slot~$B_i$ of the ancestor big list, we construct
+a~subvector of~$S$ containing only the pointers referring to that slot,
+adjusted to be relative to the beginning of the slot (we use \<Compare>
+and \<Replicate> from Algorithm \ref{vecops} and bit masking). Then we
+use a~precomputed table to replace the pointers by the fields of~$B_i$
+they point to. We $\bor$ together the partial results and we again have
+a~small list.
+
+Finally, we have to spread the fields of this small list to the whole big list.
+This is similar: for each slot of the big list, we find the part of the small
+list keeping the fields we want (we call \<Weight> on the sub-words of~$M_e$ before
+and after the intended interval of depths) and we use a~tabulated function
+to shift the fields to the right locations in the slot (controlled by the
+sub-word of~$M_e$ in the intended interval).
+\qeditem
+\endlist
+
+\>We are now ready to combine these steps and get the following theorem:
+
+\thmn{Verification of MST on the RAM}\id{ramverify}%
+There is a~RAM algorithm which for every weighted graph~$G$ and its spanning tree~$T$
+determines whether~$T$ is minimum and finds all $T$-light edges in~$G$ in time $\O(m)$.
+
+\proof
+Implement the Koml\'os's algorithm from Theorem \ref{verify} with the data
+structures developed in this section.
+According to Lemma \ref{verfh}, the algorithm runs in time $\sum_e \O(\log\vert T_e\vert + q_e)
+= \O(\sum_e \log\vert T_e\vert) + \O(\sum_e q_e)$. The second sum is $\O(m)$
+as there are $\O(1)$ query paths per edge, the first sum is $\O(\#\hbox{comparisons})$,
+which is $\O(m)$ by Theorem \ref{verify}.
+\qed
+
+\>In Section \ref{kbestsect}, we will need a~more specialized statement:
+
+\cor\id{rampeaks}%
+There is a~RAM algorithm which for every weighted tree~$T$ and a~set~$P$ of
+paths in~$T$ calculates the peaks of these paths in time $\O(n(T) + \vert P\vert)$.
+
+\paran{Verification on the Pointer Machine}\id{pmverify}%
+Buchsbaum et al.~\cite{buchsbaum:verify} have recently shown that linear-time
+verification can be achieved even on the Pointer Machine. They first solve the
+problem of finding the lowest common ancestors for a~set of pairs of vertices
+by batch processing: They combine an~algorithm of time complexity $\O(m\timesalpha(m,n))$
+based on the Disjoint Set Union data structure with the framework of topological graph
+computations described in Section \ref{bucketsort}. Then they use a~similar
+technique for finding the peaks themselves.
+
+\paran{Online verification}%
+The online version of this problem has turned out to be more difficult. It calls for an~algorithm
+that preprocesses the tree and then answers queries for peaks of paths presented online. Pettie
+\cite{pettie:onlineverify} has proven an~interesting lower bound based on the inverses of the
+Ackermann's function. If we want to answer queries within $t$~comparisons, we
+have to invest $\Omega(n\log\lambda_t(n))$ time into preprocessing.\foot{$\lambda_t(n)$ is the
+$t$-th row inverse of the Ackermann's function, $\alpha(n)$ is its diagonal inverse. See
+\ref{ackerinv} for the exact definitions.} This implies that with
+preprocessing in linear time, the queries require $\Omega(\alpha(n))$ time.
  
  %--------------------------------------------------------------------------------
  
  
  %--------------------------------------------------------------------------------
  
-\section{Special cases and related problems}
-
-Finally, we will focus our attention on various special cases of the minimum
-spanning tree problem which frequently arise in practice.
-
-\examplen{Graphs with sorted edges}
-When the edges are already sorted by their weights, we can use the Kruskal's
-algorithm to find the MST in time $\O(m\timesalpha(n))$ (Theorem \ref{kruskal}).
-We however can do better: As the minimality of a~spanning tree depends only on the
-order of weights and not on the actual values (Theorem \ref{mstthm}), we can
-renumber the weights to $1, \ldots, m$ and find the MST using the Fredman-Willard
-algorithm for integer weights. According to Theorem \ref{intmst} it runs in
-time $\O(m)$ on the Word-RAM.
-
-\examplen{Graphs with a~small number of distinct weights}
-When the weights of edges are drawn from a~set of a~fixed size~$U$, we can
-sort them in linear time and so reduce the problem to the previous case.
-A~more practical way is to use the Jarn\'\i{}k's algorithm (\ref{jarnimpl}),
-but replace the heap by an~array of $U$~buckets. As the number of buckets
-is constant, we can find the minimum in constant time and hence the whole
-algorithm runs in time $\O(m)$, even on the Pointer Machine. For large
-values of~$U,$ we can build a~binary search tree or the van Emde-Boas
-tree (see Section \ref{ramdssect} and \cite{boas:vebt}) on the top of the buckets to bring the complexity
-of finding the minimum down to $\O(\log U)$ or $\O(\log\log U)$ respectively.
-
-\examplen{Graphs with floating-point weights}
-A~common case of non-integer weights are rational numbers in floating-point (FP)
-representation. Even in this case we will be able to find the MST in linear time.
-The most common representation of binary FP numbers specified by the IEEE
-standard 754-1985 \cite{ieee:binfp} has a~useful property:  When the
-bit strings encoding non-negative FP numbers are read as ordinary integers,
-the order of these integers is the same as of the original FP numbers. We can
-therefore once again replace the edge weights by integers and use the linear-time
-integer algorithm. While the other FP representations (see \cite{dgoldberg:fp} for
-an~overview) need not have this property, the corresponding integers can be adjusted
-in $\O(1)$ time to the format we need. (More advanced tricks of this type have been
-employed by Thorup in \cite{thorup:floatint} to extend his linear-time algorithm
-for single-source shortest paths to FP edge lengths.)
-
-\examplen{Graphs with bounded degrees}
-For graphs with vertex degrees bounded by a~constant~$\Delta$, the problem is either
-trivial (if $\Delta<3$) or as hard as for arbitrary graphs. There is a~simple linear-time
-transform of arbitrary graphs to graphs with maximum degree~3 which preserves the MST:
-
-\lemman{Degree reduction}\id{degred}%
-For every graph~$G$ there exists a~graph~$G'$ with maximum degree at most~3 and
-a~function $\pi: E(G)\rightarrow E(G')$ such that $\mst(G) = \pi^{-1}(\mst(G'))$.
-The graph $G'$ and the embedding~$\pi$ can be constructed in time $\O(m)$.
-
-\figure{french.eps}{\epsfxsize}{Degree reduction in Lemma~\ref{degred}}
+\section{A randomized algorithm}\id{randmst}%
+
+When we analysed the Contractive Bor\o{u}vka's algorithm in Section~\ref{contalg},
+we observed that while the number of vertices per iteration decreases exponentially,
+the number of edges generally does not, so we spend $\Theta(m)$ time on every phase.
+Karger, Klein and Tarjan \cite{karger:randomized} have overcome this problem by
+combining the Bor\o{u}vka's algorithm with filtering based on random sampling.
+This leads to a~randomized algorithm which runs in linear expected time.
+
+The principle of the filtering is simple: Let us consider any spanning tree~$T$
+of the input graph~$G$. Each edge of~$G$ that is $T$-heavy is the heaviest edge
+of some cycle, so by the Red lemma (\ref{redlemma}) it cannot participate in
+the MST of~$G$. We can therefore discard all $T$-heavy edges and continue with
+finding the MST on the reduced graph. Of course, not all choices of~$T$ are equally
+good, but it will soon turn out that when we take~$T$ as the MST of a~randomly selected
+subgraph, only a~small expected number of edges remains.
+
+Selecting a~subgraph at random will unavoidably produce disconnected subgraphs
+at occasion, so we will drop the implicit assumption that all graphs are
+connected for this section and we will always search for the minimum spanning forest.
+As we already noted (\ref{disconn}), with a~little bit of care our
+algorithms and theorems keep working.
+
+Since we need the MST verification algorithm for finding the $T$-heavy edges,
+we will assume that we are working on the RAM.
+
+\lemman{Random sampling, Karger \cite{karger:sampling}}\id{samplemma}%
+Let $H$~be a~subgraph of~$G$ obtained by including each edge independently
+with probability~$p$. Let further $F$~be the minimum spanning forest of~$H$. Then the
+expected number of $F$-nonheavy\foot{That is, $F$-light edges and also edges of~$F$ itself.}
+edges in~$G$ is at most $n/p$.
+
+\proof
+Let us observe that we can obtain the forest~$F$ by running the Kruskal's algorithm
+(\ref{kruskal}) combined with the random process producing~$H$ from~$G$. We sort all edges of~$G$
+by their weights and we start with an~empty forest~$F$. For each edge, we first
+flip a~biased coin (that gives heads with probability~$p$) and if it comes up
+tails, we discard the edge. Otherwise we perform a~single step of the Kruskal's
+algorithm: We check whether $F+e$ contains a~cycle. If it does, we discard~$e$, otherwise
+we add~$e$ to~$F$. At the end, we have produced the subgraph~$H$ and its MSF~$F$.
+
+When we  exchange the check for cycles with flipping the coin, we get an~equivalent
+algorithm which will turn out to be more convenient to analyse:
+\algo
+\:If $F+e$ contains a~cycle, we immediately discard~$e$ (we can flip
+the coin, but we need not to, because the edge will be discarded regardless of
+the outcome). We note that~$e$ is $F$-heavy with respect to both the
+current state of~$F$ and the final MSF.
+\:If $F+e$ is acyclic, we flip the coin:
+\::If it comes up heads, we add~$e$ to~$F$. In this case, $e$~is neither $F$-light
+   nor $F$-heavy.
+\::If it comes up tails, we discard~$e$. Such edges are $F$-light.
+\endalgo
+
+The number of $F$-nonheavy edges is therefore equal to the total number of the coin
+flips in step~2 of this algorithm. We also know that the algorithm stops before
+it adds $n$~edges to~$F$. Therefore it flips at most as many coins as a~simple
+random process that repeatedly flips until it gets~$n$ heads. As waiting for
+every occurrence of heads takes expected time~$1/p$, waiting for~$n$ heads
+must take $n/p$. This is the bound we wanted to achieve.
+\qed
+
+\para
+We will formulate the algorithm as a~doubly-recursive procedure. It alternatively
+performs steps of the Bor\o{u}vka's algorithm and filtering based on the above lemma.
+The first recursive call computes the MSF of the sampled subgraph, the second one
+finds the MSF of the original graph, but without the heavy edges.
+
+As in all contractive algorithms, we use edge labels to keep track of the
+original locations of the edges in the input graph. For the sake of simplicity,
+we do not mention it in the algorithm explicitly.
+
+\algn{MSF by random sampling --- the KKT algorithm}\id{kkt}%
+\algo
+\algin A~graph $G$ with an~edge comparison oracle.
+\:Remove isolated vertices from~$G$. If no vertices remain, stop and return an~empty forest.
+\:Perform two Bor\o{u}vka steps (iterations of Algorithm \ref{contbor}) on~$G$ and
+  remember the set~$B$ of the edges having been contracted.
+\:Select a~subgraph~$H\subseteq G$ by including each edge independently with
+  probability $1/2$.
+\:$F\=\msf(H)$ calculated recursively.
+\:Construct $G'\subseteq G$ by removing all $F$-heavy edges of~$G$.
+\:$R\=\msf(G')$ calculated recursively.
+\:Return $R\cup B$.
+\algout The minimum spanning forest of~$G$.
+\endalgo
+
+\nota
+Let us analyse the time complexity of this algorithm by studying properties of its \df{recursion tree.}
+This tree describes the subproblems processed by the recursive calls. For any vertex~$v$
+of the tree, we denote the number of vertices and edges of the corresponding subproblem~$G_v$
+by~$n_v$ and~$m_v$ respectively.
+If $m_v>0$, the recursion continues: the left son of~$v$ corresponds to the
+call on the sampled subgraph~$H_v$, the right son to the reduced graph~$G^\prime_v$.
+(Similarly, we use letters subscripted with~$v$ for the state of the other variables
+of the algorithm.)
+The root of the recursion tree is obviously the original graph~$G$, the leaves are
+trivial graphs with no edges.
+
+\obs
+The Bor\o{u}vka steps together with the removal of isolated vertices guarantee that the number
+of vertices drops at least by a~factor of four in every recursive call. The size of a~subproblem~$G_v$
+at level~$i$ is therefore at most $n/4^i$ and the depth of the tree is at most $\lceil\log_4 n\rceil$.
+As there are no more than~$2^i$ subproblems at level~$i$, the sum of all~$n_v$'s
+on that level is at most $n/2^i$, which is at most~$2n$ when summed over the whole tree.
+
+We are going to show that the worst case of the KKT algorithm is not worse than
+of the plain contractive algorithm, while the average case is linear.
+
+\lemma
+For every subproblem~$G_v$, the KKT algorithm runs in time $\O(m_v+n_v)$ plus the time
+spent on the recursive calls.
+
+\proof
+We know from Lemma \ref{contiter} that each Bor\o{u}vka step takes time $\O(m_v+n_v)$.\foot{We
+add $n_v$ as the graph could be disconnected.}
+The selection of the edges of~$H_v$ is straightforward.
+Finding the $F_v$-heavy edges is not, but we have already shown in Theorem \ref{ramverify}
+that linear time is sufficient on the RAM.
+\qed
+
+\thmn{Worst-case complexity of the KKT algorithm}
+The KKT algorithm runs in time $\O(\min(n^2,m\log n))$ in the worst case on the RAM.
+
+\proof
+The argument for the $\O(n^2)$ bound is similar to the analysis of the plain
+contractive algorithm. As every subproblem~$G_v$ is a~simple graph, the number
+of its edges~$m_v$ is less than~$n_v^2$. By the previous lemma, we spend time
+$\O(n_v^2)$ on it. Summing over all subproblems yields $\sum_v \O(n_v^2) =
+\O((\sum_v n_v)^2) = \O(n^2)$.
+
+In order to prove the $\O(m\log n)$ bound, it is sufficient to show that the total time
+spent on every level of the recursion tree is $\O(m)$. Suppose that $v$~is a~vertex
+of the recursion tree with its left son~$\ell$ and right son~$r$. Some edges of~$G_v$
+are removed in the Bor\o{u}vka steps, let us denote their number by~$b_v$.
+The remaining edges fall either to~$G_\ell = H_v$, or to $G_r = G^\prime_v$, or possibly
+to both.
+
+We can observe that the intersection $G_\ell\cap G_r$ cannot be large: The edges of~$H_v$ that
+are not in the forest~$F_v$ are $F_v$-heavy, so they do not end up in~$G_r$. Therefore the
+intersection can contain only the edges of~$F_v$. As there are at most $n_v/4$ such edges,
+we have $m_\ell + m_r + b_v \le m_v + n_v/4$.
+
+On the other hand, the first Bor\o{u}vka step selects at least $n_v/2$ edges,
+so $b_v \ge n_v/2$. The duplication of edges between $G_\ell$ and~$G_r$ is therefore
+compensated by the loss of edges by contraction and $m_\ell + m_r \le m_v$. So the total
+number of edges per level does not decrease and it remains to apply the previous lemma.
+\qed
+
+\thmn{Expected complexity of the KKT algorithm}\id{kktavg}%
+The expected time complexity of the KKT algorithm on the RAM is $\O(m)$.
  
  \proof
  
  \proof
-We show how to eliminate a~single vertex~$v$ of degree $d>3$ and then apply
-induction.
-
-Assume that $v$~has neighbors $w_1,\ldots,w_d$. We replace~$v$ and the edges~$vw_i$
-by $d$~new vertices $v_1,\ldots,v_d$, joined by a~path $v_1v_2\ldots v_d$, and
-edges~$v_iw_i$. Each edge of the path will receive a~weight smaller than all
-original weights, the other edges will inherit the weights of the edges $vw_i$
-they replace. The edges of the path will therefore lie in the MST (this is
-obvious from the Kruskal's algorithm) and as~$G$ can be obtained from the
-new~$G'$ by contracting the path, the rest follows from the Contraction lemma
-(\ref{contlemma}).
-
-This step can be carried out in time $\O(d)$. As it replaces a high-degree
-vertex by vertices of degree~3, the whole procedure stops in at most~$n$ such
-steps, so it takes time $\O(\sum_{v\in V}\deg(v)) = \O(m)$ including the
-time needed to find the high-degree vertices at the beginning.
+The structure of the recursion tree depends on the random choices taken,
+but as its worst-case depth is at most~$\lceil \log_4 n\rceil$, the tree
+is always a~subtree of the complete binary tree of that depth. We will
+therefore prove the theorem by examining the complete tree, possibly with
+empty subproblems in some vertices.
+
+The left edges of the tree (edges connecting a~parent with its left
+son) form a~set of \df{left paths.} Let us consider the expected time spent on
+a~single left path. When walking the path downwards from its top vertex~$r$,
+the expected size of the subproblems decreases exponentially: for a~son~$\ell$
+of a~vertex~$v$, we have $n_\ell \le n_v/4$ and $\E m_\ell = \E m_v/2$. The
+expected total time spend on the path is therefore $\O(n_r+m_r)$ and it remains
+to sum this over all left paths.
+
+With the exception of the path going from the root of the tree,
+the top~$r$ of a~left path is always a~right son of a~unique parent vertex~$v$.
+Since the subproblem~$G_r$ has been obtained from its parent subproblem~$G_v$
+by filtering out all heavy edges, we can use the Sampling lemma (\ref{samplemma}) to show that
+$\E m_r \le 2n_v$. The sum of the expected sizes of all top subproblems is
+then $\sum_r n_r + m_r \le \sum_v 3n_v = \O(n)$. After adding the exceptional path
+from the root, we get $\O(m+n)=\O(m)$.
  \qed
  
  \qed
  
-\examplen{Euclidean MST}
-The MST also has its counterparts in the realm of geometric algorithms. Suppose
-that we have $n$~points $x_1,\ldots,x_n$ in the plane and we want to find the
-shortest system of segments connecting these points. If we want the segments to
-touch only in the given points, this is equivalent to finding the MST of the
-complete graph on the vertices $V=\{x_1,\ldots,x_n\}$ with edge weights
-defined as the Euclidean distances of the points. Since the graph is dense, many of the MST
-algorithms discussed run in linear time with the size of the graph, hence
-in time $\O(n^2)$.
-
-There is a~more efficient method based on the observation that the MST
-is always a~subgraph of the Delaunay's tesselation for the given points
-(this was first noted by Shamos and Hoey in~\cite{shamos:closest}). The
-tesselation is a~planar graph, which guarantees that it has $\O(n)$ edges,
-and it is a~dual graph of the Voronoi diagram of the given points, which can
-be constructed in time $\O(n\log n)$ using for example the Fortune's
-algorithm \cite{fortune:voronoi}. We can therefore reduce the problem
-to finding the MST of the tesselation for which $\O(n\log n)$ time
-is more than sufficient.
-
-This approach fails for non-Euclidean metrics, but in some cases
-(in particular for the rectilinear metric) the $\O(n\log n)$ time bound
-is also achievable by the algorithm of Zhou et al.~\cite{zhou:nodel}
-based on the sweep-line technique and the Red rule. For other
-variations on the geometric MST, see Eppstein's survey paper
-\cite{eppstein:spanning}.
-
-\examplen{Steiner trees}
-The constraint that the segments in the previous example are allowed to touch
-each other only in the given points looks artificial and it is indeed uncommon in
-practical applications (including the problem of designing electrical transmission
-lines originally studied by Bor\o{u}vka). If we lift this restriction, we get
-the problem known by the name Steiner tree.\foot{It is named after the Swiss mathematician
-Jacob Steiner who studied a~special case of this problem in the 19th century.}
-We can also define it in terms of graphs:
-
-\defn A~\df{Steiner tree} of a~weighted graph~$(G,w)$ with a~set~$M\subseteq V$
-of \df{mandatory notes} is a~tree~$T\subseteq G$ which contains all the mandatory
-vertices and its weight is minimum possible.
-
-For $M=V$ the Steiner tree is identical to the MST, but if we allow an~arbitrary
-choice of the mandatory vertices, it is NP-hard. This has been proven by Garey and Johnson
-\cite{garey:steiner,garey:rectisteiner} for not only the graph version with
-weights $\{1,2\}$, but also for the planar version with Euclidean or rectilinear
-metric. There is a~polynomial approximation algorithm with ratio $5/3$ for
-graphs due to Pr\"omel and Steger \cite{proemel:steiner} and a~polynomial-time
-approximation scheme for the Euclidean Steiner tree in an~arbitrary dimension
-by Arora \cite{arora:tspapx}.
-
-\examplen{Approximating the weight of the MST}
-Sometimes we are not interested in the actual edges forming the MST and only
-the weight matters. If we are willing to put up with a~randomized approximation,
-we can even achieve sub-linear complexity. Chazelle et al.~\cite{chazelle:mstapprox}
-have shown an~algorithm which, given $0 < \varepsilon < 1/2$, approximates
-the weight of the MST of a~graph with average degree~$d$ and edge weights from the set
-$\{1,\ldots,w\}$ in time $\O(dw\varepsilon^{-2}\cdot\log(dw/\varepsilon))$,
-producing a~weight which has relative error at most~$\varepsilon$ with probability
-at least $3/4$. They have also proven an~almost matching lower bound $\Omega(dw\varepsilon^{-2})$.
-
-For the $d$-dimensional Euclidean case, there is a~randomized approximation
-algorithm by Czumaj et al.~\cite{czumaj:euclidean} which with high probability
-produces a~spanning tree within relative error~$\varepsilon$ in $\widetilde\O(\sqrt{n}\cdot \poly(1/\varepsilon))$\foot{%
-$\widetilde\O(f) = \O(f\cdot\log^{\O(1)} f)$ and $\poly(n)=n^{\O(1)}$.}
-queries to a~data structure containing the points. The data structure is expected
-to answer orthogonal range queries and cone approximate nearest neighbor queries.
-There is also a~$\widetilde\O(n\cdot \poly(1/\varepsilon))$ time approximation
-algorithm for the MST weight in arbitrary metric spaces by Czumaj and Sohler \cite{czumaj:metric}.
-(This is still sub-linear since the corresponding graph has roughly $n^2$ edges.)
+\paran{High probability}%
+There is also a~high-probability version of the above theorem. According to
+Karger, Klein and Tarjan \cite{karger:randomized}, the time complexity
+of the algorithm is $\O(m)$ with probability $1-\exp(-\Omega(m))$. The proof
+again follows the recursion tree and it involves applying the Chernoff bound
+\cite{chernoff} to bound the tail probabilities.
+
+\paran{Different sampling}%
+We could also use a~slightly different formulation of the Sampling lemma
+suggested by Chan \cite{chan:backward}. He changes the selection of the subgraph~$H$
+to choosing an~$mp$-edge subset of~$E(G)$ uniformly at random. The proof is then
+a~straightforward application of the backward analysis method. We however preferred
+the Karger's original version, because generating a~random subset of a~given size
+requires an~unbounded number of random bits in the worst case.
+
+\paran{On the Pointer Machine}%
+The only place where we needed the power of the RAM is finding the heavy edges,
+so we can employ the pointer-machine verification algorithm mentioned in \ref{pmverify}
+to bring the results of this section to the~PM.
  
  \endpart
  
  \endpart