+\proof
+Let us count the comparisons performed by the Contractive Bor\o{u}vka's algorithm
+(\ref{contbor}), tightening up the constants in its previous analysis in Theorem
+\ref{contborthm}. In the first iteration, each edge participates in two comparisons
+(one per endpoint), so the algorithm performs at most $2m \le 2{n\choose 2} \le n^2$
+comparisons. Then the number of vertices drops at least by a~factor of two, so
+the subsequent iterations spend at most $(n/2)^2, (n/4)^2, \ldots$ comparisons, which sums
+to less than $n^2\cdot\sum_{i=0}^\infty (1/4)^i = 4/3 \cdot n^2$. Between the Bor\o{u}vka steps,
+we flatten the multigraph to a~simple graph, which also needs some comparisons,
+but for every such comparison we remove one of the participating edges, which saves
+at least one comparison in the subsequent steps.
+\qed
+
+\para
+Of course we can get sharper bounds from the better algorithms, but we will first
+show how to find the optimal trees using brute force. The complexity of the search
+will be of course enormous, but as we already promised, we will need the optimal
+trees only for very small subgraphs.
+
+\lemman{Construction of optimal decision trees}\id{odtconst}%
+An~optimal MST decision tree for a~graph~$G$ on~$n$ vertices can be constructed on
+the Pointer Machine in time $\O(2^{2^{4n^2}})$.
+
+\proof
+We will try all possible decision trees of depth at most $2n^2$
+(we know from the previous lemma that the desired optimal tree is shallower). We can obtain
+any such tree by taking the complete binary tree of exactly this depth
+and labeling its $2\cdot 2^{2n^2}-1$ vertices with comparisons and spanning trees. Those labeled
+with comparisons become internal vertices of the decision tree, the others
+become leaves and the parts of the tree below them are removed. There are less
+than $n^4$ possible comparisons and less than $2^{n^2}$ spanning trees of~$G$,
+so the number of candidate decision trees is bounded by
+$(n^4+2^{n^2})^{2^{2n^2+1}} \le 2^{(n^2+1)\cdot 2^{2n^2+1}} \le 2^{2^{2n^2+2}} \le 2^{2^{3n^2}}$.
+
+We will enumerate the trees in an~arbitrary order, test each of them for correctness and
+find the shallowest tree among those correct. Testing can be accomplished by running
+through all possible permutations of edges, each time calculating the MSF using any
+of the known algorithms and comparing it with the result given by the decision tree.
+The number of permutations does not exceed $(n^2)! \le (n^2)^{n^2} \le n^{2n^2} \le 2^{n^3}$
+and each permutation can be checked in time $\O(\poly(n))$.
+
+On the Pointer Machine, trees and permutations can be certainly enumerated in time
+$\O(\poly(n))$ per object. The time complexity of the whole algorithm is therefore
+$\O(2^{2^{3n^2}} \cdot 2^{n^3} \cdot \poly(n)) = \O(2^{2^{4n^2}})$.
+\qed
+
+\paran{Basic properties of decision trees}%
+The following properties will be useful for analysis of algorithms based
+on precomputed decision trees. We will omit some technical details, referring
+the reader to section 5.1 of the Pettie's article \cite{pettie:optimal}.
+
+\lemma\id{dtbasic}%
+The decision tree complexity $D(m,n)$ of the MSF satisfies:
+\numlist\ndotted
+\:$D(m,n) \ge m/2$ for $m,n > 2$.
+\:$D(m',n') \ge D(m,n)$ whenever $m'\ge m$ and $n'\ge n$.
+\endlist
+
+\proof
+For every $m,n>2$ there is a~graph on $n$~vertices and $m$~edges such that
+every edge lies on a~cycle. Every correct MSF decision tree for this graph
+has to compare each edge at least once. Otherwise the decision tree cannot
+distinguish between the case when an~edge has the lowest of all weights (and
+thus it is forced to belong to the MSF) and when it has the highest weight (so
+it is forced out of the MSF).
+
+Decision trees for graphs on $n'$~vertices can be used for graphs with $n$~vertices
+as well --- it suffices to add isolated vertices, which does not change the MSF.
+Similarly, we can increase $m$ to~$m'$ by adding edges parallel to an~existing
+edge and making them heavier than the rest of the graph, so that they can never
+belong to the MSF.
+\qed
+
+\defn
+Subgraphs $C_1,\ldots,C_k$ of a~graph~$G$ are called the \df{compartments} of~$G$
+iff they are edge-disjoint, their union is the whole graph~$G$ and
+$\msf(G) = \bigcup_i \msf(C_i)$ for every permutation of edge weights.
+
+\lemma\id{partiscomp}%
+The clusters $C_1,\ldots,C_k$ generated by the Partition procedure of the
+previous section (Algorithm \ref{partition}) are compartments of the graph
+$H=\bigcup_i C_i$.
+
+\proof
+The first and second condition of the definition of compartments follow
+from the Partitioning theorem (\ref{partthm}), so it remains to show that $\msf(H)$
+is the union of the MSF's of the individual compartments. By the Cycle rule
+(Lemma \ref{redlemma}), an~edge $h\in H$ is not contained in $\msf(H)$ if and only if
+it is the heaviest edge on some cycle. It is therefore sufficient to prove that
+every cycle in~$H$ is contained within a~single~$C_i$.
+
+Let us consider a~cycle $K\subseteq H$ and a~cluster~$C_i$ such that it contains
+an~edge~$e$ of~$K$ and all clusters constructed later by the procedure do not contain
+any. If $K$~is not fully contained in~$C_i$, we can extend the edge~$e$ to a~maximal
+path contained in both~$K$ and~$C_i$. Since $C_i$ shares at most one vertex with the
+earlier clusters, there can be at most one edge from~$K$ adjacent to the maximal path,
+which is impossible.
+\qed
+
+\lemma
+Let $C_1,\ldots,C_k$ be compartments of a~graph~$G$. Then there exists an~optimal
+MSF decision tree for~$G$ that does not compare edges of distinct compartments.
+
+\proofsketch
+Consider a~subset~$\cal P$ of edge weight permutations~$w$ that satisfy $w(e) < w(f)$
+whenever $e\in C_i, f\in C_j, i<j$. For such permutations, no decision tree can
+gain any information on relations between edge weights in a~single compartment by
+inter-compartment comparisons --- the results of all such comparisons are determined
+in advance.
+
+Let us take an~arbitrary correct decision tree for~$G$ and restrict it to
+vertices reachable by computations on~$\cal P$. Whenever a~vertex contained
+an~inter-compartment comparison, it has lost one of its sons, so we can remove it
+by contracting its only outgoing edge. We observe that we get a~decision tree
+satisfying the desired condition and that this tree is correct.
+
+As for the correctness, the MSF of a~single~$C_i$ is uniquely determined by
+comparisons of its weights and the set~$\cal P$ contains all combinations
+of orderings of weights inside individual compartments. Therefore every
+spanning tree of every~$C_i$ and thus also of~$H$ is properly recognized.
+\qed
+
+\lemma\id{compartsum}%
+Let $C_1,\ldots,C_k$ be compartments of a~graph~$G$. Then $D(G) = \sum_i D(C_i)$.
+
+\proofsketch
+A~collection of decision trees for the individual compartments can be ``glued together''
+to a~decision tree for~$G$. We take the decision tree for~$C_1$, replace every its leaf
+by a~copy of the tree for~$C_2$ and so on. Every leaf~$\ell$ of the compound tree will be
+labeled with the union of labels of the original leaves encountered on the path from
+the root to~$\ell$. This proves that $D(G) \le \sum_i D(C_i)$.
+
+The other inequality requires more effort. We use the previous lemma to transform
+the optimal decision tree for~$G$ to another of the same depth, but without inter-compartment
+comparisons. Then we prove by induction on~$k$ and then on the depth of the tree
+that this tree can be re-arranged, so that every computation first compares edges
+from~$C_1$, then from~$C_2$ and so on. This means that the tree can be decomposed
+to decision trees for the $C_i$'s. Also, without loss of efficiency all trees for
+a~single~$C_i$ can be made isomorphic to~${\cal D}(C_i)$.
+\qed
+
+\cor\id{dtpart}%
+If $C_1,\ldots,C_k$ are the clusters generated by the Partition procedure (Algorithm \ref{partition}),
+then $D(\bigcup_i C_i) = \sum_i D(C_i)$.
+
+\proof
+Lemma \ref{partiscomp} tells us that $C_1,\ldots,C_k$ are compartments of the graph
+$\bigcup C_i$, so we can apply Lemma \ref{compartsum} on them.
+\qed
+
+\cor\id{dttwice}%
+$2D(m,n) \le D(2m,2n)$ for every $m,n$.
+
+\proof
+For an~arbitrary graph~$G$ with $m$~edges and $n$~vertices, we create a~graph~$G_2$
+consisting of two copies of~$G$ sharing a~single vertex. The copies of~$G$ are obviously
+compartments of~$G_2$, so by Lemma \ref{compartsum} it holds that $D(G_2) = 2D(G)$.
+Taking a~maximum over all choices of~$G$ yields $D(2m,2n) \ge \max_G D(G_2) = 2D(m,n)$.
+\qed
+
+%--------------------------------------------------------------------------------
+
+\section{An optimal algorithm}\id{optalgsect}%
+
+Once we have developed the soft heaps, partitioning and MST decision trees,
+it is now simple to state the Pettie's and Ramachandran's MST algorithm
+and prove that it is asymptotically optimal among all MST algorithms in
+comparison-based models. Several standard MST algorithms from the previous
+chapters will also play their roles.
+
+We will describe the algorithm as a~recursive procedure. When the procedure is
+called on a~graph~$G$, it sets the parameter~$t$ to roughly $\log^{(3)} n$ and
+it calls the \<Partition> procedure to split the graph into a~collection of
+clusters of size~$t$ and a~set of corrupted edges. Then it uses precomputed decision
+trees to find the MSF of the clusters. The graph obtained by contracting
+the clusters is on the other hand dense enough, so that the Iterated Jarn\'\i{}k's
+algorithm runs on it in linear time. Afterwards we combine the MSF's of the clusters
+and of the contracted graphs, we mix in the corrupted edges and run two iterations
+of the Contractive Bor\o{u}vka's algorithm. This guarantees reduction in the number of
+both vertices and edges by a~constant factor, so we can efficiently recurse on the
+resulting graph.
+
+\algn{Optimal MST algorithm, Pettie and Ramachandran \cite{pettie:optimal}}\id{optimal}%
+\algo
+\algin A~connected graph~$G$ with an~edge comparison oracle.
+\:If $G$ has no edges, return an~empty tree.
+\:$t\=\lfloor\log^{(3)} n\rfloor$. \cmt{the size of clusters}
+\:Call \<Partition> (\ref{partition}) on $G$ and $t$ with $\varepsilon=1/8$. It returns
+ a~collection~$\C=\{C_1,\ldots,C_k\}$ of clusters and a~set~$R^\C$ of corrupted edges.
+\:$F_i \= \mst(C_i)$ for all~$i$, obtained using optimal decision trees.
+\:$G_A \= (G / \bigcup_i C_i) \setminus R^\C$. \cmt{the contracted graph}
+\:$F_A \= \msf(G_A)$ calculated by the Iterated Jarn\'\i{}k's algorithm (\ref{itjar}).
+\:$G_B \= \bigcup_i F_i \cup F_A \cup R^\C$. \cmt{combine subtrees with corrupted edges}
+\:Run two Bor\o{u}vka steps (iterations of the Contractive Bor\o{u}vka's algorithm, \ref{contbor}) on~$G_B$,
+ getting a~contracted graph~$G_C$ and a~set~$F_B$ of MST edges.
+\:$F_C \= \mst(G_C)$ obtained by a~recursive call to this algorithm.
+\:Return $F_B \cup F_C$.
+\algout The minimum spanning tree of~$G$.
+\endalgo
+
+Correctness of this algorithm immediately follows from the Partitioning theorem (\ref{partthm})
+and from the proofs of the respective algorithms used as subroutines. Let us take a~look at
+the time complexity. We will be careful to use only the operations offered by the Pointer Machine.
+
+\lemma\id{optlemma}%
+The time complexity $T(m,n)$ of the Optimal algorithm satisfies the following recurrence:
+$$
+T(m,n) \le \sum_i c_1 D(C_i) + T(m/2, n/4) + c_2 m,
+$$
+where~$c_1$ and~$c_2$ are some positive constants and $D$~is the decision tree complexity
+from the previous section.
+
+\proof
+The first two steps of the algorithm are trivial as we have linear time at our
+disposal.
+
+By the Partitioning theorem (\ref{partthm}), the call to \<Partition> with~$\varepsilon$
+set to a~constant takes $\O(m)$ time and it produces a~collection of clusters of size
+at most~$t$ and at most $m/4$ corrupted edges. It also guarantees that the
+connected components of the union of the $C_i$'s have at least~$t$ vertices
+(unless there is just a~single component).
+
+To apply the decision trees, we will use the framework of topological computations developed
+in Section \ref{bucketsort}. We pad all clusters in~$\C$ with isolated vertices, so that they
+have exactly~$t$ vertices. We use a~computation that labels the graph with a~pointer to
+its optimal decision tree. Then we apply Theorem \ref{topothm} combined with the
+brute-force construction of optimal decision trees from Lemma \ref{odtconst}. Together they guarantee
+that we can assign the decision trees to the clusters in time:
+$$\O\Bigl(\Vert\C\Vert + t^{t(t+2)} \cdot \bigl(2^{2^{4t^2}} + t^2\bigr)\Bigr)
+= \O\Bigl(m + 2^{2^{2^t}}\Bigr)
+= \O(m).$$
+Execution of the decision tree on each cluster~$C_i$ then takes $\O(D(C_i))$ steps.
+
+The contracted graph~$G_A$ has at most $n/t = \O(n / \log^{(3)}n)$ vertices and asymptotically
+the same number of edges as~$G$, so according to Corollary \ref{ijdens}, the Iterated Jarn\'\i{}k's
+algorithm runs on it in linear time.
+
+The combined graph~$G_B$ has~$n$ vertices, but less than~$n$ edges from the
+individual spanning trees and at most~$m/4$ additional edges which were
+corrupted. The Bor\o{u}vka steps on~$G_B$ take $\O(m)$
+time by Lemma \ref{boruvkaiter} and they produce a~graph~$G_C$ with at most~$n/4$
+vertices and at most $n/4 + m/4 \le m/2$ edges. (The $n$~tree edges in~$G_B$ are guaranteed
+to be reduced by the Bor\o{u}vka's algorithm.) It is easy to verify that this
+graph is still connected, so we can recurse on it.
+
+The remaining steps of the algorithm can be easily performed in linear time either directly
+or in case of the contractions by the bucket-sorting techniques of Section \ref{bucketsort}.
+\qed
+
+\paran{Optimality}%
+The properties of decision tree complexity, which we have proven in the previous
+section, will help us show that the time complexity recurrence is satisfied by a~constant
+multiple of the decision tree complexity $D(m,n)$ itself. This way, we will prove
+the following theorem:
+
+\thmn{Optimality of the Optimal algorithm}
+The time complexity of the Optimal MST algorithm \ref{optimal} is $\Theta(D(m,n))$.
+
+\proof
+We will prove by induction that $T(m,n) \le cD(m,n)$ for some $c>0$. The base
+case is trivial, for the induction step we will expand on the previous lemma:
+\def\eqalign#1{\null\,\vcenter{\openup\jot
+ \ialign{\strut\hfil$\displaystyle{##}$&$\displaystyle{{}##}$\hfil
+ \crcr#1\crcr}}\,}
+$$\vcenter{\openup\jot\halign{\strut\hfil $\displaystyle{#}$&$\displaystyle{{}#}$\hfil&\quad#\hfil\cr
+T(m,n)
+ &\le \sum_i c_1 D(C_i) + T(m/2, n/4) + c_2 m &(Lemma \ref{optlemma})\cr
+ &\le c_1 D({\textstyle\bigcup}_i C_i) + T(m/2, n/4) + c_2 m &(Corollary \ref{dtpart})\cr
+ &\le c_1 D(m,n) + T(m/2, n/4) + c_2m &(definition of $D(m,n)$)\cr
+ &\le c_1 D(m,n) + cD(m/2, n/4) + c_2m &(induction hypothesis)\cr
+ &\le c_1 D(m,n) + c/2\cdot D(m,n/2) + c_2m &(Corollary \ref{dttwice})\cr
+ &\le c_1 D(m,n) + c/2\cdot D(m,n) + 2c_2 D(m,n) &(Lemma \ref{dtbasic})\cr
+ &\le (c_1 + c/2 + 2c_2) \cdot D(m,n)&\cr
+ &\le cD(m,n). &(by setting $c=2c_1+4c_2$)\cr
+}}$$
+The other inequality is obvious as $D(m,n)$ is an~asymptotic lower bound on
+the time complexity of every comparison-based algorithm.
+\qed
+
+\paran{Complexity of MST}%
+As we have already noted, the exact decision tree complexity $D(m,n)$ of the MST problem
+is still open and so therefore is the time complexity of the optimal algorithm. However,
+every time we come up with another comparison-based algorithm, we can use its complexity
+(or more specifically the number of comparisons it performs, which can be even lower)
+as an~upper bound on the optimal algorithm.
+
+The best explicit comparison-based algorithm known to date achieves complexity $\O(m\timesalpha(m,n))$.\foot{$\alpha(m,n)$
+is a~certain inverse of the Ackermann's function, $\lambda_k(n)$ is the row inverse of the same
+function. See \ref{ackerinv} for the exact definitions.}
+It has been discovered by Chazelle \cite{chazelle:ackermann} as an~improvement of his previous
+$\O(m\timesalpha(m,n)\cdot\log\alpha(m,n))$ algorithm \cite{chazelle:almostacker}.
+It is also based on the ideas of this chapter --- in particular the soft heaps and robust contractions.
+The algorithm is very complex and it involves a~lot of elaborate
+technical details, so we only refer to the original paper here. Another algorithm of the same
+complexity, using similar ideas, has been discovered independently by Pettie \cite{pettie:ackermann}, who,
+having the optimal algorithm at hand, does not take care about the low-level details and he only
+bounds the number of comparisons. Using any of these results, we can prove an~Ackermannian
+upper bound on the optimal algorithm:
+
+\thmn{Upper bound on complexity of the Optimal algorithm}\id{optthm}%
+The time complexity of the Optimal MST algorithm is $\O(m\alpha(m,n))$.
+
+\proof
+We bound $D(m,n)$ by the number of comparisons performed by the algorithm of Chazelle \cite{chazelle:ackermann}
+or Pettie \cite{pettie:ackermann}.
+\qed
+
+Similarly to the Iterated Jarn\'\i{}k's algorithm, this bound is actually linear for classes of graphs
+that do not have density extremely close to constant:
+
+\cor\id{lambdacor}%
+The Optimal MST algorithm runs in linear time whenever $m\ge n\cdot \lambda_k(n)$ for any fixed~$k$.
+
+\proof
+Combine the previous theorem with Lemma \ref{alphaconst}.
+\qed
+
+\rem
+It is also known from \cite{pettie:optimal} that when we run the Optimal algorithm on a~random
+graph drawn from either $G_{n,p}$ (random graphs on~$n$ vertices, each edge is included with probability~$p$
+independently on the other edges) or $G_{n,m}$ (we draw the graph uniformly at random from the
+set of all graphs with~$n$ vertices and $m$~edges), it runs in linear time with high probability,
+regardless of the edge weights.
+
+\paran{Models of computation}%
+Another important consequence of the optimal algorithm is that when we aim for a~linear-time
+MST algorithm (or for proving that it does not exist), we do not need to care about computational
+models at all. The elaborate RAM data structures of Chapter \ref{ramchap}, which have helped us
+so much in the case of integer weights, cannot help if we are allowed to access the edge weights
+by performing comparisons only. We can even make use of non-uniform objects given by some sort
+of oracle. Indeed, whatever trick we employ to achieve linear time complexity, we can mimic it in the
+world of decision trees and thus we can use it to show that the algorithm we already knew is
+also linear.
+
+This however applies to deterministic algorithms only --- we have shown that access to a~source
+of random bits allows us to compute the MST in expected linear time (the KKT algorithm, \ref{kkt}).
+There were attempts to derandomize the KKT algorithm, but so far the best result in this direction
+is the randomized algorithm also by Pettie \cite{pettie:minirand} which achieves expected linear time
+complexity with only $\O(\log^* n)$ random bits.