Abstract: two more chapters.

author Martin Mares <mj@ucw.cz>

Tue, 3 Jun 2008 09:53:53 +0000 (11:53 +0200)

committer Martin Mares <mj@ucw.cz>

Tue, 3 Jun 2008 09:53:53 +0000 (11:53 +0200)
author Martin Mares <mj@ucw.cz>
Tue, 3 Jun 2008 09:53:53 +0000 (11:53 +0200)
committer Martin Mares <mj@ucw.cz>
Tue, 3 Jun 2008 09:53:53 +0000 (11:53 +0200)
diff --git a/abstract.tex b/abstract.tex

index c657c5cba9540c4ae7fe079a6e0ff544fc7529e9..4574e20bc7a410112bd94c5e3ce743fcbd6f1971 100644 (file)
--- a/abstract.tex
+++ b/abstract.tex
@@ -126,7 +126,6 @@ The lightest edge of every cut is contained in the MST.
  \lemman{Red lemma, also known as the Cycle rule}\id{redlemma}%
  An~edge~$e$ is not contained in the MST iff it is the heaviest on some cycle.
  
-\para
  The algorithm repeatedly colors lightest edges of cuts blue and heaviest
  edges of cycles red. We prove that no matter which order of the colorings
  we use, the algorithm always stops and the blue edges form the MST.
@@ -139,7 +138,7 @@ time complexity of standard implementations of these algorithms.
  \algo
  \algin A~graph~$G$ with an edge comparison oracle.
  \:$T\=$ a forest consisting of vertices of~$G$ and no edges.
-\:While $T$ is not connected:
+\:While $T$ is not connected, perform a~\df{Bor\o{u}vka step:}
  \::For each component $T_i$ of~$T$, choose the lightest edge $e_i$ from the cut
     separating $T_i$ from the rest of~$T$.
  \::Add all $e_i$'s to~$T$.
@@ -176,10 +175,10 @@ The Jarn\'\i{}k's algorithm computes the MST of a~given graph in time $\O(m\log
  \thm
  The Kruskal's algorithm finds the MST of a given graph in time $\O(m\log n)$.
  If the edges are already sorted by their weights, the time drops to
-$\O(m\timesalpha(m,n))$.\foot{Here $\alpha(m,n)$ is the inverse Ackermann's
+$\O(m\timesalpha(m,n))$.\foot{Here $\alpha(m,n)$ is a~certain inverse of the Ackermann's
  function.}
  
-\section{Contractive algorithms}
+\section{Contractive algorithms}\id{contalg}%
  
  While the classical algorithms are based on growing suitable trees, they
  can be also reformulated in terms of edge contraction. Instead of keeping
@@ -190,7 +189,7 @@ expense of having to perform the contractions.
  A~contractive version of the Bor\o{u}vka's algorithm is easy to formulate
  and also to analyse:
  
-\algn{Contractive version of Bor\o{u}vka's algorithm}
+\algn{Contractive version of Bor\o{u}vka's algorithm}\id{contbor}%
  \algo
  \algin A~graph~$G$ with an edge comparison oracle.
  \:$T\=\emptyset$.
@@ -219,7 +218,7 @@ time $\O(n)$.
  Graph contractions are indeed a~very powerful tool and they can be used in other MST
  algorithms as well. The following lemma shows the gist:
  
-\lemma
+\lemman{Contraction lemma}\id{contlemma}%
  Let $G$ be a weighted graph, $e$~an arbitrary edge of~$\mst(G)$, $G/e$ the multigraph
  produced by contracting~$e$ in~$G$, and $\pi$ the bijection between edges of~$G-e$ and
  their counterparts in~$G/e$. Then $\mst(G) = \pi^{-1}[\mst(G/e)] + e.$
@@ -256,7 +255,7 @@ but unlike the RAM's they turn out to be equivalent up to constant slowdown.
  Our formal definition is closely related to the \em{linking automaton} proposed
  by Knuth in~\cite{knuth:fundalg}.
  
-\section{Pointer machine techniques}
+\section{Pointer machine techniques}\id{bucketsort}%
  
  In the Contractive Bor\o{u}vka's algorithm, we needed to contract a~given
  set of edges in the current graph and then flatten the graph, all this in time $\O(m)$.
@@ -296,7 +295,7 @@ $T(k)$ for graphs on $k$~vertices. Then we can run~$\cal T$ on a~collection~$\C$
  of labeled graphs on~$k$ vertices in time $\O(\Vert\C\Vert + (k+s)^{k(k+2)}\cdot (T(k)+k^2))$,
  where~$s$ is a~constant depending only on the number of symbols used as vertex/edge labels.
  
-\section{Data structures on the RAM}
+\section{Data structures on the RAM}\id{ramds}%
  
  There is a~lot of data structures designed specifically for the RAM. These structures
  take advantage of both indexing and arithmetics and they often surpass the known
@@ -385,6 +384,494 @@ under insertions and deletions. Each operation takes $\O(1)$ time on a~Word-RAM
  with word size $W=\Omega(r^{\delta})$, after spending time
  $\O(2^{r^\delta})$ on precomputing of tables.
  
+\chapter{Advanced MST Algorithms}
+
+\section{Minor-closed graph classes}\id{minorclosed}%
+
+The contractive algorithm given in Section~\ref{contalg} has been found to perform
+well on planar graphs, but in the general case its time complexity was not linear.
+Can we find any broader class of graphs where this algorithm is still efficient?
+The right context turns out to be the minor-closed graph classes, which are
+closed under contractions and have bounded density.
+
+\defn\id{minordef}%
+A~graph~$H$ is a \df{minor} of a~graph~$G$ (written as $H\minorof G$) iff it can be obtained
+from a~subgraph of~$G$ by a sequence of simple graph contractions.
+
+\defn
+A~class~$\cal C$ of graphs is \df{minor-closed}, when for every $G\in\cal C$ and
+every minor~$H$ of~$G$, the graph~$H$ lies in~$\cal C$ as well. A~class~$\cal C$ is called
+\df{non-trivial} if at least one graph lies in~$\cal C$ and at least one lies outside~$\cal C$.
+
+\example
+Non-trivial minor-closed classes include:
+\itemize\ibull
+\:planar graphs,
+\:graphs embeddable in any fixed surface (i.e., graphs of bounded genus),
+\:graphs embeddable in~${\bb R}^3$ without knots or without interlocking cycles,
+\:graphs of bounded tree-width or path-width.
+\endlist
+
+\para
+Many of the nice structural properties of planar graphs extend to
+minor-closed classes, too (see Lov\'asz \cite{lovasz:minors} for a~nice survey
+of this theory and Diestel \cite{diestel:gt} for some of the deeper results).
+For analysis of the contractive algorithm, we will make use of the bounded
+density of minor-closed classes:
+
+\defn\id{density}%
+Let $G$ be a~graph and $\cal C$ be a class of graphs. We define the \df{edge density}
+$\varrho(G)$ of~$G$ as the average number of edges per vertex, i.e., $m(G)/n(G)$. The
+edge density $\varrho(\cal C)$ of the class is then defined as the infimum of $\varrho(G)$ over all $G\in\cal C$.
+
+\thmn{Density of minor-closed classes, Mader~\cite{mader:dens}}
+Every non-trivial minor-closed class of graphs has finite edge density.
+
+\thmn{MST on minor-closed classes, Mare\v{s} \cite{mm:mst}}\id{mstmcc}%
+For any fixed non-trivial minor-closed class~$\cal C$ of graphs, the Contractive Bor\o{u}vka's
+algorithm (\ref{contbor}) finds the MST of any graph of this class in time
+$\O(n)$. (The constant hidden in the~$\O$ depends on the class.)
+
+\paran{Local contractions}\id{nobatch}%
+The contractive algorithm uses ``batch processing'' to perform many contractions
+in a single step. It is also possible to perform them one edge at a~time,
+batching only the flattenings. A~contraction of an edge~$uv$ can be done in time~$\O(\deg(u))$,
+so we have to make sure that there is a~steady supply of low-degree vertices.
+It indeed is in minor-closed classes:
+
+\lemman{Low-degree vertices}\id{lowdeg}%
+Let $\cal C$ be a graph class with density~$\varrho$ and $G\in\cal C$ a~graph
+with $n$~vertices. Then at least $n/2$ vertices of~$G$ have degree at most~$4\varrho$.
+
+So we get the following algorithm:
+
+\algn{Local Bor\o{u}vka's Algorithm, Mare\v{s} \cite{mm:mst}}%
+\algo
+\algin A~graph~$G$ with an edge comparison oracle and a~parameter~$t\in{\bb N}$.
+\:$T\=\emptyset$.
+\:$\ell(e)\=e$ for all edges~$e$.
+\:While $n(G)>1$:
+\::While there exists a~vertex~$v$ such that $\deg(v)\le t$:
+\:::Select the lightest edge~$e$ incident with~$v$.
+\:::Contract~$e$.
+\:::$T\=T + \ell(e)$.
+\::Flatten $G$, removing parallel edges and loops.
+\algout Minimum spanning tree~$T$.
+\endalgo
+
+\thm
+When $\cal C$ is a minor-closed class of graphs with density~$\varrho$, the
+Local Bor\o{u}vka's Algorithm with the parameter~$t$ set to~$4\varrho$ 
+finds the MST of any graph from this class in time $\O(n)$. (The constant
+in the~$\O$ depends on~the class.)
+
+\section{Iterated algorithms}\id{iteralg}%
+
+We have seen that the Jarn\'\i{}k's Algorithm \ref{jarnik} runs in $\Theta(m\log n)$ time.
+Fredman and Tarjan \cite{ft:fibonacci} have shown a~faster implementation using their Fibonacci
+heaps, which runs in time $\O(m+n\log n)$. This is $\O(m)$ whenever the density of the
+input graph reaches $\Omega(\log n)$. This suggests that we could combine the algorithm with
+another MST algorithm, which identifies a~part of the MST edges and contracts
+them to increase the density of the graph. For example, if we perform several Bor\o{u}vka
+steps and then run the Jarn\'\i{}k's algorithm, we find the MST in time $\O(m\log\log n)$.
+
+Actually, there is a~much better choice of the algorithms to combine: use the
+Jarn\'\i{}k's algorithm with a~Fibonacci heap multiple times, each time stopping it after a~while.
+A~good choice of the stopping condition is to place a~limit on the size of the heap.
+We start with an~arbitrary vertex, grow the tree as usually and once the heap gets too large,
+we conserve the current tree and start with a~different vertex and an~empty heap. When this
+process runs out of vertices, it has identified a~sub-forest of the MST, so we can
+contract the edges of~this forest and iterate.
+
+\algn{Iterated Jarn\'\i{}k; Fredman and Tarjan \cite{ft:fibonacci}}\id{itjar}%
+\algo
+\algin A~graph~$G$ with an edge comparison oracle.
+\:$T\=\emptyset$. \cmt{edges of the MST}
+\:$\ell(e)\=e$ for all edges~$e$. \cmt{edge labels as usually}
+\:$m_0\=m$.
+\:While $n>1$: \cmt{We will call iterations of this loop \df{phases}.}
+\::$F\=\emptyset$. \cmt{forest built in the current phase}
+\::$t\=2^{\lceil 2m_0/n \rceil}$. \cmt{the limit on heap size}
+\::While there is a~vertex $v_0\not\in F$:
+\:::Run the Jarn\'\i{}k's algorithm from~$v_0$, stop when:
+\::::all vertices have been processed, or
+\::::a~vertex of~$F$ has been added to the tree, or
+\::::the heap has grown to more than~$t$ elements.
+\:::Denote the resulting tree~$R$.
+\:::$F\=F\cup R$.
+\::$T\=T\cup \ell[F]$. \cmt{Remember MST edges found in this phase.}
+\::Contract all edges of~$F$ and flatten~$G$.
+\algout Minimum spanning tree~$T$.
+\endalgo
+
+\thm\id{itjarthm}%
+The Iterated Jarn\'\i{}k's algorithm finds the MST of the input graph in time
+$\O(m\timesbeta(m,n))$, where $\beta(m,n):=\min\{ i \mid \log^{(i)}n \le m/n \}$.
+
+\cor
+The Iterated Jarn\'\i{}k's algorithm runs in time $\O(m\log^* n)$.
+
+\paran{Integer weights}%
+The algorithm spends most of the time in phases which have small heaps. Once the
+heap grows to $\Omega(\log^{(k)} n)$ for any fixed~$k$, the graph gets dense enough
+to guarantee that at most~$k$ phases remain. This means that if we are able to
+construct a~heap of size $\Omega(\log^{(k)} n)$ with constant time per operation,
+we can get a~linear-time algorithm for MST. This is the case when the weights are
+integers (we can use the Q-heap trees from Section~\ref{ramds}.
+
+\thmn{MST for integer weights, Fredman and Willard \cite{fw:transdich}}\id{intmst}%
+MST of a~graph with integer edge weights can be found in time $\O(m)$ on the Word-RAM.
+
+\section{Verification of minimality}\id{verifysect}%
+
+Now we will turn our attention to a~slightly different problem: given a~spanning
+tree, how to verify that it is minimum? We will show that this can be achieved
+in linear time and it will serve as a~basis for a~randomized linear-time
+MST algorithm in the next section.
+
+MST verification has been studied by Koml\'os \cite{komlos:verify}, who has
+proven that $\O(m)$ edge comparisons are sufficient, but his algorithm needed
+super-linear time to find the edges to compare. Dixon, Rauch and Tarjan \cite{dixon:verify}
+have later shown that the overhead can be reduced
+to linear time on the RAM using preprocessing and table lookup on small
+subtrees. Later, King has given a~simpler algorithm in \cite{king:verifytwo}.
+
+To verify that a~spanning tree~$T$ is minimum, it is sufficient to check that all
+edges outside~$T$ are $T$-heavy. For each edge $uv\in E\setminus T$, we will
+find the heaviest edge of the tree path $T[u,v]$ (we will call it the \df{peak}
+of the path) and compare its weight to $w(uv)$. We have therefore transformed
+the MST verification to the problem of finding peaks for a~set of \df{query
+paths} on a~given tree. By a~sequence of further transformations, we can even
+assume that the given tree is \df{complete branching} (all vertices are on
+the same level and internal vertices always have outdegree~2) and that the
+query paths join a~vertex with one of its ancestors.
+
+Koml\'os has given a~simple algorithm that traverses the complete branching
+tree recursively. At each moment, it maintains an~array of peaks of the restrictions
+of the query paths to the subtree below the current vertex. If we account for the
+comparisons performed by this algorithm carefully and express the bound in terms
+of the size of the original problem (before all the transformations), we get:
+
+\thmn{Verification of the MST, Koml\'os \cite{komlos:verify}}\id{verify}%
+For every weighted graph~$G$ and its spanning tree~$T$, it is sufficient to
+perform $\O(m)$ comparisons of edge weights to determine whether~$T$ is minimum
+and to find all $T$-light edges in~$G$.
+
+It remains to demonstrate that the overhead of the algorithm needed to find
+the required comparisons and infer the peaks from their results can be decreased,
+so that it gets bounded by the number of comparisons and therefore also by $\O(m)$.
+We will follow the idea of King from \cite{king:verifytwo}, but as we have the power
+of the RAM data structures from Section~\ref{ramds} at our command, the low-level
+details will be easier. Still, the construction is rather technical, so we omit
+it in this abstract and state only the final theorem:
+
+\thmn{Verification of MST on the RAM}\id{ramverify}%
+There is a~RAM algorithm which for every weighted graph~$G$ and its spanning tree~$T$
+determines whether~$T$ is minimum and finds all $T$-light edges in~$G$ in time $\O(m)$.
+
+\rem
+Buchsbaum et al.~\cite{buchsbaum:verify} have recently shown that linear-time
+verification can be achieved even on the Pointer Machine.
+They combine an~algorithm of time complexity $\O(m\timesalpha(m,n))$
+based on the Disjoint Set Union data structure with the framework of topological graph
+computations described in Section \ref{bucketsort}.
+The online version of this problem has turned out to be more difficult: there
+is a~super-linear lower bound for it due to Pettie \cite{pettie:onlineverify}.
+
+\section{A randomized algorithm}\id{randmst}%
+
+When we analysed the Contractive Bor\o{u}vka's algorithm in Section~\ref{contalg},
+we observed that while the number of vertices per iteration decreases exponentially,
+the number of edges generally does not, so we spend $\Theta(m)$ time on every phase.
+Karger, Klein and Tarjan \cite{karger:randomized} have overcome this problem by
+combining the Bor\o{u}vka's algorithm with filtering based on random sampling.
+This leads to a~randomized algorithm which runs in linear expected time.
+
+The principle of the filtering is simple: Let us consider any spanning tree~$T$
+of the input graph~$G$. Each edge of~$G$ that is $T$-heavy is the heaviest edge
+of some cycle, so by the Red lemma it cannot participate in
+the MST of~$G$. We can therefore discard all $T$-heavy edges and continue with
+finding the MST on the reduced graph. Of course, not all choices of~$T$ are equally
+good, but it will soon turn out that when we take~$T$ as the MST of a~randomly selected
+subgraph, only a~small expected number of edges remains:
+
+\lemman{Random sampling, Karger \cite{karger:sampling}}\id{samplemma}%
+Let $H$~be a~subgraph of~$G$ obtained by including each edge independently
+with probability~$p$. Let further $F$~be the minimum spanning forest of~$H$. Then the
+expected number of $F$-nonheavy\foot{That is, $F$-light edges and also edges of~$F$ itself.}
+edges in~$G$ is at most $n/p$.
+
+\para
+We will formulate the algorithm as a~doubly-recursive procedure. It alternatively
+performs steps of the Bor\o{u}vka's algorithm and filtering based on the above lemma.
+The first recursive call computes the MSF of the sampled subgraph, the second one
+finds the MSF of the original graph, but without the heavy edges.
+
+\algn{MSF by random sampling --- the KKT algorithm}\id{kkt}%
+\algo
+\algin A~graph $G$ with an~edge comparison oracle.
+\:Remove isolated vertices from~$G$. If no vertices remain, stop and return an~empty forest.
+\:Perform two Bor\o{u}vka steps (iterations of Algorithm \ref{contbor}) on~$G$ and
+  remember the set~$B$ of the edges having been contracted.
+\:Select a~subgraph~$H\subseteq G$ by including each edge independently with
+  probability $1/2$.
+\:$F\=\msf(H)$ calculated recursively.
+\:Construct $G'\subseteq G$ by removing all $F$-heavy edges of~$G$.
+\:$R\=\msf(G')$ calculated recursively.
+\:Return $R\cup B$.
+\algout The minimum spanning forest of~$G$.
+\endalgo
+
+A~careful analysis of this algorithm, based on properties of its recursion tree
+and the peak-finding algorithm of the previous section yields the following time bounds:
+
+\thm
+The KKT algorithm runs in time $\O(\min(n^2,m\log n))$ in the worst case on the RAM.
+
+\thm
+The expected time complexity of the KKT algorithm on the RAM is $\O(m)$.
+
+\chapter{Approaching Optimality}\id{optchap}%
+
+\section{Soft heaps}\id{shsect}%
+
+A~vast majority of MST algorithms that we have encountered so far is based on
+the Tarjan's Blue rule (Lemma \ref{bluelemma}). The rule serves to identify
+edges that belong to the MST, while all other edges are left in the process. This
+unfortunately means that the later stages of computation spend most of
+their time on these edges that never enter the MSF. A~notable exception is the randomized
+algorithm of Karger, Klein and Tarjan. It adds an~important ingredient: it uses
+the Red rule (Lemma \ref{redlemma}) to filter out edges that are guaranteed to stay
+outside the MST, so that the graphs with which the algorithm works get smaller
+with time.
+
+Recently, Chazelle \cite{chazelle:ackermann} and Pettie \cite{pettie:ackermann}
+have presented new deterministic algorithms for the MST which are also based
+on the combination of both rules. They have reached worst-case time complexity
+$\O(m\timesalpha(m,n))$ on the Pointer Machine. We will devote this chapter to their results
+and especially to another algorithm by Pettie and Ramachandran \cite{pettie:optimal}
+which is provably optimal.
+
+At the very heart of all these algorithms lies the \df{soft heap} discovered by
+Chazelle \cite{chazelle:softheap}. It is a~meldable priority queue, roughly
+similar to the Vuillemin's binomial heaps \cite{vuillemin:binheap} or Fredman's
+and Tarjan's Fibonacci heaps \cite{ft:fibonacci}. The soft heaps run faster at
+the expense of \df{corrupting} a~fraction of the inserted elements by raising
+their values (the values are however never lowered). This allows for
+a~trade-off between accuracy and speed, controlled by a~parameter~$\varepsilon$.
+The heap operations take $\O(\log(1/\varepsilon))$ amortized time and at every
+moment at most~$\varepsilon n$ elements of the $n$~elements inserted can be
+corrupted.
+
+\defnn{Soft heap interface}%
+The \df{soft heap} contains a~set of distinct items from a~totally ordered universe and it
+supports the following operations:
+\itemize\ibull
+\:$\<Create>(\varepsilon)$ --- Create an~empty soft heap with the given accuracy parameter~$\varepsilon$.
+\:$\<Insert>(H,x)$ --- Insert a~new item~$x$ into the heap~$H$.
+\:$\<Meld>(P,Q)$ --- Merge two heaps into one, more precisely move all items of a~heap~$Q$
+  to the heap~$P$, destroying~$Q$ in the process. Both heaps must have the same~$\varepsilon$.
+\:$\<DeleteMin>(H)$ --- Delete the minimum item of the heap~$H$ and return its value
+  (optionally signalling that the value has been corrupted).
+\:$\<Explode>(H)$ --- Destroy the heap and return a~list of all items contained in it
+  (again optionally marking those corrupted).
+\endlist
+
+\>We describe the exact mechanics of the soft heaps and analyse its complexity.
+The important properties are summarized in the following theorem:
+
+\thmn{Performance of soft heaps, Chazelle \cite{chazelle:softheap}}\id{softheap}%
+A~soft heap with error rate~$\varepsilon$ ($0<\varepsilon\le 1/2$) processes
+a~sequence of operations starting with an~empty heap and containing $n$~\<Insert>s
+in time $\O(n\log(1/\varepsilon))$ on the Pointer Machine. At every moment, the
+heap contains at most $\varepsilon n$ corrupted items.
+
+\section{Robust contractions}
+
+Having the soft heaps at hand, we would like to use them in a~conventional MST
+algorithm in place of a~normal heap. The most efficient specimen of a~heap-based
+algorithm we have seen so far is the Jarn\'\i{}k's algorithm.
+We can try implanting the soft heap in this algorithm, preferably in the earlier
+version without Fibonacci heaps as the soft heap lacks the \<Decrease> operation.
+This brave, but somewhat simple-minded attempt is however doomed to
+fail. The reason is of course the corruption of items inside the heap, which
+leads to increase of weights of some subset of edges. In presence of corrupted
+edges, most of the theory we have so carefully built breaks down. There is
+fortunately some light in this darkness. While the basic structural properties
+of MST's no longer hold, there is a~weaker form of the Contraction lemma that
+takes the corrupted edges into account. Before we prove this lemma, we expand
+our awareness of subgraphs which can be contracted.
+
+\defn
+A~subgraph $C\subseteq G$ is \df{contractible} iff for every pair of edges $e,f\in\delta(C)$\foot{That is,
+of~$G$'s edges with exactly one endpoint in~$C$.} there exists a~path in~$C$ connecting the endpoints
+of the edges $e,f$ such that all edges on this path are lighter than either $e$ or~$f$.
+
+For example, when we stop the Jarn\'\i{}k's algorithm at some moment and
+we take a~subgraph~$C$ induced by the constructed tree, this subgraph is contractible.
+We can now easily reformulate the Contraction lemma (\ref{contlemma}) in the language
+of contractible subgraphs:
+
+\lemman{Generalized contraction}
+When~$C\subseteq G$ is a~contractible subgraph, then $\msf(G)=\msf(C) \cup \msf(G/C)$.
+
+We can now bring corruption back to the game and state a~``robust'' version
+of this lemma. A~notation for corrupted graphs will be handy:
+
+\nota\id{corrnota}%
+When~$G$ is a~weighted graph and~$R$ a~subset of its edges, we will use $G\crpt
+R$ to denote an arbitrary graph obtained from~$G$ by increasing the weights of
+some of the edges in~$R$.
+Whenever~$C$ is a~subgraph of~$G$, we will use $R^C$ to refer to the edges of~$R$ with
+exactly one endpoint in~$C$ (i.e., $R^C = R\cap \delta(C)$).
+
+\lemman{Robust contraction, Chazelle \cite{chazelle:almostacker}}\id{robcont}%
+Let $G$ be a~weighted graph and $C$~its subgraph contractible in~$G\crpt R$
+for some set~$R$ of edges. Then $\msf(G) \subseteq \msf(C) \cup \msf((G/C) \setminus R^C) \cup R^C$.
+
+\para
+We will mimic the Iterated Jarn\'\i{}k's algorithm. We will partition the given graph to a~collection~$\C$
+of non-overlapping contractible subgraphs called \df{clusters} and we put aside all edges that got corrupted in the process.
+We recursively compute the MSF of those subgraphs and of the contracted graph. Then we take the
+union of these MSF's and add the corrupted edges. According to the previous lemma, this does not produce
+the MSF of~$G$, but a~sparser graph containing it, on which we can continue.
+
+\thmn{Partitioning to contractible clusters, Chazelle \cite{chazelle:almostacker}}\id{partthm}%
+Given a~weighted graph~$G$ and parameters $\varepsilon$ ($0<\varepsilon\le 1/2$)
+and~$t$, we can construct a~collection $\C=\{C_1,\ldots,C_k\}$ of clusters and a~set~$R^\C$ of edges such that:
+
+\numlist\ndotted
+\:All the clusters and the set~$R^\C$ are mutually edge-disjoint.
+\:Each cluster contains at most~$t$ vertices.
+\:Each vertex of~$G$ is contained in at least one cluster.
+\:The connected components of the union of all clusters have at least~$t$ vertices each,
+  except perhaps for those which are equal to a~connected component of $G\setminus R^\C$.
+\:$\vert R^\C\vert \le 2\varepsilon m$.
+\:$\msf(G) \subseteq \bigcup_i \msf(C_i) \cup \msf\bigl((G / \bigcup_i C_i) \setminus R^\C\bigr) \cup R^\C$.
+\:The construction takes $\O(n+m\log (1/\varepsilon))$ time.
+\endlist
+
+\section{Decision trees}\id{dtsect}%
+
+The Pettie's and Ramachandran's algorithm combines the idea of robust partitioning with optimal decision
+trees constructed by brute force for very small subgraphs.
+Formally, the decision trees are defined as follows:
+
+\defnn{Decision trees and their complexity}\id{decdef}%
+A~\df{MSF decision tree} for a~graph~$G$ is a~binary tree. Its internal vertices
+are labeled with pairs of $G$'s edges to be compared, each of the two outgoing tree edges
+corresponds to one possible result of the comparison.
+Leaves of the tree are labeled with spanning trees of the graph~$G$.
+
+A~\df{computation} of the decision tree on a~specific permutation of edge weights
+in~$G$ is the path from the root to a~leaf such that the outcome of every comparison
+agrees with the edge weights. The result of the computation is the spanning tree
+assigned to its final leaf.
+A~decision tree is \df{correct} iff for every permutation the corresponding
+computation results in the real MSF of~$G$ with the particular weights.
+
+The \df{time complexity} of a~decision tree is defined as its depth. It therefore
+bounds the number of comparisons spent on every path. (It need not be equal since
+some paths need not correspond to an~actual computation --- the sequence of outcomes
+on the path could be unsatisfiable.)
+
+A~decision tree is called \df{optimal} if it is correct and its depth is minimum possible
+among the correct decision trees for the given graph.
+We will denote an~arbitrary optimal decision tree for~$G$ by~${\cal D}(G)$ and its
+complexity by~$D(G)$.
+
+The \df{decision tree complexity} $D(m,n)$ of the MSF problem is the maximum of~$D(G)$
+over all graphs~$G$ with $n$~vertices and~$m$ edges.
+
+\obs
+Decision trees are the most general deterministic comparison-based computation model possible.
+The only operations that count in its time complexity are comparisons. All
+other computation is free, including solving NP-complete problems or having
+access to an~unlimited source of non-uniform constants. The decision tree
+complexity is therefore an~obvious lower bound on the time complexity of the
+problem in all other comparison-based models.
+
+The downside is that we do not know any explicit construction of the optimal
+decision trees, or at least a~non-constructive proof of their complexity.
+On the other hand, the complexity of any existing comparison-based algorithm
+can be used as an~upper bound on the decision tree complexity. Also, we can
+construct an~optimal decision tree using brute force:
+
+\lemma
+An~optimal MST decision tree for a~graph~$G$ on~$n$ vertices can be constructed on
+the Pointer Machine in time $\O(2^{2^{4n^2}})$.
+
+\section{An optimal algorithm}\id{optalgsect}%
+
+Once we have developed the soft heaps, partitioning and MST decision trees,
+it is now simple to state the Pettie's and Ramachandran's MST algorithm
+and prove that it is asymptotically optimal among all MST algorithms in
+comparison-based models. Several standard MST algorithms from the previous
+chapters will also play their roles.
+
+We will describe the algorithm as a~recursive procedure. When the procedure is
+called on a~graph~$G$, it sets the parameter~$t$ to roughly $\log^{(3)} n$ and
+it calls the \<Partition> procedure to split the graph into a~collection of
+clusters of size~$t$ and a~set of corrupted edges. Then it uses precomputed decision
+trees to find the MSF of the clusters. The graph obtained by contracting
+the clusters is on the other hand dense enough, so that the Iterated Jarn\'\i{}k's
+algorithm runs on it in linear time. Afterwards we combine the MSF's of the clusters
+and of the contracted graphs, we mix in the corrupted edges and run two iterations
+of the Contractive Bor\o{u}vka's algorithm. This guarantees reduction in the number of
+both vertices and edges by a~constant factor, so we can efficiently recurse on the
+resulting graph.
+
+\algn{Optimal MST algorithm, Pettie and Ramachandran \cite{pettie:optimal}}\id{optimal}%
+\algo
+\algin A~connected graph~$G$ with an~edge comparison oracle.
+\:If $G$ has no edges, return an~empty tree.
+\:$t\=\lfloor\log^{(3)} n\rfloor$. \cmt{the size of clusters}
+\:Call \<Partition> (\ref{partition}) on $G$ and $t$ with $\varepsilon=1/8$. It returns
+  a~collection~$\C=\{C_1,\ldots,C_k\}$ of clusters and a~set~$R^\C$ of corrupted edges.
+\:$F_i \= \mst(C_i)$ for all~$i$, obtained using optimal decision trees.
+\:$G_A \= (G / \bigcup_i C_i) \setminus R^\C$. \cmt{the contracted graph}
+\:$F_A \= \msf(G_A)$ calculated by the Iterated Jarn\'\i{}k's algorithm (\ref{itjar}).
+\:$G_B \= \bigcup_i F_i \cup F_A \cup R^\C$. \cmt{combine subtrees with corrupted edges}
+\:Run two Bor\o{u}vka steps (iterations of the Contractive Bor\o{u}vka's algorithm, \ref{contbor}) on~$G_B$,
+  getting a~contracted graph~$G_C$ and a~set~$F_B$ of MST edges.
+\:$F_C \= \mst(G_C)$ obtained by a~recursive call to this algorithm.
+\:Return $F_B \cup F_C$.
+\algout The minimum spanning tree of~$G$.
+\endalgo
+
+Correctness of this algorithm immediately follows from the Partitioning theorem (\ref{partthm})
+and from the proofs of the respective algorithms used as subroutines. As for time complexity:
+
+\lemma\id{optlemma}%
+The time complexity $T(m,n)$ of the Optimal algorithm satisfies the following recurrence:
+$$
+T(m,n) \le \sum_i c_1 D(C_i) + T(m/2, n/4) + c_2 m,
+$$
+where~$c_1$ and~$c_2$ are some positive constants and $D$~is the decision tree complexity
+from the previous section.
+
+It turns out that the recurrence is satisfied by the decision tree complexity function
+$D(m,n)$ itself, so we can prove the following theorem by induction:
+
+\thm
+The time complexity of the Optimal algorithm is $\Theta(D(m,n))$.
+
+\paran{Complexity of MST}%
+As we have already noted, the exact decision tree complexity $D(m,n)$ of the MST problem
+is still open and so therefore is the time complexity of the optimal algorithm. However,
+every time we come up with another comparison-based algorithm, we can use its complexity
+(or more specifically the number of comparisons it performs, which can be even lower)
+as an~upper bound on the optimal algorithm.
+The best explicit comparison-based algorithm known to date has been discovered by Chazelle
+\cite{chazelle:ackermann} and independently by Pettie \cite{pettie:ackermann}. It achieves complexity $\O(m\timesalpha(m,n))$.
+Using any of these results, we can prove an~Ackermannian upper bound on the
+optimal algorithm:
+
+\thm
+The time complexity of the Optimal algorithm is $\O(m\alpha(m,n))$.
+
+%--------------------------------------------------------------------------------
+
  \chapter{Ranking Combinatorial Structures}\id{rankchap}%
  
  \chapter{Bibliography}
author	Martin Mares <mj@ucw.cz>
	Tue, 3 Jun 2008 09:53:53 +0000 (11:53 +0200)
committer	Martin Mares <mj@ucw.cz>
	Tue, 3 Jun 2008 09:53:53 +0000 (11:53 +0200)