appl.tex

   1 \ifx\endpart\undefined
   2 \input macros.tex
   3 \fi
   4
   5 \chapter{Applications}
   6
   7 \section{Special cases and related problems}
   8
   9 Towards the end of our story of the minimum spanning trees, we will now focus our attention
  10 on various special cases of the MST problem and also to several related problems that
  11 frequently arise in practice.
  12
  13 \paran{Graphs with sorted edges}\id{sortededges}%
  14 When the edges of the given graph are already sorted by their weights, we can use the Kruskal's
  15 algorithm to find the MST in time $\O(m\timesalpha(n))$ (Theorem \ref{kruskal}).
  16 We can however do better: As the minimality of a~spanning tree depends only on the
  17 order of weights and not on the actual values (The Minimality Theorem, \ref{mstthm}), we can
  18 renumber the weights to $1, \ldots, m$ and find the MST using the Fredman-Willard
  19 algorithm for integer weights. According to Theorem \ref{intmst} it runs in
  20 time $\O(m)$ on the Word-RAM.
  21
  22 \paran{Graphs with a~small number of distinct weights}
  23 When the weights of edges are drawn from a~set of a~fixed size~$U$, we can
  24 sort them in linear time and so reduce the problem to the previous case.
  25 A~more practical way is to use the Jarn\'\i{}k's algorithm (\ref{jarnimpl}),
  26 but replace the heap by an~array of $U$~buckets. As the number of buckets
  27 is constant, we can find the minimum in constant time and hence the whole
  28 algorithm runs in time $\O(m)$, even on the Pointer Machine. For large
  29 values of~$U,$ we can build a~binary search tree or the van Emde-Boas
  30 tree (see Section \ref{ramdssect} and \cite{boas:vebt}) on the top of the buckets to bring the complexity
  31 of finding the minimum down to $\O(\log U)$ or $\O(\log\log U)$ respectively.
  32
  33 \paran{Graphs with floating-point weights}
  34 A~common case of non-integer weights are rational numbers in floating-point (FP)
  35 representation. Even in this case we will be able to find the MST in linear time.
  36 The most common representation of binary FP numbers specified by the IEEE
  37 standard 754-1985 \cite{ieee:binfp} has a~useful property:  When the
  38 bit strings encoding non-negative FP numbers are read as ordinary integers,
  39 the order of these integers is the same as of the original FP numbers. We can
  40 therefore once again replace the edge weights by integers and use the linear-time
  41 integer algorithm. While the other FP representations (see \cite{dgoldberg:fp} for
  42 an~overview) need not have this property, the corresponding integers can be adjusted
  43 in $\O(1)$ time to the format we need using bit masking. (More advanced tricks of this type have been
  44 employed by Thorup \cite{thorup:floatint} to extend his linear-time algorithm
  45 for single-source shortest paths to FP edge lengths.)
  46
  47 \paran{Graphs with bounded degrees}
  48 For graphs with vertex degrees bounded by a~constant~$\Delta$, the problem is either
  49 trivial (if $\Delta<3$) or as hard as for arbitrary graphs. There is a~simple linear-time
  50 transform of arbitrary graphs to graphs with maximum degree~3 which preserves the MST:
  51
  52 \lemman{Degree reduction}\id{degred}%
  53 For every graph~$G$ there exists a~graph~$G'$ with maximum degree at most~3 and
  54 a~function $\pi: E(G)\rightarrow E(G')$ such that $\mst(G) = \pi^{-1}(\mst(G'))$.
  55 The graph $G'$ and the embedding~$\pi$ can be constructed in time $\O(m)$.
  56
  57 \figure{french.eps}{\epsfxsize}{Degree reduction in Lemma~\ref{degred}}
  58
  59 \proof
  60 We show how to eliminate a~single vertex~$v$ of degree $d>3$ and then apply
  61 induction.
  62
  63 Assume that $v$~has neighbors $w_1,\ldots,w_d$. We replace~$v$ and the edges~$vw_i$
  64 by $d$~new vertices $v_1,\ldots,v_d$, joined by a~path $v_1v_2\ldots v_d$, and
  65 edges~$v_iw_i$. Each edge of the path will receive a~weight smaller than all
  66 original weights, the other edges will inherit the weights of the edges $vw_i$
  67 they replace. The edges of the path will therefore lie in the MST (this is
  68 obvious from the Kruskal's algorithm) and as~$G$ can be obtained from the
  69 new~$G'$ by contracting the path, the rest follows from the Contraction lemma
  70 (\ref{contlemma}).
  71
  72 This step can be carried out in time $\O(d)$. It replaces a high-degree
  73 vertex by vertices of degree~3 and it does not change degrees of any other
  74 vertices. So the whole procedure stops in at most~$n$ such
  75 steps, so it takes time $\O(\sum_{v\in V}\deg(v)) = \O(m)$, including the
  76 time needed to find the high-degree vertices at the beginning.
  77 \qed
  78
  79 \paran{Euclidean MST}
  80 The MST also has its counterparts in the realm of geometric algorithms. Suppose
  81 that we have $n$~points $x_1,\ldots,x_n$ in the plane and we want to find the
  82 shortest system of segments connecting these points. If we want the segments to
  83 touch only in the given points, this is equivalent to finding the MST of the
  84 complete graph on the vertices $V=\{x_1,\ldots,x_n\}$ with edge weights
  85 defined as the Euclidean distances of the points. Since the graph is dense, many of the MST
  86 algorithms discussed run in linear time with the size of the graph, hence
  87 in time $\O(n^2)$.
  88
  89 There is a~more efficient method based on the observation that the MST
  90 is always a~subgraph of the Delaunay's tesselation for the given points
  91 (this was first noted by Shamos and Hoey \cite{shamos:closest}). The
  92 tesselation is a~planar graph, which guarantees that it has $\O(n)$ edges,
  93 and it is a~dual graph of the Voronoi diagram of the given points, which can
  94 be constructed in time $\O(n\log n)$ using for example the Fortune's
  95 algorithm \cite{fortune:voronoi}. We can therefore reduce the problem
  96 to finding the MST of the tesselation for which $\O(n\log n)$ time
  97 is more than sufficient.
  98
  99 This approach fails for non-Euclidean metrics, but in some cases
 100 (in particular for the rectilinear metric) the $\O(n\log n)$ time bound
 101 is also achievable by the algorithm of Zhou et al.~\cite{zhou:nodel}
 102 based on the sweep-line technique and the Red rule. For other
 103 variations on the geometric MST, see Eppstein's survey paper
 104 \cite{eppstein:spanning}.
 105
 106 \paran{Steiner trees}
 107 The constraint that the segments in the previous example are allowed to touch
 108 each other only in the given points looks artificial and it is indeed uncommon in
 109 practical applications (including the problem of designing electrical transmission
 110 lines originally studied by Bor\o{u}vka). If we lift this restriction, we get
 111 the problem known by the name Steiner tree.\foot{It is named after the Swiss mathematician
 112 Jacob Steiner who studied a~special case of this problem in the 19th century.}
 113 We can also define it in terms of graphs:
 114
 115 \defn A~\df{Steiner tree} of a~weighted graph~$(G,w)$ with a~set~$M\subseteq V$
 116 of \df{mandatory vertices} is a~tree~$T\subseteq G$ that contains all the mandatory
 117 vertices and its weight is minimum possible.
 118
 119 When $M=V$, the Steiner tree is identical to the MST, but if we allow an~arbitrary
 120 choice of the mandatory vertices, it is NP-hard. This has been proven by Garey and Johnson
 121 \cite{garey:steiner,garey:rectisteiner} for not only the graph version with
 122 weights $\{1,2\}$, but also for the planar version with Euclidean or rectilinear
 123 metric. There is a~polynomial-time approximation algorithm with ratio $5/3$ for
 124 graphs due to Pr\"omel and Steger \cite{proemel:steiner} and a~polynomial-time
 125 approximation scheme for the Euclidean Steiner tree in an~arbitrary dimension
 126 by Arora \cite{arora:tspapx}.
 127
 128 \paran{Approximating the weight of the MST}
 129 Sometimes we are not interested in the actual edges forming the MST and only
 130 the weight matters. If we are willing to put up with a~randomized approximation,
 131 we can even achieve sub-linear complexity. Chazelle et al.~\cite{chazelle:mstapprox}
 132 have shown an~algorithm which, given $0 < \varepsilon < 1/2$, approximates
 133 the weight of the MST of a~graph with average degree~$d$ and edge weights from the set
 134 $\{1,\ldots,w\}$ in time $\O(dw\varepsilon^{-2}\cdot\log(dw/\varepsilon))$,
 135 producing a~weight which has relative error at most~$\varepsilon$ with probability
 136 at least $3/4$. They have also proven a~close lower bound $\Omega(dw\varepsilon^{-2})$.
 137
 138 For the $d$-dimensional Euclidean case, there is a~randomized approximation
 139 algorithm by Czumaj et al.~\cite{czumaj:euclidean} which with high probability
 140 produces a~spanning tree within relative error~$\varepsilon$ in $\widetilde\O(\sqrt{n}\cdot \poly(1/\varepsilon))$\foot{%
 141 $\widetilde\O(f) = \O(f\cdot\log^{\O(1)} f)$ and $\poly(n)=n^{\O(1)}$.}
 142 queries to a~data structure containing the points. The data structure is expected
 143 to answer orthogonal range queries and cone approximate nearest neighbor queries.
 144 There is also an~$\widetilde\O(n\cdot \poly(1/\varepsilon))$ time approximation
 145 algorithm for the MST weight in arbitrary metric spaces by Czumaj and Sohler \cite{czumaj:metric}.
 146 (This is still sub-linear since the corresponding graph has roughly $n^2$ edges.)
 147
 148 %--------------------------------------------------------------------------------
 149
 150 \section{Practical algorithms}
 151
 152 So far, we were studying the theoretical aspects of the MST algorithms.
 153 How should we find the MST on a~real computer?
 154
 155 Moret and Shapiro \cite{moret:expmst} have conducted an~extensive experimental
 156 study of performance of implementations of various MST algorithms on different
 157 computers. They have tested the algorithms on several sets of graphs, varying
 158 in the number of vertices (up to millions) and in edge density (from constant to
 159 $n^{1/2}$. In almost all tests, the winner was an~ordinary Prim's algorithm
 160 implemented with pairing heaps \cite{fredman:pairingheap}. The pairing heaps
 161 are known to perform surprisingly well in practice, but they still elude attempts
 162 at complete theoretical analysis. So far the best results are those of Pettie
 163 \cite{pettie:pairing}, who has proven that deletion of the minimum takes $\O(\log n)$
 164 time and all other operations take $\O(2^{2\scriptscriptstyle\sqrt{\log\log n}})$; both bounds are amortized.
 165
 166 The Moret's study however completely misses variants of the Bor\o{u}vka's algorithm
 167 and many of those have very promising characteristics, especially for planar graphs
 168 and minor-closed classes.
 169
 170 Also, Katriel et al.~\cite{katriel:cycle} have proposed a~new algorithm based on~the
 171 Red rule. It is a~randomized algorithm which uses a~simplified form of the idea of edge
 172 filtering from the algorithm of Karger, Klein and Tarjan (see Section \ref{randmst}).
 173 The expected time complexity is slightly worse: $\O(n\log n + m)$. However, for dense
 174 graphs it performs very well in practice, even in comparison with the Moret's results.
 175
 176 \paran{Parallel algorithms}
 177 Most of the early parallel algorithms for the MST are variants of the Bor\o{u}vka's algorithm.
 178 The operations on the individual trees are independent of each other, so they can be carried out in parallel.
 179 There are $\O(\log n)$ steps and each of them can be executed in $\O(\log n)$ parallel time using
 180 standard PRAM techniques (see \cite{jaja:parallel} for the description of the model).
 181
 182 Several better algorithms have been constructed, the best of which run in time $\O(\log n)$.
 183 Chong, Han and Lam \cite{chong:parallel} have recently discovered an~algorithm that achieves
 184 this time complexity even on the EREW PRAM --- the weakest of the parallel RAM's which does
 185 not allow concurrent reading nor writing to the same memory cell by multiple processors.
 186 In this model, the $\O(\log n)$ bound is clearly the best possible.
 187
 188 As in the sequential models, the basic question still remains open: Is it
 189 possible to compute the MST in parallel on EREW PRAM, spending only linear
 190 work? This would of course imply existence of a~linear-time sequential
 191 algorithm, so none of the known parallel algorithms achieve that. Algorithms
 192 with linear work can be obtained by utilizing randomness, as shown for example
 193 by Pettie and Ramachandran \cite{pettie:minirand}, but so far only at the
 194 expense of higher time complexity.
 195
 196 \endpart