Abstract: added Conclusions and reviewed chapters 1--5.

author Martin Mares <mj@ucw.cz>

Tue, 3 Jun 2008 21:34:31 +0000 (23:34 +0200)

committer Martin Mares <mj@ucw.cz>

Tue, 3 Jun 2008 21:34:31 +0000 (23:34 +0200)
author Martin Mares <mj@ucw.cz>
Tue, 3 Jun 2008 21:34:31 +0000 (23:34 +0200)
committer Martin Mares <mj@ucw.cz>
Tue, 3 Jun 2008 21:34:31 +0000 (23:34 +0200)
diff --git a/abstract.tex b/abstract.tex

index 06184e1655c2223d693264d643c4b87bfecb1901..ee556c8ec5060dcf886d9b98f501fd95c0ddb78a 100644 (file)
--- a/abstract.tex
+++ b/abstract.tex
@@ -1,6 +1,10 @@
  \input macros.tex
-\finalfalse  %%FIXME
+\finaltrue
+\hwobble=0mm
+\advance\hsize by 1cm
+\advance\vsize by 20pt
  
+\font\chapfont=csb17
  \def\rawchapter#1{\vensure{0.5in}\bigbreak\bigbreak
  \leftline{\chapfont #1}
  }
@@ -8,12 +12,12 @@
  \def\rawsection#1{\bigskip
  \leftline{\secfont #1}
  \nobreak
-\medskip
+\smallskip
  \nobreak
  }
  
  \chapter{Introduction}
-\bigskip
+\medskip
  
  This thesis tells the story of two well-established problems of algorithmic
  graph theory: the minimum spanning trees and ranks of permutations. At distance,
@@ -173,10 +177,10 @@ The Jarn\'\i{}k's algorithm computes the MST of a~given graph in time $\O(m\log
  \endalgo
  
  \thm
-The Kruskal's algorithm finds the MST of a given graph in time $\O(m\log n)$.
+The Kruskal's algorithm finds the MST of the graph given as input in time $\O(m\log n)$.
  If the edges are already sorted by their weights, the time drops to
-$\O(m\timesalpha(m,n))$.\foot{Here $\alpha(m,n)$ is a~certain inverse of the Ackermann's
-function.}
+$\O(m\timesalpha(m,n))$, where $\alpha(m,n)$ is a~certain inverse of the Ackermann's
+function.
  
  \section{Contractive algorithms}\id{contalg}%
  
@@ -196,7 +200,7 @@ and also to analyse:
  \:$\ell(e)\=e$ for all edges~$e$. \cmt{Initialize edge labels.}
  \:While $n(G)>1$:
  \::For each vertex $v_k$ of~$G$, let $e_k$ be the lightest edge incident to~$v_k$.
-\::$T\=T\cup \{ \ell(e_1),\ldots,\ell(e_n) \}$.\hfil\break\cmt{Remember labels of all selected edges.}
+\::$T\=T\cup \{ \ell(e_1),\ldots,\ell(e_n) \}$.\cmt{Remember labels of all selected edges.}
  \::Contract all edges $e_k$, inheriting labels and weights.\foot{In other words, we will ask the comparison oracle for the edge $\ell(e)$ instead of~$e$.}
  \::Flatten $G$ (remove parallel edges and loops).
  \algout Minimum spanning tree~$T$.
@@ -234,13 +238,8 @@ possibly up to polynomial slowdown which is negligible. In our case, the
  differences between good and not-so-good algorithms are on a~much smaller
  scale, so we need to state our computation models carefully and develop
  a repertoire of basic data structures tailor-made for the fine details of the
-models.
-
-In recent decades, most researchers in the area of combinatorial algorithms
-have been considering two computational models: the Random Access Machine and the Pointer
-Machine. The former is closer to the programmer's view of a~real computer,
-the latter is slightly more restricted and ``asymptotically safe.''
-We will follow this practice and study our algorithms in both models.
+models. In recent decades, most researchers in the area of combinatorial algorithms
+have been considering the following two computational models, and we will do likewise.
  
  The \df{Random Access Machine (RAM)} is not a~single coherent model, but rather a~family
  of closely related machines (See Cook and Reckhow \cite{cook:ram} for one of the usual formal definitions
@@ -260,7 +259,7 @@ by Knuth in~\cite{knuth:fundalg}.
  In the Contractive Bor\o{u}vka's algorithm, we needed to contract a~given
  set of edges in the current graph and then flatten the graph, all this in time $\O(m)$.
  This can be easily handled on both the RAM and the PM by bucket sorting. We develop
-a~bunch of pointer-based sorted techniques which can be summarized by the following
+a~bunch of pointer-based sorting techniques which can be summarized by the following
  lemma:
  
  \lemma
@@ -304,7 +303,7 @@ per operation, at least when either the magnitude of the values or the size of
  the data structure is suitably bounded.
  
  A~classical result of this type is the tree of van Emde Boas~\cite{boas:vebt}
-which represent a~subset of the integers $\{0,\ldots,U-1\}$. It allows insertion,
+which represents a~subset of the integers $\{0,\ldots,U-1\}$. It allows insertion,
  deletion and order operations (minimum, maximum, successor etc.) in time $\O(\log\log U)$,
  regardless of the size of the subset. If we replace the heap used in the Jarn\'\i{}k's
  algorithm (\ref{jarnik}) by this structure, we immediately get an~algorithm
@@ -317,11 +316,7 @@ operation on a~set of $n$~integers, but with time complexity $\O(\log_W n)$
  per operation on a~Word-RAM with $W$-bit words. This of course assumes that
  each element of the set fits in a~single word. As $W$ must at least~$\log n$,
  the operations take $\O(\log n/\log\log n)$ time and thus we are able to sort $n$~integers
-in time~$o(n\log n)$. This was a~beginning of a~long sequence of faster and
-faster sorting algorithms, culminating with the work of Thorup and Han.
-They have improved the time complexity of integer sorting to $\O(n\log\log n)$ deterministically~\cite{han:detsort}
-and expected $\O(n\sqrt{\log\log n})$ for randomized algorithms~\cite{hanthor:randsort},
-both in linear space.
+in time~$o(n\log n)$. This was further improved by Han and Thorup \cite{han:detsort,hanthor:randsort}.
  
  The Fusion trees themselves have very limited use in graph algorithms, but the
  principles behind them are ubiquitous in many other data structures and these
@@ -340,7 +335,6 @@ algorithms have been then significantly simplified by Hagerup
  Despite the progress in the recent years, the corner-stone of all RAM structures
  is still the representation of combinatorial objects by integers introduced by
  Fredman and Willard.
-
  First of all, we observe that we can encode vectors in integers:
  
  \notan{Bit strings}\id{bitnota}%
@@ -405,12 +399,10 @@ every minor~$H$ of~$G$, the graph~$H$ lies in~$\cal C$ as well. A~class~$\cal C$
  
  \example
  Non-trivial minor-closed classes include:
-\itemize\ibull
-\:planar graphs,
-\:graphs embeddable in any fixed surface (i.e., graphs of bounded genus),
-\:graphs embeddable in~${\bb R}^3$ without knots or without interlocking cycles,
-\:graphs of bounded tree-width or path-width.
-\endlist
+planar graphs,
+graphs embeddable in any fixed surface (i.e., graphs of bounded genus),
+graphs embeddable in~${\bb R}^3$ without knots or without interlocking cycles,
+and graphs of bounded tree-width or path-width.
  
  \para
  Many of the nice structural properties of planar graphs extend to
@@ -471,9 +463,9 @@ We have seen that the Jarn\'\i{}k's Algorithm \ref{jarnik} runs in $\Theta(m\log
  Fredman and Tarjan \cite{ft:fibonacci} have shown a~faster implementation using their Fibonacci
  heaps, which runs in time $\O(m+n\log n)$. This is $\O(m)$ whenever the density of the
  input graph reaches $\Omega(\log n)$. This suggests that we could combine the algorithm with
-another MST algorithm, which identifies a~part of the MST edges and contracts
+another MST algorithm, which identifies a~subset of the MST edges and contracts
  them to increase the density of the graph. For example, if we perform several Bor\o{u}vka
-steps and then run the Jarn\'\i{}k's algorithm, we find the MST in time $\O(m\log\log n)$.
+steps and then we run the Jarn\'\i{}k's algorithm, we find the MST in time $\O(m\log\log n)$.
  
  Actually, there is a~much better choice of the algorithms to combine: use the
  Jarn\'\i{}k's algorithm with a~Fibonacci heap multiple times, each time stopping it after a~while.
@@ -517,7 +509,7 @@ heap grows to $\Omega(\log^{(k)} n)$ for any fixed~$k$, the graph gets dense eno
  to guarantee that at most~$k$ phases remain. This means that if we are able to
  construct a~heap of size $\Omega(\log^{(k)} n)$ with constant time per operation,
  we can get a~linear-time algorithm for MST. This is the case when the weights are
-integers (we can use the Q-heap trees from Section~\ref{ramds}.
+integers (we can use the Q-heap trees from Section~\ref{ramds}).
  
  \thmn{MST for integer weights, Fredman and Willard \cite{fw:transdich}}\id{intmst}%
  MST of a~graph with integer edge weights can be found in time $\O(m)$ on the Word-RAM.
@@ -558,26 +550,17 @@ perform $\O(m)$ comparisons of edge weights to determine whether~$T$ is minimum
  and to find all $T$-light edges in~$G$.
  
  It remains to demonstrate that the overhead of the algorithm needed to find
-the required comparisons and infer the peaks from their results can be decreased,
+the required comparisons and to infer the peaks from their results can be decreased,
  so that it gets bounded by the number of comparisons and therefore also by $\O(m)$.
  We will follow the idea of King from \cite{king:verifytwo}, but as we have the power
  of the RAM data structures from Section~\ref{ramds} at our command, the low-level
  details will be easier. Still, the construction is rather technical, so we omit
-it in this abstract and state only the final theorem:
+it from this abstract and state only the final theorem:
  
  \thmn{Verification of MST on the RAM}\id{ramverify}%
  There is a~RAM algorithm which for every weighted graph~$G$ and its spanning tree~$T$
  determines whether~$T$ is minimum and finds all $T$-light edges in~$G$ in time $\O(m)$.
  
-\rem
-Buchsbaum et al.~\cite{buchsbaum:verify} have recently shown that linear-time
-verification can be achieved even on the Pointer Machine.
-They combine an~algorithm of time complexity $\O(m\timesalpha(m,n))$
-based on the Disjoint Set Union data structure with the framework of topological graph
-computations described in Section \ref{bucketsort}.
-The online version of this problem has turned out to be more difficult: there
-is a~super-linear lower bound for it due to Pettie \cite{pettie:onlineverify}.
-
  \section{A randomized algorithm}\id{randmst}%
  
  When we analysed the Contractive Bor\o{u}vka's algorithm in Section~\ref{contalg},
@@ -623,7 +606,7 @@ finds the MSF of the original graph, but without the heavy edges.
  \endalgo
  
  A~careful analysis of this algorithm, based on properties of its recursion tree
-and the peak-finding algorithm of the previous section yields the following time bounds:
+and on the peak-finding algorithm of the previous section, yields the following time bounds:
  
  \thm
  The KKT algorithm runs in time $\O(\min(n^2,m\log n))$ in the worst case on the RAM.
@@ -636,21 +619,14 @@ The expected time complexity of the KKT algorithm on the RAM is $\O(m)$.
  \section{Soft heaps}\id{shsect}%
  
  A~vast majority of MST algorithms that we have encountered so far is based on
-the Tarjan's Blue rule (Lemma \ref{bluelemma}). The rule serves to identify
-edges that belong to the MST, while all other edges are left in the process. This
-unfortunately means that the later stages of computation spend most of
-their time on these edges that never enter the MSF. A~notable exception is the randomized
-algorithm of Karger, Klein and Tarjan. It adds an~important ingredient: it uses
-the Red rule (Lemma \ref{redlemma}) to filter out edges that are guaranteed to stay
-outside the MST, so that the graphs with which the algorithm works get smaller
-with time.
-
-Recently, Chazelle \cite{chazelle:ackermann} and Pettie \cite{pettie:ackermann}
-have presented new deterministic algorithms for the MST which are also based
-on the combination of both rules. They have reached worst-case time complexity
-$\O(m\timesalpha(m,n))$ on the Pointer Machine. We will devote this chapter to their results
-and especially to another algorithm by Pettie and Ramachandran \cite{pettie:optimal}
-which is provably optimal.
+the Tarjan's Blue rule (Lemma \ref{bluelemma}), the only exception being the
+randomized KKT algorithm, which also used the Red rule (Lemma \ref{redlemma}). Recently, Chazelle
+\cite{chazelle:ackermann} and Pettie \cite{pettie:ackermann} have presented new
+deterministic algorithms for the MST which are also based on the combination of
+both rules. They have reached worst-case time complexity
+$\O(m\timesalpha(m,n))$ on the Pointer Machine. We will devote this chapter to
+their results and especially to another algorithm by Pettie and Ramachandran
+\cite{pettie:optimal} which is provably optimal.
  
  At the very heart of all these algorithms lies the \df{soft heap} discovered by
  Chazelle \cite{chazelle:softheap}. It is a~meldable priority queue, roughly
@@ -659,9 +635,6 @@ and Tarjan's Fibonacci heaps \cite{ft:fibonacci}. The soft heaps run faster at
  the expense of \df{corrupting} a~fraction of the inserted elements by raising
  their values (the values are however never lowered). This allows for
  a~trade-off between accuracy and speed, controlled by a~parameter~$\varepsilon$.
-The heap operations take $\O(\log(1/\varepsilon))$ amortized time and at every
-moment at most~$\varepsilon n$ elements of the $n$~elements inserted can be
-corrupted.
  
  \defnn{Soft heap interface}%
  The \df{soft heap} contains a~set of distinct items from a~totally ordered universe and it
@@ -677,8 +650,8 @@ supports the following operations:
    (again optionally marking those corrupted).
  \endlist
  
-\>We describe the exact mechanics of the soft heaps and analyse its complexity.
-The important properties are summarized in the following theorem:
+\>In the thesis, we describe the exact mechanics of the soft heaps and analyse its complexity.
+The important properties are characterized by the following theorem:
  
  \thmn{Performance of soft heaps, Chazelle \cite{chazelle:softheap}}\id{softheap}%
  A~soft heap with error rate~$\varepsilon$ ($0<\varepsilon\le 1/2$) processes
@@ -691,16 +664,13 @@ heap contains at most $\varepsilon n$ corrupted items.
  Having the soft heaps at hand, we would like to use them in a~conventional MST
  algorithm in place of a~normal heap. The most efficient specimen of a~heap-based
  algorithm we have seen so far is the Jarn\'\i{}k's algorithm.
-We can try implanting the soft heap in this algorithm, preferably in the earlier
+We can try implanting the soft heap in it, preferably in the earlier
  version without Fibonacci heaps as the soft heap lacks the \<Decrease> operation.
  This brave, but somewhat simple-minded attempt is however doomed to
-fail. The reason is of course the corruption of items inside the heap, which
-leads to increase of weights of some subset of edges. In presence of corrupted
-edges, most of the theory we have so carefully built breaks down. There is
-fortunately some light in this darkness. While the basic structural properties
-of MST's no longer hold, there is a~weaker form of the Contraction lemma that
-takes the corrupted edges into account. Before we prove this lemma, we expand
-our awareness of subgraphs which can be contracted.
+fail because of corruption of items inside the soft heap.
+While the basic structural properties of MST's no longer hold in corrupted graphs,
+there is a~weaker form of the Contraction lemma that takes the corrupted edges into account.
+Before we prove this lemma, we expand our awareness of subgraphs which can be contracted.
  
  \defn
  A~subgraph $C\subseteq G$ is \df{contractible} iff for every pair of edges $e,f\in\delta(C)$\foot{That is,
@@ -730,7 +700,7 @@ Let $G$ be a~weighted graph and $C$~its subgraph contractible in~$G\crpt R$
  for some set~$R$ of edges. Then $\msf(G) \subseteq \msf(C) \cup \msf((G/C) \setminus R^C) \cup R^C$.
  
  \para
-We will mimic the Iterated Jarn\'\i{}k's algorithm. We will partition the given graph to a~collection~$\C$
+We will now mimic the Iterated Jarn\'\i{}k's algorithm. We will partition the given graph to a~collection~$\C$
  of non-overlapping contractible subgraphs called \df{clusters} and we put aside all edges that got corrupted in the process.
  We recursively compute the MSF of those subgraphs and of the contracted graph. Then we take the
  union of these MSF's and add the corrupted edges. According to the previous lemma, this does not produce
@@ -808,18 +778,7 @@ it is now simple to state the Pettie's and Ramachandran's MST algorithm
  and prove that it is asymptotically optimal among all MST algorithms in
  comparison-based models. Several standard MST algorithms from the previous
  chapters will also play their roles.
-
-We will describe the algorithm as a~recursive procedure. When the procedure is
-called on a~graph~$G$, it sets the parameter~$t$ to roughly $\log^{(3)} n$ and
-it calls the \<Partition> procedure to split the graph into a~collection of
-clusters of size~$t$ and a~set of corrupted edges. Then it uses precomputed decision
-trees to find the MSF of the clusters. The graph obtained by contracting
-the clusters is on the other hand dense enough, so that the Iterated Jarn\'\i{}k's
-algorithm runs on it in linear time. Afterwards we combine the MSF's of the clusters
-and of the contracted graphs, we mix in the corrupted edges and run two iterations
-of the Contractive Bor\o{u}vka's algorithm. This guarantees reduction in the number of
-both vertices and edges by a~constant factor, so we can efficiently recurse on the
-resulting graph.
+We will describe the algorithm as a~recursive procedure:
  
  \algn{Optimal MST algorithm, Pettie and Ramachandran \cite{pettie:optimal}}\id{optimal}%
  \algo
@@ -839,7 +798,7 @@ resulting graph.
  \algout The minimum spanning tree of~$G$.
  \endalgo
  
-Correctness of this algorithm immediately follows from the Partitioning theorem (\ref{partthm})
+\>Correctness of this algorithm immediately follows from the Partitioning theorem (\ref{partthm})
  and from the proofs of the respective algorithms used as subroutines. As for time complexity:
  
  \lemma\id{optlemma}%
@@ -1447,6 +1406,51 @@ permanents in advance. When we plug it in the general algorithm, we get:
  For every~$n$, the derangements on the set~$[n]$ can be ranked and unranked according to the
  lexicographic order in time~$\O(n)$ after spending $\O(n^2)$ on initialization of auxiliary tables.
  
+\chapter{Conclusions}
+
+We have seen the many facets of the minimum spanning tree problem. It has
+turned out that while the major question of the existence of a~linear-time
+MST algorithm is still open, backing off a~little bit in an~almost arbitrary
+direction leads to a~linear solution. This includes classes of graphs with edge
+density at least $\lambda_k(n)$ for an~arbitrary fixed~$k$,
+minor-closed classes, and graphs whose edge weights are
+integers. Using randomness also helps, as does having the edges pre-sorted.
+
+If we do not know anything about the structure of the graph and we are only allowed
+to compare the edge weights, we can use the Pettie's MST algorithm.
+Its time complexity is guaranteed to be asymptotically optimal,
+but we do not know what it really is --- the best what we have is
+an~$\O(m\timesalpha(m,n))$ upper bound and the trivial $\Omega(m)$ lower bound.
+
+One thing we however know for sure. The algorithm runs on the weakest of our
+computational models ---the Pointer Machine--- and its complexity is linear
+in the minimum number of comparisons needed to decide the problem. We therefore
+need not worry about the details of computational models, which have contributed
+so much to the linear-time algorithms for our special cases. Instead, it is sufficient
+to study the complexity of MST decision trees. However, not much is known about these trees so far.
+
+As for the dynamic algorithms, we have an~algorithm which maintains the minimum
+spanning forest within poly-logarithmic time per operation.
+The optimum complexity is once again undecided --- the known lower bounds are very far
+from the upper ones.
+The known algorithms run on the Pointer machine and we do not know if using a~stronger
+model can help.
+
+For the ranking problems, the situation is completely different. We have shown
+linear-time algorithms for three important problems of this kind. The techniques,
+which we have used, seem to be applicable to other ranking problems. On the other
+hand, ranking of general restricted permutations has turned out to balance on the
+verge of $\#{\rm P}$-completeness. All our algorithms run
+on the RAM model, which seems to be the only sensible choice for problems of
+inherently arithmetic nature. While the unit-cost assumption on arithmetic operations
+is not universally accepted, our results imply that the complexity of our algorithm
+is dominated by the necessary arithmetics.
+
+Aside from the concrete problems we have solved, we have also built several algorithmic
+techniques of general interest: the unification procedures using pointer-based
+bucket sorting and the vector computations on the RAM. We hope that they will
+be useful in many other algorithms.
+
  \chapter{Bibliography}
  
  \dumpbib
author	Martin Mares <mj@ucw.cz>
	Tue, 3 Jun 2008 21:34:31 +0000 (23:34 +0200)
committer	Martin Mares <mj@ucw.cz>
	Tue, 3 Jun 2008 21:34:31 +0000 (23:34 +0200)