Fixed to KKT.

author Martin Mares <mj@ucw.cz>

Mon, 24 Mar 2008 16:34:25 +0000 (17:34 +0100)

committer Martin Mares <mj@ucw.cz>

Mon, 24 Mar 2008 16:34:25 +0000 (17:34 +0100)
author Martin Mares <mj@ucw.cz>
Mon, 24 Mar 2008 16:34:25 +0000 (17:34 +0100)
committer Martin Mares <mj@ucw.cz>
Mon, 24 Mar 2008 16:34:25 +0000 (17:34 +0100)
diff --git a/adv.tex b/adv.tex

index 9928648957d421ace7715c4374e11d1bab888905..2e99c5bd1588a70ec3122c25cd6f5bcb07b7b007 100644 (file)
--- a/adv.tex
+++ b/adv.tex
@@ -1054,32 +1054,27 @@ is still open even for the RAM.
  
  When we analysed the contractive Bor\o{u}vka's algorithm in Section~\ref{contalg},
  we observed that while the number of vertices per iteration decreases exponentially,
-the number of edges can remain asymptotically the same. Karger, Klein and Tarjan
-\cite{karger:randomized} have overcome this problem by combining the Bor\o{u}vka's
-algorithm with filtering based on random sampling. This leads to a~randomized
-algorithm which runs in expected linear time.
-
-As we will need the algorithm for verification of minimality from the previous
-section, we will again assume that we are working on the RAM.
+the number of edges generally does not, so we spend $\Theta(m)$ time on every phase.
+Karger, Klein and Tarjan \cite{karger:randomized} have overcome this problem by
+combining the Bor\o{u}vka's algorithm with filtering based on random sampling.
+This leads to a~randomized algorithm which runs in linear expected time.
  
  The principle of the filtering is simple: Let us consider any spanning tree~$T$
  of the input graph~$G$. Each edge of~$G$ that is $T$-heavy is the heaviest edge
  of some cycle, so by the Red lemma (\ref{redlemma}) it cannot participate in
  the MST of~$G$. We can therefore discard all $T$-heavy edges and continue with
-finding the MST on the reduced graph. Not all choices of~$T$ are equally good,
-but the following lemma shows that when we take~$T$ as a~MST of a~randomly selected
-subgraph, only a~small expected number of edges remain.
+finding the MST on the reduced graph. Of course, not all choices of~$T$ are equally
+good, but it will soon turn out that when we take~$T$ as the MST of a~randomly selected
+subgraph, only a~small expected number of edges remains.
  
-\rem
-The random sampling will sometimes select a~disconnected subgraph,
-so we will not assume connectedness in this section. As we already noted
-(Remark \ref{disconn}), with a~little bit of care our algorithms and theorems
-keep working. We also extend the notion of light and heavy edges with respect
-to a~tree to forests: When an~edge~$e$ connects two vertices lying in the same
-tree~$T$ of a~forest~$F$, it is $F$-heavy iff it is $T$-heavy (similarly
-for $F$-light). Edges connecting two different trees are always considered
-$F$-light. Again, a~spanning forest~$F$ is minimum iff there are no $F$-light
-edges.
+Selecting a~subgraph at random will unavoidably produce disconnected subgraphs
+at occassion, so we will drop the implicit assumption that all graphs are
+connected for this section and we will always search for the minimum spanning forest.
+As we already noted (Remark \ref{disconn}), with a~little bit of care our
+algorithms and theorems keep working.
+
+Since we need the MST verification algorithm for finding the $T$-heavy edges,
+we will assume that we are working on the RAM.
  
  \lemman{Random sampling, Karger \cite{karger:sampling}}
  Let $H$~be a~subgraph of~$G$ obtained by including each edge independently
@@ -1087,28 +1082,28 @@ with probability~$p$ and $F$~the minimum spanning forest of~$H$. Then the
  expected number of $F$-nonheavy edges in~$G$ is at most $n/p$.
  
  \proof
-Let us observe that we can obtain the forest~$F$ by combining the Kruskal's algorithm
-(\ref{kruskal}) with the random process producing~$H$ from~$G$. We sort all edges of~$G$
-by their weights and start with an~empty forest~$F$. For each edge, we first
-flip a~biased coin (it gives heads with probability~$p$) and if it comes up
+Let us observe that we can obtain the forest~$F$ by running the Kruskal's algorithm
+(\ref{kruskal}) combined with the random process producing~$H$ from~$G$. We sort all edges of~$G$
+by their weights and we start with an~empty forest~$F$. For each edge, we first
+flip a~biased coin (which gives heads with probability~$p$) and if it comes up
  tails, we discard the edge. Otherwise we perform a~single step of the Kruskal's
-algoritm: We check if~$F+e$ contains a~cycle. If it does, we discard~$e$, otherwise
+algoritm: We check whether $F+e$ contains a~cycle. If it does, we discard~$e$, otherwise
  we add~$e$ to~$F$. At the end, we have produced the subgraph~$H$ and its MSF~$F$.
  
-We can modify the algorithm by swapping the check for cycles with flipping
-the coin:
+When we  exchange the check for cycles with flipping the coin, we get an~equivalent
+algorithm which will turn out to be more convenient to analyse:
  \algo
  \:If $F+e$ contains a~cycle, we immediately discard~$e$ (we can flip
  the coin, but we need not to, because the edge will be discarded regardless of
  the outcome). We note that~$e$ is $F$-heavy with respect to both the
-current~$F$ and the final MSF.
+current state of~$F$ and the final MSF.
  \:If $F+e$ is acyclic, we flip the coin:
  \::If it comes up heads, we add~$e$ to~$F$. In this case, $e$~is neither $F$-light
     nor $F$-heavy.
  \::If it comes up tails, we discard~$e$. Such edges are $F$-light.
  \endalgo
  
-The number of $F$-nonheavy edges is therefore equal to the number of the coin
+The number of $F$-nonheavy edges is therefore equal to the total number of the coin
  flips in step~2 of this algorithm. We also know that the algorithm stops before
  it adds $n$~edges to~$F$. Therefore it flips at most as many coins as a~simple
  random process which repeatedly flips until it gets~$n$ heads. As waiting for
@@ -1135,14 +1130,14 @@ we do not mention it in the algorithm.
  \:Select subgraph~$H\subseteq G$ by including each edge independently with
    probability $1/2$.
  \:$F\=\msf(H)$ calculated recursively.
-\:Construct $G'\subseteq G$ by removing all $F$-heavy edges.
+\:Construct $G'\subseteq G$ by removing all $F$-heavy edges of~$G$.
  \:$R\=\msf(G')$ calculated recursively.
  \:Return $R\cup B$.
  \algout The minimum spanning forest of~$G$.
  \endalgo
  
  \nota
-Let us analyse the time complexity this algorithm by studying properties of its \df{recursion tree.}
+Let us analyse the time complexity of this algorithm by studying properties of its \df{recursion tree.}
  The tree describes the subproblems processed by the recursive calls. For any vertex~$t$
  of the tree, we denote the number of vertices and edges of the corresponding subproblem~$G_t$
  by~$n_t$ and~$m_t$ respectively.
@@ -1158,7 +1153,7 @@ The Bor\o{u}vka steps together with the removal of isolated vertices guarantee t
  of vertices drops at least by a~factor of four in every recursive call. The size of a~subproblem~$G_t$
  at level~$i$ is therefore at most $n/4^i$ and the depth of the tree is at most $\lceil\log_4 n\rceil$.
  As there are no more than~$2^i$ subproblems at level~$i$, the sum of all~$n_t$'s
-on that level is at most $n/2^i$, which is at most~$2n$ summed over the whole tree.
+on that level is at most $n/2^i$, which is at most~$2n$ for the whole tree.
  
  We are going to show that the worst case of the KKT algorithm is not worse than
  of the plain contractive algorithm, while the average case is linear.
@@ -1182,7 +1177,7 @@ The KKT algorithm runs in time $\O(\min(n^2,m\log n))$ in the worst case on the
  The argument for the $\O(n^2)$ bound is similar to the analysis of the plain
  contractive algorithm. As every subproblem~$G_t$ is a~simple graph, the number
  of its edges~$m_t$ is less than~$n_t^2$. By the previous lemma, we spend time
-$\O(n_t^2)$ there. Summing over all subproblems yields $\sum_t \O(n_t^2) =
+$\O(n_t^2)$ on it. Summing over all subproblems yields $\sum_t \O(n_t^2) =
  \O((\sum_t n_t)^2) = \O(n^2)$.
  
  In order to prove the $\O(m\log n)$ bound, it is sufficient to show that the total time
@@ -1200,7 +1195,7 @@ we have $m_\ell + m_r + b_t \le m_t + n_t/4$.
  On the other hand, the first Bor\o{u}vka step selects at least $n_t/2$ edges,
  so $b_t \ge n_t/2$. The duplication of edges between $G_\ell$ and~$G_r$ is therefore
  compensated by the loss of edges by contraction and $m_\ell + m_r \le m_t$. So the total
-number of edges per level cannot decrease and it remains to apply the previous lemma.
+number of edges per level does not decrease and it remains to apply the previous lemma.
  \qed
  
  \thmn{Average-case complexity of the KKT algorithm}
@@ -1208,7 +1203,7 @@ The expected time complexity of the KKT algorithm on the RAM is $\O(m)$.
  
  \proof
  The structure of the recursion tree depends on the random choices taken,
-but as its worst case depth is at most~$\lceil \log_4 n\rceil$, the tree
+but as its worst-case depth is at most~$\lceil \log_4 n\rceil$, the tree
  is always a~subtree of the complete binary tree of that depth. We will
  therefore prove the theorem by examining the complete tree, possibly with
  empty subproblems at some vertices.
@@ -1217,14 +1212,14 @@ The set of all left edges in the tree (edges connecting a~parent with its left
  son) form a~set of \df{left paths.} Let us consider the expected time spent on
  a~single left path. When walking the path downwards from its top vertex~$r$,
  the expected size of the subproblems decreases exponentially: for a~son~$\ell$
-of a~vertex~$t$, we have $n_\ell \le n_t/4$ and $\E m_\ell = 2\cdot\E m_t$. The
+of a~vertex~$t$, we have $n_\ell \le n_t/4$ and $\E m_\ell = \E m_t/2$. The
  expected total time spend on the path is therefore $\O(n_r+m_r)$ and it remains
  to sum this over all left paths.
  
  With the exception of the path going from the root of the tree,
  the top~$r$ of a~left path is always a~right son of a~unique parent vertex~$t$.
  Since the subproblem~$G_r$ has been obtained from its parent subproblem~$G_t$
-by filtering out all heavy edges, we can use the Sampling lemma to obtain
+by filtering out all heavy edges, we can use the Sampling lemma to show that
  $\E m_r \le 2n_t$. The sum of the expected sizes of all top subproblems is
  then $\sum_r n_r + m_r \le \sum_t 3n_t = \O(n)$. After adding the exceptional path
  from the root, we get $\O(m+n)=\O(m)$.
diff --git a/mst.tex b/mst.tex

index e178d8b30bca3f5e84a32fa4096e1ac10f53b7e0..b915272b62a7ab165ff03d38805731aa41a7db00 100644 (file)
--- a/mst.tex
+++ b/mst.tex
@@ -683,4 +683,11 @@ of vertices and existence of paths, so they are always local to a~single
  connected component. The Bor\o{u}vka's and Kruskal's algorithm need no changes,
  the Jarn\'\i{}k's algorithm has to be invoked separately for each component.
  
+We can also extend the notion of light and heavy edges with respect
+to a~tree to forests: When an~edge~$e$ connects two vertices lying in the same
+tree~$T$ of a~forest~$F$, it is $F$-heavy iff it is $T$-heavy (similarly
+for $F$-light). Edges connecting two different trees are always considered
+$F$-light. Again, a~spanning forest~$F$ is minimum iff there are no $F$-light
+edges.
+
  \endpart
diff --git a/notation.tex b/notation.tex

index eb70c7224c38125bba52f23fbeb1a15eff808d42..83c6292574685f7293e7288a15e7ffd188964dfa 100644 (file)
--- a/notation.tex
+++ b/notation.tex
@@ -32,6 +32,7 @@
  \n{MST}{minimum spanning tree \[mstdef]}
  \n{MSF}{minimum spanning forest \[mstdef]}
  \n{$\mst(G)$}{the unique minimum spanning tree of a graph~$G$ \[mstnota]}
+\n{$\msf(G)$}{the unique minimum spanning forest of a graph~$G$ \[mstnota]}
  \n{$X \choose k$}{a set of all $k$-element subsets of a set~$X$}
  \n{$G/e$}{multigraph contraction \[contract]}
  \n{$G.e$}{simple graph contraction \[simpcont]}
author	Martin Mares <mj@ucw.cz>
	Mon, 24 Mar 2008 16:34:25 +0000 (17:34 +0100)
committer	Martin Mares <mj@ucw.cz>
	Mon, 24 Mar 2008 16:34:25 +0000 (17:34 +0100)
adv.tex		patch \| blob \| history
mst.tex		patch \| blob \| history
notation.tex		patch \| blob \| history