From 0e783e3fc00288b976fc0277e8948f3bc90cab98 Mon Sep 17 00:00:00 2001
From: Martin Mares <mj@ucw.cz>
Date: Tue, 18 Mar 2008 12:49:17 +0100
Subject: [PATCH] Finished KKT.

---
 PLAN         |  7 +++----
 adv.tex      | 47 +++++++++++++++++++++++++++++++++++++++--------
 biblio.bib   | 11 +++++++++++
 macros.tex   |  1 +
 notation.tex |  2 +-
 5 files changed, 55 insertions(+), 13 deletions(-)

diff --git a/PLAN b/PLAN
index 5554906..20a4c0d 100644
--- a/PLAN
+++ b/PLAN
@@ -19,7 +19,7 @@
   o  Fredman-Tarjan algorithm
   o  MST verification
   o  Linear-time verification
-  .  A randomized algorithm
+  o  A randomized algorithm
   .  ?? Chazelle ??
   .  ?? Pettie ??
   o  Special cases and related problems
@@ -55,11 +55,10 @@ Spanning trees:
 - reference to mixed Boruvka-Jarnik
 - use the notation for contraction by a set
 - practical considerations: katriel:cycle, moret:practice (mention pairing heaps)
-- parallel algorithms: p243-cole (are there others?)
+- parallel algorithms: p243-cole (see also remarks in Karger and pettie:minirand)
 - bounded expansion classes?
 - restricted cases and arborescences
-- mention parallel algorithms (see remarks in Karger)
-- Pettie's paper on random bits
+- Pettie's paper on random bits (pettie:minirand)
 
 Models:
 
diff --git a/adv.tex b/adv.tex
index 58d3c5c..9928648 100644
--- a/adv.tex
+++ b/adv.tex
@@ -167,7 +167,7 @@ of edges being at most $\varrho n$.
 \rem
 The proof can be also viewed
 probabilistically: let $X$ be the degree of a vertex of~$G$ chosen uniformly at
-random. Then ${\bb E}X \le 2\varrho$, hence by the Markov's inequality
+random. Then $\E X \le 2\varrho$, hence by the Markov's inequality
 ${\rm Pr}[X > 4\varrho] < 1/2$, so for at least $n/2$ vertices~$v$ we have
 $\deg(v)\le 4\varrho$.
 
@@ -1034,7 +1034,7 @@ as there are $\O(1)$ query paths per edge, the first sum is $\O(\#\hbox{comparis
 which is $\O(m)$ by Theorem \ref{verify}.
 \qed
 
-\rem
+\rem\id{pmverify}%
 Buchsbaum et al.~have recently shown in \cite{buchsbaum:verify} that linear-time
 verification can be achieved even on the pointer machine. They first solve the
 problem of finding the lowest common ancestors for a~set of pairs of vertices
@@ -1095,7 +1095,7 @@ tails, we discard the edge. Otherwise we perform a~single step of the Kruskal's
 algoritm: We check if~$F+e$ contains a~cycle. If it does, we discard~$e$, otherwise
 we add~$e$ to~$F$. At the end, we have produced the subgraph~$H$ and its MSF~$F$.
 
-We can modify the algorithm by swapping the check for cycles with flipping of
+We can modify the algorithm by swapping the check for cycles with flipping
 the coin:
 \algo
 \:If $F+e$ contains a~cycle, we immediately discard~$e$ (we can flip
@@ -1160,6 +1160,9 @@ at level~$i$ is therefore at most $n/4^i$ and the depth of the tree is at most $
 As there are no more than~$2^i$ subproblems at level~$i$, the sum of all~$n_t$'s
 on that level is at most $n/2^i$, which is at most~$2n$ summed over the whole tree.
 
+We are going to show that the worst case of the KKT algorithm is not worse than
+of the plain contractive algorithm, while the average case is linear.
+
 \lemma
 For every subproblem~$G_t$, the KKT algorithm spends time $\O(m_t+n_t)$ plus the time
 spent on the recursive calls.
@@ -1204,20 +1207,48 @@ number of edges per level cannot decrease and it remains to apply the previous l
 The expected time complexity of the KKT algorithm on the RAM is $\O(m)$.
 
 \proof
-
+The structure of the recursion tree depends on the random choices taken,
+but as its worst case depth is at most~$\lceil \log_4 n\rceil$, the tree
+is always a~subtree of the complete binary tree of that depth. We will
+therefore prove the theorem by examining the complete tree, possibly with
+empty subproblems at some vertices.
+
+The set of all left edges in the tree (edges connecting a~parent with its left
+son) form a~set of \df{left paths.} Let us consider the expected time spent on
+a~single left path. When walking the path downwards from its top vertex~$r$,
+the expected size of the subproblems decreases exponentially: for a~son~$\ell$
+of a~vertex~$t$, we have $n_\ell \le n_t/4$ and $\E m_\ell = 2\cdot\E m_t$. The
+expected total time spend on the path is therefore $\O(n_r+m_r)$ and it remains
+to sum this over all left paths.
+
+With the exception of the path going from the root of the tree,
+the top~$r$ of a~left path is always a~right son of a~unique parent vertex~$t$.
+Since the subproblem~$G_r$ has been obtained from its parent subproblem~$G_t$
+by filtering out all heavy edges, we can use the Sampling lemma to obtain
+$\E m_r \le 2n_t$. The sum of the expected sizes of all top subproblems is
+then $\sum_r n_r + m_r \le \sum_t 3n_t = \O(n)$. After adding the exceptional path
+from the root, we get $\O(m+n)=\O(m)$.
 \qed
 
-\FIXME{High probability result.}
+\rem
+There is also a~high-probability version of the above theorem. According to
+Karger, Klein and Tarjan \cite{karger:randomized}, the time complexity
+of the algorithm is $\O(m)$ with probability $1-\exp(-\Omega(m))$. The proof
+again follows the recursion tree and it involves applying the Chernoff bound
+\cite{chernoff} to bound the tail probabilities.
 
 \rem
 We could also use a~slightly different formulation of the sampling lemma
 suggested by Chan \cite{chan:backward}. It changes the selection of the subgraph~$H$
 to choosing an~$mp$-edge subset of~$E(G)$ uniformly at random. The proof is then
 a~straightforward application of the backward analysis method. We however prefered
-the Karger's original version, because generating  a~random subset of a~given size
-cannot be generally performed in bounded worst-case time.
+the Karger's original version, because generating a~random subset of a~given size
+requires an~unbounded number of random bits in the worst case.
 
-\FIXME{Pointer machine.}
+\rem
+The only place where we needed the power of the RAM is the verification algorithm,
+so we can use the pointer-machine verification algorithm mentioned in Remark \ref{pmverify}
+to bring the results of this section to the~PM.
 
 %--------------------------------------------------------------------------------
 
diff --git a/biblio.bib b/biblio.bib
index 0e406ad..01aca86 100644
--- a/biblio.bib
+++ b/biblio.bib
@@ -1098,3 +1098,14 @@ inproceedings{ pettie:minirand,
   year={1983},
   publisher={Academic Press, Inc. Orlando, FL, USA}
 }
+
+@article{ chernoff,
+  title={{A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations}},
+  author={Chernoff, H.},
+  journal={The Annals of Mathematical Statistics},
+  volume={23},
+  number={4},
+  pages={493--507},
+  year={1952},
+  publisher={JSTOR}
+}
diff --git a/macros.tex b/macros.tex
index 7eed678..2ac7bbf 100644
--- a/macros.tex
+++ b/macros.tex
@@ -49,6 +49,7 @@
 \def\minorof{\preccurlyeq}
 \def\per{\mathop{\rm per}}
 \def\poly{\mathop{\rm poly}}
+\def\E{{\bb E}}
 
 % Bit strings
 \def\0{{\bf 0}}
diff --git a/notation.tex b/notation.tex
index f2c84dd..eb70c72 100644
--- a/notation.tex
+++ b/notation.tex
@@ -41,7 +41,7 @@
 \n{$f[e]$}{as edges are two-element sets, $f[e]$ maps both endpoints of an edge~$e$}
 \n{$\varrho({\cal C})$}{edge density of a graph class~$\cal C$ \[density]}
 \n{$\deg_G(v)$}{degree of vertex~$v$ in graph~$G$; we omit $G$ if it is clear from context}
-\n{${\bb E}X$}{expected value of a~random variable~$X$}
+\n{${\E}X$}{expected value of a~random variable~$X$}
 \n{${\rm Pr}[\varphi]$}{probability that a predicate~$\varphi$ is true}
 \n{$\log n$}{a binary logarithm of the number~$n$}
 \n{$f^{(i)}$}{function~$f$ iterated $i$~times: $f^{(0)}(x):=x$, $f^{(i+1)}(x):=f(f^{(i)}(x))$}
-- 
2.39.5