From 7bc4f10b763512ad5777a1d1ea0d9069d78a72b2 Mon Sep 17 00:00:00 2001
From: Martin Mares <mj@ucw.cz>
Date: Wed, 2 Apr 2008 15:32:13 +0200
Subject: [PATCH] More robust contractions.

---
 adv.tex |   2 +-
 opt.tex | 132 ++++++++++++++++++++++++++++++++++++++++++++++++++------
 2 files changed, 121 insertions(+), 13 deletions(-)
diff --git a/adv.tex b/adv.tex
index 6836d43..f5f3360 100644
--- a/adv.tex
+++ b/adv.tex
@@ -449,7 +449,7 @@ Each edge is considered at most twice (once per its endpoint), so the number
 of the other heap operations is~$\O(m_i)$. Together, it equals $\O(m_i + n_i\log t_i) = \O(m_i+m) = \O(m)$.
 \qed
 
-\lemma
+\lemma\id{ijsize}%
 Unless the $i$-th phase is final, the forest~$F_i$ consists of at most $2m_i/t_i$ trees.
 
 \proof
diff --git a/opt.tex b/opt.tex
index 7aa4c54..07da833 100644
--- a/opt.tex
+++ b/opt.tex
@@ -28,11 +28,13 @@ The soft heap contains a~set of distinct items from a~totally ordered universe a
 supports the following operations:
 \itemize\ibull
 \:$\<Create>(\varepsilon)$ --- create an~empty soft heap with the given accuracy parameter~$\varepsilon$,
-\:$\<Insert>(H,x)$ --- insert a~new element~$x$ to the heap~$H$,
-\:$\<Meld>(P,Q)$ --- merge two heaps into one, more precisely move all elements of a~heap~$Q$
+\:$\<Insert>(H,x)$ --- insert a~new item~$x$ to the heap~$H$,
+\:$\<Meld>(P,Q)$ --- merge two heaps into one, more precisely move all items of a~heap~$Q$
   to the heap~$P$, destroying~$Q$ in the process (both heaps must have the same~$\varepsilon$),
-\:$\<DeleteMin>(H)$ --- delete the minimum element of the heap~$H$ and return its value
+\:$\<DeleteMin>(H)$ --- delete the minimum item of the heap~$H$ and return its value
   (optionally signalling that the value has been corrupted).
+\:$\<Explode>(H)$ --- destroy the heap and return a~list of all items contained in it
+  (again optionally marking those corrupted).
 \endlist
 
 \examplen{Linear-time selection}
@@ -273,6 +275,11 @@ Let us translate these ideas to real (pseudo)code:
 \algout A~modified soft queue.
 \endalgo
 
+\para
+\<Explode> is trivial once we know how to recognize the corrupted items. It simply examines
+all queues in the heap, walks the trees and the item lists of all vertices. It records
+all items seen, the corrupted ones are those that different from their \<ckey>.
+
 \paran{Analysis of accuracy}
 The description of the operations is now complete, so let us analyse their behavior
 and verify that we have delivered what we promised --- first the accuracy of
@@ -457,6 +464,17 @@ leftmost path and therefore it can be also paid for by \<Refill>. (Incidentally,
 this was the only place where we needed the invariant.)
 \qed
 
+\<Explode>s are easy not only to implement, but also to analyse:
+
+\lemma
+Every \<Explode> takes $\O(1)$ time amortized.
+
+\proof
+As all queues, vertices and items examined by \<Explode> are forever gone from the heap,
+we can charge the constant time spent on each of them against the operations
+that have created them.
+\qed
+
 It remains to take care of the calculation with ranks:
 
 \lemma\id{shyards}%
@@ -482,7 +500,7 @@ and their sum telescopes, again to the real cost of the meld.
 
 Now we can put the bits together and laurel our effort with the following theorem:
 
-\thmn{Performance of soft heaps, Chazelle \cite{chazelle:softheap}}
+\thmn{Performance of soft heaps, Chazelle \cite{chazelle:softheap}}\id{softheap}%
 A~soft heap with error rate~$\varepsilon$ ($0<\varepsilon\le 1/2$) processes
 a~sequence of operations starting with an~empty heap and containing $n$~\<Insert>s
 in time $\O(n\log(1/\varepsilon))$ on the Pointer machine. At every moment, the
@@ -509,8 +527,12 @@ by rebuilding the whole data structure completely after $n/2$ deletes. This
 increases the number of potentially corrupted items, but at worst twice, so it
 suffices to decrease~$\varepsilon$ twice.
 
+%--------------------------------------------------------------------------------
+
 \section{Robust contractions}
 
+\def\C{{\cal C}}
+
 Having the soft heaps at hand, we would like to use them in a~conventional MST
 algorithm in place of a~usual heap. The most efficient specimen of a~heap-based
 algorithm we have seen so far is the Iterated Jarn\'\i{}k's algorithm (\ref{itjar}).
@@ -561,7 +583,7 @@ between $G-C$ and~$G/C$, so we can write $G=C \cup (G/C)$.
 When~$C\subseteq G$ is a~contractible subgraph, then $\msf(G)=\msf(C) \cup \msf(G/C)$.
 
 \proof
-As both sides of the equality are trees on the same number of vertices, it suffices
+As both sides of the equality are forests spanning the same graph, it suffices
 to show that $\msf(G) \subseteq \msf(C)\cup\msf(G/C)$. We know that the edges which
 does not participate in the MSF are exactly those which are the heaviest on some cycle
 (this is the Cycle rule \ref{cyclerule}).
@@ -615,14 +637,100 @@ trick with expanding the vertex~$c$ to~$P$ in the cycle~$C$. Hence $g\not\in\mst
 \qed
 
 \para
-It therefore suffices to run the Jarn\'\i{}k's algorithm with a~soft heap to find the
-$(G\crpt R)$-contractible subgraph~$C$ and then recurse three times: first find $\msf(C)$
-and $\msf((G/C)\setminus R_C)$, then put together the pieces which are known to contain
-$\msf(G)$ by the Robust contraction lemma and find the MSF of this graph. As in the Iterative
-Jarn\'\i{}k's algorithm, we will construct as many non-overlapping contractible subgraphs in a~single
-phase of the algorithm as possible. Formally speaking, we will use the following partitioning
-procedure.
+Our plan still is to mimic the Iterative Jarn\'\i{}k's algorithm. We will grow a~collection~$\C$
+of non-overlapping contractible subgraphs and put aside the edges which got corrupted in the process.
+We recursively compute the MSF of the subgraphs and of the contracted graph. Then we take the
+union of these MSF's and add the corrupted edges. According to the previous lemma, this does not produce
+the final MSF, but a~sparser graph containing it on which we can continue. More precisely:
+
+\corn{Gradual robust contraction}\id{gradrc}%
+Suppose that $G$~is a~weighted graph, $R$~a~set of its edges, and $\C=\{C_1,\ldots,C_k\}$ is a~collection
+of subgraphs contractible in~$G\crpt R_\C$, where $R_\C = R_{C_1} \cup \ldots \cup R_{C_k}$. Then:
+$$
+\msf(G) = \msf\left( \bigcup_i \msf(C_i) \cup \msf\biggl( \biggl( G / \bigcup_i C_i \biggr) \setminus R_\C \biggr) \cup R_\C \right).
+$$
+
+\proof
+By iterating the previous lemma and noting that subgraphs contractible in~$G\crpt R^\prime_\C$
+for a~subset~$R^\prime_\C$ of $R_\C$ are also contractible in~$G\crpt R_\C$.
+\qed
 
+We can formulate the exact partitioning algorithm as follows:
 
+\algn{Partition a~graph to a~collection of contractible subgraphs}\id{partition}%
+\algo
+\algin A~graph~$G$ with an~edge comparison oracle, a~parameter~$t$ controlling the size of the subgraphs,
+  and an~accuracy parameter~$\varepsilon$.
+\:Mark all vertices as ``live''.
+\:$\C\=\emptyset$, $R_\C\=\emptyset$. \cmt{Start with an~empty collection and no corrupted edges.}
+\:While there is a~live vertex~$v_0$:
+\::$K=\{v_0\}$. \cmt{vertices of the tree we currently grow}
+\::$R=\emptyset$. \cmt{edges known to be corrupted}
+\::\<Create> a~soft heap with accuracy~$\varepsilon$ and \<Insert> the edges adjacent to~$v_0$ in it.
+\::While the heap is not empty and $\vert K\vert \le t$:
+\:::\<DeleteMin> an~edge $uv$ from the heap, assume $u\in K$.
+\:::If $uv$ was corrupted, add it to~$R$.
+\:::If $v\in K$, drop the edge and repeat the previous two steps.
+\:::$K\=K\cup\{v\}$.
+\:::If $v$ is dead, break out of the current loop.
+\:::Insert all edges incident with~$v$ to the heap.
+\::$\C\=\C \cup \{G[K]\}$. \cmt{Record the subgraph.}
+\::\<Explode> the heap and add all remaining corrupted edges to~$R$.
+\::$R_\C\=R_\C \cup \{ R_K \}$. \cmt{Record the ``interesting'' corrupted edges.}
+\::$G\=G\setminus R_K$. \cmt{Remove the corrupted edges from~$G$.}
+\::Mark all vertices of~$K$ as ``dead''.
+\algout The collection $\C$ of contractible subgraphs and the set~$R_\C$ of
+corrupted edges in the neighborhood of these subgraphs.
+\endalgo
+
+\lemman{Partitioning to contractible subgraphs, Pettie \cite{pettie:optimal}}\id{partlemma}%
+Given a~weighted graph~$G$ and parameters $\varepsilon$ ($0<\varepsilon\le 1/2$)
+and~$t$, the Partition algorithm \ref{partition} constructs a~collection
+$\C=\{C_1,\ldots,C_k\}$ of subgraphs and a~set~$R_\C$ of corrupted edges such that:
+
+\numlist\ndotted
+\:All the $C_i$'s and the set~$R_\C$ are edge-disjoint.
+\:Each $C_i$ contains at most~$t$ vertices.
+\:Each vertex of~$G$ is contained in at least one~$C_i$.
+\:The connected components of the union of all $C_i$'s have at least~$t$ vertices each,
+  except perhaps for those which are equal to a~connected component of~$G\setminus R_\C$.
+\:$\vert R_\C\vert \le 2\varepsilon m$.
+\:$\msf(G) = \msf\Bigl( \bigcup_i \msf(C_i) \cup \msf\bigl((G / \bigcup_i C_i) \setminus R_\C\bigr) \cup R_C \Bigr)$.
+\:The algorithm runs in time $\O(n+m\log (1/\varepsilon))$.
+\endlist
+
+\proof
+The algorithm grows a~series of trees. A~tree is built from edges adjacent to live vertices
+and once it is finished, all vertices of the tree die, so no edge is ever reused in another
+tree. Edges of~$R_\C$ are immediately deleted from the graph, so they never participate
+in any tree. The algorithm stops only if all vertices are dead, so each vertex must have
+entered some tree. Each $C_i$ is induced by some tree and the trees have at most~$t$ vertices
+each, which limits the size of the $C_i$'s as well. This proves claims 1--3.
+
+Claim~4: We can show that each connected component has $t$~or more vertices as we already did
+in the proof of Lemma \ref{ijsize}: How can a~new tree stop growing? Either it already
+has~$t$ vertices, or it joins one of the existing trees (which only increases the
+size of the component), or the heap becomes empty (which means that the tree spans
+a~full component of the current graph and hence also of~$G\setminus R_\C$).
+
+Claim~5: The set~$R_\C$ contains a~subset of edges corrupted by the soft heaps over
+the course of the algorithm. As every edge is inserted to a~heap at most once
+in every direction, the heaps can corrupt at worst $2\varepsilon m$ edges altogether.
+
+Claim~6 follows from the above Corollary \ref{gradrc} on Gradual robust contraction.
+\FIXME{It does not so far!}
+
+It remains to calculate the time complexity (claim~7):
+The inner loop (steps 7--13) processes the edges of the current subgraph~$G[K]$ and also
+the edges in its neighborhood $\delta(G[K])$. Steps 6 and~13 insert each such edge to the heap
+at most once, so steps 8--12 also visit each edge at most once. For every edge, we spend
+$\O(\log(1/\varepsilon))$ time amortized on inserting it and $\O(1)$ on the other operations
+(by Theorem \ref{softheap} on performance of the soft heaps).
+
+Each edge of~$G$ can participate in some $G[K]\cup \delta(G[K])$ only twice, then both of its ends
+are dead. The inner loop therefore processes $\O(m)$ edges total, so we spend $\O(m\log(1/\varepsilon))$
+time there. The rest of the algorithm spends $\O(m)$ time on the other operations with subgraphs
+and $\O(n)$ on identifying the live vertices.
+\qed
 
 \endpart
-- 
2.39.5