Finished K best.

author Martin Mares <mj@ucw.cz>

Thu, 24 Apr 2008 10:48:53 +0000 (12:48 +0200)

committer Martin Mares <mj@ucw.cz>

Thu, 24 Apr 2008 10:48:53 +0000 (12:48 +0200)
author Martin Mares <mj@ucw.cz>
Thu, 24 Apr 2008 10:48:53 +0000 (12:48 +0200)
committer Martin Mares <mj@ucw.cz>
Thu, 24 Apr 2008 10:48:53 +0000 (12:48 +0200)
diff --git a/biblio.bib b/biblio.bib

index f489749bfce79ba73fa8e11e7aa63496c282b05b..a2cbfa0aa827f460fa3de921740f6d441d01944c 100644 (file)
--- a/biblio.bib
+++ b/biblio.bib
@@ -1580,3 +1580,17 @@
    note = {In Czech},
    isbn = {80-7015-109-9},
  }
+
+@article{ tarjan:applpc,
+ author = {Robert Endre Tarjan},
+ title = {Applications of Path Compression on Balanced Trees},
+ journal = {Journal of the ACM},
+ volume = {26},
+ number = {4},
+ year = {1979},
+ issn = {0004-5411},
+ pages = {690--715},
+ doi = {http://doi.acm.org/10.1145/322154.322161},
+ publisher = {ACM},
+ address = {New York, NY, USA},
+}
diff --git a/dyn.tex b/dyn.tex

index 7274616c725adb3ae47720f4b45025bff7061054..e7432fca64e7604960cbd43e8afd8dc0785015dc 100644 (file)
--- a/dyn.tex
+++ b/dyn.tex
@@ -731,21 +731,22 @@ by Theorem \ref{sletar}.
  
  \section{Almost minimum trees}\id{kbestsect}%
  
-In some situations, finding the minimum spanning tree is not enough and we are interested
+In some situations, finding the single minimum spanning tree is not enough and we are interested
  in the $K$~lightest spanning trees, usually for some small value of~$K$. Katoh, Ibaraki
  and Mine \cite{katoh:kmin} have given an~algorithm of time complexity $\O(m\log\beta(m,n) + Km)$,
  building on the MST algorithm of Gabow et al.~\cite{gabow:mst}.
  Subsequently, Eppstein \cite{eppstein:ksmallest} has discovered an~elegant preprocessing step which allows to reduce
  the running time to $\O(m\log\beta(m,n) + \min(K^2,Km))$ by eliminating edges
  which are either present in all $K$ trees or in none of them.
-We will show a~simplified algorithm based on the MST verification procedure of Section~\ref{verifysect}.
+We will show a~variant of their algorithm based on the MST verification
+procedure of Section~\ref{verifysect}.
  
  In this section, we will require the edge weights to be real numbers (or integers), because
  comparisons are certainly not enough to determine the second best spanning tree. We will
  assume that our computation model is able to add, subtract and compare the edge weights
  in constant time.
  
-Let us focus on finding the second best spanning tree, to begin with.
+Let us focus on finding the second best spanning tree first.
  
  \paran{Second best spanning tree}%
  Suppose that we have a~weighted graph~$G$ and a~sequence $T_1,\ldots,T_z$ of all its spanning
@@ -775,10 +776,186 @@ in linear time. This implies the following:
  \lemma
  Given~$G$ and~$T_1$, we can find~$T_2$ in time $\O(m)$.
  
-\paran{Getting further spanning trees}%
-When we know~$T_1$ and~$T_2$, how to get~$T_3$? According to Lemma \ref{kbl}, it can be
+\paran{Third best spanning tree}%
+When we know~$T_1$ and~$T_2$, how to get~$T_3$? According to Lemma \ref{kbl}, $T_3$~can be
  obtained by a~single exchange from either~$T_1$ or~$T_2$. Therefore we need to find the
  best exchange for~$T_2$ and the second best exchange for~$T_1$ and use the better of them.
+The latter is not easy to find directly, so we will make a~minor side step.
+
+We know that $T_2=T_1-e+f$ for some edges $e$ and~$f$. We define two auxiliary graphs:
+$G_1 := G\sgc e$ ($G$~with the edge~$e$ contracted) and $G_2 := G-e$. The tree~$T_1\sgc e$ is
+obviously the MST of~$G_1$ (by the Contraction lemma) and $T_2$ is the MST of~$G_2$ (all
+$T_2$-light edges in~$G_2$ would be $T_1$-light in~$G$).
+
+\obs
+The tree $T_3$~can be obtained by a~single edge exchange in either $(G_1,T_1\sgc e)$ or $(G_2,T_2)$:
+
+\itemize\ibull
+\:If $T_3 = T_1-e'+f'$ for $e'\ne e$, then $T_3\sgc e = (T_1\sgc e)-e'+f'$ in~$G_1$.
+\:If $T_3 = T_1-e+f'$, then $T_3 = T_2 - f + f'$ in~$G_2$.
+\:If $T_3 = T_2-e'+f'$, then this exchange is found in~$G_2$.
+\endlist
+
+\>Conversely, a~single exchange in $(G_1,T_1\sgc e)$ or in $(G_2,T_2)$ corresponds
+to an~exchange in either~$(G,T_1)$ or $(G,T_2)$.
+Even stronger, a~spanning tree~$T$ of~$G$ either contains~$e$ and then $T\sgc
+e$ is a~spanning tree of~$G_1$, or $T$~doesn't contain~$e$ and so it is
+a~spanning tree of~$G_2$.
+
+Thus we can run the previous algorithm for finding the best edge exchange
+on both~$G_1$ and~$G_2$ and find~$T_3$ again in time $\O(m)$.
+
+\paran{Further spanning trees}%
+The construction of auxiliary graphs can be iterated to obtain $T_1,\ldots,T_K$
+for an~arbitrary~$K$. We will build a~\df{meta-tree} of auxilary graphs. Each node of this meta-tree
+is assigned a~minor of~$G$ and its minimum spanning tree. The root node contains~$(G,T_1)$,
+its sons have $(G_1,T_1\sgc e)$ and $(G_2,T_2)$. When $T_3$ is obtained by an~exchange
+in one of these sons, we attach two new leaves to that son and we assign the two auxiliary
+graphs derived by contracting or deleting the exchanged edge. Then we find the best
+edge exchanges among the new sons and repeat the process. By the above observation,
+each spanning tree of~$G$ is generated exactly once. Lemma \ref{kbl} guarantees that
+the trees are enumerated in the increasing order.
+
+Recalculating the best exchanges in all leaves of the meta-tree after generating each~$T_i$
+is of course not necessary, because most leaves stay unchanged. We will rather remember
+the best exchange for each leaf and keep their values in a~heap. In every step, we will
+delete the minimum from the heap and use the exchange in the particular leaf to generate
+a~new tree. Then we will create the new leaves, calculate their best exchanges and insert
+them into the heap. The algorithm is now straightforward and so will be its analysis:
+
+\algn{Finding $K$ best spanning trees}\id{kbestalg}%
+\algo
+\algin A~weighted graph~$G$, its MST~$T_1$ and an~integer $K>0$.
+\:$R\=$ a~meta tree whose vertices carry triples $(G',T',F')$. Initially
+  it contains just a~root with $(G,T_1,\emptyset)$.
+  \hfil\break
+  \cmt{$G'$ is a~minor of~$G$, $T'$~is its MST, and~$F'$ is a~set of edges of~$G$
+  that are contracted in~$G'$.}
+\:$H\=$ a~heap of quadruples $(\delta,r,e,f)$ ordered on~$\delta$, initially empty.
+  \hfil\break
+  \cmt{Each quadruple describes an~exchange of~$e$ for~$f$ in a~leaf~$r$ of~$R$ and $\delta=w(f)-w(e)$
+  is the weight gain of this exchange.}
+\:Find the best edge exchange in~$(G,T_1)$ and insert it to~$H$.
+\:$i\= 1$.
+\:While $i<K$:
+\::Delete the minimum quadruple $(\delta,r,e,f)$ from~$H$.
+\::$(G',T',F') \=$ the triple carried by the leaf~$r$.
+\::$i\=i+1$.
+\::$T_i\=(T'-e+f) \cup F'$. \cmt{The next spanning tree}
+\::$r_1\=$ a~new leaf carrying $(G'\sgc e,T'\sgc e,F'+e)$.
+\::$r_2\=$ a~new leaf carrying $(G'-e,T_i,F')$.
+\::Attach~$r_1$ and~$r_2$ as sons of~$r$.
+\::Find the best edge exchanges in~$r_1$ and~$r_2$ and insert them to~$H$.
+\algout The spanning trees $T_2,\ldots,T_K$.
+\endalgo
+
+\lemma\id{kbestl}%
+Given~$G$ and~$T_1$, we can find $T_2,\ldots,T_K$ in time $\O(Km + K\log K)$.
+
+\proof
+Generating each~$T_i$ requires finding the best exchange for two graphs and $\O(1)$
+operations on the heap. The former takes $\O(m)$ according to Corollary \ref{rampeaks},
+and each heap operation takes $\O(\log K)$.
+\qed
+
+\paran{Arbitrary weights}%
+While the assumption that the weights of all spanning trees are distinct has helped us
+in thinking about the problem, we should not forget that it is somewhat unrealistic.
+We could refine the proof of our algorithm and demonstrate that the algorithm indeed works
+without this assumption, but we will rather show that the ties can be broken easily.
+
+Let~$\delta$ be the minimum positive difference of weights of spanning trees
+of~$G$ and $e_1,\ldots,e_m$ be the edges of~$G$. We observe that it suffices to
+increase $w(e_i)$ by~$\delta_i = \delta/2^{i+1}$. The cost of every spanning tree
+has increased by at most $\sum_i\delta_i < \delta/2$, so if $T$~was lighter
+than~$T'$, it still is. On the other hand, the no two trees share the same
+weight difference, so all tree weights are now distinct.
+
+The exact value of~$\delta$ is not easy to calculate, but examination of the algorithm
+reveals that it is not needed at all. The only place where the edge weights are examined
+is when we search for the best exchange. In this case, we compare the differences of
+pairs of edge weights with each other. Each such difference is therefore adjusted
+by $\delta\cdot(2^{-i}-2^{-j})$ for some $i,j>1$, which again does not influence comparison
+of originally distinct differences. If the differences were the same, it is sufficient
+to look at their values of~$i$ and~$j$, i.e., at the identifiers of the edges.
+
+\paran{Invariant edges}%
+Our algorithm can be further improved for small values of~$K$ (which seems to be the common
+case in most applications) by the reduction of Eppstein \cite{eppstein:ksmallest}.
+We will observe that there are many edges of~$T_1$
+which are guaranteed to be contained in $T_2,\ldots,T_K$ as well, and likewise there are
+many edges of $G\setminus T_1$ which are certainly not present in those spanning trees.
+The idea is the following (we again assume that the tree weights are distinct):
+
+\defn
+For an~edge $e\in T_1$, we define its \df{gain} $g(e)$ as the minimum weight gained by an~edge exchange
+replacing~$e$. Similarly, we define $G(f)$ for $f\not\in T_1$. Put formally:
+$$\eqalign{
+g(e) &:= \min\{ w(f)-w(e) \mid f\in E, e\in T[f] \} \cr
+G(f) &:= \min\{ w(f)-w(e) \mid e\in E, e\in T[f] \}.\cr
+}$$
+
+\lemma\id{gaina}%
+When $t_1,\ldots,t_{n-1}$ are the edges of~$T_1$ in order of increasing gain,
+the edges $t_K,\ldots,t_n$ are present in all trees $T_2,\ldots,T_K$.
+
+\proof
+The best exchanges in~$T_1$ involving $t_1,\ldots,t_{K-1}$ produce~$K-1$ spanning trees
+of increasing weights. Any exchange involving $t_K,\ldots,t_n$ produces a~tree
+which is heavier or equal than those. (We are ascertained by the Monotone exchange lemma
+that the gain of such exchanges cannot be reverted by any later exchanges.)
+\qed
+
+\lemma\id{gainb}%
+When $q_1,\ldots,q_{m-n+1}$ are the edges of $G\setminus T_1$ in order of increasing gain,
+the edges $q_K,\ldots,q_{m-n+1}$ are not present in any of $T_2,\ldots,T_K$.
+
+\proof
+Similar to the previous lemma.
+\qed
+
+\para
+It is therefore sufficient to find $T_2,\ldots,T_K$ in the graph obtained from~$G$ by
+contracting the edges $t_K,\ldots,t_n$ and deleting $q_K,\ldots,q_{m-n+1}$. This graph
+has only $\O(K)$ vertices and $\O(K)$ edges. The only remaining question is how to
+calculate the gains. For edges outside~$T_1$, it again suffices to find the peaks of the
+covered paths. The gains of MST edges require a~different algorithm, but Tarjan \cite{tarjan:applpc}
+has shown that they can be obtained in time $\O(m\timesalpha(m,n))$.
+
+When we put the results of this section together, we obtain:
+
+\thmn{Finding $K$ best spanning trees}\id{kbestthm}%
+For a~given graph~$G$ with real edge weights, the $K$~best spanning trees can be found
+in time $\O(m\timesalpha(m,n) + \min(K^2,Km + K\log K))$.
+
+\proof
+First we find the MST of~$G$ in time $\O(m\timesalpha(m,n))$ using the Pettie's Optimal
+MST algorithm (Theorem \ref{optthm}). Then we calculate the gains of MST edges by the
+Tarjan's algorithm from \cite{tarjan:applpc}, again in $\O(m\timesalpha(m,n))$, and
+the gains of the other edges using our MST verification algorithm (Corollary \ref{rampeaks})
+in $\O(m)$. We Lemma \ref{gaina} to identify edges that are required, and Lemma \ref{gainb}
+to find edges that are useless. We contract the former edges, remove the latter ones
+and run Algorithm \ref{kbestalg} to find the trees. By Lemma \ref{kbestl}, it runs in
+the desired time.
+
+If~$K\ge m$, this reduction does not pay off, so we run Algorithm \ref{kbestalg}
+directly on the input graph.
+\qed
+
+\paran{Improvements}%
+It is an~interesting open question whether the algorithms of Section \ref{verifysect} can
+be modified to calculate all gains. The main procedure can, but it requires to reduce
+the input to a~balance tree first and here the Bor\o{u}vka trees fail. The Buchsbaum's
+Pointer-Machine algorithm (\ref{pmverify}) is more promising.
+
+\paran{Large~$K$}%
+When $K$~is large, re-running the verification algorithm for every change of the graph
+is too costly. Frederickson \cite{frederickson:ambivalent} has shown how to find the best
+swaps dynamically, reducing the overall time complexity of Algorithm \ref{kbestalg}
+to $\O(Km^{1/2})$ and improving Theorem \ref{kbestthm} to $\O(m\timesalpha(m,n)
++ \min( K^{3/2}, Km^{1/2} ))$. It is open if the dynamic data structures of this
+chapter could be modified to bring the complexity of finding the next tree down
+to polylogarithmic.
  
  
  \endpart
author	Martin Mares <mj@ucw.cz>
	Thu, 24 Apr 2008 10:48:53 +0000 (12:48 +0200)
committer	Martin Mares <mj@ucw.cz>
	Thu, 24 Apr 2008 10:48:53 +0000 (12:48 +0200)
biblio.bib		patch \| blob \| history
dyn.tex		patch \| blob \| history