From 583801851a8de6cc312a112ef170d4a279d55b3d Mon Sep 17 00:00:00 2001
From: Martin Mares <mj@ucw.cz>
Date: Fri, 7 Mar 2008 22:28:43 +0100
Subject: [PATCH] Komlos's theorem.

---
 adv.tex | 136 +++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 109 insertions(+), 27 deletions(-)
diff --git a/adv.tex b/adv.tex
index 3a174c8..dde5acb 100644
--- a/adv.tex
+++ b/adv.tex
@@ -544,7 +544,7 @@ which need careful handling, so we omit the description of this algorithm.
 \section{Verification of minimality}
 
 Now we will turn our attention to a~slightly different problem: given a~spanning
-tree, how to verify that it is minimal? We will show that this can be achieved
+tree, how to verify that it is minimum? We will show that this can be achieved
 in linear time and it will serve as a~basis for the randomized linear-time
 MST algorithm in the next section.
 
@@ -558,7 +558,7 @@ We will follow the results of Koml\'os and King, but as we have the power of the
 RAM data structures from Section~\ref{bitsect} at our command, we will not have
 to worry about much technical details.
 
-To verify that a~spanning~$T$ is minimal, it is sufficient to check that all
+To verify that a~spanning~$T$ is minimum, it is sufficient to check that all
 edges outside~$T$ are $T$-heavy (by Theorem \ref{mstthm}). In fact, we will be
 able to find all $T$-light edges efficiently. For each edge $uv\in E\setminus T$,
 we will find the heaviest edge of the tree path $T[u,v]$ and compare its weight
@@ -642,7 +642,7 @@ See for example Bender and Falach-Colton \cite{bender:lca}.
 We summarize the reductions in the following lemma, also showing that they can
 be performed in linear time:
 
-\lemma
+\lemma\id{verbranch}%
 For each tree~$T$ and a~set of query paths~$Q$ on~$T$, it is possible to find
 a~complete branching tree~$T'$ in linear time together with a~set~$Q'$ of query
 paths on~$T'$, such that the weights of the heaviest edges of the new paths can
@@ -669,43 +669,125 @@ is then trivial.
 \qed
 
 \para
-We will now describe a~simple variant of the depth-first search which finds the
-maximum-weight edges for all query paths.
+We will now describe a~simple variant of depth-first search which finds the
+maximum-weight edges for all query paths of the transformed problem. For the
+time being, we will not care about the time complexity of the algorithm (as long
+as it is polynomial) and we will minimize only the number of edge weight comparisons
+performed.
 
 For every edge~$e=uv$, we consider the set $Q_e$ of all query paths containing~$e$.
-The vertex of a~path which is closer to the root will be called its \df{top,}
+The vertex of a~path, which is closer to the root, will be called its \df{top,}
 the other vertex its \df{bottom.}
 We define arrays $T_e$ and~$H_e$ as follows: $T_e$ contains
 the tops of the paths in~$Q_e$ in order of their increasing depth (we
 will call them \df{active tops} and each of them will be stored exactly once). For
-each active top~$t=T_e[i]$, the $H_e[i]$ is defined as the heaviest edge of the path $T[v,t]$.
+each active top~$t=T_e[i]$, we define $H_e[i]$ as the heaviest edge of the path $T[v,t]$.
 As for every~$i$ the path $T[v,T_e[i+1]]$ is contained within $T[v,T_e[i]]$,
-the edges in~$H_e$ must have non-increasing weights: $w(H_e[i+1]) \le
+the edges in~$H_e$ must have non-increasing weights, that is $w(H_e[i+1]) \le
 w(H_e[i])$.
 
-The algorithm constructs the $T_e$'s and $H_e$'s recursively. We assume that the
-root has a~parent edge~$r$ and we define $T_r$ and~$H_r$ as empty arrays (no query
-path contains~$r$). When we arrived to a~vertex~$u$ via an~edge~$p$ from its parent,
-we visit each son~$v$ of~$u$ and we calculate $T_e$ and~$H_e$ for the edge $e=uv$
-from $T_p$ and~$H_p$:
+\alg $\<FindHeavy>(u,p,T_p,H_p)$ --- process all queries in the subtree rooted
+at~$u$ entered from its parent via an~edge~$p$.
+\id{findheavy}
 
-\itemize\ibull
-\:First we remove all tops which are no longer contained in paths crossing the
-current edge~$e$.
-%For example, we can keep a~use count for every active top and
-%decrease counters of the tops of all paths whose bottom is~$u$.
+\algo
+\:Process all query paths whose bottom is~$u$. For each of them, find the top of
+the path in~$T_p$ and record the heaviest edge remembered for it in~$H_p$.
 
-\:Then we update the heaviest edges for all tops. The new edge~$e$ will replace
-all lighter edges in $H_p$. Since $H_p$ is sorted, we can use binary search
-to locate the boundary between lighter and heavier edges in it.
+\:First find the tops~$T$ which will be shared by all edges going from~$u$ downwards.
+These are the tops from~$T_p$ except for the ones which have ceased to be active,
+because all query paths which were referring to them have~$u$ as their bottom.
+Select the corresponding array of the heaviest edges~$H$ from~$H_p$.
+
+\:For every son~$v$ of~$u$, do:
+
+\::Construct the array of tops~$T_e$ for the edge $e=uv$. It contains all tops
+   from~$T$ and possibly also the vertex~$u$ itself if there is a~query path
+   which has~$u$ as its top and which has bottom somewhere in the subtree
+   rooted at~$v$.
+
+\::Construct the array of the heaviest edges~$H_e$. For all tops from~$T$, just compare
+   the weight of the edge recorded in~$H$ with $w(e)$ and if it was smaller,
+   replace the remembered edge by~$e$. Since $H_p$ was sorted, so is~$H$
+   and we can use binary search to locate the boundary between lighter and
+   heavier edges in~$H$.
+   For the new top~$u$, the edge~$e$ itself is obviously the heaviest.
+
+\::Recurse on~$v$: call $\<FindHeavy>(v,e,T_e,H_e)$.
+\endalgo
+
+\>As we need a~parent edge to start the recursion, we add an~imaginary parent
+edge~$p_0$ of the root vertex~$r$, for which no queries are defined. We can
+therefore start with $\<FindHeavy>(r,p_0,\emptyset,\emptyset)$.
+
+Let us account for the comparisons:
+
+\lemma\id{vercompares}%
+When the procedure \<FindHeavy> is called on the transformed problem, it
+performs $\O(n+m)$ comparisons, where $n$ is the size of the tree and
+$m$ is the number of query paths.
+
+\proof
+We will calculate the number of comparisons~$c_i$ performed when processing the edges
+going from the $(i+1)$-th to the $i$-th level of the tree.
+The levels are numbered from the bottom, so leaves are at level~0, the root
+is at level $\ell\le \lceil \log_2 n\rceil$ and there are $n_i\le n/2^i$ vertices
+at the $i$-th level, so we consider exactly $n_i$ edges.
+\def\eqalign#1{\null\,\vcenter{\openup\jot
+  \ialign{\strut\hfil$\displaystyle{##}$&$\displaystyle{{}##}$\hfil
+      \crcr#1\crcr}}\,}
+$$\vcenter{\openup\jot\halign{\strut\hfil $\displaystyle{#}$&$\displaystyle{{}#}$\hfil&\quad#\hfil\cr
+c_i &\le \sum_e \left( 1 + \log \vert T_e\vert \right)&(Total cost of the binary search.)\cr
+    &\le n_i + \sum_e \log\vert T_e\vert&(We sum over $n_i$ edges.)\cr
+    &\le n_i + n_i \cdot {\sum_e \log\vert T_e\vert \over n_i}&(Consider the average of logarithms.) \cr
+    &\le n_i + n_i \cdot \log{\sum_e \vert T_e\vert \over n_i}&(Logarithm is concave.) \cr
+    &\le n_i + n_i \cdot \log{m\over n_i}&(Bound the number of tops by queries.) \cr
+    &= n_i \cdot \left( 1 + \log\left({m\over n}\cdot{n\over n_i}\right) \right)\cr
+    &= n_i + n_i\log{m\over n} + n_i\log{n\over n_i}.\cr
+}}$$
+Summing over all levels, we estimate the total number of comparisons as:
+$$
+c = \sum_i c_i = \left( \sum_i n_i \right) + \left( \sum_i n_i \log{m\over n}\right) + \left( \sum_i n_i \log{n\over n_i}\right).
+$$
+The first part is equal to~$n$, the second one to $n\log(m/n)\le m$. For the third
+one, we would like to plug in $n_i \le n/2^i$, but we unfortunately have one~$n_i$
+in the denominator. We save the situation by observing that the function $f(x)=x\log(n/x)$
+is decreasing\foot{We can easily check the derivative: $f(x)=(x\ln n-x\ln x)/\ln 2$, so $f'(x)\cdot \ln2 =
+\ln n - \ln x - 1$. We want $f'(x)<0$ and therefore $\ln x > \ln n - 1$, i.e., $x > n/e$.}
+for $x > n/e$, so except for $i=0$ and $i=1$ it holds that:
+$$
+n_i\log{n\over n_i} \le {n\over 2^i}\cdot\log{n\over n/2^i} = {n\over 2^i} \cdot i.
+$$
+We can therefore rewrite the third part as:
+$$\eqalign{
+\sum_i n_i\log{n\over n_i} &\le n_0\log{n\over n_0} + n_1\log{n\over n_1} + n\cdot\sum_{i\ge 2}{i\over 2^i} \le\cr
+&\le n\log1 + n_1\cdot {n\over n_1} + n\cdot\O(1) = \O(n).\cr
+}$$
+Putting all three parts together, we conclude that:
+$$
+c \le n + m + \O(n) = \O(n+m).
+$$
+\qed
+
+\para
+When we combine this lemma with the above reduction, we get the following theorem:
+
+\thmn{Verification of the MST, Koml\'os \cite{komlos:verify}}
+For every weighted graph~$G$ and its spanning tree~$T$, it is sufficient to
+perform $\O(m)$ comparisons of edge weights to determine whether~$T$ is minimum
+and to find all $T$-light edges in~$G$.
+
+\para
+We first transform the problem to finding the heaviest edges for a~set
+of query paths in~$T$ (these are exactly the paths covered by the edges
+of $G\setminus T$). We use the reduction from Lemma \ref{verbranch} to get
+an~equivalent problem with a~full branching tree and a~set of parent-descendant
+paths. Then we run the \<FindHeavy> procedure (\ref{findheavy}) to find
+the heaviest edges and employ Lemma \ref{vercompares} to bound the number
+of comparisons used.
+\qed
 
-\:Finally we have to add~$u$ as a~new top of there is a~query path which has~$u$
-at its top and its bottom lies anywhere in the subtree rooted at~$v$.
-\endlist
 
-So far, we will be interested in minimizing only the number of comparisons of
-edge weights, so we will postpone the exact implementation of the first and the
-third step for a~while.
 
 
 
-- 
2.39.5