From 91a2e3f900548436b8bf764b2c8dcef62c00636c Mon Sep 17 00:00:00 2001 From: Martin Mares Date: Sun, 16 Mar 2008 20:57:48 +0100 Subject: [PATCH] Cleaning up verification chapter. --- adv.tex | 165 ++++++++++++++++++++++++++++---------------------------- 1 file changed, 81 insertions(+), 84 deletions(-) diff --git a/adv.tex b/adv.tex index f9cec90..cf39940 100644 --- a/adv.tex +++ b/adv.tex @@ -563,18 +563,18 @@ To verify that a~spanning~$T$ is minimum, it is sufficient to check that all edges outside~$T$ are $T$-heavy (by Theorem \ref{mstthm}). In fact, we will be able to find all $T$-light edges efficiently. For each edge $uv\in E\setminus T$, we will find the heaviest edge of the tree path $T[u,v]$ and compare its weight -to $w(uv)$. It is therefore sufficient to solve this problem: +to $w(uv)$. It is therefore sufficient to solve the following problem: \problem Given a~weighted tree~$T$ and a~set of \df{query paths} $Q \subseteq \{ T[u,v] ; u,v\in V(T) \}$ -specified by their endpoints, find the heaviest edge for every path in~$Q$. +specified by their endpoints, find the heaviest edge (\df{peak}) for every path in~$Q$. \para -Finding the heaviest edges can be burdensome if the tree~$T$ is degenerated, +Finding the peaks can be burdensome if the tree~$T$ is degenerated, so we will first reduce it to the same problem on a~balanced tree. We run the Bor\o{u}vka's algorithm on~$T$, which certainly produces $T$ itself, and we record the order in which the subtrees have been merged in another tree~$B(T)$. -The queries on~$T$ can be then easily translated to queries on~$B(T)$. +The peak queries on~$T$ can be then easily translated to peak queries on~$B(T)$. \defn For a~weighted tree~$T$ we define its \df{Bor\o{u}vka tree} $B(T)$ as a~rooted tree which records @@ -592,9 +592,8 @@ is really a~tree. All its leaves are on the same level and each internal vertex at least two sons. Such trees will be called \df{complete branching trees.} \lemma -For every tree~$T$ and every pair of its vertices $x,y\in V(T)$, the heaviest edge -of the path $T[x,y]$ has the same weight as the heaviest edge of~the path -$B(T)[x,y]$. +For every tree~$T$ and every pair of its vertices $x,y\in V(T)$, the peak +of the path $T[x,y]$ has the same weight as the peak of~the path $B(T)[x,y]$. \proof Let us denote the path $T[x,y]$ by~$P$ and its heaviest edge by~$h=ab$. Similarly, @@ -647,56 +646,55 @@ algorithm is $\O(\log n)$, the depth of the Bor\o{u}vka tree must be logarithmic For each query path $T[x,y]$ we find the lowest common ancestor of~$x$ and~$y$ and split the path by the two half-paths. This produces a~set~$Q'$ of at most~$2q$ half-paths. -The heaviest edge of every original query path is then the maximum of the results for -its halves. +The peak of every original query path is then the heavier of the peaks of its halves. \qed \para -We will now describe a~simple variant of depth-first search which finds the -heaviest edges for all query paths of the transformed problem. As we promised, +We will now describe a~simple variant of the depth-first search which finds the +peaks of all query paths of the transformed problem. As we promised, we will take care of the number of comparisons only, as long as all other operations are well-defined and they can be performed in polynomial time. +\defn For every edge~$e=uv$, we consider the set $Q_e$ of all query paths containing~$e$. The vertex of a~path, which is closer to the root, will be called its \df{top,} the other vertex its \df{bottom.} -We define arrays $T_e$ and~$H_e$ as follows: $T_e$ contains +We define arrays $T_e$ and~$P_e$ as follows: $T_e$ contains the tops of the paths in~$Q_e$ in order of their increasing depth (we will call them \df{active tops} and each of them will be stored exactly once). For -each active top~$t=T_e[i]$, we define $H_e[i]$ as the heaviest edge of the path $T[v,t]$. +each active top~$t=T_e[i]$, we define $P_e[i]$ as the peak of the path $T[v,t]$. + +\obs As for every~$i$ the path $T[v,T_e[i+1]]$ is contained within $T[v,T_e[i]]$, -the edges in~$H_e$ must have non-increasing weights, that is $w(H_e[i+1]) \le -w(H_e[i])$. +the edges of~$P_e$ must have non-increasing weights, that is $w(P_e[i+1]) \le +w(P_e[i])$. -\alg $\(u,p,T_p,H_p)$ --- process all queries in the subtree rooted +\alg $\(u,p,T_p,P_p)$ --- process all queries in the subtree rooted at~$u$ entered from its parent via an~edge~$p$. \id{findheavy} \algo -\:Process all query paths whose bottom is~$u$. For each of them, find the top of -the path in~$T_p$ and record the heaviest edge remembered for it in~$H_p$. - -\:First find the tops~$T$ which will be shared by all edges going from~$u$ downwards. -These are the tops from~$T_p$ except for the ones which have ceased to be active, -because all query paths which were referring to them either have~$u$ as their bottom -or continue from~$u$ downwards by another edge. -Select the corresponding array of the heaviest edges~$H$ from~$H_p$. - -\:For every son~$v$ of~$u$, do: - -\::Construct the array of tops~$T_e$ for the edge $e=uv$. It contains all tops - from~$T$ and possibly also the vertex~$u$ itself if there is a~query path - which has~$u$ as its top and which has bottom somewhere in the subtree - rooted at~$v$. - -\::Construct the array of the heaviest edges~$H_e$. For all tops from~$T$, just compare - the weight of the edge recorded in~$H$ with $w(e)$ and if it was smaller, - replace the remembered edge by~$e$. Since $H_p$ was sorted, so is~$H$ - and we can use binary search to locate the boundary between lighter and - heavier edges in~$H$. - For the new top~$u$, the edge~$e$ itself is obviously the heaviest. - -\::Recurse on~$v$: call $\(v,e,T_e,H_e)$. +\:Process all query paths whose bottom is~$u$ and record their peaks. +This is accomplished by finding the index~$i$ of each path's top in~$T_p$ and reading +the desired edge from~$P_p[i]$. + +\:For every son~$v$ of~$u$, process the edge $e=uv$: + +\::Construct the array of tops~$T_e$ for the edge~$e$: Start with~$T_p$, remove + the tops of the paths which do not contain~$e$ and add the vertex~$u$ itself + if there is a~query path which has~$u$ as its top and which has bottom somewhere + in the subtree rooted at~$v$. + +\::Construct the array of the peaks~$P_e$: Start with~$P_p$, + remove the entries corresponding to the tops which are no longer active. + Since the paths leading to all active tops have been extended by the + edge~$e$, compare $w(e)$ with weights of the edges recorded in~$P_e$ and replace + those edges which are lighter by~$e$. + Since $P_p$ was sorted, we can use binary search + to locate the boundary between lighter and heavier edges in~$P_e$. Finally + append~$e$ to the array if the top $u$ became active. + +\::Recurse on~$v$: call $\(v,e,T_e,P_e)$. \endalgo \>As we need a~parent edge to start the recursion, we add an~imaginary parent @@ -707,38 +705,38 @@ Let us account for the comparisons: \lemma\id{vercompares}% When the procedure \ is called on the transformed problem, it -performs $\O(n+m)$ comparisons, where $n$ is the size of the tree and -$m$ is the number of query paths. +performs $\O(n+q)$ comparisons, where $n$ is the size of the tree and +$q$ is the number of query paths. \proof We will calculate the number of comparisons~$c_i$ performed when processing the edges going from the $(i+1)$-th to the $i$-th level of the tree. -The levels are numbered from the bottom, so leaves are at level~0, the root -is at level $\ell\le \lceil \log_2 n\rceil$ and there are $n_i\le n/2^i$ vertices +The levels are numbered from the bottom, so leaves are at level~0 and the root +is at level $\ell\le \lceil \log_2 n\rceil$. There are $n_i\le n/2^i$ vertices at the $i$-th level, so we consider exactly $n_i$ edges. To avoid taking a~logarithm -of zero, we will define $\vert T_e\vert=1$ for $T_e=\emptyset$. +of zero, we define $\vert T_e\vert=1$ for $T_e=\emptyset$. \def\eqalign#1{\null\,\vcenter{\openup\jot \ialign{\strut\hfil$\displaystyle{##}$&$\displaystyle{{}##}$\hfil \crcr#1\crcr}}\,} $$\vcenter{\openup\jot\halign{\strut\hfil $\displaystyle{#}$&$\displaystyle{{}#}$\hfil&\quad#\hfil\cr -c_i &\le \sum_e \left( 1 + \log \vert T_e\vert \right)&(Total cost of the binary search.)\cr +c_i &\le \sum_e \left( 1 + \log \vert T_e\vert \right)&(Total cost of the binary searches.)\cr &\le n_i + \sum_e \log\vert T_e\vert&(We sum over $n_i$ edges.)\cr &\le n_i + n_i \cdot {\sum_e \log\vert T_e\vert \over n_i}&(Consider the average of logarithms.) \cr &\le n_i + n_i \cdot \log{\sum_e \vert T_e\vert \over n_i}&(Logarithm is concave.) \cr - &\le n_i + n_i \cdot \log{m\over n_i}&(Bound the number of tops by queries.) \cr - &= n_i \cdot \left( 1 + \log\left({m\over n}\cdot{n\over n_i}\right) \right)\cr - &= n_i + n_i\log{m\over n} + n_i\log{n\over n_i}.\cr + &\le n_i + n_i \cdot \log{q+n\over n_i}&(Bound the number of tops by queries.) \cr + &= n_i \cdot \left( 1 + \log\left({q+n\over n}\cdot{n\over n_i}\right) \right)\cr + &= n_i + n_i\log{q+n\over n} + n_i\log{n\over n_i}.\cr }}$$ Summing over all levels, we estimate the total number of comparisons as: $$ -c = \sum_i c_i = \left( \sum_i n_i \right) + \left( \sum_i n_i \log{m\over n}\right) + \left( \sum_i n_i \log{n\over n_i}\right). +c = \sum_i c_i = \left( \sum_i n_i \right) + \left( \sum_i n_i \log{q+n\over n}\right) + \left( \sum_i n_i \log{n\over n_i}\right). $$ -The first part is equal to~$n$, the second one to $n\log(m/n)\le m$. For the third -one, we would like to plug in $n_i \le n/2^i$, but we unfortunately have one~$n_i$ +The first part is equal to~$n$, the second one to $n\log((q+n)/n)\le q+n$. For the third +one, we would like to plug in the bound $n_i \le n/2^i$, but we unfortunately have one~$n_i$ in the denominator. We save the situation by observing that the function $f(x)=x\log(n/x)$ is decreasing\foot{We can easily check the derivative: $f(x)=(x\ln n-x\ln x)/\ln 2$, so $f'(x)\cdot \ln2 = \ln n - \ln x - 1$. We want $f'(x)<0$ and therefore $\ln x > \ln n - 1$, i.e., $x > n/e$.} -for $x > n/e$, so except for $i=0$ and $i=1$ it holds that: +for $x > n/e$, so for $i\ge 2$ it holds that: $$ n_i\log{n\over n_i} \le {n\over 2^i}\cdot\log{n\over n/2^i} = {n\over 2^i} \cdot i. $$ @@ -749,7 +747,7 @@ $$\eqalign{ }$$ Putting all three parts together, we conclude that: $$ -c \le n + m + \O(n) = \O(n+m). \qedmath +c \le n + (q+n) + \O(n) = \O(n+q). \qedmath $$ \para @@ -761,35 +759,34 @@ perform $\O(m)$ comparisons of edge weights to determine whether~$T$ is minimum and to find all $T$-light edges in~$G$. \proof -We first transform the problem to finding the heaviest edges for a~set +We first transform the problem to finding all peaks of a~set of query paths in~$T$ (these are exactly the paths covered by the edges of $G\setminus T$). We use the reduction from Lemma \ref{verbranch} to get an~equivalent problem with a~full branching tree and a~set of parent-descendant -paths, which costs $\O(m+n)$ comparisons. -Then we run the \ procedure (\ref{findheavy}) to find -the heaviest edges and according to Lemma \ref{vercompares} it spends -another $\O(m+n)$ comparisons. Since we (as always) assume that~$G$ is connected, -$\O(m+n)=\O(m)$. +paths. The reduction costs $\O(m+n)$ comparisons. +Then we run the \ procedure (Algorithm \ref{findheavy}) to find +the tops of all query paths. According to Lemma \ref{vercompares}, this spends another $\O(m+n)$ +comparisons. Since we (as always) assume that~$G$ is connected, $\O(m+n)=\O(m)$. \qed \rem -The problem of computing path maxima or minima in a~tree has several other interesting applications, -such as computing minimum cuts separating given pairs of vertices in a~given weighted undirected -graph~$G$. We construct a~Gomory-Hu tree~$T$ for the graph as described in \cite{gomoryhu} +The problem of computing path maxima or minima in a~weighted tree has several other interesting +applications. One of them is computing minimum cuts separating given pairs of vertices in a~given +weighted undirected graph~$G$. We construct a~Gomory-Hu tree~$T$ for the graph as described in \cite{gomoryhu} (see also \cite{bhalgat:ght} for a~more efficient algorithm running in time -$\widetilde\O(mn)$ for unit-cost graphs). This tree has the property that for every two +$\widetilde\O(mn)$ for unit-cost graphs). The crucial property of this tree is that for every two vertices $u$, $v$ of the graph~$G$, the minimum-cost edge on $T[u,v]$ has the same cost as the minimum cut separating $u$ and~$v$ in~$G$. Since the construction of~$T$ generally -takes $\Omega(n^2)$ time, we could also precompute the minima for all pairs of vertices -at no extra cost. This would however require quadratic space, while the method of this -section needs only $\O(n+q)$ to process~$q$ queries. +takes $\Omega(n^2)$ time, we could of course invest this time in precomputing the minima for +all pairs of vertices. This would however require quadratic space, so we can better use +the method of this section which fits in $\O(n+q)$ space for $q$~queries. \rem A~dynamic version of the problem is also often considered. It calls for a~data structure -representing a~weighted tree with operations for modifying the structure of the tree +representing a~weighted forest with operations for modifying the structure of the forest and querying minima or maxima on paths. Sleator and Tarjan have shown in \cite{sleator:trees} -how to do this in $\O(\log n)$ time amortized per operation, which for example -allows an~implementation of the Dinic's maximum flow algorithm \cite{dinic:flow} +how to do this in $\O(\log n)$ time amortized per operation, which leads to +an~implementation of the Dinic's maximum flow algorithm \cite{dinic:flow} in time $\O(mn\log n)$. %-------------------------------------------------------------------------------- @@ -831,7 +828,7 @@ The reductions in Lemma \ref{verbranch} can be performed in time $\O(m)$. \para Having the reduced problem at hand, we have to implement the procedure \ of Algorithm \ref{findheavy} efficiently. We need a~compact representation of -the arrays $T_e$ and~$H_e$ by vectors, so that the overhead of the algorithm +the arrays $T_e$ and~$P_e$ by vectors, so that the overhead of the algorithm will be linear in the number of comparisons performed. \defn @@ -858,7 +855,7 @@ a~single bit telling whether $t\in T_e$. Each top mask fits in $\lceil\log n\rce bits and therefore in a~single machine word. If needed, it can be split to three slots. \em{Small and big lists:} The heaviest edge found so far for each top is stored -by the algorithm in the array~$H_e$. Instead of keeping the real array, +by the algorithm in the array~$P_e$. Instead of keeping the real array, we store the labels of these edges in a~list encoded in a~bit string. Depending on the size of the list, we use two one of two possible encodings: \df{Small lists} are stored in a~vector which fits in a~single slot, with @@ -868,7 +865,7 @@ length of the list. If the data do not fit in a~small list, we use a~\df{big list} instead, which is stored in $\O(\log\log n)$ words, each of them containing a~slot-sized vector. Unlike the small lists, we use the big lists as arrays. If a~top~$t$ of -depth~$d$ is active, we keep the corresponding entry of~$H_e$ in the $d$-th +depth~$d$ is active, we keep the corresponding entry of~$P_e$ in the $d$-th field of the big list. Otherwise, we keep that entry unused. We will want to perform all operations on small lists in constant time, @@ -936,13 +933,13 @@ including the tops of all query paths which have bottom at the current vertex \qed \lemma\id{verth}% -The arrays $T_e$ and~$H_e$ can be indexed in constant time. +The arrays $T_e$ and~$P_e$ can be indexed in constant time. \proof Indexing~$T_e$ is exactly the operation \ applied on the corresponding top mask~$M_e$. -If $H_e$ is stored in a~big list, we calculate the index of the particular +If $P_e$ is stored in a~big list, we calculate the index of the particular slot and the position of the field inside the slot. This field can be then extracted using bit masking and shifts. @@ -953,12 +950,12 @@ encounter a~pointer, we index this big list. \qed \lemma\id{verhe}% -For an~arbitrary active top~$t$, the corresponding entry of~$H_e$ can be +For an~arbitrary active top~$t$, the corresponding entry of~$P_e$ can be extracted in constant time. \proof We look up the precomputed depth~$d$ of~$t$ first. -If $H_e$ is stored in a~big list, we extract the $d$-th entry of the list. +If $P_e$ is stored in a~big list, we extract the $d$-th entry of the list. If the list is small, we find the position of the particular field by counting bits of the top mask~$M_e$ at position~$d$ and higher (this is \ of $M_e$ with the lower bits masked out). @@ -972,14 +969,14 @@ where $q_e$~is the number of query paths having~$e$ as its bottommost edge. The edge is examined in steps 1, 2, 4 and~5 of the algorithm. Step~4 is trivial as we have already computed the top masks and we can reconstruct the entries of~$T_e$ in constant time (Lemma \ref{verth}). We have to perform the other -in constant time if $H_e$ is a~small list or $\O(\log\log n)$ if it is big. +in constant time if $P_e$ is a~small list or $\O(\log\log n)$ if it is big. -\em{Step 5} involves binary search on~$H_e$ in $\O(\log\vert T_e\vert)$ comparisons, -each of them indexes~$H_e$, which is $\O(1)$ by Lemma \ref{verth}. Rewriting the +\em{Step 5} involves binary search on~$P_e$ in $\O(\log\vert T_e\vert)$ comparisons, +each of them indexes~$P_e$, which is $\O(1)$ by Lemma \ref{verth}. Rewriting the lighter edges is $\O(1)$ for small lists by replication and bit masking, for a~big list we do the same for each of its slot. -\em{Step 1} looks up $q_e$~tops in~$H_e$ and we already know from Lemma \ref{verhe} +\em{Step 1} looks up $q_e$~tops in~$P_e$ and we already know from Lemma \ref{verhe} how to do that in constant time. \em{Step 2} is the only non-trivial one. We already know which tops to select @@ -988,13 +985,13 @@ extract the sublist. We have to handle these four cases: \itemize\ibull -\:\em{Small from small:} We use \ to find the fields of~$P_p$ which shall be deleted. Then we apply \ to actually delete them. Pointers can be retained as they still refer to the same ancestor list. -\:\em{Big from big:} We can copy the whole~$H_p$, since the layout of the +\:\em{Big from big:} We can copy the whole~$P_p$, since the layout of the big lists is fixed and the items we do not want simply end up as unused -fields in~$H_e$. +fields in~$P_e$. \:\em{Small from big:} We use the operation \ to construct a~list of pointers (we use bit masking to add the ``this is a~pointer'' flags). -- 2.39.2