Cleaning up verification chapter.

author Martin Mares <mj@ucw.cz>

Sun, 16 Mar 2008 19:57:48 +0000 (20:57 +0100)

committer Martin Mares <mj@ucw.cz>

Sun, 16 Mar 2008 19:57:48 +0000 (20:57 +0100)
author Martin Mares <mj@ucw.cz>
Sun, 16 Mar 2008 19:57:48 +0000 (20:57 +0100)
committer Martin Mares <mj@ucw.cz>
Sun, 16 Mar 2008 19:57:48 +0000 (20:57 +0100)
diff --git a/adv.tex b/adv.tex

index f9cec903c48b3885e7a31488ad57ff1d7a659162..cf3994014bd8b5ab9d2f8a85d146cc1ea50971b7 100644 (file)
--- a/adv.tex
+++ b/adv.tex
@@ -563,18 +563,18 @@ To verify that a~spanning~$T$ is minimum, it is sufficient to check that all
  edges outside~$T$ are $T$-heavy (by Theorem \ref{mstthm}). In fact, we will be
  able to find all $T$-light edges efficiently. For each edge $uv\in E\setminus T$,
  we will find the heaviest edge of the tree path $T[u,v]$ and compare its weight
-to $w(uv)$. It is therefore sufficient to solve this problem:
+to $w(uv)$. It is therefore sufficient to solve the following problem:
  
  \problem
  Given a~weighted tree~$T$ and a~set of \df{query paths} $Q \subseteq \{ T[u,v] ; u,v\in V(T) \}$
-specified by their endpoints, find the heaviest edge for every path in~$Q$.
+specified by their endpoints, find the heaviest edge (\df{peak}) for every path in~$Q$.
  
  \para
-Finding the heaviest edges can be burdensome if the tree~$T$ is degenerated,
+Finding the peaks can be burdensome if the tree~$T$ is degenerated,
  so we will first reduce it to the same problem on a~balanced tree. We run
  the Bor\o{u}vka's algorithm on~$T$, which certainly produces $T$ itself, and we
  record the order in which the subtrees have been merged in another tree~$B(T)$.
-The queries on~$T$ can be then easily translated to queries on~$B(T)$.
+The peak queries on~$T$ can be then easily translated to peak queries on~$B(T)$.
  
  \defn
  For a~weighted tree~$T$ we define its \df{Bor\o{u}vka tree} $B(T)$ as a~rooted tree which records
@@ -592,9 +592,8 @@ is really a~tree. All its leaves are on the same level and each internal vertex
  at least two sons. Such trees will be called \df{complete branching trees.}
  
  \lemma
-For every tree~$T$ and every pair of its vertices $x,y\in V(T)$, the heaviest edge
-of the path $T[x,y]$ has the same weight as the heaviest edge of~the path
-$B(T)[x,y]$.
+For every tree~$T$ and every pair of its vertices $x,y\in V(T)$, the peak
+of the path $T[x,y]$ has the same weight as the peak of~the path $B(T)[x,y]$.
  
  \proof
  Let us denote the path $T[x,y]$ by~$P$ and its heaviest edge by~$h=ab$. Similarly,
@@ -647,56 +646,55 @@ algorithm is $\O(\log n)$, the depth of the Bor\o{u}vka tree must be logarithmic
  
  For each query path $T[x,y]$ we find the lowest common ancestor of~$x$ and~$y$
  and split the path by the two half-paths. This produces a~set~$Q'$ of at most~$2q$ half-paths.
-The heaviest edge of every original query path is then the maximum of the results for
-its halves.
+The peak of every original query path is then the heavier of the peaks of its halves.
  \qed
  
  \para
-We will now describe a~simple variant of depth-first search which finds the
-heaviest edges for all query paths of the transformed problem. As we promised,
+We will now describe a~simple variant of the depth-first search which finds the
+peaks of all query paths of the transformed problem. As we promised,
  we will take care of the number of comparisons only, as long as all other operations
  are well-defined and they can be performed in polynomial time.
  
+\defn
  For every edge~$e=uv$, we consider the set $Q_e$ of all query paths containing~$e$.
  The vertex of a~path, which is closer to the root, will be called its \df{top,}
  the other vertex its \df{bottom.}
-We define arrays $T_e$ and~$H_e$ as follows: $T_e$ contains
+We define arrays $T_e$ and~$P_e$ as follows: $T_e$ contains
  the tops of the paths in~$Q_e$ in order of their increasing depth (we
  will call them \df{active tops} and each of them will be stored exactly once). For
-each active top~$t=T_e[i]$, we define $H_e[i]$ as the heaviest edge of the path $T[v,t]$.
+each active top~$t=T_e[i]$, we define $P_e[i]$ as the peak of the path $T[v,t]$.
+
+\obs
  As for every~$i$ the path $T[v,T_e[i+1]]$ is contained within $T[v,T_e[i]]$,
-the edges in~$H_e$ must have non-increasing weights, that is $w(H_e[i+1]) \le
-w(H_e[i])$.
+the edges of~$P_e$ must have non-increasing weights, that is $w(P_e[i+1]) \le
+w(P_e[i])$.
  
-\alg $\<FindHeavy>(u,p,T_p,H_p)$ --- process all queries in the subtree rooted
+\alg $\<FindHeavy>(u,p,T_p,P_p)$ --- process all queries in the subtree rooted
  at~$u$ entered from its parent via an~edge~$p$.
  \id{findheavy}
  
  \algo
-\:Process all query paths whose bottom is~$u$. For each of them, find the top of
-the path in~$T_p$ and record the heaviest edge remembered for it in~$H_p$.
-
-\:First find the tops~$T$ which will be shared by all edges going from~$u$ downwards.
-These are the tops from~$T_p$ except for the ones which have ceased to be active,
-because all query paths which were referring to them either have~$u$ as their bottom
-or continue from~$u$ downwards by another edge.
-Select the corresponding array of the heaviest edges~$H$ from~$H_p$.
-
-\:For every son~$v$ of~$u$, do:
-
-\::Construct the array of tops~$T_e$ for the edge $e=uv$. It contains all tops
-   from~$T$ and possibly also the vertex~$u$ itself if there is a~query path
-   which has~$u$ as its top and which has bottom somewhere in the subtree
-   rooted at~$v$.
-
-\::Construct the array of the heaviest edges~$H_e$. For all tops from~$T$, just compare
-   the weight of the edge recorded in~$H$ with $w(e)$ and if it was smaller,
-   replace the remembered edge by~$e$. Since $H_p$ was sorted, so is~$H$
-   and we can use binary search to locate the boundary between lighter and
-   heavier edges in~$H$.
-   For the new top~$u$, the edge~$e$ itself is obviously the heaviest.
-
-\::Recurse on~$v$: call $\<FindHeavy>(v,e,T_e,H_e)$.
+\:Process all query paths whose bottom is~$u$ and record their peaks.
+This is accomplished by finding the index~$i$ of each path's top in~$T_p$ and reading
+the desired edge from~$P_p[i]$.
+
+\:For every son~$v$ of~$u$, process the edge $e=uv$:
+
+\::Construct the array of tops~$T_e$ for the edge~$e$: Start with~$T_p$, remove
+   the tops of the paths which do not contain~$e$ and add the vertex~$u$ itself
+   if there is a~query path which has~$u$ as its top and which has bottom somewhere
+   in the subtree rooted at~$v$.
+
+\::Construct the array of the peaks~$P_e$: Start with~$P_p$,
+   remove the entries corresponding to the tops which are no longer active.
+   Since the paths leading to all active tops have been extended by the
+   edge~$e$, compare $w(e)$ with weights of the edges recorded in~$P_e$ and replace
+   those edges which are lighter by~$e$.
+   Since $P_p$ was sorted, we can use binary search
+   to locate the boundary between lighter and heavier edges in~$P_e$. Finally
+   append~$e$ to the array if the top $u$ became active.
+
+\::Recurse on~$v$: call $\<FindHeavy>(v,e,T_e,P_e)$.
  \endalgo
  
  \>As we need a~parent edge to start the recursion, we add an~imaginary parent
@@ -707,38 +705,38 @@ Let us account for the comparisons:
  
  \lemma\id{vercompares}%
  When the procedure \<FindHeavy> is called on the transformed problem, it
-performs $\O(n+m)$ comparisons, where $n$ is the size of the tree and
-$m$ is the number of query paths.
+performs $\O(n+q)$ comparisons, where $n$ is the size of the tree and
+$q$ is the number of query paths.
  
  \proof
  We will calculate the number of comparisons~$c_i$ performed when processing the edges
  going from the $(i+1)$-th to the $i$-th level of the tree.
-The levels are numbered from the bottom, so leaves are at level~0, the root
-is at level $\ell\le \lceil \log_2 n\rceil$ and there are $n_i\le n/2^i$ vertices
+The levels are numbered from the bottom, so leaves are at level~0 and the root
+is at level $\ell\le \lceil \log_2 n\rceil$. There are $n_i\le n/2^i$ vertices
  at the $i$-th level, so we consider exactly $n_i$ edges. To avoid taking a~logarithm
-of zero, we will define $\vert T_e\vert=1$ for $T_e=\emptyset$.
+of zero, we define $\vert T_e\vert=1$ for $T_e=\emptyset$.
  \def\eqalign#1{\null\,\vcenter{\openup\jot
    \ialign{\strut\hfil$\displaystyle{##}$&$\displaystyle{{}##}$\hfil
        \crcr#1\crcr}}\,}
  $$\vcenter{\openup\jot\halign{\strut\hfil $\displaystyle{#}$&$\displaystyle{{}#}$\hfil&\quad#\hfil\cr
-c_i &\le \sum_e \left( 1 + \log \vert T_e\vert \right)&(Total cost of the binary search.)\cr
+c_i &\le \sum_e \left( 1 + \log \vert T_e\vert \right)&(Total cost of the binary searches.)\cr
      &\le n_i + \sum_e \log\vert T_e\vert&(We sum over $n_i$ edges.)\cr
      &\le n_i + n_i \cdot {\sum_e \log\vert T_e\vert \over n_i}&(Consider the average of logarithms.) \cr
      &\le n_i + n_i \cdot \log{\sum_e \vert T_e\vert \over n_i}&(Logarithm is concave.) \cr
-    &\le n_i + n_i \cdot \log{m\over n_i}&(Bound the number of tops by queries.) \cr
-    &= n_i \cdot \left( 1 + \log\left({m\over n}\cdot{n\over n_i}\right) \right)\cr
-    &= n_i + n_i\log{m\over n} + n_i\log{n\over n_i}.\cr
+    &\le n_i + n_i \cdot \log{q+n\over n_i}&(Bound the number of tops by queries.) \cr
+    &= n_i \cdot \left( 1 + \log\left({q+n\over n}\cdot{n\over n_i}\right) \right)\cr
+    &= n_i + n_i\log{q+n\over n} + n_i\log{n\over n_i}.\cr
  }}$$
  Summing over all levels, we estimate the total number of comparisons as:
  $$
-c = \sum_i c_i = \left( \sum_i n_i \right) + \left( \sum_i n_i \log{m\over n}\right) + \left( \sum_i n_i \log{n\over n_i}\right).
+c = \sum_i c_i = \left( \sum_i n_i \right) + \left( \sum_i n_i \log{q+n\over n}\right) + \left( \sum_i n_i \log{n\over n_i}\right).
  $$
-The first part is equal to~$n$, the second one to $n\log(m/n)\le m$. For the third
-one, we would like to plug in $n_i \le n/2^i$, but we unfortunately have one~$n_i$
+The first part is equal to~$n$, the second one to $n\log((q+n)/n)\le q+n$. For the third
+one, we would like to plug in the bound $n_i \le n/2^i$, but we unfortunately have one~$n_i$
  in the denominator. We save the situation by observing that the function $f(x)=x\log(n/x)$
  is decreasing\foot{We can easily check the derivative: $f(x)=(x\ln n-x\ln x)/\ln 2$, so $f'(x)\cdot \ln2 =
  \ln n - \ln x - 1$. We want $f'(x)<0$ and therefore $\ln x > \ln n - 1$, i.e., $x > n/e$.}
-for $x > n/e$, so except for $i=0$ and $i=1$ it holds that:
+for $x > n/e$, so for $i\ge 2$ it holds that:
  $$
  n_i\log{n\over n_i} \le {n\over 2^i}\cdot\log{n\over n/2^i} = {n\over 2^i} \cdot i.
  $$
@@ -749,7 +747,7 @@ $$\eqalign{
  }$$
  Putting all three parts together, we conclude that:
  $$
-c \le n + m + \O(n) = \O(n+m). \qedmath
+c \le n + (q+n) + \O(n) = \O(n+q). \qedmath
  $$
  
  \para
@@ -761,35 +759,34 @@ perform $\O(m)$ comparisons of edge weights to determine whether~$T$ is minimum
  and to find all $T$-light edges in~$G$.
  
  \proof
-We first transform the problem to finding the heaviest edges for a~set
+We first transform the problem to finding all peaks of a~set
  of query paths in~$T$ (these are exactly the paths covered by the edges
  of $G\setminus T$). We use the reduction from Lemma \ref{verbranch} to get
  an~equivalent problem with a~full branching tree and a~set of parent-descendant
-paths, which costs $\O(m+n)$ comparisons.
-Then we run the \<FindHeavy> procedure (\ref{findheavy}) to find
-the heaviest edges and according to Lemma \ref{vercompares} it spends
-another $\O(m+n)$ comparisons. Since we (as always) assume that~$G$ is connected,
-$\O(m+n)=\O(m)$.
+paths. The reduction costs $\O(m+n)$ comparisons.
+Then we run the \<FindHeavy> procedure (Algorithm \ref{findheavy}) to find
+the tops of all query paths. According to Lemma \ref{vercompares}, this spends another $\O(m+n)$
+comparisons. Since we (as always) assume that~$G$ is connected, $\O(m+n)=\O(m)$.
  \qed
  
  \rem
-The problem of computing path maxima or minima in a~tree has several other interesting applications,
-such as computing minimum cuts separating given pairs of vertices in a~given weighted undirected
-graph~$G$. We construct a~Gomory-Hu tree~$T$ for the graph as described in \cite{gomoryhu}
+The problem of computing path maxima or minima in a~weighted tree has several other interesting
+applications. One of them is computing minimum cuts separating given pairs of vertices in a~given
+weighted undirected graph~$G$. We construct a~Gomory-Hu tree~$T$ for the graph as described in \cite{gomoryhu}
  (see also \cite{bhalgat:ght} for a~more efficient algorithm running in time
-$\widetilde\O(mn)$ for unit-cost graphs). This tree has the property that for every two
+$\widetilde\O(mn)$ for unit-cost graphs). The crucial property of this tree is that for every two
  vertices $u$, $v$ of the graph~$G$, the minimum-cost edge on $T[u,v]$ has the same cost
  as the minimum cut separating $u$ and~$v$ in~$G$. Since the construction of~$T$ generally
-takes $\Omega(n^2)$ time, we could also precompute the minima for all pairs of vertices
-at no extra cost. This would however require quadratic space, while the method of this
-section needs only $\O(n+q)$ to process~$q$ queries.
+takes $\Omega(n^2)$ time, we could of course invest this time in precomputing the minima for
+all pairs of vertices. This would however require quadratic space, so we can better use
+the method of this section which fits in $\O(n+q)$ space for $q$~queries.
  
  \rem
  A~dynamic version of the problem is also often considered. It calls for a~data structure
-representing a~weighted tree with operations for modifying the structure of the tree
+representing a~weighted forest with operations for modifying the structure of the forest
  and querying minima or maxima on paths. Sleator and Tarjan have shown in \cite{sleator:trees}
-how to do this in $\O(\log n)$ time amortized per operation, which for example
-allows an~implementation of the Dinic's maximum flow algorithm \cite{dinic:flow}
+how to do this in $\O(\log n)$ time amortized per operation, which leads to
+an~implementation of the Dinic's maximum flow algorithm \cite{dinic:flow}
  in time $\O(mn\log n)$.
  
  %--------------------------------------------------------------------------------
@@ -831,7 +828,7 @@ The reductions in Lemma \ref{verbranch} can be performed in time $\O(m)$.
  \para
  Having the reduced problem at hand, we have to implement the procedure \<FindHeavy>
  of Algorithm \ref{findheavy} efficiently. We need a~compact representation of
-the arrays $T_e$ and~$H_e$ by vectors, so that the overhead of the algorithm
+the arrays $T_e$ and~$P_e$ by vectors, so that the overhead of the algorithm
  will be linear in the number of comparisons performed.
  
  \defn
@@ -858,7 +855,7 @@ a~single bit telling whether $t\in T_e$. Each top mask fits in $\lceil\log n\rce
  bits and therefore in a~single machine word. If needed, it can be split to three slots.
  
  \em{Small and big lists:} The heaviest edge found so far for each top is stored
-by the algorithm in the array~$H_e$. Instead of keeping the real array,
+by the algorithm in the array~$P_e$. Instead of keeping the real array,
  we store the labels of these edges in a~list encoded in a~bit string.
  Depending on the size of the list, we use two one of two possible encodings:
  \df{Small lists} are stored in a~vector which fits in a~single slot, with
@@ -868,7 +865,7 @@ length of the list.
  If the data do not fit in a~small list, we use a~\df{big list} instead, which
  is stored in $\O(\log\log n)$ words, each of them containing a~slot-sized
  vector. Unlike the small lists, we use the big lists as arrays. If a~top~$t$ of
-depth~$d$ is active, we keep the corresponding entry of~$H_e$ in the $d$-th
+depth~$d$ is active, we keep the corresponding entry of~$P_e$ in the $d$-th
  field of the big list. Otherwise, we keep that entry unused.
  
  We will want to perform all operations on small lists in constant time,
@@ -936,13 +933,13 @@ including the tops of all query paths which have bottom at the current vertex
  \qed
  
  \lemma\id{verth}%
-The arrays $T_e$ and~$H_e$ can be indexed in constant time.
+The arrays $T_e$ and~$P_e$ can be indexed in constant time.
  
  \proof
  Indexing~$T_e$ is exactly the operation \<FindKth> applied on the corresponding
  top mask~$M_e$.
  
-If $H_e$ is stored in a~big list, we calculate the index of the particular
+If $P_e$ is stored in a~big list, we calculate the index of the particular
  slot and the position of the field inside the slot. This field can be then
  extracted using bit masking and shifts.
  
@@ -953,12 +950,12 @@ encounter a~pointer, we index this big list.
  \qed
  
  \lemma\id{verhe}%
-For an~arbitrary active top~$t$, the corresponding entry of~$H_e$ can be
+For an~arbitrary active top~$t$, the corresponding entry of~$P_e$ can be
  extracted in constant time.
  
  \proof
  We look up the precomputed depth~$d$ of~$t$ first.
-If $H_e$ is stored in a~big list, we extract the $d$-th entry of the list.
+If $P_e$ is stored in a~big list, we extract the $d$-th entry of the list.
  If the list is small, we find the position of the particular field
  by counting bits of the top mask~$M_e$ at position~$d$ and higher
  (this is \<Weight> of $M_e$ with the lower bits masked out).
@@ -972,14 +969,14 @@ where $q_e$~is the number of query paths having~$e$ as its bottommost edge.
  The edge is examined in steps 1, 2, 4 and~5 of the algorithm. Step~4 is trivial
  as we have already computed the top masks and we can reconstruct the entries
  of~$T_e$ in constant time (Lemma \ref{verth}). We have to perform the other
-in constant time if $H_e$ is a~small list or $\O(\log\log n)$ if it is big.
+in constant time if $P_e$ is a~small list or $\O(\log\log n)$ if it is big.
  
-\em{Step 5} involves binary search on~$H_e$ in $\O(\log\vert T_e\vert)$ comparisons,
-each of them indexes~$H_e$, which is $\O(1)$ by Lemma \ref{verth}. Rewriting the
+\em{Step 5} involves binary search on~$P_e$ in $\O(\log\vert T_e\vert)$ comparisons,
+each of them indexes~$P_e$, which is $\O(1)$ by Lemma \ref{verth}. Rewriting the
  lighter edges is $\O(1)$ for small lists by replication and bit masking, for a~big
  list we do the same for each of its slot.
  
-\em{Step 1} looks up $q_e$~tops in~$H_e$ and we already know from Lemma \ref{verhe}
+\em{Step 1} looks up $q_e$~tops in~$P_e$ and we already know from Lemma \ref{verhe}
  how to do that in constant time.
  
  \em{Step 2} is the only non-trivial one. We already know which tops to select
@@ -988,13 +985,13 @@ extract the sublist.
  We have to handle these four cases:
  
  \itemize\ibull
-\:\em{Small from small:} We use \<Select> to find the fields of~$H_p$ which
+\:\em{Small from small:} We use \<Select> to find the fields of~$P_p$ which
  shall be deleted. Then we apply \<SubList> to actually delete them. Pointers
  can be retained as they still refer to the same ancestor list.
  
-\:\em{Big from big:} We can copy the whole~$H_p$, since the layout of the
+\:\em{Big from big:} We can copy the whole~$P_p$, since the layout of the
  big lists is fixed and the items we do not want simply end up as unused
-fields in~$H_e$.
+fields in~$P_e$.
  
  \:\em{Small from big:} We use the operation \<Bits> to construct a~list
  of pointers (we use bit masking to add the ``this is a~pointer'' flags).
author	Martin Mares <mj@ucw.cz>
	Sun, 16 Mar 2008 19:57:48 +0000 (20:57 +0100)
committer	Martin Mares <mj@ucw.cz>
	Sun, 16 Mar 2008 19:57:48 +0000 (20:57 +0100)