Analysis of ET-trees.

[saga.git] / dyn.tex
diff --git a/dyn.tex b/dyn.tex

index ea51c44f2922ad7f483462c273ccbc580f62e47f..967c86675473d9c19a272b648cac1492b6174fc8 100644 (file)
--- a/dyn.tex
+++ b/dyn.tex
@@ -11,53 +11,53 @@ of graphs that differ only a~little, so recomputing the solution from scratch fo
  every graph would be a~waste of time. In such cases, we usually turn our attention
  to \df{dynamic graph algorithms.} A~dynamic algorithm is in fact a~data structure
  that remembers a~graph and offers operations that modify the structure of the graph
-(let's say by insertion and deletion of edges) and also operations that query the
-result of the problem for the current state of the graph.
-A~typical example of a~problem solved by such algorithms is dynamic maintenance of
-connected components:
+and also operations that query the result of the problem for the current state
+of the graph. A~typical example of a~problem of this kind is dynamic
+maintenance of connected components:
  
  \problemn{Dynamic connectivity}
  Maintain an~undirected graph under a~sequence of the following operations:
  \itemize\ibull
-\:$\<Init>(n)$ --- create a~graph with $n$~isolated vertices $\{1,\ldots,n\}$,\foot{%
+\:$\<Init>(n)$ --- Create a~graph with $n$~isolated vertices $\{1,\ldots,n\}$.\foot{%
  The structure could support dynamic addition and removal of vertices, too,
-but this is easy to add and infrequently used, so we will rather define
-the structure with a~fixed set of vertices for clarity.}
-\:$\<Insert>(G,u,v)$ --- insert an~edge $uv$ to~$G$ and return its unique
-identifier (assuming that the edge did not exist yet),
-\:$\<Delete>(G,e)$ --- delete an~edge specified by its identifier from~$G$,
-\:$\<Connected>(G,u,v)$ --- test if $u$ and~$v$ are in the same connected component of~$G$.
+but this is easy to add and infrequently used, so we will rather keep the set
+of vertices fixed for clarity.}
+\:$\<Insert>(G,u,v)$ --- Insert an~edge $uv$ to~$G$ and return its unique
+identifier. This assumes that the edge did not exist yet.
+\:$\<Delete>(G,e)$ --- Delete an~edge specified by its identifier from~$G$.
+\:$\<Connected>(G,u,v)$ --- Test if $u$ and~$v$ are in the same connected component of~$G$.
  \endlist
  
  \para
  We have already encountered a~special case of dynamic connectivity when implementing the
  Kruskal's algorithm in Section \ref{classalg}. At that time, we did not need to delete
  any edges from the graph, which makes the problem substantially easier. This special
-case is sometimes called an~\df{incremental} or \df{semidynamic} graph algorithm.
-We mentioned the Union-Find data structure of Tarjan and van Leeuwen \cite{tarjan:setunion}
-which can be used for that: queries on connectedness translate to \<Find>, edge
-insertions to \<Find> followed by \<Union> if the edge connects two different
-components. This way, a~sequence of $m$~operations starting with an~empty graph
-on $n$~vertices is processed in time $\O(m\timesalpha(m,n))$ and this works even
-on the Pointer Machine. Fredman and Saks \cite{fredman:cellprobe} have proven
-a~matching lower bound in the cell-probe model which is stronger than RAM with
-$\O(\log n)$-bit words.
+case is customarily called an~\df{incremental} or \df{semidynamic} graph algorithm.
+We mentioned the Disjoint Set Union data structure of Tarjan and van Leeuwen (Theorem \ref{dfu})
+which can be used for that: Connected components are represented as an~equivalence classes.
+Queries on connectedness translate to \<Find>, edge insertions to \<Find>
+followed by \<Union> if the new edge joins two different components. This way,
+a~sequence of $m$~operations starting with an~empty graph on $n$~vertices is
+processed in time $\O(n+m\timesalpha(m,n))$ and this holds even for the Pointer
+Machine. Fredman and Saks \cite{fredman:cellprobe} have proven a~matching lower
+bound in the cell-probe model which is stronger than RAM with $\O(\log n)$-bit
+words.
  
  The edges that have triggered the \<Union>s form a~spanning forest of the current graph.
  So far, all known algorithms for dynamic connectivity maintain some sort of a~spanning
  tree. This suggests that a~dynamic MST algorithm could be obtained by modifying the
  dynamic connectivity algorithms. This will indeed turn out to be true. Semidynamic MST
-is easy to achieve before the end of this section, but making it fully dynamic will require
-more effort, so we will review some of the existing data structures in the meantime.
+is easy to achieve even in the few pages of this section, but making it fully dynamic will require
+more effort, so we will review some of the required building blocks before going into that.
  
  We however have to answer one important question first: What should be the output of
  our MSF data structure? Adding an~operation that would return the MSF of the current
  graph is of course possible, but somewhat impractical as this operation has to
  spend $\Omega(n)$ time on the mere writing of its output. A~better way seems to
-be to make the \<Insert> and \<Delete> operations report the list of modifications
-of the MSF caused by the change in the graph.
+be making the \<Insert> and \<Delete> operations report the list of modifications
+of the MSF implied by the change in the graph.
  
-Let us see what happens when we \<Insert> an~edge~$e$ to a~graph~$G$ with a~minimum spanning
+Let us see what happens when we \<Insert> an~edge~$e$ to a~graph~$G$ with its minimum spanning
  forest~$F$, obtaining a~new graph~$G'$ with its MSF~$F'$. If $e$~connects two components of~$G$ (and
  therefore also of~$F$), we have to add~$e$ to~$F$. Otherwise, one of the following cases happens:
  Either $e$~is $F$-heavy and so the forest~$F$ is also the MSF of the new graph. Or it is $F$-light
@@ -68,43 +68,261 @@ Correctness of the former case follows immediately from the Theorem on Minimalit
  minimum. In the latter case, the edge~$f$ is not contained in~$F'$ because it is the heaviest
  on the cycle $F[e]+e$ (by the Red lemma, \ref{redlemma}). We can now use the Blue lemma
  (\ref{bluelemma}) to prove that it should be replaced with~$e$. Consider the tree~$T$
-of~$F$ containing both endpoints of the edge~$e$. When we remove~$f$ from~$F$, this tree falls
+of~$F$ that contains both endpoints of the edge~$e$. When we remove~$f$ from~$F$, this tree falls
  apart to two components $T_1$ and~$T_2$. The edge~$f$ was the lightest edge of the cut~$\delta_G(T_1)$
  and $e$~is lighter than~$f$, so $e$~is the lightest in~$\delta_{G'}(T_1)$ and hence $e\in F'$.
  
  A~\<Delete> of an~edge that is not contained in~$F$ does not change~$F$. When we delete
  an~MSF edge, we have to reconnect~$F$ by choosing the lightest edge of the cut separating
-the new components. If there is no such replacement edge, we have deleted a~bridge,
-so the MSF has to remain disconnected.
+the new components (again the Blue lemma in action). If there is no such
+replacement edge, we have deleted a~bridge, so the MSF has to remain
+disconnected.
  
  The idea of reporting differences in the MSF indeed works very well. We can summarize
-what we have shown by the following lemma:
+what we have shown in the following lemma and use it to define the dynamic MSF.
  
  \lemma
  An~\<Insert> or \<Delete> of an~edge in~$G$ causes at most one edge addition, edge
  removal or edge exchange in $\msf(G)$.
  
+\problemn{Dynamic minimum spanning forest}
+Maintain an~undirected graph with distinct weights on edges (drawn from a~totally ordered set)
+and its minimum spanning forest under a~sequence of the following operations:
+\itemize\ibull
+\:$\<Init>(n)$ --- Create a~graph with $n$~isolated vertices $\{1,\ldots,n\}$.
+\:$\<Insert>(G,u,v)$ --- Insert an~edge $uv$ to~$G$. Return its unique
+       identifier and the list of additions and deletions of edges in $\msf(G)$.
+\:$\<Delete>(G,e)$ --- Delete an~edge specified by its identifier from~$G$.
+       Return the list of additions and deletions of edges in $\msf(G)$.
+\endlist
+
  \paran{Semidynamic MSF}%
-To obtain a~semidynamic MSF algorithm, we need to keep the forest in a~data structure which
+To obtain a~semidynamic MSF algorithm, we need to keep the forest in a~data structure that
  supports search for the heaviest edge on the path connecting a~given pair
  of vertices. This can be handled efficiently by the Link-Cut trees of Sleator and Tarjan:
  
-\thmn{Link-Cut Trees, Sleator and Tarjan \cite{sleator:trees}}
+\thmn{Link-Cut Trees, Sleator and Tarjan \cite{sleator:trees}}\id{sletar}%
  There is a~data structure that represents a~forest of rooted trees on~$n$ vertices.
  Each edge of the forest has a~weight drawn from a~totally ordered set. The structure
-supports the following operations in time $\O(\log n)$ amortized:
+supports the following operations in time $\O(\log n)$ amortized:\foot{%
+The Link-Cut trees offer many other operations, but we do not mention them
+as they are not needed in our application.}
+\itemize\ibull
+\:$\<Parent>(v)$ --- Return the parent of~$v$ in its tree or \<null> if $v$~is a~root.
+\:$\<Root>(v)$ --- Return the root of the tree containing~$v$.
+\:$\<Weight>(v)$ --- Return the weight of the edge $(\<Parent(v)>,v)$.
+\:$\<PathMax>(v)$ --- Return the vertex~$w$ closest to $\<Root>(v)$ such that the edge
+       $(\<Parent>(w),w)$ is the heaviest of those on the path from the root to~$v$.
+       If more edges have the maximum weight, break the tie arbitrarily.
+       If there is no such edge ($v$~is the root itself), return \<null>.
+\:$\<Link>(u,v,x)$ --- Connect the trees containing $u$ and~$v$ by an~edge $(u,v)$ of
+       weight~$x$. Assumes that $u~$is a tree root and $v$~lies in a~different tree.
+\:$\<Cut>(v)$ --- Split the tree containing the non-root vertex $v$ to two trees by
+       removing the edge $(\<Parent>(v),v)$. Returns the weight of this edge.
+\:$\<Evert>(v)$ --- Modify the orientations of the edges in the tree containing~$v$
+       to make~$v$ the tree's root.
+\endlist
  
-XXX
+%% \>Additionally, all edges on the path from~$v$ to $\<Root>(v)$ can be enumerated in
+%% time $\O(\ell + \log n)$, where $\ell$~is the length of that path. This operation
+%% (and also the path itself) is called $\<Path>(v)$.
+%% 
+%% \>If the weights are real numbers (or in general an~arbitrary group), the $\O(\log n)$
+%% operations also include:
+%% 
+%% \itemize\ibull
+%% \:$\<PathWeight>(v)$ --- Return the sum of the weights on $\<Path>(v)$.
+%% \:$\<PathUpdate>(v,x)$ --- Add~$x$ to the weights of all edges on $\<Path>(v)$.
+%% \endlist
  
  \proof
  See \cite{sleator:trees}.
  \qed
  
+Once we have this structure, we can turn our ideas on updating of the MSF to
+an~incremental algorithm:
+
+\algn{\<Insert> in a~semidynamic MSF}
+\algo
+\algin A~graph~$G$ with its MSF $F$ represented as a~Link-Cut forest, an~edge~$uv$
+with weight~$w$ to be inserted.
+\:$\<Evert>(u)$. \cmt{$u$~is now the root of its tree.}
+\:If $\<Root>(v) \ne u$: \cmt{$u$~and~$v$ lie in different trees.}
+\::$\<Link>(u,v,w)$. \cmt{Connect the trees.}
+\::Return ``$uv$ added''.
+\:Otherwise: \cmt{both are in the same tree}
+\::$y\=\<PathMax>(v)$.
+\::$x\=\<Parent>(y)$.  \cmt{Edge~$xy$ is the heaviest on $F[uv]$.}
+\::If $\<Weight>(y) > w$: \cmt{We have to exchange~$xy$ with~$uv$.}
+\:::$\<Cut>(y)$, $\<Evert>(v)$, $\<Link>(v,y,w)$.
+\:::Return ``$uv$~added, $xy$~removed''.
+\::Otherwise return ``no changes''.
+\algout The list of changes in~$F$.
+\endalgo
+
+\thmn{Incremental MSF}
+When only edge insertions are allowed, the dynamic MSF can be maintained in time $\O(\log n)$
+amortized per operation.
+
+\proof
+Every \<Insert> performs $\O(1)$ operations on the Link-Cut forest, which take
+$\O(\log n)$ each by Theorem \ref{sletar}.
+\qed
  
+\rem
+We can easily extend the semidynamic MSF algorithm to allow an~operation commonly called
+\<Backtrack> --- removal of the most recently inserted edge. It is sufficient to keep the
+history of all MSF changes in a~stack and reverse the most recent change upon backtrack.
+
+What are the obstacles to making the structure fully dynamic?
+Deletion of edges that do not belong to the MSF is trivial (we do not
+need to change anything) and so is deletion of bridges (we just remove the bridge
+from the Link-Cut tree, knowing that there is no edge to replace it). The hard part
+is the search for replacement edges after an~edge of the MSF is deleted.
+
+This very problem also has to be solved by algorithms for fully dynamic connectivity,
+we will take a~look on them first.
  
  %--------------------------------------------------------------------------------
  
-\section{ET trees}
+\section{Eulerian Tour trees}
+
+An~important stop on the road to fully dynamic algorithms has the name \df{Eulerian Tour trees} or
+simply \df{ET-trees}. It is a~representation of forests introduced by Henzinger and King
+\cite{henzinger:randdyn} in their randomized dynamic algorithms. It is similar to the one by Sleator
+and Tarjan, but it is much simpler and instead of path operations it offers efficient operations on
+subtrees. It is also possible to attach auxiliary data to vertices and edges of the original tree.
+
+\defn\id{eulseq}%
+The \df{Eulerian Tour sequence} $\Eul(T)$ of a~rooted tree~$T$ is the sequence of vertices of~$T$ as visited
+by the depth-first traversal of~$T$. More precisely, it is generated by the following algorithm $\<ET>(v)$
+when it is invoked on the root of the tree:
+\algo
+\:Record~$v$ in the sequence.
+\:For each son~$w$ of~$v$:
+\::Call $\<ET>(w)$ recursively.
+\::Record~$w$.
+\endalgo
+\>One of the occurrences of each vertex is defined as its \df{active occurrence} and it will
+be used to store auxiliary data associated with the vertex.
+
+\obs
+The ET-sequence contains a~vertex of degree~$d$ exactly $d$~times except for the root which
+occurs $d+1$ times. The whole sequence therefore contains $2n-1$ elements. It indeed describes the
+order vertices on an~Eulerian tour in the tree with all edges doubled. Let us observe what happens
+to the ET-sequence when we modify the tree.
+
+When we \em{delete} an~edge $uv$ from the tree~$T$ (let $u$~be the parent of~$v$), the sequence
+$\Eul(T) = AuvBvuC$ (with no~$u$ nor~$v$ in~$B$) splits to two ET-sequences $AuC$ and $vBv$.
+If there was only a~single occurrence of~$v$, it corresponded to a~leaf and thus the second
+sequence should consist of $v$~alone.
+
+\em{Changing the root} of the tree~$T$ from~$v$ to~$w$ changes $\Eul(T)$ from $vAwBwCv$ to $wBwCvAw$.
+If $w$~was a~leaf, the sequence changes from $vAwCv$ to $wCvAw$. If $vw$ was the only edge of~$T$,
+the sequence $vw$ becomes $wv$. Note that this works regardless of the presence of~$w$ inside~$B$.
+
+\em{Joining} the roots of two trees by a~new edge makes their ET-sequences $vAv$ and~$wBw$
+combine to $vAvwBwv$. Again, we have to handle the cases when $v$ or~$w$ has degree~1 separately:
+$v$~and~$wBw$ combine to $vwBwv$, and $v$~with~$w$ makes $vwv$.
+
+If any of the occurrences that we have removed from the sequence was active, there is always
+a~new occurrence of the same vertex that can stand in its place and inherit the auxiliary data.
+
+The ET-trees will represent the ET-sequences by $(a,b)$-trees with the parameter~$a$ set upon
+initialization of the structure and with $b=2a$. We know from the standard theorems of $(a,b)$-trees
+(see for example \cite{clrs}) that the depth of a~tree with $n$~leaves is always $\O(\log_a n)$
+and that all basic operations including insertion, deletion, search, splitting and joining the trees
+run in time $\O(a\log_a n)$ in the worst case.
+
+We will use the ET-trees to maintain a~spanning forest of the current graph. The auxiliary data of
+each vertex will hold a~list of edges incident with the given vertex, which do not lie in the
+forest. Such edges are usually called the \df{non-tree edges.}
+
+\defn
+\df{Eulerian Tour trees} are a~data structure that represents a~forest of trees and a~set of non-tree
+edges associated with the vertices of the forest. To avoid confusion, we will distinguish between
+\df{original} vertices and edges (of the original trees) and the vertices and edges of the
+data structure. The structure consists of:
+\itemize\ibull
+\:A~collection of $(a,b)$-trees with some fixed parameters $a$ and~$b$.
+       Each such tree corresponds to one of the original trees~$T$. Its
+       leaves (in the usual tree order) correspond to the elements
+       of the ET-sequence $\Eul(T)$. Each two consecutive leaves $u$ and~$v$ are separated
+       by a~unique key stored in an~internal vertex of the $(a,b)$-tree. This key is used to represent
+       the original edge~$uv$. Each original edge is therefore kept in both its orientations.
+\:A~mapping $\<act>(v)$ that maps each original vertex to the leaf containing its active occurrence.
+\:A~mapping $\<edge>(e)$ that maps an~original edge~$e$ to one of the internal keys representing it.
+\:A~mapping $\<twin>(k)$ that maps an~internal key~$k$ to the other internal key of the same
+       original edge.
+\:A~list of non-tree edges placed in each leaf. The lists are allowed to be non-empty only
+       in the leaves that represent active occurrences of original vertices.
+\:Boolean \df{markers} in the internal vertices that signal presence of a~non-tree
+       edge anywhere in the subtree rooted at that internal vertex.
+\endlist
+\>The structure supports the following operations on the original trees:
+\itemize\ibull
+\:\<Create> --- Create a~single-vertex tree.
+\:$\<Link>(u,v)$ --- Join two different trees by an~edge~$uv$ and return a~unique identifier
+       of this edge.
+\:$\<Cut>(e)$ --- Split a~tree by removing the edge~$e$ given by its identifier.
+\:$\<Root>(v)$ --- Return the root of the ET-tree containing the vertex~$v$. This can be used
+       to test whether two vertices lie in the same tree. However, the root is not guaranteed
+       to stay the same when the tree is modified by a~\<Link> or \<Cut>.
+\:$\<InsertNontree>(v,e)$ --- Add a~non-tree edge~$e$ to the list at~$v$ and return a~unique
+       identifier of this edge.
+\:$\<DeleteNontree>(e)$ --- Delete a~non-tree edge~$e$ given by its identifier.
+\:$\<ScanNontree>(v)$ --- Return non-tree edges stored in the same tree as~$v$.
+\endlist
+
+\impl
+We will implement the operations on the ET-trees by translating the intended changes of the
+ET-sequences to operations on the $(a,b)$-trees. The role of identifiers of the original vertices
+and edges will be of course played by pointers to the respective leaves and internal keys of
+the $(a,b)$-trees.
+
+\<Cut> of an~edge splits the $(a,b)$-tree at both internal keys representing the given edge
+and joins them back in the different order.
+
+\<Link> of two trees can be accomplished by making both vertices the roots of their trees first
+and joining the roots by an~edge afterwards. Re-rooting again involves splits and joins of $(a,b)$-trees.
+As we can split at any occurrence of the new root vertex, we will use the active occurrence
+which we remember. Joining of the roots translated to joining of the $(a,b)$-trees.
+
+\<Root> just follows parent pointers in the $(a,b)$-tree and it walks the path from the leaf
+to the root.
+
+\<InsertNontree> finds the leaf $\<act>(v)$ containing the list of $v$'s non-tree edges
+and inserts the new edge there. The returned identifier will consist from the pointer to
+the edge and the vertex in whose list it is stored. Then we have to recalculate the markers
+on the path from $\<act>(v)$ to the root. \<DeleteNontree> is analogous. Whenever any
+other operation changes a~vertex of the tree, it will also update its marker and, if necessary,
+the markers on the path to the root.
+
+\<ScanNontree> traverses the tree recursively from the root, but it does not enter the
+subtrees whose roots are not marked.
+
+Analysis of the time complexity is now straightforward:
+
+\thmn{Eulerian Tour trees, Henzinger and Rauch \cite{henzinger:randdyn}}
+The ET-trees perform the operations \<Link> and \<Cut> in time $\O(a\log_a n)$, \<Create>
+in $\O(1)$, \<Root>, \<InsertNontree>, and \<DeleteNontree> in $\O(\log_a n)$, and
+\<ScanNontree> in $\O(a\log_a n)$ per edge reported. Here $n$~is the number of vertices
+in the original forest and $a\ge 2$ is an~arbitrary constant.
+
+\proof
+We set $b=2a$. Our implementation performs $\O(1)$ operations on the $(a,b)$-trees
+per operation on the ET-tree, plus $\O(1)$ other operations. We apply the standard theorems
+on the complexity of $(a,b)$-trees \cite{clrs}.
+\qed
+
+\examplen{Connectivity acceleration}
+In most cases, the ET-trees are used with $a$~constant, but sometimes choosing~$a$ as a~function
+of~$n$ can also have its beauty. Suppose that there is a~data structure which maintains an~arbitrary
+spanning forest of a~dynamic graph. Suppose also that the structure works in time $\O(\log^k n)$
+per operation and that it reports $\O(1)$ changes in the spanning forest for every change
+in the graph. If we keep the spanning forest in ET-trees with $a=\log n$, the updates of the
+data structure cost an~extra $\O(\log^2 n / \log\log n)$, but queries accelerate to $\O(\log
+n/\log\log n)$.
  
  
  \endpart