Topological graph computations.

author Martin Mares <mj@ucw.cz>

Sun, 6 Apr 2008 15:59:53 +0000 (17:59 +0200)

committer Martin Mares <mj@ucw.cz>

Sun, 6 Apr 2008 15:59:53 +0000 (17:59 +0200)
author Martin Mares <mj@ucw.cz>
Sun, 6 Apr 2008 15:59:53 +0000 (17:59 +0200)
committer Martin Mares <mj@ucw.cz>
Sun, 6 Apr 2008 15:59:53 +0000 (17:59 +0200)
diff --git a/PLAN b/PLAN

index 308ad010b3557fe676e7912da01a84a70ddaa10f..74de0983f0537df4838ccd0bb32cfed0440bb3b0 100644 (file)
--- a/PLAN
+++ b/PLAN
@@ -74,10 +74,9 @@ Models:
  - mention in-place radix-sorting?
  - consequences of Q-Heaps: Thorup's undirected SSSP etc.
  - add more context from thorup:aczero, also mention FP operations
-- expand the section on radix-sorting, mention Buchsbaum
-- move Q-Heaps to the chapter on the MST's?
  - Tarjan79 is claimed by Pettie to define Pointer machines
  - add references to the C language
+- PM: unify yardsticks
  
  Ranking:
  
diff --git a/adv.tex b/adv.tex

index 1eac7589c8d3bd480159a9e908ab9b5936b18517..abf07c0eb16a1eccdd8280ff91f076658ec2f224 100644 (file)
--- a/adv.tex
+++ b/adv.tex
@@ -536,8 +536,7 @@ They split the adjacency lists of the vertices to small buckets, keep each bucke
  sorted and consider only the lightest edge in each bucket until it is removed.
  The mechanics of the algorithm is complex and there is a~lot of technical details
  which need careful handling, so we omit the description of this algorithm.
-
-\FIXME{Reference to Chazelle.}
+A~better algorithm will be shown in Chapter \ref{optchap}.
  
  %--------------------------------------------------------------------------------
  
@@ -793,7 +792,7 @@ in time $\O(mn\log n)$.
  
  %--------------------------------------------------------------------------------
  
-\section{Verification in linear time}
+\section{Verification in linear time}\id{verifysect}%
  
  We have proven that $\O(m)$ edge weight comparisons suffice to verify minimality
  of a~given spanning tree. Now we will show an~algorithm for the RAM,
@@ -1035,13 +1034,13 @@ which is $\O(m)$ by Theorem \ref{verify}.
  \qed
  
  \rem\id{pmverify}%
-Buchsbaum et al.~have recently shown in \cite{buchsbaum:verify} that linear-time
-verification can be achieved even on the pointer machine. They first solve the
+Buchsbaum et al.~\cite{buchsbaum:verify} have recently shown that linear-time
+verification can be achieved even on the Pointer machine. They first solve the
  problem of finding the lowest common ancestors for a~set of pairs of vertices
-by batch processing. They combine an~algorithm of time complexity $\O(m\alpha(m,n))$
-using the Union-Find data structure with table lookup for small subtrees. Then they use a~similar
-technique for finding the peaks themselves. The tricky part is of course the table
-lookup, which they handle by radix-sorting pointer-based codes of the subtrees.
+by batch processing: They combine an~algorithm of time complexity $\O(m\timesalpha(m,n))$
+based on the Union-Find data structure with the framework of topological graph
+computations developed in Section \ref{bucketsort}. Then they use a~similar
+technique for finding the peaks themselves.
  
  \rem
  The online version of this problem (build a~data structure for a~weighted tree
diff --git a/biblio.bib b/biblio.bib

index 3859f497b51063f9f16c6bb195673a2182ddd7c9..cef5f704cdb85ba70bec2d51de1e01fe96a129c6 100644 (file)
--- a/biblio.bib
+++ b/biblio.bib
@@ -1245,3 +1245,19 @@
    pages={54--60},
    note={In Russian},
  }
+
+@article{ alstrup:marked,
+  title={{Marked ancestor problems}},
+  author={Alstrup, S. and Husfeldt, T. and Rauhe, T.},
+  journal={Foundations of Computer Science, 1998. Proceedings. 39th Annual Symposium on},
+  pages={534--543},
+  year={1998}
+}
+
+@article{ arvind:isomorph,
+  title={{Graph isomorphism is in SPP}},
+  author={Arvind, V. and Kurur, P. P.},
+  journal={Foundations of Computer Science, 2002. Proceedings. The 43rd Annual IEEE Symposium on},
+  pages={743--750},
+  year={2002}
+}
diff --git a/opt.tex b/opt.tex

index b0540e3233425ca9a16f0dad77f54955b02db378..52934ce82c81b9a3a8c8e18b5ec6618a9f2b3fb1 100644 (file)
--- a/opt.tex
+++ b/opt.tex
@@ -2,7 +2,7 @@
  \input macros.tex
  \fi
  
-\chapter{Approaching Optimality}
+\chapter{Approaching Optimality}\id{optchap}%
  
  \section{Soft heaps}
  
@@ -828,35 +828,31 @@ show how to find the optimal trees using brute force. The complexity of the sear
  will be of course enormous, but as we already promised, we will need the optimal
  trees only for very small subgraphs.
  
-\lemman{Construction of optimal decision trees}
-Optimal MST decision trees for all graphs on at most~$k$ vertices can be
-constructed on the Pointer machine in time $\O(2^{2^{4k^2}})$.
+\lemman{Construction of optimal decision trees}\id{odtconst}%
+An~optimal MST decision tree for a~graph~$G$ on~$n$ vertices can be constructed on
+the Pointer machine in time $\O(2^{2^{4n^2}})$.
  
  \proof
-There are $2^{k\choose 2} \le 2^{k^2}$ undirected graphs on~$k$ vertices. Any
-graph on less than~$k$ vertices can be extended to $k$~vertices by adding isolated
-vertices, which obviously do not affect the optimal decision tree.
-
-For every graph~$G$, we will try all possible decision trees of depth at most $2k^2$
-(we know from the previous lemma that the optimal tree is shallower). We can obtain
+We will try all possible decision trees of depth at most $2n^2$
+(we know from the previous lemma that the desired optimal tree is shallower). We can obtain
  any such tree by taking the complete binary tree of exactly this depth
-and labeling its $2\cdot 2^{2k^2}-1$ vertices with comparisons and spanning trees. Those labeled
+and labeling its $2\cdot 2^{2n^2}-1$ vertices with comparisons and spanning trees. Those labeled
  with comparisons become internal vertices of the decision tree, the others
  become leaves and the parts of the tree below them are cut. There are less
-than $k^4$ possible comparisons and less than $2^{k^2}$ spanning trees of~$G$,
+than $n^4$ possible comparisons and less than $2^{n^2}$ spanning trees of~$G$,
  so the number of candidate decision trees is bounded by
-$(k^4+2^{k^2})^{2^{2k^2+1}} \le 2^{(k^2+1)\cdot 2^{2k^2+1}} \le 2^{2^{2k^2+2}} \le 2^{2^{3k^2}}$.
+$(n^4+2^{n^2})^{2^{2n^2+1}} \le 2^{(n^2+1)\cdot 2^{2n^2+1}} \le 2^{2^{2n^2+2}} \le 2^{2^{3n^2}}$.
  
  We will enumerate the trees in an~arbitrary order, test each for correctness and
  find the shallowest tree among those correct. Testing can be accomplished by running
  through all possible permutations of edges, each time calculating the MSF using any
  of the known algorithms and comparing it with the result given by the decision tree.
-The number of permutations does not exceed $(k^2)! \le (k^2)^{k^2} \le k^{2k^2} \le 2^{k^3}$
-and each permutation can be checked in time $\O(\poly(k))$.
+The number of permutations does not exceed $(n^2)! \le (n^2)^{n^2} \le n^{2n^2} \le 2^{n^3}$
+and each permutation can be checked in time $\O(\poly(n))$.
  
-On the Pointer machine, graphs, trees and permutations can be certainly enumerated in time
-$\O(\poly(k))$ per object. The time complexity of the whole algorithm is therefore
-$\O(2^{k^2} \cdot 2^{2^{3k^2}} \cdot 2^{k^3} \cdot \poly(k)) = \O(2^{2^{4k^2}})$.
+On the Pointer machine, trees and permutations can be certainly enumerated in time
+$\O(\poly(n))$ per object. The time complexity of the whole algorithm is therefore
+$\O(2^{2^{3n^2}} \cdot 2^{n^3} \cdot \poly(n)) = \O(2^{2^{4n^2}})$.
  \qed
  
  \paran{Basic properties of decision trees}%
@@ -975,7 +971,7 @@ Taking a~maximum over all choices of~$G$ yields $D(2m,2n) \ge \max_G D(G_2) = 2D
  
  %--------------------------------------------------------------------------------
  
-\section{An optimal algorithm}
+\section{An optimal algorithm}\id{optalgsect}%
  
  Once we have developed the soft heaps, partitioning and MST decision trees,
  it is now simple to state the Pettie's and Ramachandran's MST algorithm \cite{pettie:optimal}
@@ -1034,7 +1030,14 @@ at most~$t$ and at most $m/4$ corrupted edges. It also guarantees that the
  connected components of the union of the $C_i$'s have at least~$t$ vertices
  (unless there is just a~single component).
  
-\FIXME{Decision trees and sorting}
+To apply the decision trees, we will use the framework of topological computations developed
+in Section \ref{bucketsort}. We pad all subgraphs in~$\C$ with isolated vertices, so that they
+have exactly~$t$ vertices. We define the computation so that it labels the graph with a~pointer to
+its optimal decision tree. Then we apply Theorem \ref{topothm} combined with the
+brute-force construction of optimal decision trees from Lemma \ref{odtconst}. Together they guarantee
+that we can assign the decision trees to the subgraphs in time:
+$$\O(\Vert\C\Vert + t^{t(2t+1)} \cdot (2^{2^{4t^2}} + t^2)) = \O(m).$$
+Execution of the decision tree on each subgraph~$C_i$ then takes $\O(D(C_i))$ steps.
  
  The contracted graph~$G_A$ has at most $n/t = \O(n / \log^{(3)}n)$ vertices and asymptotically
  the same number of edges as~$G$, so according to Corollary \ref{ijdens} the Iterated Jarn\'\i{}k's
diff --git a/ram.tex b/ram.tex

index 038ad0c7735ebd0dbd401cfd27bce7a116f12da5..11876213b79cb3d3cbf1f7d787fcbced4d941ec4 100644 (file)
--- a/ram.tex
+++ b/ram.tex
@@ -12,7 +12,7 @@ as a~formalism in which their algorithms are stated. If we were studying
  NP-completeness, we could safely assume that all the models are equivalent,
  possibly up to polynomial slowdown which is negligible. In our case, the
  differences between good and not-so-good algorithms are on a~much smaller
-scale. In this chapter, we will replace the usual ``yardsticks'' by a~micrometer,
+scale. In this chapter, we will replace the usual ``tape measure'' by a~micrometer,
  state our computation models carefully and develop a repertoire of basic
  data structures taking advantage of the fine details of the models.
  
@@ -316,9 +316,10 @@ Then we proceed from depth~1 to the maximum depth and for each of them we identi
  the isomorphism equivalence classes of subtrees of that particular depth. We will assign some
  sort of identifiers to the classes; at most~$n+1$ of them are needed as there are
  $n+1$~subtrees in the tree (including the empty subtree). As the PM does not
-have numbers as a~first-class type, we just create a~list of $n+1$~distinct items
-and use pointers to these items as identifiers. Isomorphism of the whole trees
-can be finally decided by comparing the identifiers assigned to their roots.
+have numbers as a~first-class type, we just create a~``\df{yardstick}'' ---
+a~list of $n+1$~distinct items and we use pointers to these items as identifiers.
+Isomorphism of the whole trees can be finally decided by comparing the
+identifiers assigned to their roots.
  
  Suppose that classes of depths $1,\ldots,d-1$ are already computed and we want
  to identify those of depth~$d$. We will denote their number of~$n_d$. We take
@@ -350,7 +351,15 @@ position to the last one and after each bucket sort pass we put aside the sequen
  that have just ended. (They are obviously not equivalent to any other sequences.)
  The second sort is linear in the sum of the lengths of the sequences, which is
  $n_{d+1}$ for depth~$d$. We can therefore decide isomorphism of the whole trees
-in time $\O(\sum_d n_d + n_{d+1}) = \O(n)$.
+in time $\O(\sum_d (n_d + n_{d+1})) = \O(n)$.
+
+The unification of sequences by bucket sorting will be useful in many
+other situations, so we will state it as a~separate lemma:
+
+\lemman{Unification of sequences}\id{uniflemma}%
+Partitioning of a~collection of sequences $S_1,\ldots,S_n$, whose elements are
+arbitrary pointers and symbols from a~finite alphabet, to equality classes can
+be performed on the Pointer machine in time $\O(n + \sum_i \vert S_i \vert)$.
  
  \rem
  The first linear-time algorithm that partitions all subtrees to isomorphism equivalence
@@ -360,7 +369,118 @@ terminology and filled the gaps. Our algorithm is easier to formulate than those
  because it replaces the need for auxiliary data structures by more elaborate bucket
  sorting.
  
-\FIXME{Buchsbaum's trick}
+\paran{Topological graph computations}%
+Many graph algorithms are based on the idea of so called \df{micro/macro decomposition:}
+We decompose a~graph to subgraphs on roughly~$k$ vertices and solve the problem
+separately inside these ``micrographs'' and in the ``macrograph'' obtained by
+contraction of the micrographs. If $k$~is small enough, many of the micrographs
+are isomorphic, so we can compute the result only once for each isomorphism class
+and recycle it for all micrographs in that class. On the other hand, the macrograph
+is roughly $k$~times smaller than the original graph, so we can use a~less efficient
+algorithm and it will still run in linear time with respect to the size of the original
+graph.
+
+This type of decomposition was traditionally used for trees, especially in the
+algorithms for the Lowest Common Ancestor problem (cf.~Section \ref{verifysect}
+and the survey paper \cite{alstrup:nca}) and for online maintenance of marked ancestors
+(cf.~Alstrup et al.~\cite{alstrup:marked}). Let us take a~glimpse at what happens when
+we set~$k$ to $1/4\cdot\log n$. There are at most $2^{2k} = \sqrt n$ non-isomorphic subtrees,
+because each isomorphism class is uniquely determined by the sequence of $2k$~up/down steps
+performed by depth-first search. Suppose that we are able to decompose the input and identify
+the equivalence classes of microtrees in linear time, then solve the problem in time $\O(\poly(k))$ for
+each microtree and finally in $\O(n'\log n')$ for the macrotree. When we put these pieces
+together, we get an~algorithm with time complexity $\O(n + \sqrt{n}\cdot\poly(\log n) + n/\log n\cdot\log(n/\log n)) = \O(n)$
+for the whole problem.
+
+Decompositions are usually implemented on the RAM, because subgraphs can be easily
+encoded in numbers, which can be then used to index arrays containing precomputed
+results. As the previous algorithm for subtree isomorphism shows, indexing is not
+required for identifying and equivalent microtrees and it can be replaced by bucket
+sorting on the Pointer machine. Buchsbaum et al.~\cite{buchsbaum:verify} have extended
+this technique to general graphs in form of topological graph computations.
+
+\defn
+A~\df{graph computation} takes a~\df{labeled undirected graph} as its input. The labels of both
+vertices and edges can be arbitrary symbols drawn from a~finite alphabet. The output
+of the computation is another labeling of the same graph. This time, the vertices and
+edges can be labeled with not only symbols of the alphabet, but also with alphabet pointers to the vertices
+and edges of the graph given as the input, and possibly also with pointers to outside objects.
+A~graph computation is called \df{topological} iff it produces isomorphic
+outputs for isomorphic inputs. The isomorphism has of course preserve not only
+the structure of the graph, but also the labels in the obvious way.
+
+\obs
+The topological graph computations cover a~great variety of graph problems, ranging
+from searching for spanning trees or Eulerian tours to the Traveling Salesman Problem.
+The MST problem itself however does not lie in this class, because we do not have any means
+of representing the edge weights, unless of course there is only a~fixed amount
+of possible weights.
+
+As in the case of tree decompositions, we would like to identify the equivalent subgraphs
+and process only a~single instance from each equivalence class. The obstacle is that
+graph isomorphism is known to be computationally hard (it is one of the few
+problems that are neither known to lie in~$\rm P$ nor to be $\rm NP$-complete,
+see Arvind and Kurur \cite{arvind:isomorph} for recent results on its complexity).
+We will therefore manage with a~weaker form of equivalence, based on some sort
+of graph encodings:
+
+\defn
+A~\df{canonical encoding} of a~given labeled graph represented by adjancency lists
+can be obtained by running the depth-first search on the graph. When we enter
+a~vertex, we assign an~identifier to it (again using a~yardstick to represent numbers)
+and we append the label of this vertex to the encoding. Then we scan all back edges
+going from this vertex and append the identifiers of their destinations, accompanied
+by the edges' labels. Finally we append a~special terminator to mark the boundary
+between the code of this vertex and its successor.
+
+\obs
+The canonical encoding is well defined in the sense that two non-isomorphic graphs
+receive different encodings. Obviously, encodings of isomorphic graphs can differ,
+depending on the order of vertices and also of the adjacency lists. A~graph
+on~$n$ vertices with $m$~edges is assigned an~encoding of length at most $2n+2m$
+(for each vertex, we record its label and a~single terminator; edges contribute
+by identifiers and labels). This encoding can be constructed in linear time.
+Let us use the encodings for unification of graphs:
+
+\lemman{Unification of graphs}\id{uniflemma}%
+A~collection~$\C$ of labeled graphs can be partitioned into classes which
+share the same canonical encoding in time $\O(\Vert\C\Vert)$, where $\Vert\C\Vert$
+is the total size of the collection, i.e., $\sum_{G\in\C} n(G) + m(G)$.
+
+\para
+When we have to perform a~topological computation on a~collection of graphs
+on $k$~vertices, we first precompute its result for all possible canonical
+encodings (the \df{generic graphs}) and then we use unification to match
+the actual graphs to the generic graphs. This gives us the following theorem:
+
+\thmn{Batched topological computations, Buchsbaum et al.~\cite{buchsbaum:verify}}\id{topothm}%
+Suppose we have a~topological graph computation~$\cal T$ that can be performed in time
+$T(k)$ for graphs on $k$~vertices. Then we can run~$\cal T$ on a~collection~$\C$
+of graphs on~$k$ vertices in time $\O(\Vert\C\Vert + (k+s)^{k(2k+1)}\cdot (T(k)+k^2))$,
+where~$s$ is the number of symbols used as vertex/edge labels.
+
+\proof
+A~graph on~$k$ vertices has less than~$k^2/2$ edges, so the canonical encodings of
+all such graphs are shorter than $2k + 2k^2/2 = k(2k+1)$. Each element of the encoding
+is either a~vertex identifier or a~symbol, so it can attain at most $k+s$ possible values.
+We can therefore enumerate all possible encodings, convert them to the collection $\cal G$
+of all generic graphs and run the computation on them in time $\O(\vert{\cal G}\vert \cdot T(k))
+= \O((k+s)^{k(2k+1)}\cdot T(k))$.
+
+We then use the Unification lemma (\ref{uniflemma}) on the union of the collections
+$\C$ and~$\cal G$ to match the generic graphs with the equivalent graphs in~$\C$
+in time $\O(\Vert\C\Vert + \Vert{\cal G}\Vert) = \O(\Vert\C\Vert + \vert{\cal G}\vert \cdot k^2)$.
+Finally we create a~copy of the generic result for each of the actual graphs.
+If the computation uses pointers to input vertices in its output, this involves
+redirecting them to the actual input vertices, but we can do that by associating
+the output vertices that refer to an~input vertex with the corresponding places
+in the encoding of the input graph. The whole output can be generated in time
+$\O(\Vert\C\Vert + \Vert{\cal G}\Vert)$.
+\qed
+
+\rem
+The topological computations and the Unification lemma will play important
+roles in Sections \ref{verifysect} and \ref{optalgsect}.
  
  %--------------------------------------------------------------------------------
author	Martin Mares <mj@ucw.cz>
	Sun, 6 Apr 2008 15:59:53 +0000 (17:59 +0200)
committer	Martin Mares <mj@ucw.cz>
	Sun, 6 Apr 2008 15:59:53 +0000 (17:59 +0200)
PLAN		patch \| blob \| history
adv.tex		patch \| blob \| history
biblio.bib		patch \| blob \| history
opt.tex		patch \| blob \| history
ram.tex		patch \| blob \| history