From: Martin Mares <mj@ucw.cz>
Date: Mon, 7 Apr 2008 14:38:20 +0000 (+0200)
Subject: Cleaned up topological computations and unification.
X-Git-Tag: printed~122
X-Git-Url: http://mj.ucw.cz/gitweb/?a=commitdiff_plain;h=4277fa92b24cbc04758c30d894249bce24790529;p=saga.git

Cleaned up topological computations and unification.
---

diff --git a/PLAN b/PLAN
index 74de098..342bf81 100644
--- a/PLAN
+++ b/PLAN
@@ -98,6 +98,7 @@ Notation:
 - change the notation for contractions -- use double slash instead of the dot?
 - introduce \widehat\O early
 - unify { x ; ... }, { x | ...} and { x : ... }
+- capitalize Pointer Machine
 
 Varia:
 
diff --git a/ram.tex b/ram.tex
index af55492..6691434 100644
--- a/ram.tex
+++ b/ram.tex
@@ -304,6 +304,13 @@ Each such structure will be endowed with a~pointer to the head of the list of it
 the corresponding bucket. Inserting an~edge to a~bucket can be then done in constant time
 and scanning the contents of all~$n$ buckets takes $\O(n+m)$ time.
 
+At last, we must not forget that while it was easy to \df{normalize} the pairs on the RAM
+by putting the smaller identifier first, this fails on the PM because we can directly
+compare the identifiers only for equality. We can work around this again by bucket-sorting:
+we sort the multiset $\{ (x,i) \mid \hbox{$x$~occurs in the $i$-th pair} \}$ on~$x$.
+Then we reset all pairs and re-insert the values back in their increasing order.
+This is also $\O(n+m)$.
+
 \paran{Tree isomorphism}%
 Another nice example of pointer-based radix sorting is a~Pointer machine algorithm for
 deciding whether two rooted trees are isomorphic. Let us assume for a~moment that
@@ -329,11 +336,8 @@ reordering of elements. We therefore sort the codes inside each tuple and then
 sort the tuples, which brings the equivalent tuples together.
 
 The first sort (inside the tuples) would be easy on the RAM, but on the PM we
-have no means of comparing two identifiers for anything else than equality.
-To work around this, we sort the set $\{ (x,i,j) \mid \hbox{$x$ is the $i$-th
-element of the $j$-th tuple} \}$ on~$x$, reset all tuples and insert the elements
-back in the increasing order of~$x$, ignoring the original positions. The second
-sort is a~straightforward $k$-pass bucket sort.
+have to use the normalization trick mentioned above. The second sort is
+a~straightforward $k$-pass bucket sort.
 
 If we are not careful, a~single sorting pass takes $\O(n_d + n)$ time, because
 while we have only $n_d$~items to sort, we have to scan all $n$~buckets. This can
@@ -356,7 +360,7 @@ in time $\O(\sum_d (n_d + n_{d+1})) = \O(n)$.
 The unification of sequences by bucket sorting will be useful in many
 other situations, so we will state it as a~separate lemma:
 
-\lemman{Unification of sequences}\id{uniflemma}%
+\lemman{Sequence unification}\id{suniflemma}%
 Partitioning of a~collection of sequences $S_1,\ldots,S_n$, whose elements are
 arbitrary pointers and symbols from a~finite alphabet, to equality classes can
 be performed on the Pointer machine in time $\O(n + \sum_i \vert S_i \vert)$.
@@ -436,18 +440,27 @@ by the edges' labels. Finally we append a~special terminator to mark the boundar
 between the code of this vertex and its successor.
 
 \obs
-The canonical encoding is well defined in the sense that non-isomorphic graphs always
+The canonical encoding is well defined in the sense that non-iso\-morphic graphs always
 receive different encodings. Obviously, encodings of isomorphic graphs can differ,
 depending on the order of vertices and also of the adjacency lists. A~graph
 on~$n$ vertices with $m$~edges is assigned an~encoding of length at most $2n+2m$ ---
 for each vertex, we record its label and a~single terminator; edges contribute
 by identifiers and labels. These encodings can be constructed in linear time and
-we will use them for our unification of graphs:
+in the same time we can also create a~graph corresponding to any encoding.
+We will use the encodings for our unification of graphs:
+
+\defn
+For a~collection~$\C$ of graphs, we define $\vert\C\vert$ as the number of graphs in
+the collection and $\Vert\C\Vert$ as their total size, i.e., $\Vert\C\Vert = \sum_{G\in\C} n(G) + m(G)$.
 
-\lemman{Unification of graphs}\id{uniflemma}%
-A~collection~$\C$ of labeled graphs can be partitioned into classes which
-share the same canonical encoding in time $\O(\Vert\C\Vert)$, where $\Vert\C\Vert$
-is the total size of the collection, i.e., $\sum_{G\in\C} n(G) + m(G)$.
+\lemman{Graph unification}\id{guniflemma}%
+A~collection~$\C$ of labeled graphs can be partitioned into classes which share the same
+canonical encoding in time $\O(\Vert\C\Vert)$ on the Pointer machine.
+
+\proof
+Construct canonical encodings of all the graphs and then apply the Sequence unification lemma
+(\ref{suniflemma}) on them.
+\qed
 
 \para
 When we want to perform a~topological computation on a~collection~$\C$ of graphs
@@ -467,23 +480,24 @@ A~graph on~$k$ vertices has less than~$k^2/2$ edges, so the canonical encodings
 all such graphs are shorter than $2k + 2k^2/2 = k(k+2)$. Each element of the encoding
 is either a~vertex identifier, or a~symbol, or a~separator, so it can attain at most $k+s$
 possible values for some fixed~$s$.
-We can therefore enumerate all possible encodings, convert them to a~collection $\cal G$
-of all generic graphs and run the computation on all of them in time $\O(\vert{\cal G}\vert \cdot T(k))
-= \O((k+s)^{k(k+2)}\cdot T(k))$.
+We can therefore enumerate all possible encodings and convert them to a~collection $\cal G$
+of all generic graphs such that $\vert{\cal G}\vert \le (k+s)^{k(k+2)}$ and $\Vert{\cal G}\Vert
+\le \vert{\cal G}\vert \cdot k^2$.
 
-Then we use the Unification lemma (\ref{uniflemma}) on the union of the collections
+We run the computation on all generic graphs in time $\O(\vert{\cal G}\vert \cdot T(k))$
+and then we use the Unification lemma (\ref{guniflemma}) on the union of the collections
 $\C$ and~$\cal G$ to match the generic graphs with the equivalent actual graphs in~$\C$
-in time $\O(\Vert\C\Vert + \Vert{\cal G}\Vert) = \O(\Vert\C\Vert + \vert{\cal G}\vert \cdot k^2)$.
+in time $\O(\Vert\C\Vert + \Vert{\cal G}\Vert)$.
 Finally we create a~copy of the generic result for each of the actual graphs.
 If the computation uses pointers to the input vertices in its output, we have to
-redirect them to the actual input vertices, but we can do that by associating
+redirect them to the actual input vertices, which we can do by associating
 the output vertices that refer to an~input vertex with the corresponding places
 in the encoding of the input graph. This way, the whole output can be generated in time
 $\O(\Vert\C\Vert + \Vert{\cal G}\Vert)$.
 \qed
 
 \rem
-The topological computations and the Unification lemma will play important
+The topological computations and the Graph unification lemma will play important
 roles in Sections \ref{verifysect} and \ref{optalgsect}.
 
 %--------------------------------------------------------------------------------