Tree isomorphism.

author Martin Mares <mj@ucw.cz>

Sun, 6 Apr 2008 10:52:43 +0000 (12:52 +0200)

committer Martin Mares <mj@ucw.cz>

Sun, 6 Apr 2008 10:52:43 +0000 (12:52 +0200)
author Martin Mares <mj@ucw.cz>
Sun, 6 Apr 2008 10:52:43 +0000 (12:52 +0200)
committer Martin Mares <mj@ucw.cz>
Sun, 6 Apr 2008 10:52:43 +0000 (12:52 +0200)
diff --git a/biblio.bib b/biblio.bib

index b8ae5526015b132489728abce09cec1b2ba6473e..3859f497b51063f9f16c6bb195673a2182ddd7c9 100644 (file)
--- a/biblio.bib
+++ b/biblio.bib
@@ -119,7 +119,7 @@
      volume = "III",
      year = "1926",
      pages = "37--58",
-    note = "Czech with German summary"
+    note = "In Czech with German summary"
  }
  
  @article { boruvka:networks,
@@ -129,7 +129,7 @@
      volume = "15",
      year = "1926",
      pages = "153--154",
-    note = "Czech"
+    note = "In Czech"
  }
  
  @article { jarnik:ojistem,
@@ -139,7 +139,7 @@
      volume = "VI",
      year = "1930",
      pages = "57--63",
-    note = "Czech"
+    note = "In Czech"
  }
  
  @book { tarjan:dsna,
@@ -381,7 +381,7 @@
    volume={206},
    pages={310},
    year={1938},
-  note={French}
+  note={In French}
  }
  
  @incollection { sollin:mst,
@@ -391,7 +391,7 @@
    editor={Berge, C. and Ghouilla-Houri, A.},
    publisher={Wiley, New York},
    year={1965},
-  note={French}
+  note={In French}
  }
  
  @article{ boyer:cutting,
@@ -711,7 +711,7 @@
    pages={265--268},
    year={1967},
    publisher={Springer},
-  note={German}
+  note={In German}
  }
  
  @inproceedings{ gustedt:parallel,
@@ -1223,3 +1223,25 @@
   publisher = {ACM},
   address = {New York, NY, USA},
  }
+
+@article{ dinitz:treeiso,
+  title={{On an algorithm of Zemlyachenko for subtree isomorphism}},
+  author={Dinitz, Y. and Itai, A. and Rodeh, M.},
+  journal={Information Processing Letters},
+  volume={70},
+  number={3},
+  pages={141--146},
+  year={1999},
+  publisher={Elsevier}
+}
+
+@inproceedings{ zemlay:treeiso,
+  title={{Determining tree isomorphism}},
+  author={Zemlayachenko, V. N.},
+  booktitle={{Voprosy Kibernetiki, Proceedings of the Seminar on Combinatorial Mathematics}},
+  location={Moscow 1971},
+  publisher={{Scientific Council of the Complex Problem ``Kibernetika'', Akad. Nauk SSSR}},
+  year={1973},
+  pages={54--60},
+  note={In Russian},
+}
diff --git a/ram.tex b/ram.tex

index c12c87e2b38568eb37bccfc3e25b7bf4d18568ff..038ad0c7735ebd0dbd401cfd27bce7a116f12da5 100644 (file)
--- a/ram.tex
+++ b/ram.tex
@@ -313,7 +313,7 @@ of both trees by their depth by running the depth-first search to calculate the
  depths and bucket-sorting them with $n$~buckets afterwards.
  
  Then we proceed from depth~1 to the maximum depth and for each of them we identify
-the isomorphism classes of subtrees of that particular depth. We will assign some
+the isomorphism equivalence classes of subtrees of that particular depth. We will assign some
  sort of identifiers to the classes; at most~$n+1$ of them are needed as there are
  $n+1$~subtrees in the tree (including the empty subtree). As the PM does not
  have numbers as a~first-class type, we just create a~list of $n+1$~distinct items
@@ -321,24 +321,44 @@ and use pointers to these items as identifiers. Isomorphism of the whole trees
  can be finally decided by comparing the identifiers assigned to their roots.
  
  Suppose that classes of depths $1,\ldots,d-1$ are already computed and we want
-to identify those of depth~$d$. We take a~root of every such tree and label it
-with an~ordered $k$-tuple of identifiers of its subtrees; when it has less than
-$k$ sons, we pad the tuple with empty subtrees. Tuples corresponding to isomorphic
-subtrees are identical up to reordering of elements. We therefore sort the codes
-inside each tuple and then sort the tuples, which brings the equivalent tuples
-together.
+to identify those of depth~$d$. We will denote their number of~$n_d$. We take
+a~root of every such tree and label it with an~ordered $k$-tuple of identifiers
+of its subtrees; when it has less than $k$ sons, we pad the tuple with empty
+subtrees. Tuples corresponding to isomorphic subtrees are identical up to
+reordering of elements. We therefore sort the codes inside each tuple and then
+sort the tuples, which brings the equivalent tuples together.
  
  The first sort (inside the tuples) would be easy on the RAM, but on the PM we
  have no means of comparing two identifiers for anything else than equality.
-We work around this by sorting the set $\{ (x,i,j) \mid \hbox{$x$ is the $i$-th
+To work around this, we sort the set $\{ (x,i,j) \mid \hbox{$x$ is the $i$-th
  element of the $j$-th tuple} \}$ on~$x$, reset all tuples and insert the elements
-back in the increasing order of~$x$.
+back in the increasing order of~$x$, ignoring the original positions. The second
+sort is a~straightforward $k$-pass bucket sort.
+
+If we are not careful, a~single sorting pass takes $\O(n_d + n)$ time, because
+while we have only $n_d$~items to sort, we have to scan all $n$~buckets. This can
+be easily avoided if we realize that the order of buckets does not need to be
+fixed --- in every pass, we can use a~completely different order and it still
+does bring the equivalent tuples together. Thus we can keep a~list of buckets
+which were used in the current pass and look only inside these buckets. This way,
+the pass takes $\O(n_d)$ time only and the whole algorithm is $\O(\sum_d n_d) = \O(n)$.
+
+Our algorithm can be easily modified for trees with unrestricted degrees.
+We replace the fixed $d$-tuples by general sequences of identifiers. The first
+sort does not need any changes. In the second sort, we proceed from the first
+position to the last one and after each bucket sort pass we put aside the sequences
+that have just ended. (They are obviously not equivalent to any other sequences.)
+The second sort is linear in the sum of the lengths of the sequences, which is
+$n_{d+1}$ for depth~$d$. We can therefore decide isomorphism of the whole trees
+in time $\O(\sum_d n_d + n_{d+1}) = \O(n)$.
  
-\FIXME{Finish}
-
-The second sort...
-
-$n$ buckets...
+\rem
+The first linear-time algorithm that partitions all subtrees to isomorphism equivalence
+classes is probably due to Zemlayachenko \cite{zemlay:treeiso}, but it lacks many
+details. Dinitz et al.~\cite{dinitz:treeiso} have recast this algorithm in modern
+terminology and filled the gaps. Our algorithm is easier to formulate than those,
+because it replaces the need for auxiliary data structures by more elaborate bucket
+sorting.
  
  \FIXME{Buchsbaum's trick}
author	Martin Mares <mj@ucw.cz>
	Sun, 6 Apr 2008 10:52:43 +0000 (12:52 +0200)
committer	Martin Mares <mj@ucw.cz>
	Sun, 6 Apr 2008 10:52:43 +0000 (12:52 +0200)
biblio.bib		patch \| blob \| history
ram.tex		patch \| blob \| history