More on bucket sorting (unfinished).

author Martin Mares <mj@ucw.cz>

Sat, 5 Apr 2008 21:42:17 +0000 (23:42 +0200)

committer Martin Mares <mj@ucw.cz>

Sat, 5 Apr 2008 21:42:17 +0000 (23:42 +0200)
author Martin Mares <mj@ucw.cz>
Sat, 5 Apr 2008 21:42:17 +0000 (23:42 +0200)
committer Martin Mares <mj@ucw.cz>
Sat, 5 Apr 2008 21:42:17 +0000 (23:42 +0200)
diff --git a/PLAN b/PLAN

index 04e4429649a34bb9782a0402ed4dc5818fe82892..308ad010b3557fe676e7912da01a84a70ddaa10f 100644 (file)
--- a/PLAN
+++ b/PLAN
@@ -77,6 +77,7 @@ Models:
  - expand the section on radix-sorting, mention Buchsbaum
  - move Q-Heaps to the chapter on the MST's?
  - Tarjan79 is claimed by Pettie to define Pointer machines
+- add references to the C language
  
  Ranking:
  
@@ -97,6 +98,7 @@ Notation:
  - use calligraphic letters from ams?
  - change the notation for contractions -- use double slash instead of the dot?
  - introduce \widehat\O early
+- unify { x ; ... }, { x | ...} and { x : ... }
  
  Varia:
  
diff --git a/ram.tex b/ram.tex

index f6dabdb4ca83eb13cb705a477823f398ff85bae4..c12c87e2b38568eb37bccfc3e25b7bf4d18568ff 100644 (file)
--- a/ram.tex
+++ b/ram.tex
@@ -261,48 +261,86 @@ data structures in the Okasaki's monograph~\cite{okasaki:funcds}.
  
  %--------------------------------------------------------------------------------
  
-\section{Bucket sorting and contractions}\id{bucketsort}%
+\section{Bucket sorting and unification}\id{bucketsort}%
  
  The Contractive Bor\o{u}vka's algorithm (\ref{contbor}) needed to contract a~given
  set of edges in the current graph and flatten it afterwards, all this in time $\O(m)$.
-We have spared the technical details for this section and they will be useful
-in further algorithms, too.
-
-As already suggested, the contractions can be performed by building an~auxiliary
-graph and finding its connected components. Thus we will take care of the flattening
-only.
-
-\para
-On the RAM, it is sufficient to sort the edges lexicographically (each edge viewed
-as an~ordered pair of vertex identifiers with the smaller of the identifiers placed
-first). We can do that by a two-pass bucket sort with~$n$ buckets corresponding
-to the vertex identifiers.
+We have spared the technical details for this section, in which we are going to
+explain several techniques based on bucket sorting. These will be useful in further
+algorithms, too.
+
+As already suggested in the proof of Lemma \ref{contbor}, contractions can be performed
+in linear time by building an~auxiliary graph and finding its connected components.
+We will thus take care only of the subsequent flattening.
+
+\paran{Flattening on RAM}%
+On the RAM, we can view the edges as ordered pairs of vertex identifiers with the
+smaller of the identifiers placed first and sort them lexicographically. This brings
+parallel edges together, so that a~simple linear scan suffices to find each bunch
+of parallel edges and remove all but the lightest one.
+Lexicographic sorting of pairs can be accomplished in linear time by a~two-pass
+bucket sort with $n$~buckets corresponding to the vertex identifiers.
  
  However, there is a~catch in this. Suppose that we use the standard representation
  of graphs by adjacency lists whose heads are stored in an array indexed by vertex
  identifiers. When we contract and flatten the graph, the number of vertices decreases,
  but if we inherit the original vertex identifiers, the arrays will still have the
-same size. Hence we spend a~super-linear amount of time on scanning the increasingly
+same size. We could then waste a~super-linear amount of time by scanning the increasingly
  sparse arrays, most of the time skipping unused entries.
  
-To avoid this, we have to renumber the vertices after each contraction to component
-identifiers from the auxiliary graph and we create a~new vertex array. This way,
-the representation of the graph will be kept linear with respect to the size of the
-current graph.
+To avoid this problem, we have to renumber the vertices after each contraction to component
+identifiers from the auxiliary graph and create a~new vertex array. This helps to
+keep the size of the representation of the graph linear with respect to its current
+size.
  
-\para
-The pointer representation of graphs does not suffer from sparsity as the vertices
+\paran{Flattening on PM}%
+The pointer representation of graphs does not suffer from sparsity since the vertices
  are always identified by pointers to per-vertex structures. Each such structure
  then contains all attributes associated with the vertex, including the head of its
  adjacency list. However, we have to find a~way how to perform bucket sorting
-without arrays.
-
-We will keep a~list of the per-vertex structures which defines the order of~vertices.
-Each such structure will contain a~pointer to the head of the corresponding bucket,
-again stored as a~list. Putting an~edge to a~bucket can be done in constant time then,
-scanning all~$n$ buckets takes $\O(n+m)$ time.
-
-\FIXME{Add an example of locally determined orders, e.g., tree isomorphism?}
+without indexing of arrays.
+
+We will keep a~list of the per-vertex structures that defines the order of~vertices.
+Each such structure will be endowed with a~pointer to the head of the list of items in
+the corresponding bucket. Inserting an~edge to a~bucket can be then done in constant time
+and scanning all~$n$ buckets takes $\O(n+m)$ time.
+
+\paran{Tree isomorphism}%
+Another nice example of pointer-based radix sorting is a~pointer algorithm for
+deciding whether two rooted trees are isomorphic. Let us assume for a~moment that
+the outdegree of each vertex is at most a~fixed constant~$k$. We sort the subtrees
+of both trees by their depth by running the depth-first search to calculate the
+depths and bucket-sorting them with $n$~buckets afterwards.
+
+Then we proceed from depth~1 to the maximum depth and for each of them we identify
+the isomorphism classes of subtrees of that particular depth. We will assign some
+sort of identifiers to the classes; at most~$n+1$ of them are needed as there are
+$n+1$~subtrees in the tree (including the empty subtree). As the PM does not
+have numbers as a~first-class type, we just create a~list of $n+1$~distinct items
+and use pointers to these items as identifiers. Isomorphism of the whole trees
+can be finally decided by comparing the identifiers assigned to their roots.
+
+Suppose that classes of depths $1,\ldots,d-1$ are already computed and we want
+to identify those of depth~$d$. We take a~root of every such tree and label it
+with an~ordered $k$-tuple of identifiers of its subtrees; when it has less than
+$k$ sons, we pad the tuple with empty subtrees. Tuples corresponding to isomorphic
+subtrees are identical up to reordering of elements. We therefore sort the codes
+inside each tuple and then sort the tuples, which brings the equivalent tuples
+together.
+
+The first sort (inside the tuples) would be easy on the RAM, but on the PM we
+have no means of comparing two identifiers for anything else than equality.
+We work around this by sorting the set $\{ (x,i,j) \mid \hbox{$x$ is the $i$-th
+element of the $j$-th tuple} \}$ on~$x$, reset all tuples and insert the elements
+back in the increasing order of~$x$.
+
+\FIXME{Finish}
+
+The second sort...
+
+$n$ buckets...
+
+\FIXME{Buchsbaum's trick}
  
  %--------------------------------------------------------------------------------
author	Martin Mares <mj@ucw.cz>
	Sat, 5 Apr 2008 21:42:17 +0000 (23:42 +0200)
committer	Martin Mares <mj@ucw.cz>
	Sat, 5 Apr 2008 21:42:17 +0000 (23:42 +0200)