From e0783236c7c0618f4a3ad1f8e584423cfbb085fa Mon Sep 17 00:00:00 2001
From: Martin Mares <mj@ucw.cz>
Date: Wed, 30 Jan 2008 13:09:23 +0100
Subject: [PATCH] Split off a chapter on computation models and added
 description of the PM.

---
 Makefile     |   2 +-
 biblio.bib   |   2 +-
 macros.tex   |   1 +
 notation.tex | 130 ------------------------------
 ram.tex      | 218 +++++++++++++++++++++++++++++++++++++++++++++++++++
 saga.tex     |   1 +
 6 files changed, 222 insertions(+), 132 deletions(-)
 create mode 100644 ram.tex

diff --git a/Makefile b/Makefile
index 9802547..6b931d1 100644
--- a/Makefile
+++ b/Makefile
@@ -1,6 +1,6 @@
 all: saga.ps
 
-CHAPTERS=cover mst notation
+CHAPTERS=cover mst notation ram
 
 %.dvi: %.tex macros.tex biblio.bib
 	tex $< && if grep -q citation $*.aux ; then bibtex $* && tex $< && tex $< ; fi
diff --git a/biblio.bib b/biblio.bib
index e06b94f..e2ea71b 100644
--- a/biblio.bib
+++ b/biblio.bib
@@ -61,7 +61,7 @@
  }
 
 @article{ benamram:pm,
-    author = "Ben-Amram",
+    author = "Ben-Amram, Amir M.",
     title = "What is a ``Pointer Machine''?",
     journal = "SIGACTN: SIGACT News (ACM Special Interest Group on Automata and Computability Theory)",
     volume = "26",
diff --git a/macros.tex b/macros.tex
index 2fe07dd..2a99ea1 100644
--- a/macros.tex
+++ b/macros.tex
@@ -323,6 +323,7 @@
 \def\algn{\alg\label}
 
 \def\proof{\noindent {\sl Proof.}\enspace}
+\def\proofsketch{\noindent {\sl Proof sketch.}\enspace}
 
 %%% References %%%
 
diff --git a/notation.tex b/notation.tex
index 1466c9d..1f75d13 100644
--- a/notation.tex
+++ b/notation.tex
@@ -86,134 +86,4 @@ E' &= \{ \{m(x),m(y)\} ; xy\in E \land m(x)\ne m(y) \},\cr
 m(x) &= \cases{v_e & \hbox{for $v=x,y,$}\cr v & \hbox{otherwise.}} \cr
 }$$
 
-%--------------------------------------------------------------------------------
-
-\section{Models of computation}
-
-Traditionally, computer scientists use a~variety of computational models
-for a~formalism in which their algorithms are stated. If we were studying
-NP-completeness, we could safely assume that all the models are equivalent,
-possibly up to polynomial slowdown, which is negligible. In our case, the
-differences between good and not-so-good algorithms are on a~much smaller
-scale, so we need to replace yardsticks with a~micrometer and define our
-computation model carefully.
-
-We would like to keep the formalism close enough to the reality of the contemporary
-computers. This rules out Turing machines and similar sequentially addressed
-models, but even the remaining models are subtly different. For example, some of them
-allow indexing of arrays in constant time, while the others have no such operation
-and arrays have to be emulated with pointer structures, requiring $\Omega(\log n)$
-time to access a~single element of an~$n$-element array. It is hard to say which
-way is superior --- most ``real'' computers have instructions for constant-time
-indexing, but on the other hand it seems to be physically impossible to fulfil
-this promise with an~arbitrary size of memory. Indeed, at the level of logical
-gates, the depth of the actual indexing circuits is logarithmic.
-
-In recent decades, most researchers in the area of combinatorial algorithms
-were considering two computational models: the Random Access Machine and the Pointer
-Machine. The former one is closer to the programmer's view of a~real computer,
-the latter one is slightly more restricted and ``asymptotically safe.''
-We will follow this practice and study our algorithms in both models.
-
-\para
-The \df{Random Access Machine (RAM)} is not a~single model, but rather a~family
-of closely related models, sharing the following properties.
-(See Cook and Reckhow \cite{cook:ram} for one of the common formal definitions
-and Hagerup \cite{hagerup:wordram} for a~thorough description of the differences
-between the RAM variants.)
-
-The \df{memory} of the model is represented by an~array of \df{memory cells}
-addressed by non-negative integers, each of them containing a~single non-negative integer.
-The \df{program} is a~sequence of \df{instructions} of two basic kinds: calculation
-instructions and control instructions.
-
-\df{Calculation instructions} have two source arguments and one destination
-argument, each \df{argument} being either an~immediate constant (not available
-as destination), a~directly addressed memory cell (specified by its number)
-or an~indirectly addressed memory cell (its address is stored in a~directly
-addressed memory cell).
-
-\df{Control instructions} include branches (to a~specific instruction in
-the program), conditional branches (e.g., jump if two arguments specified as
-in the calculation instructions are equal) and an~instruction to halt the program.
-
-At the beginning of the computation, the memory contains the input data
-in specified memory cells and arbitrary values in all other cells.
-Then the program is executed one instruction at a~time. When it halts,
-specified memory cells are interpreted as the program's output.
-
-\para\id{wordsize}%
-In the description of the RAM family, we have omitted several properties
-on~purpose, because different members of the family define them differently.
-These are: the size of the available integers, the time complexity of a~single
-instruction, the space complexity of a~single memory cell and the repertoire
-of operations available in calculation instructions.
-
-If we impose no limits on the magnitude of the numbers and we assume that
-arithmetic and logical operations work on them in constant time, we get
-a~very powerful parallel computer --- we can emulate an~exponential number
-of parallel processors using arithmetics and suddenly almost everything can be
-computed in constant time, modulo encoding and decoding of input and output.
-Such models are unrealistic and there are two basic possibilities how to
-avoid this behavior:
-
-\numlist\ndotted
-\:Keep unbounded numbers, but increase costs of instructions: each instruction
-  consumes time proportional to the number of bits of the numbers it processes,
-  including memory addresses. Similarly, space usage is measured in bits,
-  counting not only the values, but also the addresses of the respective memory
-  cells.
-\:Place a~limit on the size of the numbers ---define the \df{word size~$W$,}
-  the number of bits available in the memory cells--- and keep the cost of
-  of instructions and memory cells constant. The word size must not be constant,
-  since we can address only~$2^W$ cells of memory. If the input of the algorithm
-  is stored in~$N$ cells, we need~$W\ge\log N$ just to be able to read the input.
-  On the other hand, we are interested in polynomial-time algorithms only, so $\Theta(\log N)$-bit
-  numbers should be sufficient. In practice, we pick~$w$ to be the larger of
-  $\Theta(\log N)$ and the size of integers used in the algorithm's input and output.
-\endlist
-
-Both restrictions easily avoid the problems of unbounded parallelism. The first
-choice is theoretically cleaner and Cook et al.~show nice correspondences to the
-standard complexity classes, but the calculations of time and space complexity tend
-to be somewhat tedious. What more, when compared with the RAM with restricted
-word size, the complexities are usually exactly $\Theta(w)$ times higher.
-This does not hold in general (consider a~program which uses many small numbers
-and $\O(1)$ large ones), but it is true for the algorithms we are interested in.
-Therefore we will always assume that the operations have unit cost and we make
-sure that all numbers are limited by the available word size.
-
-\para
-As for the choice of RAM operations, the following three instruction sets are used:
-
-\itemize\ibull
-\:\df{Word-RAM} --- allows the ``C-language operators'', i.e., addition,
-  subtraction, multiplication, division, remainder, bitwise {\sc and,} {\sc or,} exclusive
-  {\sc or} ({\sc xor}), negation ({\sc not}) and bitwise shifts ($\ll$ and~$\gg$).
-\:\df{${\rm AC}^0$-RAM} --- allows all operations from the class ${\rm AC}^0$, i.e.,
-  those computable by constant-depth polynomial-size boolean circuits with unlimited
-  fan-in and fan-out. This includes all operations of the Word-RAM except for multiplication,
-  division and remainders, and also many other operations like computing the Hamming
-  weight (number of bits set in a~given number).
-\:Both restrictions at once.
-\endlist
-
-Thorup discusses the usual techniques employed by RAM algorithms in~\cite{thorup:aczero}
-and he shows that they work on both Word-RAM and ${\rm AC}^0$-RAM, but the combination
-of the two restrictions is too weak. On the other hand, taking the intersection of~${\rm AC}^0$
-with the instruction set of modern processors (like the multimedia instructions of Intel's Pentium4)
-is already strong enough.
-
-\FIXME{References to CPU manuals}
-
-We will therefore use the Word-RAM instruction set, mentioning differences with
-${\rm AC}^0$-RAM where necessary.
-
-\nota
-When speaking of the \df{RAM,} we implicitly mean the version with numbers limited
-by a~specified word size of $W$~bits, unit cost of operations and memory cells and the instruction
-set of the Word-RAM. This corresponds to the usage in recent algorithmic literature,
-although the authors rarely mention the details.
-
-
 \endpart
diff --git a/ram.tex b/ram.tex
new file mode 100644
index 0000000..d8e2391
--- /dev/null
+++ b/ram.tex
@@ -0,0 +1,218 @@
+\ifx\endpart\undefined
+\input macros.tex
+\fi
+
+\chapter{Fine Details of Computation}
+
+\section{Models and machines}
+
+Traditionally, computer scientists use a~variety of computational models
+for a~formalism in which their algorithms are stated. If we were studying
+NP-completeness, we could safely assume that all the models are equivalent,
+possibly up to polynomial slowdown, which is negligible. In our case, the
+differences between good and not-so-good algorithms are on a~much smaller
+scale. In this chapter, we will replace our yardsticks by a~micrometer,
+define our computation models carefully and develop a repertoire of basic
+data structures taking advantage of the fine details of the models.
+
+We would like to keep the formalism close enough to the reality of the contemporary
+computers. This rules out Turing machines and similar sequentially addressed
+models, but even the remaining models are subtly different. For example, some of them
+allow indexing of arrays in constant time, while the others have no such operation
+and arrays have to be emulated with pointer structures, requiring $\Omega(\log n)$
+time to access a~single element of an~$n$-element array. It is hard to say which
+way is superior --- most ``real'' computers have instructions for constant-time
+indexing, but on the other hand it seems to be physically impossible to fulfil
+this promise with an~arbitrary size of memory. Indeed, at the level of logical
+gates, the depth of the actual indexing circuits is logarithmic.
+
+In recent decades, most researchers in the area of combinatorial algorithms
+were considering two computational models: the Random Access Machine and the Pointer
+Machine. The former one is closer to the programmer's view of a~real computer,
+the latter one is slightly more restricted and ``asymptotically safe.''
+We will follow this practice and study our algorithms in both models.
+
+\para
+The \df{Random Access Machine (RAM)} is not a~single model, but rather a~family
+of closely related models, sharing the following properties.
+(See Cook and Reckhow \cite{cook:ram} for one of the common formal definitions
+and Hagerup \cite{hagerup:wordram} for a~thorough description of the differences
+between the RAM variants.)
+
+The \df{memory} of the model is represented by an~array of \df{memory cells}
+addressed by non-negative integers, each of them containing a~single non-negative integer.
+The \df{program} is a~sequence of \df{instructions} of two basic kinds: calculation
+instructions and control instructions.
+
+\df{Calculation instructions} have two source arguments and one destination
+argument, each \df{argument} being either an~immediate constant (not available
+as destination), a~directly addressed memory cell (specified by its number)
+or an~indirectly addressed memory cell (its address is stored in a~directly
+addressed memory cell).
+
+\df{Control instructions} include branches (to a~specific instruction in
+the program), conditional branches (e.g., jump if two arguments specified as
+in the calculation instructions are equal) and an~instruction to halt the program.
+
+At the beginning of the computation, the memory contains the input data
+in specified memory cells and arbitrary values in all other cells.
+Then the program is executed one instruction at a~time. When it halts,
+specified memory cells are interpreted as the program's output.
+
+\para\id{wordsize}%
+In the description of the RAM family, we have omitted several properties
+on~purpose, because different members of the family define them differently.
+These are: the size of the available integers, the time complexity of a~single
+instruction, the space complexity of a~single memory cell and the repertoire
+of operations available in calculation instructions.
+
+If we impose no limits on the magnitude of the numbers and we assume that
+arithmetic and logical operations work on them in constant time, we get
+a~very powerful parallel computer --- we can emulate an~exponential number
+of parallel processors using arithmetics and suddenly almost everything can be
+computed in constant time, modulo encoding and decoding of input and output.
+Such models are unrealistic and there are two basic possibilities how to
+avoid this behavior:
+
+\numlist\ndotted
+\:Keep unbounded numbers, but increase costs of instructions: each instruction
+  consumes time proportional to the number of bits of the numbers it processes,
+  including memory addresses. Similarly, space usage is measured in bits,
+  counting not only the values, but also the addresses of the respective memory
+  cells.
+\:Place a~limit on the size of the numbers ---define the \df{word size~$W$,}
+  the number of bits available in the memory cells--- and keep the cost of
+  of instructions and memory cells constant. The word size must not be constant,
+  since we can address only~$2^W$ cells of memory. If the input of the algorithm
+  is stored in~$N$ cells, we need~$W\ge\log N$ just to be able to read the input.
+  On the other hand, we are interested in polynomial-time algorithms only, so $\Theta(\log N)$-bit
+  numbers should be sufficient. In practice, we pick~$w$ to be the larger of
+  $\Theta(\log N)$ and the size of integers used in the algorithm's input and output.
+\endlist
+
+Both restrictions easily avoid the problems of unbounded parallelism. The first
+choice is theoretically cleaner and Cook et al.~show nice correspondences to the
+standard complexity classes, but the calculations of time and space complexity tend
+to be somewhat tedious. What more, when compared with the RAM with restricted
+word size, the complexities are usually exactly $\Theta(w)$ times higher.
+This does not hold in general (consider a~program which uses many small numbers
+and $\O(1)$ large ones), but it is true for the algorithms we are interested in.
+Therefore we will always assume that the operations have unit cost and we make
+sure that all numbers are limited by the available word size.
+
+\para
+As for the choice of RAM operations, the following three instruction sets are often used:
+
+\itemize\ibull
+\:\df{Word-RAM} --- allows the ``C-language operators'', i.e., addition,
+  subtraction, multiplication, division, remainder, bitwise {\sc and,} {\sc or,} exclusive
+  {\sc or} ({\sc xor}), negation ({\sc not}) and bitwise shifts ($\ll$ and~$\gg$).
+\:\df{${\rm AC}^0$-RAM} --- allows all operations from the class ${\rm AC}^0$, i.e.,
+  those computable by constant-depth polynomial-size boolean circuits with unlimited
+  fan-in and fan-out. This includes all operations of the Word-RAM except for multiplication,
+  division and remainders, and also many other operations like computing the Hamming
+  weight (number of bits set in a~given number).
+\:Both restrictions at once.
+\endlist
+
+Thorup discusses the usual techniques employed by RAM algorithms in~\cite{thorup:aczero}
+and he shows that they work on both Word-RAM and ${\rm AC}^0$-RAM, but the combination
+of the two restrictions is too weak. On the other hand, the intersection of~${\rm AC}^0$
+with the instruction set of modern processors (e.g., adding some floating-point
+operations and multimedia instructions present on the Intel's Pentium4) is
+already strong enough.
+
+\FIXME{References to CPU manuals}
+
+We will therefore use the Word-RAM instruction set, mentioning differences from the
+${\rm AC}^0$-RAM where necessary.
+
+\nota
+When speaking of the \df{RAM,} we implicitly mean the version with numbers limited
+by a~specified word size of $W$~bits, unit cost of operations and memory cells and the instruction
+set of the Word-RAM. This corresponds to the usage in recent algorithmic literature,
+although the authors rarely mention the details.
+
+\para
+The \df{Pointer Machine (PM)} also does not have any well established definition. The
+various kinds of pointer machines are mapped by Ben-Amram in~\cite{benamram:pm},
+but unlike the RAM's they turn out to be equivalent up to constant slowdown.
+Our definition will be closely related to the \em{linking automaton} proposed
+by Knuth in~\cite{knuth:fundalg}, we will only adapt it to use RAM-like
+instructions instead of an~opaque control unit.
+
+The PM works with two different types of data: \df{symbols} from a~finite alphabet
+and \df{pointers}. The memory of the machine consists of a~fixed amount of \df{registers}
+(some of them capable of storing a~single symbol, each of the others holds a~single pointer)
+and an~arbitrary amount of \df{cells}. The structure of all cells is the same: each of them
+again contains a~fixed number of fields for symbols and pointers. Registers can be addressed
+directly, the cells only via pointers --- either by using a~pointer stored in a~register,
+or in a~cell pointed to by a~register (longer chains of pointers cannot be followed in
+constant time).
+
+We can therefore view the whole memory as a~directed graph, whose vertices
+correspond to the cells (the registers are stored in a~single special cell).
+The outgoing edges of each vertex correspond to pointer fields and they are
+labelled with distinct labels drawn from a~finite set. In addition to that,
+each vertex contains a~fixed amount of symbols. The program can directly access
+vertices within distance~2 from the register vertex.
+
+The program is a~sequence of instructions of the following kinds:
+
+\itemize\ibull
+\:\df{symbol instructions,} which read a~pair of symbols, apply an~arbitrary
+  function on them and write the result to a~symbol register or field;
+\:\df{pointer instructions} for assignment of pointers to pointer registers/fields
+  and for creation of new memory cells (a~pointer to the new cell is assigned
+  immediately);
+\:\df{control instructions} --- similarly to the RAM; conditional jumps can decide
+  on~arbitrary unary relations on symbols and compare pointers for equality.
+\endlist
+
+Time and space complexity are defined in the straightforward way: all instructions
+have unit cost and so do all memory cells.
+
+Both input and output of the machine are passed in the form of a~linked structure
+pointed to by a~designated register. For example, we can pass graphs back and forth
+without having to encode them as strings of numbers or symbols. This is important,
+because with the finite alphabet of the~PM, all symbolic representations of graphs
+require super-linear space and therefore also time.
+
+\para
+Compared to the RAM, the PM lacks two important capabilities: indexing of arrays
+and arithmetic instructions. We can emulate both with poly-logarithmic slowdown,
+but it will turn out that they are rarely needed in graph algorithms. We are
+also going to prove that the RAM is strictly stronger, so we will prefer to
+formulate our algorithms in the PM model and use RAM only when necessary.
+
+\thm
+Every program for the Word-RAM with word size~$W$ can be translated to a~program
+computing the same on the~PM with $\O(W^2)$ slowdown. If the RAM program does not
+use multiplication, division and remainder operations, $\O(W)$~slowdown
+is sufficient.
+
+\proofsketch
+Represent the memory of the RAM by a~balanced binary search tree or by a~radix
+trie of depth~$\O(W)$. Values are encoded as~linked lists of symbols pointed
+to by the nodes of the tree. Both direct and indirect accesses to the memory
+can therefore be done in~$\O(W)$ time. Use standard algorithms for arithmetic
+on big numbers: $\O(W)$ per operation except for multiplication, division and
+remainders which take $\O(W^2)$.
+\qed
+
+\thm
+Every program for the PM running in polynomial time can be translated to a~program
+computing the same on the Word-RAM with only $\O(1)$ slowdown.
+
+\proofsketch
+Encode each cell of the PM's memory to $\O(1)$ integers. Store the encoded cells to
+memory of the RAM sequentially and use memory addresses as pointers. As the symbols
+are finite and there is only a~polynomial number of cells allocated during execution
+of the program, $\O(\log N)$-bit integers suffice.
+\qed
+
+
+
+\FIXME{Mention functional programming and immutable models}
+
+\endpart
diff --git a/saga.tex b/saga.tex
index ca3f848..752d01e 100644
--- a/saga.tex
+++ b/saga.tex
@@ -3,6 +3,7 @@
 
 \input cover.tex
 \input mst.tex
+\input ram.tex
 
 \input notation.tex
 
-- 
2.39.5