From e0783236c7c0618f4a3ad1f8e584423cfbb085fa Mon Sep 17 00:00:00 2001 From: Martin Mares Date: Wed, 30 Jan 2008 13:09:23 +0100 Subject: [PATCH] Split off a chapter on computation models and added description of the PM. --- Makefile | 2 +- biblio.bib | 2 +- macros.tex | 1 + notation.tex | 130 ------------------------------ ram.tex | 218 +++++++++++++++++++++++++++++++++++++++++++++++++++ saga.tex | 1 + 6 files changed, 222 insertions(+), 132 deletions(-) create mode 100644 ram.tex diff --git a/Makefile b/Makefile index 9802547..6b931d1 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ all: saga.ps -CHAPTERS=cover mst notation +CHAPTERS=cover mst notation ram %.dvi: %.tex macros.tex biblio.bib tex $< && if grep -q citation $*.aux ; then bibtex $* && tex $< && tex $< ; fi diff --git a/biblio.bib b/biblio.bib index e06b94f..e2ea71b 100644 --- a/biblio.bib +++ b/biblio.bib @@ -61,7 +61,7 @@ } @article{ benamram:pm, - author = "Ben-Amram", + author = "Ben-Amram, Amir M.", title = "What is a ``Pointer Machine''?", journal = "SIGACTN: SIGACT News (ACM Special Interest Group on Automata and Computability Theory)", volume = "26", diff --git a/macros.tex b/macros.tex index 2fe07dd..2a99ea1 100644 --- a/macros.tex +++ b/macros.tex @@ -323,6 +323,7 @@ \def\algn{\alg\label} \def\proof{\noindent {\sl Proof.}\enspace} +\def\proofsketch{\noindent {\sl Proof sketch.}\enspace} %%% References %%% diff --git a/notation.tex b/notation.tex index 1466c9d..1f75d13 100644 --- a/notation.tex +++ b/notation.tex @@ -86,134 +86,4 @@ E' &= \{ \{m(x),m(y)\} ; xy\in E \land m(x)\ne m(y) \},\cr m(x) &= \cases{v_e & \hbox{for $v=x,y,$}\cr v & \hbox{otherwise.}} \cr }$$ -%-------------------------------------------------------------------------------- - -\section{Models of computation} - -Traditionally, computer scientists use a~variety of computational models -for a~formalism in which their algorithms are stated. If we were studying -NP-completeness, we could safely assume that all the models are equivalent, -possibly up to polynomial slowdown, which is negligible. In our case, the -differences between good and not-so-good algorithms are on a~much smaller -scale, so we need to replace yardsticks with a~micrometer and define our -computation model carefully. - -We would like to keep the formalism close enough to the reality of the contemporary -computers. This rules out Turing machines and similar sequentially addressed -models, but even the remaining models are subtly different. For example, some of them -allow indexing of arrays in constant time, while the others have no such operation -and arrays have to be emulated with pointer structures, requiring $\Omega(\log n)$ -time to access a~single element of an~$n$-element array. It is hard to say which -way is superior --- most ``real'' computers have instructions for constant-time -indexing, but on the other hand it seems to be physically impossible to fulfil -this promise with an~arbitrary size of memory. Indeed, at the level of logical -gates, the depth of the actual indexing circuits is logarithmic. - -In recent decades, most researchers in the area of combinatorial algorithms -were considering two computational models: the Random Access Machine and the Pointer -Machine. The former one is closer to the programmer's view of a~real computer, -the latter one is slightly more restricted and ``asymptotically safe.'' -We will follow this practice and study our algorithms in both models. - -\para -The \df{Random Access Machine (RAM)} is not a~single model, but rather a~family -of closely related models, sharing the following properties. -(See Cook and Reckhow \cite{cook:ram} for one of the common formal definitions -and Hagerup \cite{hagerup:wordram} for a~thorough description of the differences -between the RAM variants.) - -The \df{memory} of the model is represented by an~array of \df{memory cells} -addressed by non-negative integers, each of them containing a~single non-negative integer. -The \df{program} is a~sequence of \df{instructions} of two basic kinds: calculation -instructions and control instructions. - -\df{Calculation instructions} have two source arguments and one destination -argument, each \df{argument} being either an~immediate constant (not available -as destination), a~directly addressed memory cell (specified by its number) -or an~indirectly addressed memory cell (its address is stored in a~directly -addressed memory cell). - -\df{Control instructions} include branches (to a~specific instruction in -the program), conditional branches (e.g., jump if two arguments specified as -in the calculation instructions are equal) and an~instruction to halt the program. - -At the beginning of the computation, the memory contains the input data -in specified memory cells and arbitrary values in all other cells. -Then the program is executed one instruction at a~time. When it halts, -specified memory cells are interpreted as the program's output. - -\para\id{wordsize}% -In the description of the RAM family, we have omitted several properties -on~purpose, because different members of the family define them differently. -These are: the size of the available integers, the time complexity of a~single -instruction, the space complexity of a~single memory cell and the repertoire -of operations available in calculation instructions. - -If we impose no limits on the magnitude of the numbers and we assume that -arithmetic and logical operations work on them in constant time, we get -a~very powerful parallel computer --- we can emulate an~exponential number -of parallel processors using arithmetics and suddenly almost everything can be -computed in constant time, modulo encoding and decoding of input and output. -Such models are unrealistic and there are two basic possibilities how to -avoid this behavior: - -\numlist\ndotted -\:Keep unbounded numbers, but increase costs of instructions: each instruction - consumes time proportional to the number of bits of the numbers it processes, - including memory addresses. Similarly, space usage is measured in bits, - counting not only the values, but also the addresses of the respective memory - cells. -\:Place a~limit on the size of the numbers ---define the \df{word size~$W$,} - the number of bits available in the memory cells--- and keep the cost of - of instructions and memory cells constant. The word size must not be constant, - since we can address only~$2^W$ cells of memory. If the input of the algorithm - is stored in~$N$ cells, we need~$W\ge\log N$ just to be able to read the input. - On the other hand, we are interested in polynomial-time algorithms only, so $\Theta(\log N)$-bit - numbers should be sufficient. In practice, we pick~$w$ to be the larger of - $\Theta(\log N)$ and the size of integers used in the algorithm's input and output. -\endlist - -Both restrictions easily avoid the problems of unbounded parallelism. The first -choice is theoretically cleaner and Cook et al.~show nice correspondences to the -standard complexity classes, but the calculations of time and space complexity tend -to be somewhat tedious. What more, when compared with the RAM with restricted -word size, the complexities are usually exactly $\Theta(w)$ times higher. -This does not hold in general (consider a~program which uses many small numbers -and $\O(1)$ large ones), but it is true for the algorithms we are interested in. -Therefore we will always assume that the operations have unit cost and we make -sure that all numbers are limited by the available word size. - -\para -As for the choice of RAM operations, the following three instruction sets are used: - -\itemize\ibull -\:\df{Word-RAM} --- allows the ``C-language operators'', i.e., addition, - subtraction, multiplication, division, remainder, bitwise {\sc and,} {\sc or,} exclusive - {\sc or} ({\sc xor}), negation ({\sc not}) and bitwise shifts ($\ll$ and~$\gg$). -\:\df{${\rm AC}^0$-RAM} --- allows all operations from the class ${\rm AC}^0$, i.e., - those computable by constant-depth polynomial-size boolean circuits with unlimited - fan-in and fan-out. This includes all operations of the Word-RAM except for multiplication, - division and remainders, and also many other operations like computing the Hamming - weight (number of bits set in a~given number). -\:Both restrictions at once. -\endlist - -Thorup discusses the usual techniques employed by RAM algorithms in~\cite{thorup:aczero} -and he shows that they work on both Word-RAM and ${\rm AC}^0$-RAM, but the combination -of the two restrictions is too weak. On the other hand, taking the intersection of~${\rm AC}^0$ -with the instruction set of modern processors (like the multimedia instructions of Intel's Pentium4) -is already strong enough. - -\FIXME{References to CPU manuals} - -We will therefore use the Word-RAM instruction set, mentioning differences with -${\rm AC}^0$-RAM where necessary. - -\nota -When speaking of the \df{RAM,} we implicitly mean the version with numbers limited -by a~specified word size of $W$~bits, unit cost of operations and memory cells and the instruction -set of the Word-RAM. This corresponds to the usage in recent algorithmic literature, -although the authors rarely mention the details. - - \endpart diff --git a/ram.tex b/ram.tex new file mode 100644 index 0000000..d8e2391 --- /dev/null +++ b/ram.tex @@ -0,0 +1,218 @@ +\ifx\endpart\undefined +\input macros.tex +\fi + +\chapter{Fine Details of Computation} + +\section{Models and machines} + +Traditionally, computer scientists use a~variety of computational models +for a~formalism in which their algorithms are stated. If we were studying +NP-completeness, we could safely assume that all the models are equivalent, +possibly up to polynomial slowdown, which is negligible. In our case, the +differences between good and not-so-good algorithms are on a~much smaller +scale. In this chapter, we will replace our yardsticks by a~micrometer, +define our computation models carefully and develop a repertoire of basic +data structures taking advantage of the fine details of the models. + +We would like to keep the formalism close enough to the reality of the contemporary +computers. This rules out Turing machines and similar sequentially addressed +models, but even the remaining models are subtly different. For example, some of them +allow indexing of arrays in constant time, while the others have no such operation +and arrays have to be emulated with pointer structures, requiring $\Omega(\log n)$ +time to access a~single element of an~$n$-element array. It is hard to say which +way is superior --- most ``real'' computers have instructions for constant-time +indexing, but on the other hand it seems to be physically impossible to fulfil +this promise with an~arbitrary size of memory. Indeed, at the level of logical +gates, the depth of the actual indexing circuits is logarithmic. + +In recent decades, most researchers in the area of combinatorial algorithms +were considering two computational models: the Random Access Machine and the Pointer +Machine. The former one is closer to the programmer's view of a~real computer, +the latter one is slightly more restricted and ``asymptotically safe.'' +We will follow this practice and study our algorithms in both models. + +\para +The \df{Random Access Machine (RAM)} is not a~single model, but rather a~family +of closely related models, sharing the following properties. +(See Cook and Reckhow \cite{cook:ram} for one of the common formal definitions +and Hagerup \cite{hagerup:wordram} for a~thorough description of the differences +between the RAM variants.) + +The \df{memory} of the model is represented by an~array of \df{memory cells} +addressed by non-negative integers, each of them containing a~single non-negative integer. +The \df{program} is a~sequence of \df{instructions} of two basic kinds: calculation +instructions and control instructions. + +\df{Calculation instructions} have two source arguments and one destination +argument, each \df{argument} being either an~immediate constant (not available +as destination), a~directly addressed memory cell (specified by its number) +or an~indirectly addressed memory cell (its address is stored in a~directly +addressed memory cell). + +\df{Control instructions} include branches (to a~specific instruction in +the program), conditional branches (e.g., jump if two arguments specified as +in the calculation instructions are equal) and an~instruction to halt the program. + +At the beginning of the computation, the memory contains the input data +in specified memory cells and arbitrary values in all other cells. +Then the program is executed one instruction at a~time. When it halts, +specified memory cells are interpreted as the program's output. + +\para\id{wordsize}% +In the description of the RAM family, we have omitted several properties +on~purpose, because different members of the family define them differently. +These are: the size of the available integers, the time complexity of a~single +instruction, the space complexity of a~single memory cell and the repertoire +of operations available in calculation instructions. + +If we impose no limits on the magnitude of the numbers and we assume that +arithmetic and logical operations work on them in constant time, we get +a~very powerful parallel computer --- we can emulate an~exponential number +of parallel processors using arithmetics and suddenly almost everything can be +computed in constant time, modulo encoding and decoding of input and output. +Such models are unrealistic and there are two basic possibilities how to +avoid this behavior: + +\numlist\ndotted +\:Keep unbounded numbers, but increase costs of instructions: each instruction + consumes time proportional to the number of bits of the numbers it processes, + including memory addresses. Similarly, space usage is measured in bits, + counting not only the values, but also the addresses of the respective memory + cells. +\:Place a~limit on the size of the numbers ---define the \df{word size~$W$,} + the number of bits available in the memory cells--- and keep the cost of + of instructions and memory cells constant. The word size must not be constant, + since we can address only~$2^W$ cells of memory. If the input of the algorithm + is stored in~$N$ cells, we need~$W\ge\log N$ just to be able to read the input. + On the other hand, we are interested in polynomial-time algorithms only, so $\Theta(\log N)$-bit + numbers should be sufficient. In practice, we pick~$w$ to be the larger of + $\Theta(\log N)$ and the size of integers used in the algorithm's input and output. +\endlist + +Both restrictions easily avoid the problems of unbounded parallelism. The first +choice is theoretically cleaner and Cook et al.~show nice correspondences to the +standard complexity classes, but the calculations of time and space complexity tend +to be somewhat tedious. What more, when compared with the RAM with restricted +word size, the complexities are usually exactly $\Theta(w)$ times higher. +This does not hold in general (consider a~program which uses many small numbers +and $\O(1)$ large ones), but it is true for the algorithms we are interested in. +Therefore we will always assume that the operations have unit cost and we make +sure that all numbers are limited by the available word size. + +\para +As for the choice of RAM operations, the following three instruction sets are often used: + +\itemize\ibull +\:\df{Word-RAM} --- allows the ``C-language operators'', i.e., addition, + subtraction, multiplication, division, remainder, bitwise {\sc and,} {\sc or,} exclusive + {\sc or} ({\sc xor}), negation ({\sc not}) and bitwise shifts ($\ll$ and~$\gg$). +\:\df{${\rm AC}^0$-RAM} --- allows all operations from the class ${\rm AC}^0$, i.e., + those computable by constant-depth polynomial-size boolean circuits with unlimited + fan-in and fan-out. This includes all operations of the Word-RAM except for multiplication, + division and remainders, and also many other operations like computing the Hamming + weight (number of bits set in a~given number). +\:Both restrictions at once. +\endlist + +Thorup discusses the usual techniques employed by RAM algorithms in~\cite{thorup:aczero} +and he shows that they work on both Word-RAM and ${\rm AC}^0$-RAM, but the combination +of the two restrictions is too weak. On the other hand, the intersection of~${\rm AC}^0$ +with the instruction set of modern processors (e.g., adding some floating-point +operations and multimedia instructions present on the Intel's Pentium4) is +already strong enough. + +\FIXME{References to CPU manuals} + +We will therefore use the Word-RAM instruction set, mentioning differences from the +${\rm AC}^0$-RAM where necessary. + +\nota +When speaking of the \df{RAM,} we implicitly mean the version with numbers limited +by a~specified word size of $W$~bits, unit cost of operations and memory cells and the instruction +set of the Word-RAM. This corresponds to the usage in recent algorithmic literature, +although the authors rarely mention the details. + +\para +The \df{Pointer Machine (PM)} also does not have any well established definition. The +various kinds of pointer machines are mapped by Ben-Amram in~\cite{benamram:pm}, +but unlike the RAM's they turn out to be equivalent up to constant slowdown. +Our definition will be closely related to the \em{linking automaton} proposed +by Knuth in~\cite{knuth:fundalg}, we will only adapt it to use RAM-like +instructions instead of an~opaque control unit. + +The PM works with two different types of data: \df{symbols} from a~finite alphabet +and \df{pointers}. The memory of the machine consists of a~fixed amount of \df{registers} +(some of them capable of storing a~single symbol, each of the others holds a~single pointer) +and an~arbitrary amount of \df{cells}. The structure of all cells is the same: each of them +again contains a~fixed number of fields for symbols and pointers. Registers can be addressed +directly, the cells only via pointers --- either by using a~pointer stored in a~register, +or in a~cell pointed to by a~register (longer chains of pointers cannot be followed in +constant time). + +We can therefore view the whole memory as a~directed graph, whose vertices +correspond to the cells (the registers are stored in a~single special cell). +The outgoing edges of each vertex correspond to pointer fields and they are +labelled with distinct labels drawn from a~finite set. In addition to that, +each vertex contains a~fixed amount of symbols. The program can directly access +vertices within distance~2 from the register vertex. + +The program is a~sequence of instructions of the following kinds: + +\itemize\ibull +\:\df{symbol instructions,} which read a~pair of symbols, apply an~arbitrary + function on them and write the result to a~symbol register or field; +\:\df{pointer instructions} for assignment of pointers to pointer registers/fields + and for creation of new memory cells (a~pointer to the new cell is assigned + immediately); +\:\df{control instructions} --- similarly to the RAM; conditional jumps can decide + on~arbitrary unary relations on symbols and compare pointers for equality. +\endlist + +Time and space complexity are defined in the straightforward way: all instructions +have unit cost and so do all memory cells. + +Both input and output of the machine are passed in the form of a~linked structure +pointed to by a~designated register. For example, we can pass graphs back and forth +without having to encode them as strings of numbers or symbols. This is important, +because with the finite alphabet of the~PM, all symbolic representations of graphs +require super-linear space and therefore also time. + +\para +Compared to the RAM, the PM lacks two important capabilities: indexing of arrays +and arithmetic instructions. We can emulate both with poly-logarithmic slowdown, +but it will turn out that they are rarely needed in graph algorithms. We are +also going to prove that the RAM is strictly stronger, so we will prefer to +formulate our algorithms in the PM model and use RAM only when necessary. + +\thm +Every program for the Word-RAM with word size~$W$ can be translated to a~program +computing the same on the~PM with $\O(W^2)$ slowdown. If the RAM program does not +use multiplication, division and remainder operations, $\O(W)$~slowdown +is sufficient. + +\proofsketch +Represent the memory of the RAM by a~balanced binary search tree or by a~radix +trie of depth~$\O(W)$. Values are encoded as~linked lists of symbols pointed +to by the nodes of the tree. Both direct and indirect accesses to the memory +can therefore be done in~$\O(W)$ time. Use standard algorithms for arithmetic +on big numbers: $\O(W)$ per operation except for multiplication, division and +remainders which take $\O(W^2)$. +\qed + +\thm +Every program for the PM running in polynomial time can be translated to a~program +computing the same on the Word-RAM with only $\O(1)$ slowdown. + +\proofsketch +Encode each cell of the PM's memory to $\O(1)$ integers. Store the encoded cells to +memory of the RAM sequentially and use memory addresses as pointers. As the symbols +are finite and there is only a~polynomial number of cells allocated during execution +of the program, $\O(\log N)$-bit integers suffice. +\qed + + + +\FIXME{Mention functional programming and immutable models} + +\endpart diff --git a/saga.tex b/saga.tex index ca3f848..752d01e 100644 --- a/saga.tex +++ b/saga.tex @@ -3,6 +3,7 @@ \input cover.tex \input mst.tex +\input ram.tex \input notation.tex -- 2.39.2