From: Martin Mares Date: Tue, 29 Jan 2008 22:35:58 +0000 (+0100) Subject: More RAM for you. X-Git-Tag: printed~266 X-Git-Url: http://mj.ucw.cz/gitweb/?a=commitdiff_plain;h=786b715f6f60744b90f73ad7d283277a40c4aeca;p=saga.git More RAM for you. --- diff --git a/biblio.bib b/biblio.bib index 852cc0e..e06b94f 100644 --- a/biblio.bib +++ b/biblio.bib @@ -524,3 +524,15 @@ inproceedings{ pettie:minirand, publisher = {Springer-Verlag}, address = {London, UK}, } + +@inproceedings{ cook:ram, + author = {Stephen A. Cook and Robert A. Reckhow}, + title = {Time-bounded random access machines}, + booktitle = {STOC '72: Proceedings of the fourth annual ACM symposium on Theory of computing}, + year = {1972}, + pages = {73--80}, + location = {Denver, Colorado, United States}, + doi = {http://doi.acm.org/10.1145/800152.804898}, + publisher = {ACM}, + address = {New York, NY, USA}, +} diff --git a/notation.tex b/notation.tex index 7095226..1466c9d 100644 --- a/notation.tex +++ b/notation.tex @@ -41,6 +41,7 @@ \n{$2\tower n$}{the tower function (iterated exponential): $2\tower 0:=1$, $2\tower (n+1):=2^{2\tower n}$} \n{$\log^* n$}{the iterated logarithm: $\log^*n := \min\{i: \log^{(i)}n < 1\}$; the inverse of~$2\tower n$} \n{$\beta(m,n)$}{$\beta(m,n) := \min\{i: \log^{(i)}n < m/n \}$ \[itjarthm]} +\n{$W$}{word size of the RAM \[wordsize]} } %-------------------------------------------------------------------------------- @@ -106,87 +107,89 @@ time to access a~single element of an~$n$-element array. It is hard to say which way is superior --- most ``real'' computers have instructions for constant-time indexing, but on the other hand it seems to be physically impossible to fulfil this promise with an~arbitrary size of memory. Indeed, at the level of logical -gates, the depth of the actual indexing circuit is logarithmic. +gates, the depth of the actual indexing circuits is logarithmic. In recent decades, most researchers in the area of combinatorial algorithms -consider two computational models: the Random Access Machine and the Pointer -Machine. The former one is closer to the programmer's view of a~computer, -the latter one is a~little more restricted and ``asymptotically safe.'' +were considering two computational models: the Random Access Machine and the Pointer +Machine. The former one is closer to the programmer's view of a~real computer, +the latter one is slightly more restricted and ``asymptotically safe.'' We will follow this practice and study our algorithms in both models. \para -The \df{Random Access Machines (RAMs)} are a~family of computational models -which share the following properties. (For one of the possible formal -definitions, see~\cite{knuth:fundalg}.) +The \df{Random Access Machine (RAM)} is not a~single model, but rather a~family +of closely related models, sharing the following properties. +(See Cook and Reckhow \cite{cook:ram} for one of the common formal definitions +and Hagerup \cite{hagerup:wordram} for a~thorough description of the differences +between the RAM variants.) The \df{memory} of the model is represented by an~array of \df{memory cells} -addressed by non-negative integers, each of them containing a~single integer. +addressed by non-negative integers, each of them containing a~single non-negative integer. The \df{program} is a~sequence of \df{instructions} of two basic kinds: calculation instructions and control instructions. \df{Calculation instructions} have two source arguments and one destination -argument, the arguments being either immediate constants (not available +argument, each \df{argument} being either an~immediate constant (not available as destination), a~directly addressed memory cell (specified by its number) or an~indirectly addressed memory cell (its address is stored in a~directly addressed memory cell). \df{Control instructions} include branches (to a~specific instruction in -the program), conditional branches (jump if two arguments specified as -in the calculation instructions are equal and so on) and an~instruction -to halt the program. +the program), conditional branches (e.g., jump if two arguments specified as +in the calculation instructions are equal) and an~instruction to halt the program. At the beginning of the computation, the memory contains the input data -in specified memory cells and undefined values in all other cells. -Then the program is executed one instruction at a~time. When it stops, -some specified memory cells are interpreted as the program's output. +in specified memory cells and arbitrary values in all other cells. +Then the program is executed one instruction at a~time. When it halts, +specified memory cells are interpreted as the program's output. -\para +\para\id{wordsize}% In the description of the RAM family, we have omitted several properties on~purpose, because different members of the family define them differently. -The differences are: the size of the numbers we can calculate with, the time -complexity of a~single instruction, the memory complexity of a~single memory -cell and the repertoire of operations available in calculation instructions. +These are: the size of the available integers, the time complexity of a~single +instruction, the space complexity of a~single memory cell and the repertoire +of operations available in calculation instructions. If we impose no limits on the magnitude of the numbers and we assume that arithmetic and logical operations work on them in constant time, we get -a~very powerful parallel computer --- we can emulate an~arbitrary number -of processors using arithmetics and suddenly almost everything can be +a~very powerful parallel computer --- we can emulate an~exponential number +of parallel processors using arithmetics and suddenly almost everything can be computed in constant time, modulo encoding and decoding of input and output. Such models are unrealistic and there are two basic possibilities how to -avoid them: +avoid this behavior: \numlist\ndotted -\:Keep unlimited numbers, but increase cost of instructions: each instruction +\:Keep unbounded numbers, but increase costs of instructions: each instruction consumes time proportional to the number of bits of the numbers it processes, - including memory addresses. Similarly, memory consumption is measured in bits, + including memory addresses. Similarly, space usage is measured in bits, counting not only the values, but also the addresses of the respective memory cells. -\:Place a~limit on the size of the numbers. It must not be constant, since - such machines would be able to address only a~constant amount of memory. - On the other hand, we are interested in polynomial-time algorithms only, so - $\Theta(\log n)$-bit numbers, where~$n$ is the size of the input, should be sufficient. - Then we can keep the cost of instructions and memory cells constant. +\:Place a~limit on the size of the numbers ---define the \df{word size~$W$,} + the number of bits available in the memory cells--- and keep the cost of + of instructions and memory cells constant. The word size must not be constant, + since we can address only~$2^W$ cells of memory. If the input of the algorithm + is stored in~$N$ cells, we need~$W\ge\log N$ just to be able to read the input. + On the other hand, we are interested in polynomial-time algorithms only, so $\Theta(\log N)$-bit + numbers should be sufficient. In practice, we pick~$w$ to be the larger of + $\Theta(\log N)$ and the size of integers used in the algorithm's input and output. \endlist -\FIXME{Mention the word size parameter and cite \cite{hagerup:wordram}} - -Both restrictions avoid the problems of unbounded parallelism. The first -choice is theoretically cleaner, but it makes the calculations of time and -space complexity somewhat tedious. What more, these calculations usually result in both -complexities being exactly $\Theta(\log n)$ times higher that with the second -choice. This does not hold in general (consider a~program which uses many -small numbers and $\O(1)$ large ones), but it is true for the algorithms we are -interested in. Therefore we will always assume that the operations have unit -cost and we make sure that all numbers are limited either by $\O(\log n)$ bits -or by the size of numbers on the algorithm's input, whatever is bigger. +Both restrictions easily avoid the problems of unbounded parallelism. The first +choice is theoretically cleaner and Cook et al.~show nice correspondences to the +standard complexity classes, but the calculations of time and space complexity tend +to be somewhat tedious. What more, when compared with the RAM with restricted +word size, the complexities are usually exactly $\Theta(w)$ times higher. +This does not hold in general (consider a~program which uses many small numbers +and $\O(1)$ large ones), but it is true for the algorithms we are interested in. +Therefore we will always assume that the operations have unit cost and we make +sure that all numbers are limited by the available word size. -As for the choice of RAM operations, the following three variants are usually -considered: +\para +As for the choice of RAM operations, the following three instruction sets are used: \itemize\ibull \:\df{Word-RAM} --- allows the ``C-language operators'', i.e., addition, subtraction, multiplication, division, remainder, bitwise {\sc and,} {\sc or,} exclusive - {\sc or} ({\sc xor}) and negation ({\sc not}). + {\sc or} ({\sc xor}), negation ({\sc not}) and bitwise shifts ($\ll$ and~$\gg$). \:\df{${\rm AC}^0$-RAM} --- allows all operations from the class ${\rm AC}^0$, i.e., those computable by constant-depth polynomial-size boolean circuits with unlimited fan-in and fan-out. This includes all operations of the Word-RAM except for multiplication, @@ -195,15 +198,22 @@ considered: \:Both restrictions at once. \endlist -As shown by Thorup in \cite{thorup:aczero}, for the usual purposes the first two choices -are equivalent, while the third one is strictly weaker. We will therefore use the -Word-RAM instruction set, mentioning differences of ${\rm AC}^0$-RAM where necessary. +Thorup discusses the usual techniques employed by RAM algorithms in~\cite{thorup:aczero} +and he shows that they work on both Word-RAM and ${\rm AC}^0$-RAM, but the combination +of the two restrictions is too weak. On the other hand, taking the intersection of~${\rm AC}^0$ +with the instruction set of modern processors (like the multimedia instructions of Intel's Pentium4) +is already strong enough. + +\FIXME{References to CPU manuals} + +We will therefore use the Word-RAM instruction set, mentioning differences with +${\rm AC}^0$-RAM where necessary. \nota -When speaking of the \df{RAM model,} we implicitly mean the version with limited numbers, -unit cost of operations and memory cells and the instruction set of the Word-RAM. -This corresponds to the usage in recent algorithmic literature, although the -authors rarely mention the details. +When speaking of the \df{RAM,} we implicitly mean the version with numbers limited +by a~specified word size of $W$~bits, unit cost of operations and memory cells and the instruction +set of the Word-RAM. This corresponds to the usage in recent algorithmic literature, +although the authors rarely mention the details. \endpart