From d9c45baa67e7655522cae7e99d062814a47d9e47 Mon Sep 17 00:00:00 2001 From: Martin Mares Date: Wed, 30 Jan 2008 15:08:04 +0100 Subject: [PATCH] Finish PM. --- biblio.bib | 15 +++++++++++++++ ram.tex | 54 ++++++++++++++++++++++++++++++++++++++---------------- 2 files changed, 53 insertions(+), 16 deletions(-) diff --git a/biblio.bib b/biblio.bib index e2ea71b..2adbf87 100644 --- a/biblio.bib +++ b/biblio.bib @@ -536,3 +536,18 @@ inproceedings{ pettie:minirand, publisher = {ACM}, address = {New York, NY, USA}, } + +@book{ okasaki:funcds, + author = {Chris Okasaki}, + title = {Purely Functional Data Structures}, + year = {1999}, + publisher = {Cambridge University Press} +} + +@book { intel:pentium, + author = {{Intel Corp.}}, + title = {Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 2: Instruction Set Reference}, + year = {2007}, + publisher = {Intel Corp.}, + xxx = {available online at http://www.intel.com/products/processor/manuals/index.htm} +} diff --git a/ram.tex b/ram.tex index d8e2391..c798933 100644 --- a/ram.tex +++ b/ram.tex @@ -9,10 +9,10 @@ Traditionally, computer scientists use a~variety of computational models for a~formalism in which their algorithms are stated. If we were studying NP-completeness, we could safely assume that all the models are equivalent, -possibly up to polynomial slowdown, which is negligible. In our case, the +possibly up to polynomial slowdown which is negligible. In our case, the differences between good and not-so-good algorithms are on a~much smaller -scale. In this chapter, we will replace our yardsticks by a~micrometer, -define our computation models carefully and develop a repertoire of basic +scale. In this chapter, we will replace the usual ``yardsticks'' by a~micrometer, +state our computation models carefully and develop a repertoire of basic data structures taking advantage of the fine details of the models. We would like to keep the formalism close enough to the reality of the contemporary @@ -21,13 +21,13 @@ models, but even the remaining models are subtly different. For example, some of allow indexing of arrays in constant time, while the others have no such operation and arrays have to be emulated with pointer structures, requiring $\Omega(\log n)$ time to access a~single element of an~$n$-element array. It is hard to say which -way is superior --- most ``real'' computers have instructions for constant-time -indexing, but on the other hand it seems to be physically impossible to fulfil -this promise with an~arbitrary size of memory. Indeed, at the level of logical -gates, the depth of the actual indexing circuits is logarithmic. +way is superior --- while most ``real'' computers have instructions for constant-time +indexing, it seems to be physically impossible to fulfil this promise regardless of +the size of memory. Indeed, at the level of logical gates, the depth of the +actual indexing circuits is logarithmic. In recent decades, most researchers in the area of combinatorial algorithms -were considering two computational models: the Random Access Machine and the Pointer +have been considering two computational models: the Random Access Machine and the Pointer Machine. The former one is closer to the programmer's view of a~real computer, the latter one is slightly more restricted and ``asymptotically safe.'' We will follow this practice and study our algorithms in both models. @@ -35,7 +35,7 @@ We will follow this practice and study our algorithms in both models. \para The \df{Random Access Machine (RAM)} is not a~single model, but rather a~family of closely related models, sharing the following properties. -(See Cook and Reckhow \cite{cook:ram} for one of the common formal definitions +(See Cook and Reckhow \cite{cook:ram} for one of the usual formal definitions and Hagerup \cite{hagerup:wordram} for a~thorough description of the differences between the RAM variants.) @@ -119,10 +119,7 @@ Thorup discusses the usual techniques employed by RAM algorithms in~\cite{thorup and he shows that they work on both Word-RAM and ${\rm AC}^0$-RAM, but the combination of the two restrictions is too weak. On the other hand, the intersection of~${\rm AC}^0$ with the instruction set of modern processors (e.g., adding some floating-point -operations and multimedia instructions present on the Intel's Pentium4) is -already strong enough. - -\FIXME{References to CPU manuals} +operations and multimedia instructions available on the Intel's Pentium~4~\cite{intel:pentium}) is already strong enough. We will therefore use the Word-RAM instruction set, mentioning differences from the ${\rm AC}^0$-RAM where necessary. @@ -187,7 +184,8 @@ formulate our algorithms in the PM model and use RAM only when necessary. \thm Every program for the Word-RAM with word size~$W$ can be translated to a~program -computing the same on the~PM with $\O(W^2)$ slowdown. If the RAM program does not +computing the same\foot{Given a~suitable encoding of inputs and outputs, of course.} +on the~PM with $\O(W^2)$ slowdown. If the RAM program does not use multiplication, division and remainder operations, $\O(W)$~slowdown is sufficient. @@ -197,9 +195,12 @@ trie of depth~$\O(W)$. Values are encoded as~linked lists of symbols pointed to by the nodes of the tree. Both direct and indirect accesses to the memory can therefore be done in~$\O(W)$ time. Use standard algorithms for arithmetic on big numbers: $\O(W)$ per operation except for multiplication, division and -remainders which take $\O(W^2)$. +remainders which take $\O(W^2)$.\foot{We could use more efficient arithmetic +algorithms, of course, but the quadratic bound it good enough for our purposes.} \qed +\FIXME{Add references.} + \thm Every program for the PM running in polynomial time can be translated to a~program computing the same on the Word-RAM with only $\O(1)$ slowdown. @@ -211,8 +212,29 @@ are finite and there is only a~polynomial number of cells allocated during execu of the program, $\O(\log N)$-bit integers suffice. \qed +\rem +There is one more interesting machine: the \df{Immutable Pointer Machine} (see +the description of LISP machines in \cite{benamram:pm}). It differs from the +ordinary PM by the inability to modify existing memory cells. Only the contents +of the registers are allowed to change. All cell modifications thus have to +be performed by creating a~copy of the particular cell with some fields changed. +This in turn requires the pointers to the cell to be updated, possibly triggering +a~cascade of cell copies. For example, when a~node of a~binary search tree is +updated, all nodes on the path from that node to the root have to be copied. + +One of the advantages of this model is that the states of the machine are +persistent --- it is possible to return to a~previously visited state by recalling +the $\O(1)$ values of the registers (everything else could not have changed +since that time) and ``fork'' the computations. This corresponds to the semantics +of pure functional languages, e.g., Haskell. + +Unless we are willing to accept a~logarithmic penalty in execution time and space +(in fact, our emulation of the Word-RAM on the PM can be easily made immutable), +the design of efficient algorithms for the immutable PM requires very different +techniques. Therefore, we will concentrate on the imperative models instead +and refer the interested reader to the thorough treatment of purely functional +data structures in the Okasaki's book~\cite{okasaki:funcds}. -\FIXME{Mention functional programming and immutable models} \endpart -- 2.39.5