-\section{Models of computation}
-
-Traditionally, computer scientists use a~variety of computational models
-for a~formalism in which their algorithms are stated. If we were studying
-NP-completeness, we could safely assume that all the models are equivalent,
-possibly up to polynomial slowdown, which is negligible. In our case, the
-differences between good and not-so-good algorithms are on a~much smaller
-scale, so we need to replace yardsticks with a~micrometer and define our
-computation model carefully.
-
-We would like to keep the formalism close enough to the reality of the contemporary
-computers. This rules out Turing machines and similar sequentially addressed
-models, but even the remaining models are subtly different. For example, some of them
-allow indexing of arrays in constant time, while the others have no such operation
-and arrays have to be emulated with pointer structures, requiring $\Omega(\log n)$
-time to access a~single element of an~$n$-element array. It is hard to say which
-way is superior --- most ``real'' computers have instructions for constant-time
-indexing, but on the other hand it seems to be physically impossible to fulfil
-this promise with an~arbitrary size of memory. Indeed, at the level of logical
-gates, the depth of the actual indexing circuits is logarithmic.
-
-In recent decades, most researchers in the area of combinatorial algorithms
-were considering two computational models: the Random Access Machine and the Pointer
-Machine. The former one is closer to the programmer's view of a~real computer,
-the latter one is slightly more restricted and ``asymptotically safe.''
-We will follow this practice and study our algorithms in both models.
-
-\para
-The \df{Random Access Machine (RAM)} is not a~single model, but rather a~family
-of closely related models, sharing the following properties.
-(See Cook and Reckhow \cite{cook:ram} for one of the common formal definitions
-and Hagerup \cite{hagerup:wordram} for a~thorough description of the differences
-between the RAM variants.)
-
-The \df{memory} of the model is represented by an~array of \df{memory cells}
-addressed by non-negative integers, each of them containing a~single non-negative integer.
-The \df{program} is a~sequence of \df{instructions} of two basic kinds: calculation
-instructions and control instructions.
-
-\df{Calculation instructions} have two source arguments and one destination
-argument, each \df{argument} being either an~immediate constant (not available
-as destination), a~directly addressed memory cell (specified by its number)
-or an~indirectly addressed memory cell (its address is stored in a~directly
-addressed memory cell).
-
-\df{Control instructions} include branches (to a~specific instruction in
-the program), conditional branches (e.g., jump if two arguments specified as
-in the calculation instructions are equal) and an~instruction to halt the program.
-
-At the beginning of the computation, the memory contains the input data
-in specified memory cells and arbitrary values in all other cells.
-Then the program is executed one instruction at a~time. When it halts,
-specified memory cells are interpreted as the program's output.
-
-\para\id{wordsize}%
-In the description of the RAM family, we have omitted several properties
-on~purpose, because different members of the family define them differently.
-These are: the size of the available integers, the time complexity of a~single
-instruction, the space complexity of a~single memory cell and the repertoire
-of operations available in calculation instructions.
-
-If we impose no limits on the magnitude of the numbers and we assume that
-arithmetic and logical operations work on them in constant time, we get
-a~very powerful parallel computer --- we can emulate an~exponential number
-of parallel processors using arithmetics and suddenly almost everything can be
-computed in constant time, modulo encoding and decoding of input and output.
-Such models are unrealistic and there are two basic possibilities how to
-avoid this behavior:
-
-\numlist\ndotted
-\:Keep unbounded numbers, but increase costs of instructions: each instruction
- consumes time proportional to the number of bits of the numbers it processes,
- including memory addresses. Similarly, space usage is measured in bits,
- counting not only the values, but also the addresses of the respective memory
- cells.
-\:Place a~limit on the size of the numbers ---define the \df{word size~$W$,}
- the number of bits available in the memory cells--- and keep the cost of
- of instructions and memory cells constant. The word size must not be constant,
- since we can address only~$2^W$ cells of memory. If the input of the algorithm
- is stored in~$N$ cells, we need~$W\ge\log N$ just to be able to read the input.
- On the other hand, we are interested in polynomial-time algorithms only, so $\Theta(\log N)$-bit
- numbers should be sufficient. In practice, we pick~$w$ to be the larger of
- $\Theta(\log N)$ and the size of integers used in the algorithm's input and output.
-\endlist
-
-Both restrictions easily avoid the problems of unbounded parallelism. The first
-choice is theoretically cleaner and Cook et al.~show nice correspondences to the
-standard complexity classes, but the calculations of time and space complexity tend
-to be somewhat tedious. What more, when compared with the RAM with restricted
-word size, the complexities are usually exactly $\Theta(w)$ times higher.
-This does not hold in general (consider a~program which uses many small numbers
-and $\O(1)$ large ones), but it is true for the algorithms we are interested in.
-Therefore we will always assume that the operations have unit cost and we make
-sure that all numbers are limited by the available word size.
-
-\para
-As for the choice of RAM operations, the following three instruction sets are used:
-
-\itemize\ibull
-\:\df{Word-RAM} --- allows the ``C-language operators'', i.e., addition,
- subtraction, multiplication, division, remainder, bitwise {\sc and,} {\sc or,} exclusive
- {\sc or} ({\sc xor}), negation ({\sc not}) and bitwise shifts ($\ll$ and~$\gg$).
-\:\df{${\rm AC}^0$-RAM} --- allows all operations from the class ${\rm AC}^0$, i.e.,
- those computable by constant-depth polynomial-size boolean circuits with unlimited
- fan-in and fan-out. This includes all operations of the Word-RAM except for multiplication,
- division and remainders, and also many other operations like computing the Hamming
- weight (number of bits set in a~given number).
-\:Both restrictions at once.
-\endlist
-
-Thorup discusses the usual techniques employed by RAM algorithms in~\cite{thorup:aczero}
-and he shows that they work on both Word-RAM and ${\rm AC}^0$-RAM, but the combination
-of the two restrictions is too weak. On the other hand, taking the intersection of~${\rm AC}^0$
-with the instruction set of modern processors (like the multimedia instructions of Intel's Pentium4)
-is already strong enough.
-
-\FIXME{References to CPU manuals}
-
-We will therefore use the Word-RAM instruction set, mentioning differences with
-${\rm AC}^0$-RAM where necessary.
-
-\nota
-When speaking of the \df{RAM,} we implicitly mean the version with numbers limited
-by a~specified word size of $W$~bits, unit cost of operations and memory cells and the instruction
-set of the Word-RAM. This corresponds to the usage in recent algorithmic literature,
-although the authors rarely mention the details.
+\section{Ackermann's function and its inverses}\id{ackersec}%
+
+The Ackermann's function is an~extremely quickly growing function which has been
+introduced by Ackermann \cite{ackermann:function} in the context of
+computability theory. Its original purpose was to demonstrate that not every recursive
+function is also primitive recursive. At the first sight, it does not
+seem related to efficient algorithms at all. Its various inverses however occur in
+analyses of algorithms and mathematical structures surprisingly often:
+We meet them in Section \ref{classalg} in the time complexity of the Disjoint Set Union
+data structure and also in the best known upper bound on the decision tree
+complexity of minimum spanning trees in Section \ref{optalgsect}. Another
+important application is in the complexity of Davenport-Schinzel sequences (see
+Klazar's survey \cite{klazar:gdss}), but as far as we know, these are not otherwise
+related to the topic of our study.
+
+Various sources differ in the exact definition of both the Ackermann's
+function and its inverse, but most of these differences are in factors that
+are negligible in the light of the giant asymptotic growth of the function.\foot{%
+To quote Pettie \cite{pettie:onlineverify}: ``In the field of algorithms \& complexity,
+Ackermann's function is rarely defined the same way twice. We would not presume to buck
+such a~well-established precedent. Here is a~slight variant.''}
+We will use the definition by double recursion given by Tarjan \cite{tarjan:setunion},
+which is predominant in the literature on graph algorithms.
+
+\defn\id{ackerdef}%
+The \df{Ackermann's function} $A(x,y)$ is a~function on non-negative integers defined as follows:
+$$\eqalign{
+A(0,y) &:= 2y, \cr
+A(x,0) &:= 0, \cr
+A(x,1) &:= 2 \quad \hbox{for $x\ge 1$}, \cr
+A(x,y) &:= A(x-1, A(x,y-1)) \quad \hbox{for $x\ge 1$, $y\ge 2$}. \cr
+}$$
+The functions $A(x,\cdot)$ are called the \df{rows} of $A(x,y)$, similarly $A(\cdot,y)$ are
+its \df{columns.}
+
+Sometimes, a~single-parameter version of this function is also used. It is defined
+as the diagonal of the previous function, i.e., $A(x):=A(x,x)$.
+
+\example
+We can try evaluating $A(x,y)$ in some points:
+$$\eqalign{
+A(x,2) &= A(x-1, A(x,1)) = A(x-1,2) = A(0,2) = 4, \cr
+A(1,y) &= A(0, A(1,y-1)) = 2A(1,y-1) = 2^{y-1}A(1,1) = 2^y, \cr
+A(2,y) &= A(1, A(2,y-1)) = 2^{A(2,y-1)} = 2\tower y \hbox{~~(the tower of exponentials),} \cr
+A(3,y) &= \hbox{the tower function iterated $y$~times,} \cr
+A(4,3) &= A(3,A(4,2)) = A(3,4) = A(2,A(3,3)) = A(2,A(2,A(3,2))) = \cr
+ &= A(2,A(2,4)) = 2\tower(2\tower 4) = 2\tower 65536. \cr
+}$$