rank.tex

   1 \ifx\endpart\undefined
   2 \input macros.tex
   3 \fi
   4
   5 \chapter{Ranking Combinatorial Structures}
   6
   7 \section{Ranking and unranking}
   8
   9 The techniques for building efficient data structures on the RAM described
  10 in Chapter~\ref{ramchap} can be also used for a~variety of problems related
  11 to ranking of combinatorial structures. Generally, the problems are stated
  12 in the following way:
  13
  14 \defn\id{rankdef}%
  15 Let~$C$ be a~set of objects and~$\prec$ a~linear order on~$C$. The \df{rank}
  16 $R_{C,\prec}(x)$ of an~element $x\in C$ is the number of elements $y\in C$ such that $y\prec x$.
  17 When~$\prec$ is defined on some superset of~$C$, we naturally extend the rank
  18 to elements outside~$C$.
  19 We will call the function $R_{C,\prec}$ the \df{ranking function} for $C$ and~$\prec$
  20 and its inverse $R^{-1}_{C,\prec}$ the \df{unranking function}. When the set
  21 and the order are clear from the context, we will use just~$R(x)$ and $R^{-1}(x)$.
  22
  23 \example
  24 Let us consider the set $C_k=\{\0,\1\}^k$ of all binary strings of length~$k$ ordered
  25 lexicographically. Then $R^{-1}(i)$ is the $i$-th smallest element of this set, that
  26 is the number~$i$ written in binary and padded to~$k$ digits (i.e., $\(i)_k$ in the
  27 notation of Section~\ref{bitsect}). Obviously, $R(x)$ is the integer whose binary
  28 representation is~$x$.
  29
  30 \para
  31 In this chapter, we will investigate how to compute the ranking and unranking
  32 functions on different sets. Usually, we will make use of the fact that the ranks
  33 (and hence the input and output of our algorithm) are large numbers, so we can
  34 use the integers of a~similar magnitude to represent non-trivial data structures.
  35 This will allow us to design efficient algorithms on the RAM.
  36
  37 \FIXME{We will assume RAM in the whole chapter.}
  38
  39 \section{Ranking of permutations}
  40 \id{pranksect}
  41
  42 One of the most common ranking problems is ranking of permutations on the set~$[n]=\{1,2,\ldots,n\}$.
  43 This is frequently used to create arrays indexed by permutations: for example in Ruskey's algorithm
  44 for finding Hamilton cycles in Cayley graphs (see~\cite{ruskey:ham} and \cite{ruskey:hce})
  45 or when exploring state spaces of combinatorial puzzles like the Loyd's Fifteen \cite{ss:fifteen}.
  46 Many other applications are surveyed by Critani et al.~in~\cite{critani:rau} and in
  47 most cases, the time complexity of the whole algorithm is limited by the efficiency
  48 of the (un)ranking functions.
  49
  50 In most cases, the permutations are ranked according to their lexicographic order.
  51 In fact, an~arbitrary order is often sufficient if the ranks are used solely
  52 for indexing of arrays, but the lexicographic order has an~additional advantage
  53 of a~nice structure allowing many additional operations on permutations to be
  54 performed directly on their ranks.
  55
  56 Na\"\i{}ve algorithms for lexicographic ranking require time $\Theta(n^2)$ in the
  57 worst case \cite{reingold:catp} and even on average~\cite{liehe:raulow}.
  58 This can be easily improved to $O(n\log n)$ by using either a binary search
  59 tree to calculate inversions, or by a divide-and-conquer technique, or by clever
  60 use of modular arithmetic (all three algorithms are described in
  61 \cite{knuth:sas}). Myrvold and Ruskey \cite{myrvold:rank} mention further
  62 improvements to $O(n\log n/\log \log n)$ by using the RAM data structures of Dietz
  63 \cite{dietz:oal}.
  64
  65 Linear time complexity was reached by Myrvold and Ruskey \cite{myrvold:rank}
  66 for a~non-lexicographic order, which is defined locally by the history of the
  67 data structure --- in fact, they introduce a linear-time unranking algorithm
  68 first and then they derive an inverse algorithm without describing the order
  69 explicitly. However, they leave the problem of lexicographic ranking open.
  70
  71 We will describe a~general procedure which, when combined with suitable
  72 RAM data structures, yields a~linear-time algorithm for lexicographic
  73 (un)ranking.
  74
  75 \nota\id{brackets}%
  76 We will view permutations on a~ordered set~$A\subseteq {\bb N}$ as ordered $\vert A\vert$-tuples
  77 (in other words, arrays) containing every element of~$A$ exactly once. We will
  78 use square brackets for indexing of these tuples: $\pi=(\pi[1],\ldots,\pi[\vert A\vert])$.
  79 The corresponding lexicographic ranking and unranking functions will be denoted by~$L(\pi,A)$
  80 and $L^{-1}(i,A)$ respectively.
  81
  82 \obs
  83 Let us first observe that permutations have a simple recursive structure.
  84 If we fix the first element $\pi[1]$ of a~permutation~$\pi$ on the set~$[n]$, the
  85 elements $\pi[2], \ldots, \pi[n]$ form a~permutation on $[n]-\{\pi[1]\} = \{1,\ldots,\pi[1]-1,\pi[1]+1,\ldots,n\}$.
  86 The lexicographic order of two permutations $\pi$ and~$\pi'$ on the original set is then determined
  87 by $\pi[1]$ and $\pi'[1]$ and only if these elements are equal, it is decided
  88 by the lexicographic comparison of permutations $(\pi[2],\ldots,\pi[n])$ and
  89 $(\pi'[2],\ldots,\pi'[n])$. Therefore the rank of $\pi$ is $(\pi[1]-1)\cdot
  90 (n-1)!$ plus the rank of $(\pi[2],\ldots,\pi[n])$.
  91
  92 This gives us a~reduction from (un)ranking of permutations on $[n]$ to (un)ranking
  93 of permutations on a $(n-1)$-element set, which suggests a straightforward
  94 algorithm, but unfortunately this set is different from $[n-1]$ and it even
  95 depends on the value of~$\pi[1]$. We could renumber the elements to get $[n-1]$,
  96 but it would require linear time per iteration. Instead, we generalize the
  97 problem to permutations on subsets of $[n]$. For a permutation $\pi$ on a~set
  98 $A\subseteq [n]$ of size~$m$, similar reasoning gives a~simple formula:
  99 $$
 100 L((\pi[1],\ldots,\pi[m]),A) = R_A(\pi[1]) \cdot (m-1)! +
 101 L((\pi[2],\ldots,\pi[m]), A\setminus\{\pi[1]\}),
 102 $$
 103 which uses the ranking function~$R_A$ for~$A$. This formula leads to the
 104 following algorithms (a~similar formulation can be found for example in~\cite{knuth:sas}):
 105
 106 \alg $\<Rank>(\pi,i,n,A)$: compute the rank of a~permutation $\pi[i\ldots n]$ on~$A$.
 107 \algo
 108 \:If $i\ge n$, return~0.
 109 \:$a\=R_A(\pi[i])$.
 110 \:$b\=\<Rank>(\pi,i+1,n,A \setminus \{\pi[i]\})$.
 111 \:Return $a\cdot(n-i)! + b$.
 112 \endalgo
 113
 114 \>We can call $\<Rank>(\pi,1,n,[n])$ for ranking on~$[n]$, i.e., to calculate
 115 $L(\pi,[n])$.
 116
 117 \alg $\<Unrank>(j,i,n,A)$: Fill $\pi[i,\ldots,n]$ with the $j$-th permutation on~$A$.
 118 \algo
 119 \:If $i>n$, return immediately.
 120 \:$a\=R^{-1}_A(\lfloor j/(n-i)! \rfloor)$.
 121 \:$\pi\=\<Unrank>(j\bmod (n-i)!,i+1,n,A\setminus \{a\})$.
 122 \:$\pi[i]\=a$.
 123 \endalgo
 124
 125 \>We can call $\<Unrank>(j,1,n,[n])$ for the unranking problem on~$[n]$, i.e., to get $L^{-1}(j,[n])$.
 126
 127 \para
 128 The most time-consuming parts of the above algorithms are of course operations
 129 on the set~$A$. If we use a~data structure of a~known time complexity, the complexity
 130 of the whole algorithm is easy to calculate:
 131
 132 \lemma\id{ranklemma}%
 133 Suppose that there is a~data structure maintaining a~subset of~$[n]$ under insertion
 134 and deletion of single elements and ranking and unranking of elements, all in time
 135 at most $t(n)$ per operation. Then lexicographic ranking and unranking of permutations
 136 can be performed in time $\O(n\cdot t(n))$.
 137
 138 \proof
 139 Let us analyse the above algorithms. The depth of the recursion is~$n$ and in each
 140 nested invokation of the recursive procedure we perform a~constant number of operations.
 141 All of them are either trivial or factorials (which can be precomputed in~$\O(n)$ time)
 142 or operations on the data structure.
 143 \qed
 144
 145 \example
 146 If we store~$A$ in an~ordinary array, we have insertion and deletion in constant time,
 147 but ranking and unranking in~$\O(n)$, so $t(n)=\O(n)$ and the algorithm is quadratic.
 148 Binary search trees give $t(n)=\O(\log n)$. The data structure of Dietz \cite{dietz:oal}
 149 improves it to $t(n)=O(\log n/\log \log n)$. In fact, all these variants are equivalent
 150 to the classical algorithms based on inversion vectors, because at the time of processing~$\pi[i]$,
 151 the value of $R_A(\pi[i])$ is exactly the number of elements forming inversions with~$\pi[i]$.
 152
 153 If we relax the requirements on the data structure to allow ordering of elements
 154 dependent on the history of the structure (i.e., on the sequence of deletes performed
 155 so far), we can implement all three operations in time $\O(1)$. We store
 156 the values in an array and we also keep an inverse permutation. Ranking is done by direct
 157 indexing, unranking by indexing of the inverse permutation and deleting by swapping
 158 the element with the last element. We can observe that although it no longer
 159 gives the lexicographic order, the unranking function is still the inverse of the
 160 ranking function (the sequence of deletes from~$A$ is the same in both functions),
 161 so this leads to $\O(n)$-time ranking and unranking in a~non-lexicographic order.
 162 This order is the same as the one used by Myrvold and Ruskey in~\cite{myrvold:rank}.
 163
 164 However, for our purposes we need to keep the same order on~$A$ over the whole
 165 course of the algorithm. We will make use of the representation of vectors
 166 by integers on the RAM as developed in Section~\ref{bitsect}, but first of
 167 all, we will note that the ranks are large numbers, so the word size of the machine
 168 has to be large as well for them to fit:
 169
 170 \obs
 171 $\log n! = \Theta(n\log n)$, therefore the word size~$W$ must be~$\Omega(n\log n)$.
 172
 173 \proof
 174 We have $n^n \ge n! \ge \lfloor n/2\rfloor^{\lfloor n/2\rfloor}$, so $n\log n \ge \log n! \ge \lfloor n/2\rfloor\cdot\log \lfloor n/2\rfloor$.
 175 \qed
 176
 177 \thmn{Ranking of permutations \cite{mm:rank}} When we order the permutations
 178 on the set~$[n]$ lexicographically, both ranking and unranking can be performed on the
 179 RAM in time~$\O(n)$.
 180
 181 \proof
 182 We will store the elements of the set~$A$ as a~sorted vector. Each element has
 183 $\O(\log n)$ bits, so the whole vector requires $\O(n\log n)$ bits, which by the
 184 above observation fits in a~constant number of machine words. We know from
 185 Algorithm~\ref{vecops} that ranks can be calculated in constant time in such
 186 vectors and that insertions and deletions can be translated to ranks and
 187 masking. Unranking, that is indexing of the vector, is masking alone.
 188 So we can apply the previous Lemma~\ref{ranklemma} with $t(n)=\O(1)$.
 189 \qed
 190
 191 \endpart