rank.tex

   1 \ifx\endpart\undefined
   2 \input macros.tex
   3 \fi
   4
   5 \chapter{Ranking Combinatorial Structures}
   6
   7 \section{Ranking and unranking}
   8
   9 The techniques for building efficient data structures on the RAM described
  10 in Chapter~\ref{ramchap} can be also used for a~variety of problems related
  11 to ranking of combinatorial structures. Generally, the problems are stated
  12 in the following way:
  13
  14 \defn\id{rankdef}%
  15 Let~$C$ be a~set of objects and~$\prec$ a~linear order on~$C$. The \df{rank}
  16 $R_{C,\prec}(x)$ of an~element $x\in C$ is the number of elements $y\in C$ such that $y\prec x$.
  17 We will call the function $R_{C,\prec}$ the \df{ranking function} for $C$ ordered by~$\prec$
  18 and its inverse $R^{-1}_{C,\prec}$ the \df{unranking function} for $C$ and~$\prec$. When the set
  19 and the order are clear from the context, we will use plain~$R(x)$ and $R^{-1}(x)$.
  20 Also, when $\prec$ is defined on a~superset~$C'$ of~$C$, we naturally extend $R_C(x)$
  21 to elements $x\in C'\setminus C$.
  22
  23 \example
  24 Let us consider the set $C_k=\{\0,\1\}^k$ of all binary strings of length~$k$ ordered
  25 lexicographically. Then $R^{-1}(i)$ is the $i$-th smallest element of this set, that
  26 is the number~$i$ written in binary and padded to~$k$ digits (i.e., $\(i)_k$ in the
  27 notation of Section~\ref{bitsect}). Obviously, $R(x)$ is the integer whose binary
  28 representation is the string~$x$.
  29
  30 \para
  31 In this chapter, we will investigate how to compute the ranking and unranking
  32 functions for different sets efficiently. Usually, we will make use of the fact
  33 that the ranks (and hence the input and output of our algorithm) are large
  34 numbers, so we can use the integers of a~similar magnitude to represent non-trivial
  35 data structures.
  36
  37 \para
  38 Until the end of the chapter, we will always assume that our model of computation
  39 is the Random Access Machine (more specifically, the Word-RAM).
  40
  41 %--------------------------------------------------------------------------------
  42
  43 \section{Ranking of permutations}
  44 \id{pranksect}
  45
  46 One of the most common ranking problems is ranking of permutations on the set~$[n]=\{1,2,\ldots,n\}$.
  47 This is frequently used to create arrays indexed by permutations: for example in Ruskey's algorithm
  48 for finding Hamilton cycles in Cayley graphs (see~\cite{ruskey:ham} and \cite{ruskey:hce})
  49 or when exploring state spaces of combinatorial puzzles like the Loyd's Fifteen \cite{ss:fifteen}.
  50 Many other applications are surveyed by Critani et al.~in~\cite{critani:rau} and in
  51 most cases, the time complexity of the whole algorithm is limited by the efficiency
  52 of the (un)ranking functions.
  53
  54 The permutations are usually ranked according to their lexicographic order.
  55 In fact, an~arbitrary order is often sufficient if the ranks are used solely
  56 for indexing of arrays. The lexicographic order however has an~additional advantage
  57 of a~nice structure, which allows various operations on permutations to be
  58 performed directly on their ranks.
  59
  60 Na\"\i{}ve algorithms for lexicographic ranking require time $\Theta(n^2)$ in the
  61 worst case \cite{reingold:catp} and even on average~\cite{liehe:raulow}.
  62 This can be easily improved to $O(n\log n)$ by using either a binary search
  63 tree to calculate inversions, or by a divide-and-conquer technique, or by clever
  64 use of modular arithmetic (all three algorithms are described in
  65 \cite{knuth:sas}). Myrvold and Ruskey \cite{myrvold:rank} mention further
  66 improvements to $O(n\log n/\log \log n)$ by using the RAM data structures of Dietz
  67 \cite{dietz:oal}.
  68
  69 Linear time complexity was reached by Myrvold and Ruskey \cite{myrvold:rank}
  70 for a~non-lexicographic order, which is defined locally by the history of the
  71 data structure --- in fact, they introduce a linear-time unranking algorithm
  72 first and then they derive an inverse algorithm without describing the order
  73 explicitly. However, they leave the problem of lexicographic ranking open.
  74
  75 We will describe a~general procedure which, when combined with suitable
  76 RAM data structures, yields a~linear-time algorithm for lexicographic
  77 (un)ranking.
  78
  79 \nota\id{brackets}%
  80 We will view permutations on a~set $A\subseteq {\bb N}$ as ordered $\vert A\vert$-tuples
  81 (in other words, arrays) containing every element of~$A$ exactly once. We will
  82 use square brackets to index these tuples: $\pi=(\pi[1],\ldots,\pi[\vert A\vert])$.
  83 The corresponding lexicographic ranking and unranking functions will be denoted by~$L(\pi,A)$
  84 and $L^{-1}(i,A)$ respectively.
  85
  86 \obs
  87 Let us first observe that permutations have a simple recursive structure.
  88 If we fix the first element $\pi[1]$ of a~permutation~$\pi$ on the set~$[n]$, the
  89 elements $\pi[2], \ldots, \pi[n]$ form a~permutation on $[n]-\{\pi[1]\} = \{1,\ldots,\pi[1]-1,\pi[1]+1,\ldots,n\}$.
  90 The lexicographic order of two permutations $\pi$ and~$\pi'$ on the original set is then determined
  91 by $\pi[1]$ and $\pi'[1]$ and only if these elements are equal, it is decided
  92 by the lexicographic comparison of permutations $(\pi[2],\ldots,\pi[n])$ and
  93 $(\pi'[2],\ldots,\pi'[n])$. Moreover, for fixed~$\pi[1]$ all permutations on
  94 the smaller set occur exactly once, so the rank of $\pi$ is $(\pi[1]-1)\cdot
  95 (n-1)!$ plus the rank of $(\pi[2],\ldots,\pi[n])$.
  96
  97 This gives us a~reduction from (un)ranking of permutations on $[n]$ to (un)ranking
  98 of permutations on a $(n-1)$-element set, which suggests a straightforward
  99 algorithm, but unfortunately this set is different from $[n-1]$ and it even
 100 depends on the value of~$\pi[1]$. We could renumber the elements to get $[n-1]$,
 101 but it would require linear time per iteration. To avoid this, we generalize the
 102 problem to permutations on subsets of $[n]$. For a permutation $\pi$ on a~set
 103 $A\subseteq [n]$ of size~$m$, similar reasoning gives a~simple formula:
 104 $$
 105 L((\pi[1],\ldots,\pi[m]),A) = R_A(\pi[1]) \cdot (m-1)! +
 106 L((\pi[2],\ldots,\pi[m]), A\setminus\{\pi[1]\}),
 107 $$
 108 which uses the ranking function~$R_A$ for~$A$. This recursive formula immediately
 109 translates to the following recursive algorithms for both ranking and unranking
 110 (described for example in \cite{knuth:sas}):
 111
 112 \alg $\<Rank>(\pi,i,n,A)$: compute the rank of a~permutation $\pi[i\ldots n]$ on~$A$.
 113 \id{rankalg}
 114 \algo
 115 \:If $i\ge n$, return~0.
 116 \:$a\=R_A(\pi[i])$.
 117 \:$b\=\<Rank>(\pi,i+1,n,A \setminus \{\pi[i]\})$.
 118 \:Return $a\cdot(n-i)! + b$.
 119 \endalgo
 120
 121 \>We can call $\<Rank>(\pi,1,n,[n])$ for ranking on~$[n]$, i.e., to calculate
 122 $L(\pi,[n])$.
 123
 124 \alg $\<Unrank>(j,i,n,A)$: Return an~array~$\pi$ such that $\pi[i,\ldots,n]$ is the $j$-th permutation on~$A$.
 125 \id{unrankalg}
 126 \algo
 127 \:If $i>n$, return $(0,\ldots,0)$.
 128 \:$a\=R^{-1}_A(\lfloor j/(n-i)! \rfloor)$.
 129 \:$\pi\=\<Unrank>(j\bmod (n-i)!,i+1,n,A\setminus \{a\})$.
 130 \:$\pi[i]\=a$.
 131 \:Return~$\pi$.
 132 \endalgo
 133
 134 \>We can call $\<Unrank>(j,1,n,[n])$ for the unranking problem on~$[n]$, i.e., to get $L^{-1}(j,[n])$.
 135
 136 \para
 137 The most time-consuming parts of the above algorithms are of course operations
 138 on the set~$A$. If we store~$A$ in a~data structure of a~known time complexity, the complexity
 139 of the whole algorithm is easy to calculate:
 140
 141 \lemma\id{ranklemma}%
 142 Suppose that there is a~data structure maintaining a~subset of~$[n]$ under a~sequence
 143 of deletions, which supports ranking and unranking of elements, and that
 144 the time complexity of a~single operation is at most~$t(n)$.
 145 Then lexicographic ranking and unranking of permutations can be performed in time $\O(n\cdot t(n))$.
 146
 147 \proof
 148 Let us analyse the above algorithms. The depth of the recursion is~$n$ and in each
 149 nested invokation of the recursive procedure we perform a~constant number of operations.
 150 All of them are either trivial, or calculations of factorials (which can be precomputed in~$\O(n)$ time),
 151 or operations on the data structure.
 152 \qed
 153
 154 \example
 155 If we store~$A$ in an~ordinary array, we have insertion and deletion in constant time,
 156 but ranking and unranking in~$\O(n)$, so $t(n)=\O(n)$ and the algorithm is quadratic.
 157 Binary search trees give $t(n)=\O(\log n)$. The data structure of Dietz \cite{dietz:oal}
 158 improves it to $t(n)=O(\log n/\log \log n)$. In fact, all these variants are equivalent
 159 to the classical algorithms based on inversion vectors, because at the time of processing~$\pi[i]$,
 160 the value of $R_A(\pi[i])$ is exactly the number of elements forming inversions with~$\pi[i]$.
 161
 162 \para
 163 To obtain linear time complexity, we will make use of the representation of
 164 vectors by integers on the RAM as developed in Section~\ref{bitsect}, but first
 165 of all, we will make sure that the ranks are large numbers, so the word size of the
 166 machine has to be large as well:
 167
 168 \obs
 169 $\log n! = \Theta(n\log n)$, therefore the word size~$W$ must be~$\Omega(n\log n)$.
 170
 171 \proof
 172 We have $n^n \ge n! \ge \lfloor n/2\rfloor^{\lfloor n/2\rfloor}$, so $n\log n \ge \log n! \ge \lfloor n/2\rfloor\cdot\log \lfloor n/2\rfloor$.
 173 \qed
 174
 175 \thmn{Lexicographic ranking of permutations \cite{mm:rank}}
 176 When we order the permutations on the set~$[n]$ lexicographically, both ranking
 177 and unranking can be performed on the RAM in time~$\O(n)$.
 178
 179 \proof
 180 We will store the elements of the set~$A$ in a~sorted vector. Each element has
 181 $\O(\log n)$ bits, so the whole vector takes $\O(n\log n)$ bits, which by the
 182 above observation fits in a~constant number of machine words. We know from
 183 Algorithm~\ref{vecops} that ranks can be calculated in constant time in such
 184 vectors and that insertions and deletions can be translated to ranks and
 185 masking. Unranking, that is indexing of the vector, is masking alone.
 186 So we can apply the previous Lemma \ref{ranklemma} with $t(n)=\O(1)$.
 187 \qed
 188
 189 \rem
 190 We can also easily derive the non-lexicographic linear-time algorithm of Myrvold
 191 and Ruskey~\cite{myrvold:rank} from our algorithm. We will relax the requirements
 192 on the data structure to allow order of elements dependent on the history of the
 193 structure (i.e., on the sequence of deletes performed so far). We can observe that
 194 although the algorithm no longer gives the lexicographic ranks, the unranking function
 195 is still an~inverse of the ranking function, because the sequence of deletes
 196 from~$A$ is the same when both ranking and unraking.
 197
 198 The implementation of the relaxed structure is straightforward. We store the set~$A$
 199 in an~array~$\alpha$ and use the order of the elements in~$\alpha$ determine the
 200 order on~$A$. We will also maintain an~``inverse'' array $\alpha^{-1}$ such that
 201 $\alpha[\alpha^{-1}[x]]=x$ for every~$x\in A$. Ranking and unranking can be performed
 202 by a~simple lookup in these arrays: $R_A(x)=\alpha^{-1}[x]$, $R^{-1}(i)=\alpha[i]$.
 203 When we want to delete an~element, we exchange it with the last element in the
 204 array~$\alpha$ and update~$\alpha^{-1}$ accordingly.
 205
 206
 207 %--------------------------------------------------------------------------------
 208
 209 \section{Ranking of {\secitfont k\/}-permutations}
 210 \id{kpranksect}
 211
 212 The technique from the previous section can be also generalized to lexicographic ranking of
 213 \df{$k$-permutations,} that is of ordered $k$-tuples drawn from the set~$[n]$.
 214 There are $n^{\underline k} = n\cdot(n-1)\cdot\ldots\cdot(n-k+1)$
 215 such $k$-permutations and they have a~recursive structure similar to the one of
 216 the permutations. We will therefore use the same recursive scheme as before
 217 (algorithms \ref{rankalg} and \ref{unrankalg}), but we will modify the first step of both algorithms
 218 to stop after the first~$k$ iterations. We will also replace the number $(n-i)!$
 219 of permutations on the remaining elements by the number of $(k-i)$-permutations on the same elements,
 220 i.e., by $(n-i)^{\underline{k-i}}$. As $(n-i)^{\underline{k-i}} = (n-i) \cdot (n-i-1)^{\underline{k-i-1}}$,
 221 we can precalculate all these numbers in linear time.
 222
 223 Unfortunately, the ranks of $k$-permutations can be much smaller, so we can no
 224 longer rely on the same data structure fitting in a constant number of word-sized integers.
 225 For example, if $k=1$, the ranks are $\O(\log n)$-bit numbers, but the data
 226 structure still requires $\Theta(n\log n)$ bits.
 227
 228 We do a minor side step by remembering the complement of~$A$ instead, that is
 229 the set of the at most~$k$ elements we have already seen. We will call this set~$H$
 230 (because it describes the ``holes'' in~$A$). Let us prove that $\Omega(k\log n)$ bits
 231 are needed to represent the rank, so the vector representation of~$H$ fits in
 232 a~constant number of words.
 233
 234 \lemma
 235 The number of $k$-permutations on~$[n]$ is $2^{\Omega(k\log n)}$.
 236
 237 \proof
 238 We already know that there $n^{\underline k}$ such $k$-permutations. If $k\le n/2$,
 239 then every term in the product is $n/2$ or more, so $\log n^{\underline k} \ge
 240 k\cdot (\log n - 1)$. If $k\ge n/2$, then $n^{\underline k} \ge n^{\underline{\smash{n/2}}}$
 241 and $\log n^{\underline k} \ge (n/2)(\log n - 1) \ge (k/2)(\log n - 1)$.
 242 \qed
 243
 244 \para
 245 It remains to show how to translate the operations on~$A$ to operations on~$H$,
 246 again stored as a~sorted vector~${\bf h}$. Insertion to~$A$ correspond to
 247 deletion from~$H$ and vice versa. The rank of any~$x\in[n]$ in~$A$ is $x$ minus
 248 the number of holes which are smaller than~$x$, therefore $R_A(x)=x-R_H(x)$.
 249 To calculate $R_H(x)$, we can again use the vector operation \<Rank> from Algorithm \ref{vecops},
 250 this time on the vector~$\bf h$.
 251
 252 The only operation we cannot translate directly is unranking in~$A$. We will
 253 therefore define an~auxiliary vector~$\bf b$ of the same size as~$\bf h$,
 254 such that $b_i=h_i-i$ (this is the number of elements of~$A$ smaller
 255 than~$h_i$). Now, if we want to find the $r$-th smallest element of~$A$, we find
 256 the largest $i$ such that $b_i<r$ (the rank of~$r$ in~$\bf b$) and we return
 257 $h_i+1+r-b_i$. This works because there is no hole in~$A$ between the element
 258 $h_i+1$, which has rank~$b_i$, and the desired element of~rank~$r$.
 259
 260 \example
 261 If $A=\{2,5,6\}$ and $n=8$, then ${\bf h}=(1,3,4,7,8)$ and ${\bf b}
 262 = (0,1,1,3,3)$. When we want to calculate $R^{-1}_A(2)$, we find $i=3$ and we
 263 get $g_3+1+2-b_3 = 4+1+2-1 = 6$.
 264
 265 \para
 266 The vector~$\bf b$ can be updated in constant time whenever an~element is
 267 inserted to~$\bf h$. It is sufficient to shift the fields apart (we know
 268 that the position of the new element in~$\bf b$ is the same as in~$\bf h$),
 269 insert the new value using masking operations and decrease all higher fields
 270 by one in parallel by using a~single subtraction. Updates after deletions
 271 from~$\bf h$ are analogous.
 272
 273 \FIXME{State the algorithms properly.}
 274
 275 We have replaced all operations on~$A$ by the corresponding operations on the
 276 modified data structure, each of which works again in constant time. Therefore
 277 we have just proven the following theorem, which brings this section to
 278 a~happy ending:
 279
 280 \thmn{Lexicographic ranking of $k$-permutations \cite{mm:rank}}
 281 When we order the $k$-per\-mu\-ta\-tions on the set~$[n]$ lexicographically, both
 282 ranking and unranking can be performed on the RAM in time~$\O(k)$.
 283
 284 \proof
 285 We modify algorithms \ref{rankalg} and \ref{unrankalg} as described in this section.
 286 The modified data structure performs all required operations on the set~$A$ in
 287 constant time. The depth of the recursion is $\O(k)$ and we spend $\O(1)$ time
 288 at every level. The time bound follows.
 289 \qed
 290
 291 \endpart