5 \chapter{Ranking Combinatorial Structures}
7 \section{Ranking and unranking}
9 The techniques for building efficient data structures on the RAM described
10 in Chapter~\ref{ramchap} can be also used for a~variety of problems related
11 to ranking of combinatorial structures. Generally, the problems are stated
15 Let~$C$ be a~set of objects and~$\prec$ a~linear order on~$C$. The \df{rank}
16 $R_{C,\prec}(x)$ of an~element $x\in C$ is the number of elements $y\in C$ such that $y\prec x$.
17 When~$\prec$ is defined on some superset of~$C$, we naturally extend the rank
18 to elements outside~$C$.
19 We will call the function $R_{C,\prec}$ the \df{ranking function} for $C$ and~$\prec$
20 and its inverse $R^{-1}_{C,\prec}$ the \df{unranking function}. When the set
21 and the order are clear from the context, we will use just~$R(x)$ and $R^{-1}(x)$.
24 Let us consider the set $C_k=\{\0,\1\}^k$ of all binary strings of length~$k$ ordered
25 lexicographically. Then $R^{-1}(i)$ is the $i$-th smallest element of this set, that
26 is the number~$i$ written in binary and padded to~$k$ digits (i.e., $\(i)_k$ in the
27 notation of Section~\ref{bitsect}). Obviously, $R(x)$ is the integer whose binary
28 representation is~$x$.
31 In this chapter, we will investigate how to compute the ranking and unranking
32 functions on different sets efficiently. Usually, we will make use of the fact
33 that the ranks (and hence the input and output of our algorithm) are large
34 numbers, so we can use the integers of a~similar magnitude to represent non-trivial
38 Until the end of the chapter, we will always assume that we are working on the
39 Random Access Machine.
41 %--------------------------------------------------------------------------------
43 \section{Ranking of permutations}
46 One of the most common ranking problems is ranking of permutations on the set~$[n]=\{1,2,\ldots,n\}$.
47 This is frequently used to create arrays indexed by permutations: for example in Ruskey's algorithm
48 for finding Hamilton cycles in Cayley graphs (see~\cite{ruskey:ham} and \cite{ruskey:hce})
49 or when exploring state spaces of combinatorial puzzles like the Loyd's Fifteen \cite{ss:fifteen}.
50 Many other applications are surveyed by Critani et al.~in~\cite{critani:rau} and in
51 most cases, the time complexity of the whole algorithm is limited by the efficiency
52 of the (un)ranking functions.
54 In most cases, the permutations are ranked according to their lexicographic order.
55 In fact, an~arbitrary order is often sufficient if the ranks are used solely
56 for indexing of arrays, but the lexicographic order has an~additional advantage
57 of a~nice structure allowing many additional operations on permutations to be
58 performed directly on their ranks.
60 Na\"\i{}ve algorithms for lexicographic ranking require time $\Theta(n^2)$ in the
61 worst case \cite{reingold:catp} and even on average~\cite{liehe:raulow}.
62 This can be easily improved to $O(n\log n)$ by using either a binary search
63 tree to calculate inversions, or by a divide-and-conquer technique, or by clever
64 use of modular arithmetic (all three algorithms are described in
65 \cite{knuth:sas}). Myrvold and Ruskey \cite{myrvold:rank} mention further
66 improvements to $O(n\log n/\log \log n)$ by using the RAM data structures of Dietz
69 Linear time complexity was reached by Myrvold and Ruskey \cite{myrvold:rank}
70 for a~non-lexicographic order, which is defined locally by the history of the
71 data structure --- in fact, they introduce a linear-time unranking algorithm
72 first and then they derive an inverse algorithm without describing the order
73 explicitly. However, they leave the problem of lexicographic ranking open.
75 We will describe a~general procedure which, when combined with suitable
76 RAM data structures, yields a~linear-time algorithm for lexicographic
80 We will view permutations on a~ordered set~$A\subseteq {\bb N}$ as ordered $\vert A\vert$-tuples
81 (in other words, arrays) containing every element of~$A$ exactly once. We will
82 use square brackets for indexing of these tuples: $\pi=(\pi[1],\ldots,\pi[\vert A\vert])$.
83 The corresponding lexicographic ranking and unranking functions will be denoted by~$L(\pi,A)$
84 and $L^{-1}(i,A)$ respectively.
87 Let us first observe that permutations have a simple recursive structure.
88 If we fix the first element $\pi[1]$ of a~permutation~$\pi$ on the set~$[n]$, the
89 elements $\pi[2], \ldots, \pi[n]$ form a~permutation on $[n]-\{\pi[1]\} = \{1,\ldots,\pi[1]-1,\pi[1]+1,\ldots,n\}$.
90 The lexicographic order of two permutations $\pi$ and~$\pi'$ on the original set is then determined
91 by $\pi[1]$ and $\pi'[1]$ and only if these elements are equal, it is decided
92 by the lexicographic comparison of permutations $(\pi[2],\ldots,\pi[n])$ and
93 $(\pi'[2],\ldots,\pi'[n])$. Therefore the rank of $\pi$ is $(\pi[1]-1)\cdot
94 (n-1)!$ plus the rank of $(\pi[2],\ldots,\pi[n])$.
96 This gives us a~reduction from (un)ranking of permutations on $[n]$ to (un)ranking
97 of permutations on a $(n-1)$-element set, which suggests a straightforward
98 algorithm, but unfortunately this set is different from $[n-1]$ and it even
99 depends on the value of~$\pi[1]$. We could renumber the elements to get $[n-1]$,
100 but it would require linear time per iteration. Instead, we generalize the
101 problem to permutations on subsets of $[n]$. For a permutation $\pi$ on a~set
102 $A\subseteq [n]$ of size~$m$, similar reasoning gives a~simple formula:
104 L((\pi[1],\ldots,\pi[m]),A) = R_A(\pi[1]) \cdot (m-1)! +
105 L((\pi[2],\ldots,\pi[m]), A\setminus\{\pi[1]\}),
107 which uses the ranking function~$R_A$ for~$A$. This formula leads to the
108 following algorithms (a~similar formulation can be found for example in~\cite{knuth:sas}):
110 \alg $\<Rank>(\pi,i,n,A)$: compute the rank of a~permutation $\pi[i\ldots n]$ on~$A$.
113 \:If $i\ge n$, return~0.
115 \:$b\=\<Rank>(\pi,i+1,n,A \setminus \{\pi[i]\})$.
116 \:Return $a\cdot(n-i)! + b$.
119 \>We can call $\<Rank>(\pi,1,n,[n])$ for ranking on~$[n]$, i.e., to calculate
122 \alg $\<Unrank>(j,i,n,A)$: Fill $\pi[i,\ldots,n]$ with the $j$-th permutation on~$A$.
125 \:If $i>n$, return immediately.
126 \:$a\=R^{-1}_A(\lfloor j/(n-i)! \rfloor)$.
127 \:$\pi\=\<Unrank>(j\bmod (n-i)!,i+1,n,A\setminus \{a\})$.
131 \>We can call $\<Unrank>(j,1,n,[n])$ for the unranking problem on~$[n]$, i.e., to get $L^{-1}(j,[n])$.
134 The most time-consuming parts of the above algorithms are of course operations
135 on the set~$A$. If we use a~data structure of a~known time complexity, the complexity
136 of the whole algorithm is easy to calculate:
138 \lemma\id{ranklemma}%
139 Suppose that there is a~data structure maintaining a~subset of~$[n]$ under insertion
140 and deletion of single elements and ranking and unranking of elements, all in time
141 at most $t(n)$ per operation. Then lexicographic ranking and unranking of permutations
142 can be performed in time $\O(n\cdot t(n))$.
145 Let us analyse the above algorithms. The depth of the recursion is~$n$ and in each
146 nested invokation of the recursive procedure we perform a~constant number of operations.
147 All of them are either trivial or factorials (which can be precomputed in~$\O(n)$ time)
148 or operations on the data structure.
152 If we store~$A$ in an~ordinary array, we have insertion and deletion in constant time,
153 but ranking and unranking in~$\O(n)$, so $t(n)=\O(n)$ and the algorithm is quadratic.
154 Binary search trees give $t(n)=\O(\log n)$. The data structure of Dietz \cite{dietz:oal}
155 improves it to $t(n)=O(\log n/\log \log n)$. In fact, all these variants are equivalent
156 to the classical algorithms based on inversion vectors, because at the time of processing~$\pi[i]$,
157 the value of $R_A(\pi[i])$ is exactly the number of elements forming inversions with~$\pi[i]$.
159 If we relax the requirements on the data structure to allow ordering of elements
160 dependent on the history of the structure (i.e., on the sequence of deletes performed
161 so far), we can implement all three operations in time $\O(1)$. We store
162 the values in an array and we also keep an inverse permutation. Ranking is done by direct
163 indexing, unranking by indexing of the inverse permutation and deleting by swapping
164 the element with the last element. We can observe that although it no longer
165 gives the lexicographic order, the unranking function is still the inverse of the
166 ranking function (the sequence of deletes from~$A$ is the same in both functions),
167 so this leads to $\O(n)$-time ranking and unranking in a~non-lexicographic order.
168 This order is the same as the one used by Myrvold and Ruskey in~\cite{myrvold:rank}.
170 However, for our purposes we need to keep the same order on~$A$ over the whole
171 course of the algorithm. We will make use of the representation of vectors
172 by integers on the RAM as developed in Section~\ref{bitsect}, but first of
173 all, we will note that the ranks are large numbers, so the word size of the machine
174 has to be large as well for them to fit:
177 $\log n! = \Theta(n\log n)$, therefore the word size~$W$ must be~$\Omega(n\log n)$.
180 We have $n^n \ge n! \ge \lfloor n/2\rfloor^{\lfloor n/2\rfloor}$, so $n\log n \ge \log n! \ge \lfloor n/2\rfloor\cdot\log \lfloor n/2\rfloor$.
183 \thmn{Ranking of permutations \cite{mm:rank}} When we order the permutations
184 on the set~$[n]$ lexicographically, both ranking and unranking can be performed on the
188 We will store the elements of the set~$A$ as a~sorted vector. Each element has
189 $\O(\log n)$ bits, so the whole vector requires $\O(n\log n)$ bits, which by the
190 above observation fits in a~constant number of machine words. We know from
191 Algorithm~\ref{vecops} that ranks can be calculated in constant time in such
192 vectors and that insertions and deletions can be translated to ranks and
193 masking. Unranking, that is indexing of the vector, is masking alone.
194 So we can apply the previous Lemma~\ref{ranklemma} with $t(n)=\O(1)$.
197 %--------------------------------------------------------------------------------
199 \section{Ranking of {\secitfont k\/}-permutations}
202 The technique from the previous section can be also generalized to ranking of
203 \df{$k$-permutations,} that is of ordered $k$-tuples drawn from the set~$[n]$,
204 again in lexicographic order. There are $n^{\underline k} = n\cdot(n-1)\cdot\ldots\cdot(n-k+1)$
205 such $k$-permutations and they have a~recursive structure similar to the one of
206 the permutations. We will therefore use the same recursive scheme as before
207 (algorithms \ref{rankalg} and \ref{unrankalg}), but we will modify the first step of both algorithms
208 to stop after the first~$k$ iterations. We will also replace the number $(n-i)!$
209 of permutations on the remaining elements by the number of $(k-i)$-permutations on the same elements,
210 i.e., $(n-i)^{\underline{k-i}}$. As $(n-i)^{\underline{k-i}} = (n-i) \cdot (n-i-1)^{\underline{k-i-1}}$,
211 we can precalculate all these numbers in linear time.
213 Unfortunately, the ranks of $k$-permutations can be much smaller, so we can no
214 longer rely on the same data structure fitting in a constant number of word-sized integers.
215 For example, if $k=1$, the ranks are $\O(\log n)$-bit numbers, but the data
216 structure still requires $\Theta(n\log n)$ bits.
218 We do a minor side step by remembering the complement of~$A$ instead, that is
219 the set of the at most~$k$ elements we have already seen. We will call this set~$H$
220 (because it describes the ``holes'' in~$A$). Let us prove that $\Omega(k\log n)$ bits
221 are needed to represent the rank, so the vector representation of~$H$ fits in
222 a~constant number of words.
225 The number of $k$-permutations on~$[n]$ is $2^{\Omega(k\log n)}$.
228 We already know that there $n^{\underline k}$ such $k$-permutations. If $k\le n/2$,
229 then every term in the product is $n/2$ or more, so $\log n^{\underline k} \ge
230 k\cdot (\log n - 1)$. If $k\ge n/2$, then $n^{\underline k} \ge n^{\underline{\smash{n/2}}}$
231 and $\log n^{\underline k} \ge (n/2)(\log n - 1) \ge (k/2)(\log n - 1)$.
235 It remains to show how to translate the operations on~$A$ to operations on~$H$,
236 again stored as a~sorted vector~${\bf h}$. Insertion to~$A$ correspond to
237 deletion in~$H$ and vice versa. The rank of any~$x\in[n]$ in~$A$ is $x$ minus
238 the number of holes which are smaller than~$x$, therefore $R_A(x)=x-R_H(x)$.
239 To calculate $R_H(x)$, we can again use the vector operation \<Rank> from Algorithm \ref{vecops},
240 this time on the vector~$\bf h$.
242 The only operation we cannot translate directly is unranking in~$A$. We will
243 therefore define an~auxiliary vector~$\bf b$ of the same size as~$\bf h$,
244 such that $b_i=h_i-i$ (this is the number of elements of~$A$ smaller
245 than~$h_i$). Now, if we want to find the $r$-th smallest element of~$A$, we find
246 the largest $i$ such that $b_i<r$ (the rank of~$r$ in~$\bf b$) and we return
247 $h_i+1+r-b_i$. This works because there is no hole in~$A$ between the element
248 $h_i+1$, which has rank~$b_i$, and the desired element of~rank~$r$.
251 If $A=\{2,5,6\}$ and $n=8$, then ${\bf h}=(1,3,4,7,8)$ and ${\bf b}
252 = (0,1,1,3,3)$. When we want to calculate $R^{-1}_A(2)$, we find $i=3$ and we
253 get $g_3+1+2-b_3 = 4+1+2-1 = 6$.
256 The vector~$\bf b$ can be updated in constant time whenever an~element is
257 inserted to~$\bf h$. It is sufficient to shift the fields apart (we know
258 that the position of the new element in~$\bf b$ is the same as in~$\bf h$),
259 set the new value using masking operations and decrease all higher fields
260 by one in parallel by using a~single subtraction. Updates after deletions
261 from~$\bf h$ are analogous.
263 \FIXME{State the algorithms properly.}
265 We have replaced all operations on~$A$ by the corresponding operations on the
266 modified data structure, each of which works again in constant time. Therefore
267 we have just proven the following theorem, which brings this section to
270 \thmn{Ranking of $k$-permutations \cite{mm:rank}}
271 When we order the $k$-per\-mu\-ta\-tions on the set~$[n]$ lexicographically, both
272 ranking and unranking can be performed on the RAM in time~$\O(k)$.
275 Modify Algorithms \ref{rankalg} and \ref{unrankalg} as described in this section.
276 The modified data structure performs all required operations on the set~$A$ in
277 constant time. The depth of the recursion is $\O(k)$ and we spend $\O(1)$ time
278 at every level. The time bound follows.