5 \chapter{Approaching Optimality}
9 Recently, Chazelle \cite{chazelle:ackermann} and Pettie \cite{pettie:ackermann}
10 have presented algorithms for the MST with worst-case time complexity
11 $\O(m\timesalpha(m,n))$. We will devote this chapter to their results
12 and especially to another algorithm by Pettie and Ramachandran \cite{pettie:optimal},
13 which is provably optimal.
15 At the very heart of all these algorithms lies the \df{soft heap} discovered by
16 Chazelle \cite{chazelle:softheap}. It is a~meldable priority queue, roughly
17 similar to the Vuillemin's binomial heaps \cite{vuillemin:binheap} or Fredman's
18 and Tarjan's Fibonacci heaps \cite{ft:fibonacci}. The soft heaps run faster at
19 the expense of \df{corrupting} a~fraction of the inserted elements by raising
20 their values (the values are however never decreased). This allows for
21 an~trade-off between accuracy and speed controlled by a~parameter~$\varepsilon$.
22 The heap operations take $\O(\log(1/\varepsilon))$ amortized time and at every
23 moment at most~$\varepsilon n$ elements of the~$n$ elements inserted can be
26 \defnn{Soft heap operations}%
27 The soft heap contains distinct elements from a~totally ordered universe and it
28 supports the following operations:
30 \:$\<Create>(\varepsilon)$ --- create an~empty soft heap with the given accuracy parameter,
31 \:$\<Insert>(H,x)$ --- insert a~new element~$x$ to the heap~$H$,
32 \:$\<Meld>(P,Q)$ --- merge two heaps into one, more precisely insert all elements of a~heap~$Q$
33 to the heap~$P$, destroying~$Q$ in the process (both heaps must have the same~$\varepsilon$),
34 \:$\<DeleteMin>(H)$ --- delete the minimum element of the heap~$H$ and return its value
35 (optionally signalling that the value has been corrupted).
39 The soft heap is built from \df{soft queues} (we will omit the adjective soft
40 in the rest of this section). Each queue has a~shape of a~binary tree.\foot{%
41 Actually, Chazelle defines the queues as binomial trees, but he transforms them in ways that are
42 somewhat counter-intuitive, albeit well-defined. We prefer describing the queues as binary
43 trees with a~special distribution of values. In fact, the original C~code in the Chazelle's
44 paper uses this representation internally.}
45 Each vertex~$v$ of the tree remembers a~doubly-linked list $\<list>(v)$ of values. The
46 item list in every left son will be used only temporarily and it will be kept
47 empty between operations. Only right sons and the root have their lists
48 permanently occupied. The former vertices will be called \df{white}, the latter
49 \df{black.} (See the picture.)
51 The first value in the list is called the \df{controlling key} of the vertex,
52 denoted by $\<ckey>(v)$. If the list is empty, we keep the most recently used
53 value or we set $\<ckey>(v)=+\infty$. The \<ckey>'s obey the standard \df{heap order}
54 --- a~\<ckey> of a~parent is always less than the \<ckey>'s of its sons.
56 Each vertex is also assigned its \df{rank,} which is a~non-negative integer.
57 The rank of leaves is always zero, the rank of every internal vertex must be strictly
58 greater than the ranks of its sons. We define the rank of a~queue to be equal to the
59 rank of its root vertex and similarly for its \<ckey>.
61 A~queue is called \df{complete} if every two vertices joined by an~edge have rank
62 difference exactly one. In other words, it is a~complete binary tree and the
63 ranks correspond to vertex heights.
65 \figure{softheap1.eps}{\epsfxsize}{\multicap{A~complete and a~partial soft queue tree\\
66 (black vertices contain items, numbers indicate ranks)}}
69 A~complete queue of rank~$k$ contains exactly~$2^{k+1}-1$ vertices, $2^k$~of which are
70 black (by induction). Any other queue can be trivially embedded to a~complete queue of the same
71 rank, which we will call the \df{master tree} of the queue. This embedding preserves vertex
72 ranks, colors and the ancestor relation.
74 The queues have a~nice recursive structure. We can construct a~queue of rank~$k$ by \df{joining}
75 two queues of rank~$k-1$ under a~new root. The root will inherit the item list of one
76 of the original roots and also its \<ckey>. To preserve the heap order, we will choose
77 the one whose \<ckey> is smaller.
79 Sometimes, we will need to split a~queue to smaller queues. We will call this operation
80 \df{dismantling} the queue and it will happen only in cases when the item list in the root
81 is empty. It suffices to remove the leftmost (all white) path going from the root. From
82 a~queue of rank~$k$, we get queues of ranks $0,1,\ldots,k-1$, some of which may be missing
83 if the original queue was not complete.
85 We will now combine the soft queues to the soft heap.
87 \figure{softheap2.eps}{\epsfxsize}{Joining and dismantling of soft queues}
89 \defn A~\df{soft heap} consists of:
91 \:a~doubly linked list of soft queues of distinct ranks (in increasing order of ranks),
92 we will call the first queue the \df{head}, the last queue will be its \df{tail};
93 \:\df{suffix minima:} each queue contains a~pointer to the queue with minimum \<ckey>
94 of those following it in the list;
95 \:and a~global parameter~$r\in{\bb N}$, to be set depending on~$\varepsilon$.
97 \>The heap always keeps the \df{Rank invariant:} When a~root of a~tree has rank~$k$,
98 its leftmost path has length at least~$k/2$.
101 \em{Melding} of two soft heaps involves merging of their lists of queues. We disassemble
102 the heap with the smaller maximum rank and we insert its queues to the other heap.
103 If two queues of the same rank~$k$ appear in both lists, we \em{join} them to
104 a~single queue of rank~$k+1$ as already described and we propagate the new queue as
105 a~\df{carry} to the next iteration, similarly to addition of binary numbers.
106 Finally we have to update the suffix minima by walking the new list backwards
107 from the last inserted item.
109 \em{Creation} of a~new soft heap is trivial, \em{insertion} is handled by creating
110 a~single-element heap and melding it to the destination heap.
112 \algn{Creating a~new soft heap}
114 \algin A~parameter~$\varepsilon$.
115 \:Allocate memory for a~new heap structure~$H$.
116 \:Initialize the list of queues in~$H$ as an~empty list.
117 \:Set the parameter~$r$ to~$2\lceil\log(1/\varepsilon)\rceil+2$ (to be justified later).
118 \algout A~newly minted soft heap~$H$.
121 \algn{Melding of two soft heaps}
123 \algin Two soft heaps~$P$ and~$Q$.
124 \:If the tail of~$P$ has smaller rank than the tail of~$Q$, exchange their item lists.
125 \brk\cmt{Whenever we run into an~end of a~list, we assume that it contains an~empty queue of infinite rank.}
127 \:While $Q$ still has some queues:
129 \::If $\<rank>(p) < \<rank>(q)$, then $p\=$ the successor of~$p$,
130 \::else if $\<rank>(p) > \<rank>(q)$, remove~$q$ from~$Q$ and insert it before~$p$,
131 \::otherwise (the ranks are equal, we need to propagate the carry):
133 \:::$p\=$ the successor of~$p$.
134 \:::While $\<rank>(q)=\<rank>(\<carry>)$:
135 \::::Remove~$q$ from~$Q$.
136 \::::$\<carry>\=\<join>(q, \<carry>)$.
137 \::::$q\=\<head>(Q)$.
138 \:::Insert~\<carry> before~$q$.
139 \:Update the suffix minima: Walk with~$p$ backwards to the head of~$P$:
140 \::$p'\=$ successor of~$p$.
141 \::If $\<ckey>(p) < \<ckey>(p')$, set $\<suffix\_min>(p)\=p$.
142 \::Otherwise set $\<suffix\_min>(p)\=\<suffix\_min>(p')$.
143 \:Destroy the heap~$Q$.
144 \algout The merged heap~$P$.
147 \algn{Insertion of an~element to a~soft heap}
149 \algin A~heap~$H$ and a~new element~$x$.
150 \:Create a~new heap~$H'$ with the same parameters as~$H$. Let~$H'$ contain a~sole queue of rank~0,
151 whose only vertex has the element~$x$ in its item list.
153 \algout An~updated heap~$H$.
157 So far, the mechanics of the soft heaps was almost identical to the binomial heaps
158 and the reader could rightfully yawn. The things are going to get interesting
159 now as we approach the parts where corruption of items takes place.
161 If all item lists contain at most one item equal to the \<ckey> of the particular
162 vertex, no information is lost and the heap order guarantees that the minimum item
163 of every queue stays in its root. We can however allow longer lists and let the items
164 stored in a~single list travel together between the vertices of the tree, still
165 represented by a~common \<ckey>. This data-structural analogue of car pooling will
166 allow the items to travel at a~faster rate, but as only a~single item can be equal
167 to the \<ckey>, all other items will be inevitably corrupted.
169 We of course have to be careful about the size of the lists, as we must avoid corrupting
170 too many items. We will control the growth according to the vertex ranks. Vertices
171 with rank at most~$r$ will always contain just a~single item. Above this level,
172 the higher is the rank, the longer list will we allow.
175 \em{Deleting of minimum} will follow this principle. The minimum is easy to locate
176 --- we follow the \<suffix\_min> of the head of the heap to the queue with the minimum
177 \<ckey> and we look inside the item list of the root of this queue. We remove the \em{last}
178 item from the list (we do not want the \<ckey> to change) and return it as the minimum.
179 (It is not necessarily the real minimum of all items, but always the minimum of those
182 If the list becomes empty, we \em{refill} it with items from the lower levels of the
183 same tree. This can be best described recursively: We ask the left son to refill itself
184 (remember that the left son is always white, so there are no items to lose). If the new
185 \<ckey> of the left son is smaller than of the right son, we immediately move the left
186 son's list to its parent. Otherwise, we exchange the sons and move the list from the
187 new left son to the parent. This way, we obey the heap order and the same time we keep
188 the white left son free of items.
190 Ocassionally, we repeat this process once again and we concatenate the resulting lists
191 (we append the latter list to the former, using the smaller of the two \<ckey>'s). This
192 makes the lists grow longer and we want to do that roughly on every other level of the
193 tree. The exact condition will be that either the rank of the current vertex is odd,
194 or the difference in ranks between this vertex and its right son is at least two.
196 If refilling of the left son fails because there are no more items in that subtree
197 (we report this by setting its \<ckey> to $+\infty$), the current vertex is no longer
198 needed, as the items would just travel through it unmodified. We therefore want to
199 remove it. Instead of deleting it directly, we will rather make it point to its former
200 grandsons and we will remove the (now orhpaned) original son. This helps us to ensure
201 that both sons always have the same rank, which will be useful for the analysis.
203 When all refilling is done, we update the suffix minima by walking from the current
204 queue to the head of the heap exactly as we did in the \<Meld> procedure.
206 Our only remaining worry is that the Rank invariant can be broken after the
207 refilling. When the leftmost path of the tree becomes too short, we just
208 \em{dismantle} the tree as already described and we meld the new trees back to
209 the heap. This is easier to handle when the item list at the root vertex is
210 empty. We will therefore move this check before the refilling of the root list.
211 It will turn out that we have enough time to always walk the leftmost path
212 completely, so no explicit counters are needed.
214 Let us translate these ideas to real (pseudo)code:
216 \algn{Deleting the minimum item of a~soft heap}
218 \algin A~soft heap~$H$.
219 \:Use \<suffix\_min> of the head queue of~$H$ to locate the queue~$q$ with the smallest \<ckey>.
220 \:Remove the last element~$x$ of the item list in the root of~$q$.
221 \:If the item list is empty:
222 \::Count the vertices on the leftmost path of~$q$.
223 \::If there are less than $\<rank>(q)$ of them:
224 \:::Remove~$q$ from the list of queues.
225 \:::Recalculate the suffix minima as in the \<Meld> procedure.
226 \:::Dismantle~$q$ and create a~heap~$H'$ holding the resulting trees.
227 \:::Meld them back: $\<Meld>(H,H')$.
229 \:::Call \<Refill> on the root of~$q$.
230 \:::If $\<ckey>(q)=+\infty$ (no items left), remove the tree~$q$.
231 \:::Recalculate the suffix minima.
232 \algout The deleted minimum item~$x$ (possibly corrupted).
235 \algn{Refilling the item list of a~vertex}
237 \algin A~soft queue and its vertex~$v$ with an~empty item list.
238 \:Handle trivial cases: If~$v$ has no children or both have $\<ckey>=+\infty$,
239 set $\<ckey>(v)$ to~$+\infty$ and return.
240 \:Let \<left> and~\<right> denote the sons of~$v$.
241 \:Recurse: call $\<Refill>(\<left>)$.
242 \:If $\<ckey>(\<left>) > \<ckey>(\<right>)$, swap the sons.
243 \:Move the item list from \<left> to~$v$ (implying $\<ckey>(v)=\<ckey>(\<left>)$).
244 \:If $\<rank>(v) > r$ and either $\<rank>(v)$ is odd or $\<rank>(v) > \<rank>(\<right>)+1$, recurse once more:
245 \::Repeat steps 3--4.
246 \::Append the item list from \<left> to the item list at~$v$.
247 \:Clean up. If $\<ckey>(\<right>) = +\infty$:
248 \::If $\<ckey>(\<left>) = +\infty$, unlink and discard both sons.
249 \::Otherwise relink the sons of \<left> to~$v$ and discard \<left>.
250 \algout A~modified soft queue.
253 \paran{Analysis of accuracy}
254 The description of the operations is complete, let us analyse their behavior
255 and verify that we have delivered what we promised --- first the accuracy of
256 the structure, then the time complexity of operations.
258 We start with bounding the size of the item lists. We will assume that
259 the threshold~$r$ is even.
262 For every vertex~$v$ of a~soft queue, the size $\ell(v)$ of its item list
264 $$\ell(v) \le \max(1, 2^{\lceil \<rank>(v)/2 \rceil - r/2}).$$
267 Initially, all item lists contain at most one item, so the ineqality trivially
268 holds. Let us continue by induction. Melds can affect it only in the favorable
269 direction (they sometimes move an~item list to a~vertex of a~higher rank)
270 and so do deletes (they only remove items from lists). The only potentially
271 dangerous place is the \<Refill> procedure.
273 Refilling sometimes just moves items upwards, which is safe, and sometimes it
274 joins two lists into one, which generally is not. When $\<rank>(v) \le r$,
275 no joining takes place and~$\ell(v)$ is still~1. Otherwise we join when either
276 $\<rank>(v)$ is odd or $\<rank>(w) < \<rank>(v)-1$ for any son~$w$ of~$v$ (remember
277 that both sons have the same rank). In both cases, $\lceil\<rank>(w)/2\rceil \le
278 \lceil\<rank>(v)/2\rceil - 1$. By the induction hypothesis, the size of each
279 of the two joined lists is at most $2^{\max(1,\lceil\<rank>(v)/2\rceil - 1 - r/2)}$,
280 so the result has at most $2^{\lceil\<rank>(v)/2\rceil - r/2}$ items. (The maximum
281 has disappeared since $\<rank>(v)>r$ and therefore the desired bound is at least~2.)
284 We will now sum the sizes of the lists over all vertices containing corrupted items.
287 After~$n$ items have been inserted, the heap contains at most~$n/2^{r-3}$ corrupted
288 items at any given time.
291 We first prove an~auxiliary claim: The master trees of all queues contain at most~$n$
292 black vertices. This follows by induction: If no deletions have taken place,
293 there are exactly~$n$ of them, because insertion adds one black vertex and
294 melding preserves their number. A~deletion affects the master trees only when
295 dismantling takes place and then it only removes a~black vertex.
297 An~obvious upper bound on the number of corrupted items is the total size of item
298 lists in all vertices of rank greater than~$r$. We already know from the previous
299 lemma that the list sizes are limited by a~function of the ranks. A~complete tree
300 is obviously the worst case, so we will prove that this lemma holds for the master
301 tree of every queue in the heap. The actual trees can be much sparser, but the
302 above claim guarantees that the total size of the master trees is bounded by the
303 number of insertions properly.
305 So let us have a~complete tree of~rank~$k$. It has exactly $2^{k-i}$ vertices
306 of rank~$i$ and each such vertex contains a~list of at most~$2^{\lceil i/2\rceil - r/2}$
307 items by the previous lemma. Summing over all ranks greater than~$r$, we get that
308 the total number of corrupted items in this tree is at most:
310 \sum_{i=r+1}^k 2^{k-i}\cdot 2^{\lceil i/2\rceil - r/2}
311 = 2^{k-r/2} \cdot \sum_i 2^{\lceil i/2\rceil - i}
312 \le 2^{k-r/2} \cdot \sum_i 2^{-i/2}
315 \FIXME{Finish the proof and update the claim of the lemma.}