From dacceb3347cf2848b8a44000e8c6d19d65e1d172 Mon Sep 17 00:00:00 2001
From: Martin Mares <mj@ucw.cz>
Date: Thu, 3 Apr 2008 00:18:44 +0200
Subject: [PATCH] Decision trees started.

---
 PLAN         |   5 ++-
 mst.tex      |   2 +-
 notation.tex |   3 ++
 opt.tex      | 103 +++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 110 insertions(+), 3 deletions(-)

diff --git a/PLAN b/PLAN
index 6792895..f1481d7 100644
--- a/PLAN
+++ b/PLAN
@@ -24,8 +24,8 @@
 
 *  Approaching Optimality
 
-  .  Soft heaps
-  .  Robust contractions
+  o  Soft heaps
+  o  Robust contractions
   .  An optimal algorithm
   .  Decision trees
 
@@ -67,6 +67,7 @@ Spanning trees:
 - random sampling (propp:randommst)
 - mention bugs in Valeria's verification paper
 - Pettie's optimal algorithm runs in average linear time
+- add references to other applications of decomposition
 
 Models:
 
diff --git a/mst.tex b/mst.tex
index 8b8e74f..00d818f 100644
--- a/mst.tex
+++ b/mst.tex
@@ -562,7 +562,7 @@ edges together. The bucket sort uses two passes with $n_i$~buckets, so it takes
 $\O(n_i+m_i)=\O(m_i)$.
 \qed
 
-\thm
+\thm\id{contborthm}%
 The Contractive Bor\o{u}vka's algorithm finds the MST of the input graph in
 time $\O(\min(n^2,m\log n))$.
 
diff --git a/notation.tex b/notation.tex
index f954ee3..e3016ae 100644
--- a/notation.tex
+++ b/notation.tex
@@ -85,6 +85,9 @@
 \n{$\per M$}{the permanent of a~square matrix~$M$}
 \n{$G\crpt R$}{graph~$G$ with edges in~$R$ corrupted \[corrnota]}
 \n{$R_C$}{$R_C = R\cup \delta(C)$ \[corrnota]}
+\n{${\cal D}(G)$}{The optimal MSF decision tree for a~graph~$G$ \[decdef]}
+\n{$D(G)$}{The depth of ${\cal D}(G)$ \[decdef]}
+\n{$D(m,n)$}{Decision tree complexity of MSF \[decdef]}
 }
 
 %--------------------------------------------------------------------------------
diff --git a/opt.tex b/opt.tex
index 6ba150d..c2c87c6 100644
--- a/opt.tex
+++ b/opt.tex
@@ -747,6 +747,109 @@ time there. The rest of the algorithm spends $\O(m)$ time on the other operation
 and $\O(n)$ on identifying the live vertices.
 \qed
 
+%--------------------------------------------------------------------------------
+
+\section{Decision trees}
+
+The Pettie's algorithm combines the idea of robust partitioning with optimal decision
+trees constructed by brute force for very small subgraphs. In this section, we will
+explain the basics of the decision trees and prove several lemmata which will
+turn out to be useful for the analysis of time complexity of the whole algorithm.
+
+Let us consider all computations of a~comparison-based MST algorithm when we
+run it on a~fixed graph~$G$ with all possible permutations of edge weights.
+They can be described by a~tree. The root of the tree corresponds to the first
+comparison performed by the algorithm and depending to its result, the computation
+continues either in the left subtree or in the right subtree. There it encounters
+another comparison and so on, until it arrives to a~leaf of the tree where the
+spanning tree found by the algorithm is recorded.
+
+Formally, the decision trees are defined as follows:
+
+\defnn{Decision trees and their complexity}\id{decdef}%
+A~\df{MSF decision tree} for a~graph~$G$ is a~binary tree. Its internal vertices
+are labeled with pairs of edges to be compared, each of the two outgoing edges
+corresponds to one possible result of the comparison.\foot{There is no reason
+to compare two identical edges.}
+Leaves of the tree are labeled with spanning trees of the graph~$G$.
+
+A~\df{computation} of the decision tree on a~specific permutation of edge weights
+in~$G$ is a~path from the root to a~leaf such that the outcome of every comparison
+agrees with the edge weights. The result of the computation is the spanning tree
+assigned to its final leaf.
+A~decision tree is \df{correct} iff for every permutation the corresponding
+computation results in the real MSF of~$G$ with the particular weights.
+
+The \df{time complexity} of a~decision tree is defined as its depth. It therefore
+bounds the number of comparisons spent on every path. (It need not be equal since
+some paths need not correspond to an~actual computation --- the sequence of outcomes
+on the path can be unsatifisfiable.)
+
+A~decision tree is called \df{optimal} if it is correct and its depth is minimum possible
+among the correct decision trees for the given graph.
+We will denote an~arbitrary optimal decision tree for~$G$ by~${\cal D}(G)$ and its
+complexity by~$D(G)$.
+
+The \df{decision tree complexity} $D(m,n)$ of the MSF problem is the maximum of~$D(G)$
+over all graphs~$G$ with $n$~vertices and~$m$ edges.
+
+\obs
+Decision trees are the most general comparison-based computation model possible.
+The only operations which count in the time complexity are the comparisons. All
+other computation is free, including solving NP-complete problems or having
+access to an~unlimited source of non-uniform constants. The decision tree
+complexity is therefore an~obvious lower bound on the time complexity of the
+problem in all other models.
+
+The downside is that we do not know any explicit construction of the optimal
+decision trees, or at least a~non-constructive proof of their complexity.
+On the other hand, the complexity of any existing comparison-based algorithm
+can be used as an~upper bound for the decision tree complexity. For example:
+
+\lemma
+$D(m,n) \le 4/3 \cdot n^2$.
+
+\proof
+Let us count the comparisons performed by the contractive Bor\o{u}vka's algorithm.
+(\ref{contbor}), tightening up the constants in the original analysis in Theorem
+\ref{contborthm}. In the first phase, each edge participates in two comparisons
+(one for each its endpoint), so the algorithm performs at most $2m \le 2{n\choose 2} \le n^2$
+comparisons. Then the number of vertices drops at least by a~factor of two, so
+the subsequent phases spend $(n/2)^2, (n/4)^2, \ldots$ comparisons, which sums
+to less than $n^2\cdot\sum_{i=0}^\infty (1/4)^i = 4/3 \cdot n^2$.
+\qed
+
+\para
+Of course we can get sharper bounds from the better algorithms, but we will first
+show how to find the optimal trees using brute force. The complexity of the search
+will be of course enormous, but as we already promised, we will need the optimal
+trees only for very small subgraphs.
+
+\lemma
+Optimal MST decision trees for all graphs on at most~$k$ vertices can be
+constructed on the Pointer machine in time $\O(2^{2^{k^3}})$.
+
+\proof
+There are $2^{k\choose 2} \le 2^{k^2}$ undirected graphs on~$k$ vertices. Any
+graph on less than~$k$ vertices can be expanded to $k$~vertices by adding isolated
+vertices, which obviously do not affect the optimal decision tree.
+
+For every graph, we will try all possible trees of depth at most $4/3\cdot k^2$
+(we know from the previous lemma that the optimal tree is not deeper). Each such
+tree has at most $2^{2k^2}$ internal vertices and at most $2^{2k^2}$ leaves.
+All internal vertices are labeled with comparisons and there are less than $k^4$
+ways how to select the edges to be compared. Each leaf is labeled with one of the
+at most $2^{k^2}$ spanning trees. There are therefore at most $(2^{2k^2})^{2^{2k^2}}
+\le 2^{2k^2\cdot 2^{2k^2}} \le 2^{2^{k^3}}$ candidate trees and we will enumerate them
+in order of increasing depth.
+
+\FIXME{Continue with verifying correctness.}
+
+\qed
+
+
+
+
 %--------------------------------------------------------------------------------
 
 \section{An optimal algorithm}
-- 
2.39.5