boofun.analysis.decision_trees
Decision tree analysis for Boolean functions.
This module implements algorithms for analyzing decision tree complexity, including dynamic programming approaches for computing optimal trees and game-theoretic analysis for randomized complexity.
The algorithms in this module are based on Avishay Tal’s PhD-era library, with modernized implementations, type hints, and documentation.
Key concepts: - D(f): Deterministic decision tree depth (worst-case) - D_avg(f): Average-case decision tree depth under uniform distribution - D_μ(f): Average-case under arbitrary distribution μ - R(f): Randomized decision tree complexity (via minimax theorem)
The core insight is that decision trees correspond to subcubes of {0,1}^n, and we can use dynamic programming over the “cube lattice” where each node represents a partial assignment (some variables fixed, others free).
References: - Tal’s PhD library (BooleanFunc.py) - Buhrman & de Wolf, “Complexity Measures and Decision Tree Complexity” (2002) - O’Donnell, “Analysis of Boolean Functions” (2014), Chapter 1
Functions
|
Compute the randomized decision tree complexity R(f). |
Count the number of distinct decision trees (without enumeration). |
|
Compute optimal decision tree depth using dynamic programming. |
|
|
Compute optimal average-case decision tree depth under uniform distribution. |
|
Compute optimal decision tree depth under arbitrary input distribution. |
|
Compute decision tree optimizing for size (number of leaves). |
|
Enumerate all valid decision trees for a Boolean function. |
|
Build the game matrix for randomized decision tree complexity. |
|
Reconstruct a decision tree from back pointers. |
|
Compute the depth of a decision tree. |
|
Compute the size (number of leaves) of a decision tree. |
Classes
|
Representation of a decision tree for a Boolean function. |
- boofun.analysis.decision_trees.decision_tree_depth_dp(f: BooleanFunction) int[source]
Compute optimal decision tree depth using dynamic programming.
This implements Tal’s calc_decision_tree_DP algorithm which uses a clever representation of subcubes. Each subcube is encoded using a ternary representation where each variable can be: - 0: fixed to 0 - 1: fixed to 1 - 2: free (not yet queried)
The DP computes the optimal depth for each subcube bottom-up, from fully-specified inputs to the full hypercube.
Time complexity: O(3^n) where n is number of variables. Space complexity: O(3^n)
- Parameters:
f – BooleanFunction to analyze
- Returns:
Optimal decision tree depth D(f)
- boofun.analysis.decision_trees.decision_tree_depth_uniform_dp(f: BooleanFunction, weights: List[float] | None = None) Tuple[float, DecisionTree | None][source]
Compute optimal average-case decision tree depth under uniform distribution.
This implements Tal’s calc_decision_tree_DP_uniform algorithm with NumPy optimizations for better performance on larger inputs.
Unlike worst-case depth, this minimizes the expected number of queries when inputs are drawn uniformly at random.
- Parameters:
f – BooleanFunction to analyze
weights – Optional per-variable query costs (default: all 1s)
- Returns:
Tuple of (average depth, optimal tree if reconstructible)
- boofun.analysis.decision_trees.decision_tree_depth_weighted_dp(f: BooleanFunction, probabilities: List[float], weights: List[float] | None = None) Tuple[float, DecisionTree | None][source]
Compute optimal decision tree depth under arbitrary input distribution.
This implements Tal’s calc_decision_tree_DP_with_prob algorithm which handles non-uniform input distributions. This is useful for analyzing decision tree performance when some inputs are more likely than others.
- Parameters:
f – BooleanFunction to analyze
probabilities – Probability of each input (length 2^n)
weights – Optional per-variable query costs
- Returns:
Tuple of (expected depth, optimal tree)
- boofun.analysis.decision_trees.decision_tree_size_dp(f: BooleanFunction, optimize_size_first: bool = True) Tuple[int, int, DecisionTree | None][source]
Compute decision tree optimizing for size (number of leaves).
This implements Tal’s calc_decision_tree_size_DP which finds trees that minimize the number of leaves, with depth as a tiebreaker.
Smaller trees are often preferred for interpretability and can sometimes be converted to smaller Boolean formulas.
- Parameters:
f – BooleanFunction to analyze
optimize_size_first – If True, minimize size then depth. If False, minimize depth then size.
- Returns:
Tuple of (size, depth, optimal tree)
- boofun.analysis.decision_trees.enumerate_decision_trees(f: BooleanFunction, prune_dominated: bool = True) List[DecisionTree][source]
Enumerate all valid decision trees for a Boolean function.
This implements Tal’s all_decision_trees algorithm with optional pruning of dominated trees (trees that are strictly worse than another tree on all inputs).
Warning: The number of trees can be exponential in n! Only use for small functions (n ≤ 6 recommended).
- Parameters:
f – BooleanFunction to analyze
prune_dominated – If True, remove dominated trees
- Returns:
List of decision trees
- boofun.analysis.decision_trees.count_decision_trees(f: BooleanFunction) int[source]
Count the number of distinct decision trees (without enumeration).
This is faster than enumerate_decision_trees when you only need the count.
- Parameters:
f – BooleanFunction to analyze
- Returns:
Number of distinct decision trees
- boofun.analysis.decision_trees.randomized_complexity_matrix(f: BooleanFunction, output_value: int | None = None) np.ndarray[source]
Build the game matrix for randomized decision tree complexity.
The rows correspond to inputs, columns to decision trees. Entry (i, j) is the number of queries tree j makes on input i.
By the minimax theorem, the randomized complexity R(f) equals the value of this matrix game.
- Parameters:
f – BooleanFunction to analyze
output_value – If specified, only include inputs where f(x) = output_value
- Returns:
NumPy matrix where M[i,j] = depth of tree j on input i
- boofun.analysis.decision_trees.compute_randomized_complexity(f: BooleanFunction, output_value: int | None = None) float[source]
Compute the randomized decision tree complexity R(f).
This solves the minimax game between: - The algorithm (chooses a distribution over decision trees) - The adversary (chooses an input)
R(f) = min over tree distributions, max over inputs, E[queries]
By von Neumann’s minimax theorem, this equals: R(f) = max over input distributions, min over trees, E[queries]
Requires scipy for linear programming.
- Parameters:
f – BooleanFunction to analyze
output_value – If specified, compute R_b(f) for inputs with f(x)=b
- Returns:
Randomized query complexity
- class boofun.analysis.decision_trees.DecisionTree(var: int = -1, left: DecisionTree | None = None, right: DecisionTree | None = None, value: int | None = None)[source]
Representation of a decision tree for a Boolean function.
A decision tree is either: - A leaf with a constant output (0 or 1) - An internal node that queries variable var and branches to
left (var=0) or right (var=1) subtrees
- left
Left subtree (when var=0), or None for leaves
- Type:
- right
Right subtree (when var=1), or None for leaves
- Type:
- left: DecisionTree | None = None
- right: DecisionTree | None = None
- evaluate(x: int, n_vars: int) int[source]
Evaluate the decision tree on input x.
- Parameters:
x – Input as integer (bit i = value of variable i)
n_vars – Number of variables
- Returns:
Output value (0 or 1)
- query_depth(x: int, n_vars: int) int[source]
Return the number of queries made to evaluate x.
- Parameters:
x – Input as integer
n_vars – Number of variables
- Returns:
Number of queries (depth of path to leaf)
- __init__(var: int = -1, left: DecisionTree | None = None, right: DecisionTree | None = None, value: int | None = None) None
- boofun.analysis.decision_trees.reconstruct_tree(back_ptr: List[int], index: int, n_vars: int, truth_table: List[int]) DecisionTree | None[source]
Reconstruct a decision tree from back pointers.
- Parameters:
back_ptr – Array of back pointers from DP
index – Current index in the cube representation
n_vars – Number of variables
truth_table – Original truth table
- Returns:
Reconstructed DecisionTree
- boofun.analysis.decision_trees.tree_depth(tree: DecisionTree | List | Tuple) int[source]
Compute the depth of a decision tree.
Handles both DecisionTree objects and list/tuple representations from Tal’s original code: [var, left_subtree, right_subtree] or [].
- Parameters:
tree – Decision tree in any supported format
- Returns:
Tree depth (longest root-to-leaf path)