boofun.analysis.decision_trees

Decision tree analysis for Boolean functions.

This module implements algorithms for analyzing decision tree complexity, including dynamic programming approaches for computing optimal trees and game-theoretic analysis for randomized complexity.

The algorithms in this module are based on Avishay Tal’s PhD-era library, with modernized implementations, type hints, and documentation.

Key concepts: - D(f): Deterministic decision tree depth (worst-case) - D_avg(f): Average-case decision tree depth under uniform distribution - D_μ(f): Average-case under arbitrary distribution μ - R(f): Randomized decision tree complexity (via minimax theorem)

The core insight is that decision trees correspond to subcubes of {0,1}^n, and we can use dynamic programming over the “cube lattice” where each node represents a partial assignment (some variables fixed, others free).

References: - Tal’s PhD library (BooleanFunc.py) - Buhrman & de Wolf, “Complexity Measures and Decision Tree Complexity” (2002) - O’Donnell, “Analysis of Boolean Functions” (2014), Chapter 1

Functions

`compute_randomized_complexity`(f[, output_value])	Compute the randomized decision tree complexity R(f).
`count_decision_trees`(f)	Count the number of distinct decision trees (without enumeration).
`decision_tree_depth_dp`(f)	Compute optimal decision tree depth using dynamic programming.
`decision_tree_depth_uniform_dp`(f[, weights])	Compute optimal average-case decision tree depth under uniform distribution.
`decision_tree_depth_weighted_dp`(f, probabilities)	Compute optimal decision tree depth under arbitrary input distribution.
`decision_tree_size_dp`(f[, optimize_size_first])	Compute decision tree optimizing for size (number of leaves).
`enumerate_decision_trees`(f[, prune_dominated])	Enumerate all valid decision trees for a Boolean function.
`randomized_complexity_matrix`(f[, output_value])	Build the game matrix for randomized decision tree complexity.
`reconstruct_tree`(back_ptr, index, n_vars, ...)	Reconstruct a decision tree from back pointers.
`tree_depth`(tree)	Compute the depth of a decision tree.
`tree_size`(tree)	Compute the size (number of leaves) of a decision tree.

Classes

DecisionTree([var, left, right, value])

Representation of a decision tree for a Boolean function.

boofun.analysis.decision_trees.decision_tree_depth_dp(f: BooleanFunction) → int[source]

Compute optimal decision tree depth using dynamic programming.

This implements Tal’s calc_decision_tree_DP algorithm which uses a clever representation of subcubes. Each subcube is encoded using a ternary representation where each variable can be: - 0: fixed to 0 - 1: fixed to 1 - 2: free (not yet queried)

The DP computes the optimal depth for each subcube bottom-up, from fully-specified inputs to the full hypercube.

Time complexity: O(3^n) where n is number of variables. Space complexity: O(3^n)

Parameters:: f – BooleanFunction to analyze
Returns:: Optimal decision tree depth D(f)

boofun.analysis.decision_trees.decision_tree_depth_uniform_dp(f: BooleanFunction, weights: List[float] | None = None) → Tuple[float, DecisionTree | None][source]

Compute optimal average-case decision tree depth under uniform distribution.

This implements Tal’s calc_decision_tree_DP_uniform algorithm with NumPy optimizations for better performance on larger inputs.

Unlike worst-case depth, this minimizes the expected number of queries when inputs are drawn uniformly at random.

Parameters:

f – BooleanFunction to analyze
weights – Optional per-variable query costs (default: all 1s)

Returns:

Tuple of (average depth, optimal tree if reconstructible)

boofun.analysis.decision_trees.decision_tree_depth_weighted_dp(f: BooleanFunction, probabilities: List[float], weights: List[float] | None = None) → Tuple[float, DecisionTree | None][source]

Compute optimal decision tree depth under arbitrary input distribution.

This implements Tal’s calc_decision_tree_DP_with_prob algorithm which handles non-uniform input distributions. This is useful for analyzing decision tree performance when some inputs are more likely than others.

Parameters:

f – BooleanFunction to analyze
probabilities – Probability of each input (length 2^n)
weights – Optional per-variable query costs

Returns:

Tuple of (expected depth, optimal tree)

boofun.analysis.decision_trees.decision_tree_size_dp(f: BooleanFunction, optimize_size_first: bool = True) → Tuple[int, int, DecisionTree | None][source]

Compute decision tree optimizing for size (number of leaves).

This implements Tal’s calc_decision_tree_size_DP which finds trees that minimize the number of leaves, with depth as a tiebreaker.

Smaller trees are often preferred for interpretability and can sometimes be converted to smaller Boolean formulas.

Parameters:

f – BooleanFunction to analyze
optimize_size_first – If True, minimize size then depth. If False, minimize depth then size.

Returns:

Tuple of (size, depth, optimal tree)

boofun.analysis.decision_trees.enumerate_decision_trees(f: BooleanFunction, prune_dominated: bool = True) → List[DecisionTree][source]

Enumerate all valid decision trees for a Boolean function.

This implements Tal’s all_decision_trees algorithm with optional pruning of dominated trees (trees that are strictly worse than another tree on all inputs).

Warning: The number of trees can be exponential in n! Only use for small functions (n ≤ 6 recommended).

Parameters:

f – BooleanFunction to analyze
prune_dominated – If True, remove dominated trees

Returns:

List of decision trees

boofun.analysis.decision_trees.count_decision_trees(f: BooleanFunction) → int[source]

Count the number of distinct decision trees (without enumeration).

This is faster than enumerate_decision_trees when you only need the count.

Parameters:: f – BooleanFunction to analyze
Returns:: Number of distinct decision trees

boofun.analysis.decision_trees.randomized_complexity_matrix(f: BooleanFunction, output_value: int | None = None) → np.ndarray[source]

Build the game matrix for randomized decision tree complexity.

The rows correspond to inputs, columns to decision trees. Entry (i, j) is the number of queries tree j makes on input i.

By the minimax theorem, the randomized complexity R(f) equals the value of this matrix game.

Parameters:

f – BooleanFunction to analyze
output_value – If specified, only include inputs where f(x) = output_value

Returns:

NumPy matrix where M[i,j] = depth of tree j on input i

boofun.analysis.decision_trees.compute_randomized_complexity(f: BooleanFunction, output_value: int | None = None) → float[source]

Compute the randomized decision tree complexity R(f).

This solves the minimax game between: - The algorithm (chooses a distribution over decision trees) - The adversary (chooses an input)

R(f) = min over tree distributions, max over inputs, E[queries]

By von Neumann’s minimax theorem, this equals: R(f) = max over input distributions, min over trees, E[queries]

Requires scipy for linear programming.

Parameters:

f – BooleanFunction to analyze
output_value – If specified, compute R_b(f) for inputs with f(x)=b

Returns:

Randomized query complexity

class boofun.analysis.decision_trees.DecisionTree(var: int = -1, left: DecisionTree | None = None, right: DecisionTree | None = None, value: int | None = None)[source]

Representation of a decision tree for a Boolean function.

A decision tree is either: - A leaf with a constant output (0 or 1) - An internal node that queries variable var and branches to

left (var=0) or right (var=1) subtrees

var

Variable index to query (-1 for leaf nodes)

Type:: int

left

Left subtree (when var=0), or None for leaves

Type:: boofun.analysis.decision_trees.DecisionTree | None

right

Right subtree (when var=1), or None for leaves

Type:: boofun.analysis.decision_trees.DecisionTree | None

value

Output value for leaf nodes (0 or 1), None for internal nodes

Type:: int | None

var: int = -1

left: DecisionTree | None = None

right: DecisionTree | None = None

value: int | None = None

is_leaf() → bool[source]: Return True if this is a leaf node.

depth() → int[source]: Compute the depth (longest root-to-leaf path) of this tree.

size() → int[source]: Compute the size (number of leaves) of this tree.

evaluate(x: int, n_vars: int) → int[source]

Evaluate the decision tree on input x.

Parameters:

x – Input as integer (bit i = value of variable i)
n_vars – Number of variables

Returns:

Output value (0 or 1)

query_depth(x: int, n_vars: int) → int[source]

Return the number of queries made to evaluate x.

Parameters:

x – Input as integer
n_vars – Number of variables

Returns:

Number of queries (depth of path to leaf)

to_dict() → Dict[str, Any][source]: Convert tree to dictionary representation.

__init__(var: int = -1, left: DecisionTree | None = None, right: DecisionTree | None = None, value: int | None = None) → None

boofun.analysis.decision_trees.reconstruct_tree(back_ptr: List[int], index: int, n_vars: int, truth_table: List[int]) → DecisionTree | None[source]

Reconstruct a decision tree from back pointers.

Parameters:

back_ptr – Array of back pointers from DP
index – Current index in the cube representation
n_vars – Number of variables
truth_table – Original truth table

Returns:

Reconstructed DecisionTree

boofun.analysis.decision_trees.tree_depth(tree: DecisionTree | List | Tuple) → int[source]

Compute the depth of a decision tree.

Handles both DecisionTree objects and list/tuple representations from Tal’s original code: [var, left_subtree, right_subtree] or [].

Parameters:: tree – Decision tree in any supported format
Returns:: Tree depth (longest root-to-leaf path)

boofun.analysis.decision_trees.tree_size(tree: DecisionTree | List | Tuple) → int[source]

Compute the size (number of leaves) of a decision tree.

Parameters:: tree – Decision tree in any supported format
Returns:: Number of leaves