boofun.analysis.decision_trees

Decision tree analysis for Boolean functions.

This module implements algorithms for analyzing decision tree complexity, including dynamic programming approaches for computing optimal trees and game-theoretic analysis for randomized complexity.

The algorithms in this module are based on Avishay Tal’s PhD-era library, with modernized implementations, type hints, and documentation.

Key concepts: - D(f): Deterministic decision tree depth (worst-case) - D_avg(f): Average-case decision tree depth under uniform distribution - D_μ(f): Average-case under arbitrary distribution μ - R(f): Randomized decision tree complexity (via minimax theorem)

The core insight is that decision trees correspond to subcubes of {0,1}^n, and we can use dynamic programming over the “cube lattice” where each node represents a partial assignment (some variables fixed, others free).

References: - Tal’s PhD library (BooleanFunc.py) - Buhrman & de Wolf, “Complexity Measures and Decision Tree Complexity” (2002) - O’Donnell, “Analysis of Boolean Functions” (2014), Chapter 1

Functions

compute_randomized_complexity(f[, output_value])

Compute the randomized decision tree complexity R(f).

count_decision_trees(f)

Count the number of distinct decision trees (without enumeration).

decision_tree_depth_dp(f)

Compute optimal decision tree depth using dynamic programming.

decision_tree_depth_uniform_dp(f[, weights])

Compute optimal average-case decision tree depth under uniform distribution.

decision_tree_depth_weighted_dp(f, probabilities)

Compute optimal decision tree depth under arbitrary input distribution.

decision_tree_size_dp(f[, optimize_size_first])

Compute decision tree optimizing for size (number of leaves).

enumerate_decision_trees(f[, prune_dominated])

Enumerate all valid decision trees for a Boolean function.

randomized_complexity_matrix(f[, output_value])

Build the game matrix for randomized decision tree complexity.

reconstruct_tree(back_ptr, index, n_vars, ...)

Reconstruct a decision tree from back pointers.

tree_depth(tree)

Compute the depth of a decision tree.

tree_size(tree)

Compute the size (number of leaves) of a decision tree.

Classes

DecisionTree([var, left, right, value])

Representation of a decision tree for a Boolean function.

boofun.analysis.decision_trees.decision_tree_depth_dp(f: BooleanFunction) int[source]

Compute optimal decision tree depth using dynamic programming.

This implements Tal’s calc_decision_tree_DP algorithm which uses a clever representation of subcubes. Each subcube is encoded using a ternary representation where each variable can be: - 0: fixed to 0 - 1: fixed to 1 - 2: free (not yet queried)

The DP computes the optimal depth for each subcube bottom-up, from fully-specified inputs to the full hypercube.

Time complexity: O(3^n) where n is number of variables. Space complexity: O(3^n)

Parameters:

f – BooleanFunction to analyze

Returns:

Optimal decision tree depth D(f)

boofun.analysis.decision_trees.decision_tree_depth_uniform_dp(f: BooleanFunction, weights: List[float] | None = None) Tuple[float, DecisionTree | None][source]

Compute optimal average-case decision tree depth under uniform distribution.

This implements Tal’s calc_decision_tree_DP_uniform algorithm with NumPy optimizations for better performance on larger inputs.

Unlike worst-case depth, this minimizes the expected number of queries when inputs are drawn uniformly at random.

Parameters:
  • f – BooleanFunction to analyze

  • weights – Optional per-variable query costs (default: all 1s)

Returns:

Tuple of (average depth, optimal tree if reconstructible)

boofun.analysis.decision_trees.decision_tree_depth_weighted_dp(f: BooleanFunction, probabilities: List[float], weights: List[float] | None = None) Tuple[float, DecisionTree | None][source]

Compute optimal decision tree depth under arbitrary input distribution.

This implements Tal’s calc_decision_tree_DP_with_prob algorithm which handles non-uniform input distributions. This is useful for analyzing decision tree performance when some inputs are more likely than others.

Parameters:
  • f – BooleanFunction to analyze

  • probabilities – Probability of each input (length 2^n)

  • weights – Optional per-variable query costs

Returns:

Tuple of (expected depth, optimal tree)

boofun.analysis.decision_trees.decision_tree_size_dp(f: BooleanFunction, optimize_size_first: bool = True) Tuple[int, int, DecisionTree | None][source]

Compute decision tree optimizing for size (number of leaves).

This implements Tal’s calc_decision_tree_size_DP which finds trees that minimize the number of leaves, with depth as a tiebreaker.

Smaller trees are often preferred for interpretability and can sometimes be converted to smaller Boolean formulas.

Parameters:
  • f – BooleanFunction to analyze

  • optimize_size_first – If True, minimize size then depth. If False, minimize depth then size.

Returns:

Tuple of (size, depth, optimal tree)

boofun.analysis.decision_trees.enumerate_decision_trees(f: BooleanFunction, prune_dominated: bool = True) List[DecisionTree][source]

Enumerate all valid decision trees for a Boolean function.

This implements Tal’s all_decision_trees algorithm with optional pruning of dominated trees (trees that are strictly worse than another tree on all inputs).

Warning: The number of trees can be exponential in n! Only use for small functions (n ≤ 6 recommended).

Parameters:
  • f – BooleanFunction to analyze

  • prune_dominated – If True, remove dominated trees

Returns:

List of decision trees

boofun.analysis.decision_trees.count_decision_trees(f: BooleanFunction) int[source]

Count the number of distinct decision trees (without enumeration).

This is faster than enumerate_decision_trees when you only need the count.

Parameters:

f – BooleanFunction to analyze

Returns:

Number of distinct decision trees

boofun.analysis.decision_trees.randomized_complexity_matrix(f: BooleanFunction, output_value: int | None = None) np.ndarray[source]

Build the game matrix for randomized decision tree complexity.

The rows correspond to inputs, columns to decision trees. Entry (i, j) is the number of queries tree j makes on input i.

By the minimax theorem, the randomized complexity R(f) equals the value of this matrix game.

Parameters:
  • f – BooleanFunction to analyze

  • output_value – If specified, only include inputs where f(x) = output_value

Returns:

NumPy matrix where M[i,j] = depth of tree j on input i

boofun.analysis.decision_trees.compute_randomized_complexity(f: BooleanFunction, output_value: int | None = None) float[source]

Compute the randomized decision tree complexity R(f).

This solves the minimax game between: - The algorithm (chooses a distribution over decision trees) - The adversary (chooses an input)

R(f) = min over tree distributions, max over inputs, E[queries]

By von Neumann’s minimax theorem, this equals: R(f) = max over input distributions, min over trees, E[queries]

Requires scipy for linear programming.

Parameters:
  • f – BooleanFunction to analyze

  • output_value – If specified, compute R_b(f) for inputs with f(x)=b

Returns:

Randomized query complexity

class boofun.analysis.decision_trees.DecisionTree(var: int = -1, left: DecisionTree | None = None, right: DecisionTree | None = None, value: int | None = None)[source]

Representation of a decision tree for a Boolean function.

A decision tree is either: - A leaf with a constant output (0 or 1) - An internal node that queries variable var and branches to

left (var=0) or right (var=1) subtrees

var

Variable index to query (-1 for leaf nodes)

Type:

int

left

Left subtree (when var=0), or None for leaves

Type:

boofun.analysis.decision_trees.DecisionTree | None

right

Right subtree (when var=1), or None for leaves

Type:

boofun.analysis.decision_trees.DecisionTree | None

value

Output value for leaf nodes (0 or 1), None for internal nodes

Type:

int | None

var: int = -1
left: DecisionTree | None = None
right: DecisionTree | None = None
value: int | None = None
is_leaf() bool[source]

Return True if this is a leaf node.

depth() int[source]

Compute the depth (longest root-to-leaf path) of this tree.

size() int[source]

Compute the size (number of leaves) of this tree.

evaluate(x: int, n_vars: int) int[source]

Evaluate the decision tree on input x.

Parameters:
  • x – Input as integer (bit i = value of variable i)

  • n_vars – Number of variables

Returns:

Output value (0 or 1)

query_depth(x: int, n_vars: int) int[source]

Return the number of queries made to evaluate x.

Parameters:
  • x – Input as integer

  • n_vars – Number of variables

Returns:

Number of queries (depth of path to leaf)

to_dict() Dict[str, Any][source]

Convert tree to dictionary representation.

__init__(var: int = -1, left: DecisionTree | None = None, right: DecisionTree | None = None, value: int | None = None) None
boofun.analysis.decision_trees.reconstruct_tree(back_ptr: List[int], index: int, n_vars: int, truth_table: List[int]) DecisionTree | None[source]

Reconstruct a decision tree from back pointers.

Parameters:
  • back_ptr – Array of back pointers from DP

  • index – Current index in the cube representation

  • n_vars – Number of variables

  • truth_table – Original truth table

Returns:

Reconstructed DecisionTree

boofun.analysis.decision_trees.tree_depth(tree: DecisionTree | List | Tuple) int[source]

Compute the depth of a decision tree.

Handles both DecisionTree objects and list/tuple representations from Tal’s original code: [var, left_subtree, right_subtree] or [].

Parameters:

tree – Decision tree in any supported format

Returns:

Tree depth (longest root-to-leaf path)

boofun.analysis.decision_trees.tree_size(tree: DecisionTree | List | Tuple) int[source]

Compute the size (number of leaves) of a decision tree.

Parameters:

tree – Decision tree in any supported format

Returns:

Number of leaves