Architecture

BooFun is a modular Boolean function analysis library designed around lazy evaluation, pluggable representations, and mathematical rigor. This document describes the high-level architecture and key design decisions.

Architecture Diagram

BooFun Architecture

Mermaid source (for GitHub rendering)

flowchart TB
    subgraph API["Public API"]
        create["bf.create()"]
        builtins["bf.majority()<br/>bf.parity()<br/>bf.tribes()<br/>bf.f2_polynomial()"]
        io["bf.load()<br/>bf.save()"]
    end

    subgraph Core["Core Module"]
        BF["BooleanFunction<br/>• fourier(), influences()<br/>• is_global(α)<br/>• __eq__, __hash__"]
        Factory["BooleanFunctionFactory"]
        Space["Space + Measure"]
        ErrorModel["ErrorModel"]
    end

    subgraph Representations["Representations (Strategy Pattern)"]
        Registry["Registry<br/>(STRATEGY_REGISTRY)"]

        subgraph RepTypes["Representation Types"]
            TT["TruthTable<br/>(dense / packed / sparse / adaptive)"]
            Fourier["FourierExpansion"]
            ANF["AlgebraicNormalForm"]
            Poly["Polynomial<br/>(real multilinear)"]
            BDD["BinaryDecisionDiagram"]
            DNF["DNF / CNF"]
            Circuit["Circuit"]
            Symbolic["Symbolic"]
            LTF["LinearThreshold"]
            Dist["Distribution"]
        end
    end

    subgraph Conversion["Conversion System"]
        ConvGraph["ConversionGraph"]
        ConvCost["ConversionCost"]
        PathFind["Two-Level Hub Dispatch<br/>(via truth_table)"]
    end

    subgraph Analysis["Analysis Module (28 modules)"]
        Spectral["SpectralAnalyzer<br/>• fourier(), influences()<br/>• noise_stability()"]
        PropTest["PropertyTester<br/>• BLR linearity<br/>• monotonicity, junta"]
        Complexity["Complexity<br/>• D(f), s(f), bs(f)<br/>• certificates, Ambainis"]
        PBiased["P-Biased Analysis<br/>• p_biased_expectation<br/>• threshold_curve<br/>• Fourier tails L_{1,k}"]
        Hyper["Hypercontractivity<br/>• KKL, Bonami, Friedgut<br/>• Global (Keevash et al.)"]
        Invariance["Invariance Principle<br/>• Gaussian analysis<br/>• Berry-Esseen<br/>• Majority is Stablest"]
        Crypto["Cryptographic<br/>• nonlinearity, bent<br/>• LAT/DDT, SAC"]
        Learning["Learning<br/>• Goldreich-Levin<br/>• PAC, junta learning"]
        Sampling["Sampling<br/>• Monte Carlo estimation<br/>• Adaptive sampling<br/>• RandomVariableView"]
    end

    subgraph Families["Built-in Families"]
        Majority["majority(n)"]
        Parity["parity(n)"]
        Tribes["tribes(k,n)"]
        Dictator["dictator(n,i)"]
        Threshold["threshold(n,k)"]
        F2Poly["f2_polynomial(n,monomials)"]
    end

    subgraph Viz["Visualization"]
        Plotter["BooleanFunctionVisualizer"]
        DecTree["Decision Tree Export"]
        Growth["GrowthVisualizer"]
    end

    subgraph Utils["Utilities"]
        Exceptions["Exception Hierarchy"]
        Math["Math / Number Theory"]
    end

    %% Connections
    create --> Factory
    builtins --> Factory
    io --> BF

    Factory --> BF
    Factory --> Registry

    BF --> Registry
    BF --> ConvGraph
    BF --> Space
    BF --> ErrorModel

    Registry --> RepTypes

    ConvGraph --> ConvCost
    ConvGraph --> PathFind
    ConvGraph --> Registry

    BF --> Spectral
    BF --> PropTest
    BF --> Complexity
    BF --> PBiased
    BF --> Hyper
    BF --> Invariance
    BF --> Crypto
    BF --> Learning
    BF --> Sampling

    Families --> Factory

    BF --> Plotter
    BF --> Growth

    Analysis --> Math
    Core --> Exceptions

Component Overview

Public API (`api.py`, `init.py`)

The public API provides a unified entry point for users:

import boofun as bf

# All roads lead through bf.create()
f = bf.create([0, 1, 1, 0])           # Truth table
f = bf.create(lambda x: x[0] ^ x[1], n=2)  # Callable
f = bf.create("x0 & x1", n=2)         # Symbolic
f = bf.majority(5)                     # Built-in

Design decision: A single create() function with type detection simplifies the API. Storage hints (storage='packed', 'sparse', 'lazy') allow users to optimize without understanding internals.

Core Module (`core/`)

BooleanFunction (`base.py`)

The central class representing a Boolean function. Key features:

Lazy evaluation: Representations are computed on-demand
Multiple representations: A function can have truth table, Fourier, ANF simultaneously
Callable interface: f([1,0,1]) or f(5) both work
Operator overloading: f & g, f | g, ~f, f ^ g

class BooleanFunction:
    def __init__(self, space, error_model, ...):
        self.representations: Dict[str, Any] = {}  # Lazy cache
        self.n_vars: int
        self.space: Space

Factory (`factory.py`)

Handles creation from various input types:

Auto-detects input type (list, callable, string, file path)
Delegates to appropriate from_* methods
Validates inputs and sets n_vars

Space and Measure (`spaces.py`)

Defines the mathematical domain/codomain and probability measures:

class Space(Enum):
    BOOLEAN_CUBE = "boolean_cube"      # {0,1}^n → {0,1}
    PLUS_MINUS_CUBE = "plus_minus_cube" # {-1,+1}^n → {-1,+1}
    REAL = "real"                       # For Fourier coefficients

class Measure:
    """Probability measure on the hypercube."""
    uniform = Measure.uniform()         # p = 0.5
    biased = Measure.p_biased(p=0.3)    # μ_p with p = 0.3

Representations (`core/representations/`)

Uses the Strategy Pattern for pluggable representations.

Registry (`registry.py`)

STRATEGY_REGISTRY: Dict[str, Type[BooleanFunctionRepresentation]] = {}

@register_strategy("truth_table")
class TruthTableRepresentation(BooleanFunctionRepresentation):
    def evaluate(self, inputs, data, space, n_vars): ...
    def convert_to(self, target, data, space, n_vars): ...

Design decision: New representations can be added without modifying core code. Each representation handles its own evaluation and conversion logic.

Available Representations

Representation	Best For	Space Complexity
`truth_table`	Small n (≤14), fast lookup	O(2^n)
`packed_truth_table`	Medium n (14-20)	O(2^n / 64)
`sparse_truth_table`	Skewed functions	O(k) where k = #exceptions
`adaptive_truth_table`	Auto-selecting dense/sparse	O(min(2^n, k))
`fourier_expansion`	Spectral analysis	O(2^n)
`anf`	Algebraic analysis (GF(2))	O(2^n)
`polynomial`	Real multilinear polynomials	O(terms)
`symbolic`	Human-readable, n=any	O(expression)
`bdd`	Structure exploitation	O(varies)
`dnf` / `cnf`	Logic optimization	O(clauses)
`circuit`	Gate-level evaluation	O(gates)
`ltf`	Linear threshold functions	O(n) weights + threshold
`distribution`	Probabilistic analysis	O(2^n)
`function`	Oracles, lazy eval	O(1)

Conversion Graph (`conversion_graph.py`)

Uses two-level hub dispatch through truth_table as the universal hub (replaced Dijkstra pathfinding in v1.3.0).

class ConversionGraph:
    def find_optimal_path(self, source, target, n_vars=None):
        # 1. source == target → no conversion
        # 2. Direct edge exists → use it
        # 3. Otherwise → source → truth_table → target

Design decision: Every representation can convert to/from truth_table, so routing through the hub is simpler and more predictable than general shortest-path search, with no loss of functionality. Conversion costs still consider time, space, and accuracy loss.

Analysis Module (`analysis/`)

Implements algorithms from O’Donnell’s “Analysis of Boolean Functions”:

Submodule	O’Donnell Ch.	Key Functions
`fourier.py`	1-2	WHT, influences, spectral weight, Fourier tails \(L_{1,k}\)
`sensitivity.py`	2, 4	Sensitivity, max/min, moments, histogram
`block_sensitivity.py`	4	Block sensitivity bs(f), everywhere sensitivity
`certificates.py`	4	Certificate complexity C(f), C_0, C_1
`complexity.py`	4	D(f), s(f), bs(f), C(f) combined
`query_complexity.py`	-	D, R, Q, Ambainis, spectral adversary, polynomial method
`decision_trees.py`	4	DP optimal depth, tree enumeration, randomized complexity
`huang.py`	-	Huang’s sensitivity theorem verification
`hypercontractivity.py`	9-10	KKL, Bonami, Friedgut, noise operator
`global_hypercontractivity.py`	Keevash+	α-globality, threshold curves, critical p
`p_biased.py`	8	P-biased Fourier, influences, expectation
`invariance.py`	11	Invariance distance, Majority is Stablest
`gaussian.py`	10-11	Multilinear extension, Berry-Esseen, Hermite
`sampling.py`	1-3	Monte Carlo estimation, adaptive sampling, RandomVariableView
`learning.py`	3	Goldreich-Levin, junta learning
`pac_learning.py`	3	PAC learning: low-degree, junta, sparse Fourier, monotone
`cryptographic.py`	-	Nonlinearity, bent, Walsh spectrum, LAT/DDT, SAC, SBoxAnalyzer
`arrow.py`	2	Arrow’s theorem, social choice analysis
`fkn.py`	2	FKN theorem, dictator proximity
`communication_complexity.py`	-	Deterministic CC, log-rank, fooling sets, discrepancy
`ltf_analysis.py`	5	Chow parameters, critical index, regularity, LTF fitting
`restrictions.py`	6-7	Random restrictions, switching lemma, restriction shrinkage
`symmetry.py`	1	Symmetry detection, symmetrization, symmetric profile
`sparsity.py`	3	Fourier sparsity, granularity, sparse representation
`canalization.py`	-	Canalizing variables, nested canalizing depth
`equivalence.py`	-	Function equivalence testing
`basic_properties.py`	1	Balance, monotonicity, linearity checks
`gf2.py`	-	GF(2) Fourier transform

Design decision: Analysis functions work on BooleanFunction objects and request representations as needed. Functions are stateless and composable.

Built-in Families (`families/`, `core/builtins.py`)

Standard Boolean functions used in research:

bf.majority(n)      # MAJ_n: output 1 if >n/2 inputs are 1
bf.parity(n)        # XOR_n: output XOR of all inputs
bf.tribes(k, n)     # Tribes function (k groups of n)
bf.dictator(n, i)   # Output = x_i
bf.threshold(n, k)  # Output 1 if ≥k inputs are 1
bf.AND(n), bf.OR(n) # Basic gates

Visualization (`visualization/`)

BooleanFunctionVisualizer: Influence plots, Fourier spectrum, heatmaps
decision_tree.py: Export decision trees
interactive.py: Jupyter widgets
Supports matplotlib, plotly, seaborn

Utilities (`utils/`)

Exception hierarchy: Structured errors (ConversionError, EvaluationError, etc.)
Math helpers: Bit manipulation, combinatorics
Sampling: Uniform and biased sampling for Monte Carlo methods

Key Design Decisions

1. Lazy Evaluation

Representations are computed only when needed:

f = bf.majority(5)  # Only stores callable
f.fourier()         # Computes truth_table → fourier on first call
f.fourier()         # Returns cached result

Rationale: For large n, materializing truth tables is expensive. Lazy evaluation allows working with oracles.

2. LSB Bit Ordering

Input index i maps to binary [x_0, x_1, ..., x_{n-1}] where x_j = (i >> j) & 1:

Index 5 (binary 101) → [1, 0, 1] (LSB first)

Rationale: Matches standard Fourier indexing where subset S corresponds to bits set in index.

3. Plus-Minus vs Boolean Space

Fourier analysis uses {-1, +1} internally (plus_minus_cube) but accepts {0, 1} inputs:

f = bf.create([0, 1, 1, 0])  # {0,1} input
f.fourier()  # Internally converts: 0→+1, 1→-1

Rationale: Fourier theory is cleaner in ±1 space (characters are χ_S(x) = ∏_{i∈S} x_i).

4. Error Models

Support for uncertainty quantification:

f = bf.create(data, error_model=bf.PACErrorModel(epsilon=0.1))

Rationale: Enables PAC learning and property testing with rigorous error bounds.

5. Extensibility via Registry

New representations can be added without modifying core:

@register_strategy("my_custom_rep")
class MyRepresentation(BooleanFunctionRepresentation):
    ...

Rationale: Research often requires custom representations (e.g., for specific function families).

Data Flow Example

User: bf.create(lambda x: x[0] ^ x[1], n=2)
  ↓
Factory.from_callable()
  → stores callable in representations["function"]
  → returns BooleanFunction

User: f.fourier()
  ↓
BooleanFunction.get_representation("fourier_expansion")
  → not in cache
  → ConversionGraph.find_optimal_path("function", "fourier_expansion")
  → routes through hub: function → truth_table → fourier_expansion
  → executes conversions, caches results
  → returns Fourier coefficients

Module Dependencies

api.py
  └── core/
       ├── base.py (BooleanFunction)
       ├── factory.py
       ├── builtins.py
       ├── conversion_graph.py (hub-and-spoke dispatch)
       ├── spaces.py (Space, Measure)
       ├── io.py (load/save)
       ├── adapters.py (adapt_callable for large-n oracles)
       ├── gpu.py (CuPy acceleration)
       ├── optimizations.py (NumPy/Numba/pyfwht backend selection)
       ├── numba_optimizations.py (JIT-compiled hot paths)
       ├── batch_processing.py (BatchProcessorManager)
       ├── auto_representation.py (recommend_representation)
       ├── errormodels.py (PACErrorModel)
       └── representations/
            ├── registry.py (STRATEGY_REGISTRY)
            ├── truth_table.py, packed_truth_table.py, sparse_truth_table.py
            ├── fourier_expansion.py, anf_form.py, polynomial.py
            ├── dnf_form.py, cnf_form.py, bdd.py, circuit.py
            ├── symbolic.py, ltf.py, distribution.py
            └── base.py (BooleanFunctionRepresentation)

analysis/
  ├── __init__.py (SpectralAnalyzer, PropertyTester)
  ├── fourier.py, sensitivity.py, block_sensitivity.py
  ├── complexity.py, query_complexity.py, certificates.py
  ├── decision_trees.py, huang.py
  ├── hypercontractivity.py, global_hypercontractivity.py
  ├── p_biased.py, sampling.py, learning.py, pac_learning.py
  ├── cryptographic.py, gaussian.py, invariance.py
  ├── arrow.py, fkn.py, communication_complexity.py
  ├── ltf_analysis.py, restrictions.py, symmetry.py
  └── sparsity.py, canalization.py, equivalence.py, gf2.py, basic_properties.py

families/
  ├── base.py (FunctionFamily, InductiveFamily)
  ├── builtins.py (MajorityFamily, ParityFamily, ...)
  ├── tracker.py (GrowthTracker)
  └── theoretical.py

visualization/
  ├── decision_tree.py, decision_tree_export.py
  ├── growth_plots.py, interactive.py
  └── animation.py, latex_export.py, widgets.py

Performance Considerations

n ≤ 14: Dense truth tables, exact algorithms
14 < n ≤ 20: Packed truth tables, still tractable
n > 20: Sampling-based methods, oracle access
n ≥ 64: Integer overflow protection (use bit-array sampling)

Numba JIT compilation and pyfwht are used for hot paths when available. See performance.md for benchmarks and optimization tiers.

Architecture

Architecture Diagram

Component Overview

Public API (api.py, __init__.py)

Core Module (core/)

BooleanFunction (base.py)

Factory (factory.py)

Space and Measure (spaces.py)

Representations (core/representations/)

Registry (registry.py)