Performance Guide

This document describes BooFun’s performance characteristics, optimization strategies, and benchmarks.

Quick Summary

Operation

Complexity

n=10

n=14

n=18

n=20

Truth table creation

O(2^n)

<1ms

~10ms

~200ms

~1s

Walsh-Hadamard Transform

O(n·2^n)

<1ms

~5ms

~100ms

~500ms

Influence computation

O(2^n)

<1ms

~5ms

~80ms

~350ms

Property testing

O(queries)

<1ms

<1ms

<5ms

<10ms

Optimization Tiers

BooFun uses multiple optimization strategies, automatically selecting the best available:

Tier 1: NumPy Vectorization (Default)

  • Always available

  • 10-100x faster than pure Python

  • Used for all array operations

Tier 3: GPU Acceleration (Optional)

  • Requires: pip install cupy-cuda11x (match your CUDA version)

  • 10-100x faster for large n (n > 16)

  • Used for: WHT, batch operations

Memory Optimization

Truth Table Representations

Format

Memory (n=20)

Access Time

Best For

numpy bool

1 MB

O(1)

n ≤ 14

packed bitarray

128 KB

O(1)

14 < n ≤ 20

sparse

~k·12 bytes

O(1)

High sparsity

Auto-Selection

from boofun.core.auto_representation import recommend_representation

# Get recommendation for your use case
rec = recommend_representation(n_vars=18, sparsity=0.1)
print(rec)
# {'representation': 'sparse_truth_table',
#  'reason': 'Sparsity 10.0% < 30%'}

Using Packed Truth Tables

from boofun.core.representations.packed_truth_table import create_packed_truth_table

# Convert existing truth table
packed = create_packed_truth_table(truth_table)

# Memory savings
from boofun.core.representations.packed_truth_table import memory_comparison
print(memory_comparison(20))
# packed_bitarray: 131,072 bytes (128.0 KB)
# numpy_bool: 1,048,576 bytes (1024.0 KB)
# savings: 8x

Parallelization

Batch Operations

from boofun.core.optimizations import parallel_batch_influences, parallel_batch_fourier

# Compute influences for many functions at once
functions = [bf.random(n=10) for _ in range(100)]
all_influences = parallel_batch_influences(functions)

# Fourier coefficients in parallel
all_fourier = parallel_batch_fourier(functions)

Numba Parallel Loops

Numba functions automatically use all CPU cores:

# These are JIT-compiled with Numba:
# - _vectorized_influences_numba
# - _total_influence_numba
# - _fast_wht_numba

# Check if Numba is being used:
from boofun.core.optimizations import HAS_NUMBA, INFLUENCES_BACKEND
print(f"Numba available: {HAS_NUMBA}")
print(f"Influences backend: {INFLUENCES_BACKEND}")

Caching and Memoization

Global Compute Cache

from boofun.core.optimizations import get_global_cache

cache = get_global_cache()

# Check cache statistics
print(cache.stats())
# {'size': 42, 'max_size': 500, 'hits': 156, 'misses': 42, 'hit_rate': 0.79}

# Clear cache if needed
cache.clear()

Instance-Level Caching

BooleanFunction instances cache:

  • Fourier coefficients (_fourier_cache)

  • Influences (_influences_cache)

  • Decision tree depth (_dt_cache)

f = bf.majority(5)

# First call computes
_ = f.fourier()  # ~1ms

# Second call returns cached
_ = f.fourier()  # ~0.001ms

Benchmarks

Walsh-Hadamard Transform

n=10:  NumPy: 0.5ms, Numba: 0.2ms, GPU: 0.1ms
n=14:  NumPy: 8ms,   Numba: 3ms,   GPU: 0.5ms
n=18:  NumPy: 150ms, Numba: 50ms,  GPU: 5ms
n=20:  NumPy: 700ms, Numba: 200ms, GPU: 15ms

Influence Computation

n=10:  NumPy: 0.3ms, Numba: 0.1ms
n=14:  NumPy: 5ms,   Numba: 1ms
n=18:  NumPy: 90ms,  Numba: 20ms
n=20:  NumPy: 400ms, Numba: 80ms

Property Testing (1000 queries)

BLR linearity:  ~2ms (independent of n)
Junta test:     ~5ms (for k-junta)
Monotonicity:   ~3ms

Best Practices

1. Install Numba

pip install numba

This alone provides 2-10x speedup for most operations.

2. Use Appropriate Representations

# For n > 14, consider sparse or packed
from boofun.core.auto_representation import AdaptiveFunction

# Automatically chooses best format
f = AdaptiveFunction(truth_table, n_vars=18)
print(f.format)  # 'packed' or 'sparse'

3. Batch Operations

# Bad: sequential
results = [f.influences() for f in functions]

# Good: parallel
from boofun.core.optimizations import parallel_batch_influences
results = parallel_batch_influences(functions)

4. Reuse Functions

# Bad: recreate function each time
for _ in range(100):
    f = bf.AND(10)
    print(f.total_influence())

# Good: reuse function
f = bf.AND(10)
for _ in range(100):
    print(f.total_influence())  # Uses cached values

5. Profile Your Code

from boofun.core.optimizations import WHT_BACKEND, INFLUENCES_BACKEND

print(f"WHT backend: {WHT_BACKEND}")
print(f"Influences backend: {INFLUENCES_BACKEND}")

# Time specific operations
import time
f = bf.random(n=16)

start = time.time()
_ = f.fourier()
print(f"WHT: {time.time() - start:.3f}s")

start = time.time()
_ = f.influences()
print(f"Influences: {time.time() - start:.3f}s")

Running Benchmarks

# Run all benchmarks
pytest tests/benchmarks/ -v --benchmark-only

# Specific benchmark
pytest tests/benchmarks/test_external_benchmarks.py -v

# With comparison
pytest tests/benchmarks/ --benchmark-compare

Docker Performance

The Docker images include Numba by default:

# Run benchmarks in Docker
docker-compose run benchmark

Scaling Guidelines

n

Recommended Approach

≤ 14

Dense truth table, Numba

15-18

Packed/sparse, Numba, consider GPU

19-22

GPU required, sparse representation

> 22

Consider sampling/approximation algorithms

Future Optimizations

  • Distributed computation with Dask

  • Further GPU kernel optimization

  • Memory-mapped truth tables for very large n