Test Guidelines
How to write tests that actually catch bugs.
The Convolution Bug: A Case Study
What Happened
# The bug: silently thresholded real values to Boolean
result_tt = (conv_values < 0).astype(bool) # Lost magnitude!
Why Tests Didn’t Catch It
# ORIGINAL TESTS (BAD)
def test_convolution_same_function(self):
result = convolution(f, f)
assert result is not None # Only checks existence
assert result.n_vars == 2 # Only checks structure
What Tests Should Have Done
# FIXED TESTS (GOOD)
def test_convolution_theorem_holds(self):
"""Verify (f*g)^(S) = f̂(S)·ĝ(S)."""
conv_coeffs = convolution(f, g)
expected = f.fourier() * g.fourier()
assert np.allclose(conv_coeffs, expected) # Verifies math!
Anti-Patterns to Avoid
1. Existence Tests
# BAD: Only checks something returned
assert result is not None
# GOOD: Verify the actual value
assert np.allclose(result, expected_value)
2. Structure-Only Tests
# BAD: Only checks shape/type
assert result.n_vars == 3
assert isinstance(result, BooleanFunction)
# GOOD: Also verify correctness
assert result.evaluate([0,0,0]) == expected_output
3. Missing Contract Tests
# BAD: Doesn't verify the mathematical claim
def test_parseval():
f = bf.parity(3)
result = parseval_verify(f)
assert result # What does this even mean?
# GOOD: Explicitly verify Parseval's identity
def test_parseval_identity():
f = bf.parity(3)
sum_squared = sum(c**2 for c in f.fourier())
assert np.isclose(sum_squared, 1.0) # Σ f̂(S)² = 1
Test Checklist
Before writing a test, answer:
What is the mathematical contract?
What theorem/property should hold?
Write the equation in the docstring
What are the edge cases?
Empty input, n=0, n=1
Constant functions (all 0s, all 1s)
Dictator functions (single variable)
What should fail?
Test that bad inputs raise exceptions
Verify error messages are helpful
Does this test the implementation or the math?
Implementation tests: “does it run?”
Math tests: “does it compute correctly?”
You need both, but math tests catch more bugs
Property-Based Testing with Hypothesis
Parameterized tests cover specific cases. Property-based tests cover invariants that must hold across all inputs.
BooFun uses Hypothesis for this. Key properties to test:
from hypothesis import given, strategies as st
@given(n=st.integers(min_value=1, max_value=8))
def test_parseval_identity_any_majority(n):
"""Parseval: Σ f̂(S)² = 1 for any ±1-valued f."""
if n % 2 == 0:
n += 1 # Majority needs odd n
f = bf.majority(n)
assert np.isclose(sum(f.fourier()**2), 1.0)
@given(n=st.integers(min_value=1, max_value=6))
def test_influences_are_nonnegative(n):
"""Influences are probabilities: always in [0, 1]."""
f = bf.random(n)
for inf in f.influences():
assert 0 <= inf <= 1
When to use property tests: monotonicity, symmetry, Parseval, nonnegativity, consistency between related functions (e.g., is_linear ↔ nonlinearity).
Cross-Validation Testing
The strongest tests verify library output against independent computations: closed-form formulas, other libraries, or hand-calculated values.
def test_majority_total_influence_exact(self):
"""Cross-validate against closed-form: I[Maj_n] = n * C(n-1,(n-1)/2) / 2^(n-1)."""
from math import comb
for n in [3, 5, 7, 9]:
exact = n * comb(n - 1, (n - 1) // 2) / (2 ** (n - 1))
assert np.isclose(bf.majority(n).total_influence(), exact)
Rule: For every mathematical function, there should be at least one test that computes the expected value independently of the library code being tested.
Test Template
class TestFunctionName:
"""Tests for function_name following [mathematical reference]."""
def test_mathematical_property(self):
"""Verify [specific mathematical property].
Theorem: [state the theorem being tested]
"""
# Arrange
f = bf.some_function(3)
# Act
result = function_name(f)
# Assert - verify the MATH, not just existence
expected = compute_expected_value(f)
assert np.allclose(result, expected), f"Expected {expected}, got {result}"
def test_edge_case_constant(self):
"""Edge case: constant function."""
f = bf.create([0, 0, 0, 0]) # All zeros
result = function_name(f)
assert result == expected_for_constant
def test_invalid_input_raises(self):
"""Invalid input should fail loud."""
with pytest.raises(ValueError, match="helpful error message"):
function_name(invalid_input)
Files Status
Fixed (Jan 2026)
File |
Issue |
Fix |
|---|---|---|
test_pac_learning.py |
24 weak assertions |
Added error rate verification |
test_builtins.py |
14 weak assertions |
Added truth table verification |
test_module_exploration.py |
20 weak assertions |
Added correctness checks |
test_quantum_complexity.py |
13 weak assertions |
Added mathematical property tests |
test_circuit.py |
10 weak assertions |
Added circuit evaluation tests |
test_property_tester.py |
(new) |
Comprehensive PropertyTester tests |
Remaining Low Priority
File |
Count |
Reason |
|---|---|---|
test_visualization_*.py |
70+ |
Visual output, hard to verify |
test_gpu_*.py |
21 |
Infrastructure tests |
test_benchmarks.py |
9 |
Performance tests |
Running Tests
pytest tests/ # All tests
pytest tests/ -x -q # Stop on first failure, quiet
pytest tests/unit/ # Unit tests only
pytest tests/analysis/ # Analysis module tests
pytest tests/ --cov=boofun # With coverage report
pytest tests/ -k "sensitivity" # Tests matching keyword
Test Structure
tests/
├── unit/ # Individual function tests
├── integration/ # Cross-module tests
├── analysis/ # Analysis module tests (Fourier, sensitivity, etc.)
├── families/ # Built-in function family tests
├── correctness/ # Bit-ordering, numerical precision
├── fuzz/ # Edge case fuzzing (Hypothesis)
├── benchmarks/ # Performance tests
└── test_*.py # Top-level test files
Coverage: 72% with 3200+ tests. Core analysis modules are 70-85%; visualization and quantum_complexity are lower priority.
CI: All tests run on every push via GitHub Actions. Pre-commit hooks enforce formatting (black, isort, flake8, codespell).