Learning Theory Guide

Algorithms for learning Boolean functions from queries or samples.

Overview

BooFun provides implementations of key learning algorithms from computational learning theory:

Goldreich-Levin: Find heavy Fourier coefficients with query access
PAC Learning: Probably Approximately Correct learning framework
Junta Learning: Learn functions that depend on few variables
LMN Algorithm: Learn decision trees from uniform samples
Sparse Fourier Learning: Learn functions with few Fourier coefficients

Goldreich-Levin Algorithm

The Goldreich-Levin algorithm finds all “heavy” Fourier coefficients (|f̂(S)| ≥ τ) using only O(n/τ²) queries.

from boofun.analysis.learning import goldreich_levin, find_heavy_coefficients

f = bf.majority(7)

# Find all coefficients with |f̂(S)| ≥ 0.1
heavy = goldreich_levin(f, threshold=0.1, num_samples=5000)
print(f"Found {len(heavy)} heavy coefficients")

for S, coeff in heavy:
    print(f"  f̂({S}) ≈ {coeff:.4f}")

Estimating Single Coefficients

from boofun.analysis.learning import estimate_fourier_coefficient

f = bf.parity(5)

# Estimate f̂({0,1,2,3,4}) - should be ±1 for parity
S = 0b11111  # All variables
estimate, std_err = estimate_fourier_coefficient(f, S, num_samples=10000)
print(f"Estimated f̂(S) = {estimate:.4f} ± {std_err:.4f}")

GoldreichLevinLearner Class

For more control over the learning process:

from boofun.analysis.learning import GoldreichLevinLearner

learner = GoldreichLevinLearner(threshold=0.1, confidence=0.95)

# Learn from query access
heavy_coeffs = learner.learn(f, num_samples=10000)

# Get learning statistics
print(f"Queries used: {learner.query_count}")
print(f"Coefficients found: {len(heavy_coeffs)}")

PAC Learning

The PAC (Probably Approximately Correct) framework:

Given: Sample access to f (can draw (x, f(x)) for random x)
Goal: Output hypothesis h such that Pr[h(x) != f(x)] <= epsilon
With probability: at least 1 - delta

Learning Low-Degree Functions

Functions with spectral concentration at low degrees:

from boofun.analysis.pac_learning import pac_learn_low_degree

f = bf.majority(5)

# Learn degree-2 approximation
hypothesis, error = pac_learn_low_degree(
    f,
    degree=2,
    epsilon=0.1,
    delta=0.05,
    num_samples=10000
)

print(f"Approximation error: {error:.4f}")

Learning Juntas

A k-junta depends on at most k variables:

from boofun.analysis.pac_learning import pac_learn_junta

# Create a 3-junta (depends on variables 0, 2, 4 only)
def junta_func(x):
    return x[0] ^ x[2] ^ x[4]

f = bf.create(junta_func, n=7)

# Learn the junta
hypothesis, relevant_vars = pac_learn_junta(
    f,
    k=3,
    epsilon=0.1,
    delta=0.05
)

print(f"Relevant variables: {relevant_vars}")

LMN Algorithm

Learn decision trees from uniform samples (Linial-Mansour-Nisan):

from boofun.analysis.pac_learning import lmn_algorithm

f = bf.tribes(2, 3)  # A read-once DNF

# Learn using LMN
hypothesis = lmn_algorithm(
    f,
    epsilon=0.1,
    delta=0.05
)

# Test accuracy
from boofun.analysis.pac_learning import sample_function
test_samples = sample_function(f, 1000)
correct = sum(1 for x, y in test_samples if hypothesis(x) == y)
print(f"Accuracy: {correct/1000:.1%}")

Learning Sparse Fourier Functions

Functions with few non-zero Fourier coefficients:

from boofun.analysis.pac_learning import pac_learn_sparse_fourier

f = bf.parity(4)  # Has only one non-zero coefficient

hypothesis, support = pac_learn_sparse_fourier(
    f,
    sparsity=5,
    epsilon=0.1,
    delta=0.05
)

print(f"Learned Fourier support: {support}")

Learning Monotone Functions

from boofun.analysis.pac_learning import pac_learn_monotone

f = bf.AND(4)  # Monotone function

hypothesis = pac_learn_monotone(
    f,
    epsilon=0.1,
    delta=0.05
)

PACLearner Class

General-purpose PAC learner:

from boofun.analysis.pac_learning import PACLearner

learner = PACLearner(epsilon=0.1, delta=0.05)

# Learn from samples
hypothesis = learner.learn(f, num_samples=10000)

# Evaluate
error = learner.evaluate(hypothesis, f, num_test=1000)
print(f"Test error: {error:.4f}")

# Get statistics
print(f"Samples used: {learner.sample_count}")

Sample Complexity

Function Class	Sample Complexity	Algorithm
k-juntas	O(2^k log n / ε)	Influence-based
Degree-d	O(n^d / ε²)	Low-degree learning
s-sparse Fourier	O(s log n / ε²)	Sparse recovery
Decision trees (size s)	O(s log n / ε²)	LMN
Monotone	O(n / ε²)	Monotone learning

Connections to Other Topics

Influences and Learning

High-influence variables are likely relevant:

# Find influential variables for junta learning
infs = f.influences()
top_vars = sorted(range(len(infs)), key=lambda i: infs[i], reverse=True)[:k]

Noise Stability and Learning

Noise-stable functions are easier to learn:

stability = f.noise_stability(0.99)
# High stability → few high-degree coefficients → easier to learn

Hypercontractivity and Learning

KKL theorem bounds help with junta identification:

from boofun import max_influence_bound
max_inf, kkl_bound, total = max_influence_bound(f)
# If max_inf >> kkl_bound, function might be close to a junta

Mathematical Background

PAC Model

A concept class C is PAC-learnable if there exists an algorithm A such that:

For any f ∈ C and distribution D
Given m = poly(n, 1/ε, 1/δ) samples from D
A outputs h with Pr_D[h(x) ≠ f(x)] ≤ ε
With probability at least 1 - δ

Goldreich-Levin Theorem

Theorem: Given query access to f, we can find all S with |f̂(S)| ≥ τ using O(n/τ²) queries.

LMN Theorem

Theorem (Linial-Mansour-Nisan): Decision trees of size s can be ε-approximated by degree O(log(s/ε)) polynomials.