# Learning Theory Guide Algorithms for learning Boolean functions from queries or samples. ## Overview BooFun provides implementations of key learning algorithms from computational learning theory: - **Goldreich-Levin**: Find heavy Fourier coefficients with query access - **PAC Learning**: Probably Approximately Correct learning framework - **Junta Learning**: Learn functions that depend on few variables - **LMN Algorithm**: Learn decision trees from uniform samples - **Sparse Fourier Learning**: Learn functions with few Fourier coefficients ## Goldreich-Levin Algorithm The Goldreich-Levin algorithm finds all "heavy" Fourier coefficients (|f̂(S)| ≥ τ) using only O(n/τ²) queries. ```python from boofun.analysis.learning import goldreich_levin, find_heavy_coefficients f = bf.majority(7) # Find all coefficients with |f̂(S)| ≥ 0.1 heavy = goldreich_levin(f, threshold=0.1, num_samples=5000) print(f"Found {len(heavy)} heavy coefficients") for S, coeff in heavy: print(f" f̂({S}) ≈ {coeff:.4f}") ``` ### Estimating Single Coefficients ```python from boofun.analysis.learning import estimate_fourier_coefficient f = bf.parity(5) # Estimate f̂({0,1,2,3,4}) - should be ±1 for parity S = 0b11111 # All variables estimate, std_err = estimate_fourier_coefficient(f, S, num_samples=10000) print(f"Estimated f̂(S) = {estimate:.4f} ± {std_err:.4f}") ``` ### GoldreichLevinLearner Class For more control over the learning process: ```python from boofun.analysis.learning import GoldreichLevinLearner learner = GoldreichLevinLearner(threshold=0.1, confidence=0.95) # Learn from query access heavy_coeffs = learner.learn(f, num_samples=10000) # Get learning statistics print(f"Queries used: {learner.query_count}") print(f"Coefficients found: {len(heavy_coeffs)}") ``` ## PAC Learning The PAC (Probably Approximately Correct) framework: - **Given:** Sample access to f (can draw (x, f(x)) for random x) - **Goal:** Output hypothesis h such that Pr[h(x) != f(x)] <= epsilon - **With probability:** at least 1 - delta ### Learning Low-Degree Functions Functions with spectral concentration at low degrees: ```python from boofun.analysis.pac_learning import pac_learn_low_degree f = bf.majority(5) # Learn degree-2 approximation hypothesis, error = pac_learn_low_degree( f, degree=2, epsilon=0.1, delta=0.05, num_samples=10000 ) print(f"Approximation error: {error:.4f}") ``` ### Learning Juntas A k-junta depends on at most k variables: ```python from boofun.analysis.pac_learning import pac_learn_junta # Create a 3-junta (depends on variables 0, 2, 4 only) def junta_func(x): return x[0] ^ x[2] ^ x[4] f = bf.create(junta_func, n=7) # Learn the junta hypothesis, relevant_vars = pac_learn_junta( f, k=3, epsilon=0.1, delta=0.05 ) print(f"Relevant variables: {relevant_vars}") ``` ### LMN Algorithm Learn decision trees from uniform samples (Linial-Mansour-Nisan): ```python from boofun.analysis.pac_learning import lmn_algorithm f = bf.tribes(2, 3) # A read-once DNF # Learn using LMN hypothesis = lmn_algorithm( f, epsilon=0.1, delta=0.05 ) # Test accuracy from boofun.analysis.pac_learning import sample_function test_samples = sample_function(f, 1000) correct = sum(1 for x, y in test_samples if hypothesis(x) == y) print(f"Accuracy: {correct/1000:.1%}") ``` ### Learning Sparse Fourier Functions Functions with few non-zero Fourier coefficients: ```python from boofun.analysis.pac_learning import pac_learn_sparse_fourier f = bf.parity(4) # Has only one non-zero coefficient hypothesis, support = pac_learn_sparse_fourier( f, sparsity=5, epsilon=0.1, delta=0.05 ) print(f"Learned Fourier support: {support}") ``` ### Learning Monotone Functions ```python from boofun.analysis.pac_learning import pac_learn_monotone f = bf.AND(4) # Monotone function hypothesis = pac_learn_monotone( f, epsilon=0.1, delta=0.05 ) ``` ## PACLearner Class General-purpose PAC learner: ```python from boofun.analysis.pac_learning import PACLearner learner = PACLearner(epsilon=0.1, delta=0.05) # Learn from samples hypothesis = learner.learn(f, num_samples=10000) # Evaluate error = learner.evaluate(hypothesis, f, num_test=1000) print(f"Test error: {error:.4f}") # Get statistics print(f"Samples used: {learner.sample_count}") ``` ## Sample Complexity | Function Class | Sample Complexity | Algorithm | |---------------|-------------------|-----------| | k-juntas | O(2^k log n / ε) | Influence-based | | Degree-d | O(n^d / ε²) | Low-degree learning | | s-sparse Fourier | O(s log n / ε²) | Sparse recovery | | Decision trees (size s) | O(s log n / ε²) | LMN | | Monotone | O(n / ε²) | Monotone learning | ## Connections to Other Topics ### Influences and Learning High-influence variables are likely relevant: ```python # Find influential variables for junta learning infs = f.influences() top_vars = sorted(range(len(infs)), key=lambda i: infs[i], reverse=True)[:k] ``` ### Noise Stability and Learning Noise-stable functions are easier to learn: ```python stability = f.noise_stability(0.99) # High stability → few high-degree coefficients → easier to learn ``` ### Hypercontractivity and Learning KKL theorem bounds help with junta identification: ```python from boofun import max_influence_bound max_inf, kkl_bound, total = max_influence_bound(f) # If max_inf >> kkl_bound, function might be close to a junta ``` ## Mathematical Background ### PAC Model A concept class C is PAC-learnable if there exists an algorithm A such that: - For any f ∈ C and distribution D - Given m = poly(n, 1/ε, 1/δ) samples from D - A outputs h with Pr_D[h(x) ≠ f(x)] ≤ ε - With probability at least 1 - δ ### Goldreich-Levin Theorem **Theorem**: Given query access to f, we can find all S with |f̂(S)| ≥ τ using O(n/τ²) queries. ### LMN Theorem **Theorem** (Linial-Mansour-Nisan): Decision trees of size s can be ε-approximated by degree O(log(s/ε)) polynomials. ## See Also - [Spectral Analysis Guide](spectral_analysis.md): Fourier coefficients - [Query Complexity Guide](query_complexity.md): Query models - `notebooks/lecture7_goldreich_levin.ipynb`: Goldreich-Levin tutorial - `notebooks/lecture8_learning_juntas.ipynb`: Junta learning tutorial - O'Donnell, *Analysis of Boolean Functions*, Chapter 3