Open In Colab

Problem Set 2: LTFs, Decision Trees, and Influences¶

CS 294-92: Analysis of Boolean Functions - Spring 2025
Due: Monday, February 24, 11:59PM Notebook by: Gabriel Taboada

This notebook provides computational exploration of the concepts in HW2:

  1. Majority as the unique symmetric, monotone, odd voting rule
  2. Linear Threshold Functions (LTFs) and influence ordering
  3. Decision tree Fourier concentration
  4. Influence characterization via resampling
  5. Goldreich-Levin and pseudorandom generators

In [1]:
# Install/upgrade boofun (required for Colab)
# This ensures you have the latest version with all features
!pip install --upgrade boofun -q

import boofun as bf
print(f"BooFun version: {bf.__version__}")
[notice] A new release of pip is available: 25.2 -> 26.0
[notice] To update, run: /Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip
/Users/gabrieltaboada/dev/Boofun/boofun/src/boofun/core/errormodels.py:21: UserWarning: uncertainties library not available - some error models disabled
  warnings.warn("uncertainties library not available - some error models disabled")
/Users/gabrieltaboada/dev/Boofun/boofun/src/boofun/quantum/__init__.py:22: UserWarning: Qiskit not available - quantum features limited
  warnings.warn("Qiskit not available - quantum features limited")
BooFun version: 1.1.1
In [2]:
import numpy as np
import boofun as bf
from boofun.analysis import PropertyTester, ltf_analysis
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings('ignore')
np.random.seed(42)

Problem 1: Majority is Unique¶

Statement: For odd $n$, the only voting rule that is symmetric, monotone, and odd is Majority.

  • Symmetric: $f(x_{\pi(1)}, \ldots, x_{\pi(n)}) = f(x_1, \ldots, x_n)$ for all permutations $\pi$
  • Monotone: $x \leq y \Rightarrow f(x) \leq f(y)$
  • Odd: $f(-x) = -f(x)$

Let's verify computationally that Majority satisfies all three, and that other candidates don't:

In [ ]:
def is_symmetric(f):
    """Check if f is symmetric (invariant under permutations)."""
    n = f.n_vars
    tt = np.asarray(f.get_representation("truth_table"))
    
    # For symmetric functions, output depends only on Hamming weight
    for i in range(2**n):
        bits = [int(b) for b in format(i, f'0{n}b')]
        weight = sum(bits)
        # Find another input with same weight
        for j in range(i+1, 2**n):
            bits_j = [int(b) for b in format(j, f'0{n}b')]
            if sum(bits_j) == weight and tt[i] != tt[j]:
                return False
    return True

def is_odd(f):
    """Check if f(-x) = -f(x), i.e., f is balanced and antiself-dual."""
    n = f.n_vars
    tt = np.asarray(f.get_representation("truth_table"))
    
    for i in range(2**n):
        neg_i = 2**n - 1 - i  # Flip all bits
        # In {0,1}: f(x) ≠ f(~x) means odd in ±1
        if tt[i] == tt[neg_i]:
            return False
    return True

# Test various n=5 functions
n = 5
functions = {
    "Majority₅": bf.majority(n),
    "Parity₅": bf.parity(n),
    "AND₅": bf.AND(n),
    "OR₅": bf.OR(n),
    "Dictator₅": bf.dictator(n, 0),
    "Threshold(3)": bf.threshold(n, 3),  # At least 3 of 5 true
}

print("Testing symmetric, monotone, odd properties for n=5:")
print(f"{'Function':<15} | {'Symmetric':<10} | {'Monotone':<10} | {'Odd':<10}")
print("-" * 55)

for name, f in functions.items():
    sym = is_symmetric(f)
    mono = f.is_monotone()
    odd = is_odd(f)
    
    all_three = "[ALL THREE]" if (sym and mono and odd) else ""
    print(f"{name:<15} | {str(sym):<10} | {str(mono):<10} | {str(odd):<10} {all_three}")
Testing symmetric, monotone, odd properties for n=5:
Function        | Symmetric  | Monotone   | Odd       
-------------------------------------------------------
Majority₅       | True       | True       | True       [ALL THREE]
Parity₅         | True       | False      | True       
AND₅            | True       | True       | False      
OR₅             | True       | True       | False      
Dictator₅       | False      | True       | True       
Threshold(3)    | True       | True       | True       [ALL THREE]

Problem 2: Linear Threshold Functions and Influences¶

Part (a): Larger weight → Larger influence¶

Statement: For a weighted majority $f(x) = \text{sign}(a_0 + a_1 x_1 + \cdots + a_n x_n)$: $$|a_i| \geq |a_j| \Rightarrow \mathbf{Inf}_i[f] \geq \mathbf{Inf}_j[f]$$

Let's verify this empirically:

In [4]:
# Now we can use bf.weighted_majority() directly!
# Example: weights [5, 3, 1]
weights = [5, 3, 1]
f = bf.weighted_majority(weights)

# Direct API: f.influences()
influences = f.influences()

print("LTF with weights [5, 3, 1]:")
print(f"{'Variable':<10} | {'|Weight|':<10} | {'Influence':<10}")
print("-" * 35)

for i, (w, inf) in enumerate(zip(weights, influences)):
    print(f"x_{i+1:<8} | {abs(w):<10} | {inf:<10.4f}")

# Verify ordering
sorted_by_weight = sorted(range(len(weights)), key=lambda i: -abs(weights[i]))
sorted_by_inf = sorted(range(len(influences)), key=lambda i: -influences[i])
print(f"\nOrdering by |weight|: {sorted_by_weight}")
print(f"Ordering by influence: {sorted_by_inf}")
print(f"Same ordering: {'yes' if sorted_by_weight == sorted_by_inf else 'no'}")
LTF with weights [5, 3, 1]:
Variable   | |Weight|   | Influence 
-----------------------------------
x_1        | 5          | 1.0000    
x_2        | 3          | 0.0000    
x_3        | 1          | 0.0000    

Ordering by |weight|: [0, 1, 2]
Ordering by influence: [0, 1, 2]
Same ordering: yes

Part (b): Nassau County Voting System¶

The Nassau County Board has 6 districts with voting weights:

District Votes
Hempstead #1 31
Hempstead #2 31
Oyster Bay 28
North Hempstead 21
Long Beach 2
Glen Cove 2

The function is $f(x) = \text{sign}(31x_1 + 31x_2 + 28x_3 + 21x_4 + 2x_5 + 2x_6)$

Let's compute the influences!

In [5]:
# Nassau County voting system - now using bf.weighted_majority()
weights = [31, 31, 28, 21, 2, 2]
districts = ["Hempstead #1", "Hempstead #2", "Oyster Bay", 
             "North Hempstead", "Long Beach", "Glen Cove"]

nassau = bf.weighted_majority(weights)

# Full LTF analysis using the new module
analysis = ltf_analysis.analyze_ltf(nassau)
print(analysis.summary())

print("\n" + "-" * 40)
print("\nInfluences by District:")
# Direct API: f.influences()
influences = nassau.influences()

for district, votes, inf in zip(districts, weights, influences):
    status = "[ok]" if inf > 0.001 else "[DUMMY]"
    print(f"  {district:<16} | {votes:>3} votes | Inf={inf:.3f} {status}")

# Use library function to find dummy voters
print("\n⚠️ DUMMY VOTERS (zero influence):")
dummies = ltf_analysis.dummy_voters(nassau)
print(f"   {[districts[i] for i in dummies]}")
print("   These voters can NEVER change the outcome!")
LTF Analysis
========================================
Is LTF: True
Weights: [-0. -0. -0. -2. -2. -2.]
Threshold: -3.0000
Chow params: [ 0.   0.   0.   0.  -0.5 -0.5 -0.5]
Critical index: 2
Regularity τ: 0.5774

----------------------------------------

Influences by District:
  Hempstead #1     |  31 votes | Inf=0.500 [ok]
  Hempstead #2     |  31 votes | Inf=0.500 [ok]
  Oyster Bay       |  28 votes | Inf=0.500 [ok]
  North Hempstead  |  21 votes | Inf=0.000 [DUMMY]
  Long Beach       |   2 votes | Inf=0.000 [DUMMY]
  Glen Cove        |   2 votes | Inf=0.000 [DUMMY]

⚠️ DUMMY VOTERS (zero influence):
   ['North Hempstead', 'Long Beach', 'Glen Cove']
   These voters can NEVER change the outcome!

LTF Theory: Not All Functions are LTFs!¶

Key insight: LTFs are exactly the linearly separable functions.

XOR/Parity is famously NOT an LTF - there's no hyperplane that separates inputs with even parity from those with odd parity.

In [6]:
# Test which functions are LTFs
test_fns = {
    "AND₃": bf.AND(3),
    "OR₃": bf.OR(3),
    "Majority₃": bf.majority(3),
    "Parity₃ (XOR)": bf.parity(3),
    "Threshold(4,2)": bf.threshold(4, 2),
    "Dictator": bf.dictator(3, 0),
}

print("Which functions are LTFs?")
print("-" * 40)

for name, f in test_fns.items():
    is_ltf = ltf_analysis.is_ltf(f)
    status = "LTF" if is_ltf else "not LTF"
    print(f"  {name:<18}: {status}")
Which functions are LTFs?
----------------------------------------
  AND₃              : LTF
  OR₃               : LTF
  Majority₃         : LTF
  Parity₃ (XOR)     : not LTF
  Threshold(4,2)    : LTF
  Dictator          : LTF

Problem 3: Decision Tree Fourier Concentration¶

Key results about decision trees $T$ of depth $d$ and size $s$:

(a) At most $4^d$ non-zero Fourier coefficients

(b) $\sum_{S} |\hat{T}(S)| \leq s$ (spectral norm ≤ size)

(c) Fourier spectrum is $\varepsilon$-concentrated on at most $(s/\varepsilon)^2$ coefficients

(d) Fourier spectrum is $4\varepsilon$-concentrated up to degree $\log_2(s/\varepsilon)$

Let's verify these with examples:

In [7]:
# Functions that can be computed by small decision trees
# AND has depth n, size 2
# Majority has depth n, size ≈ 2^n for exact computation

# Create functions and analyze
test_fns = {
    "AND₃ (depth 3, size 2)": (bf.AND(3), 3, 2),
    "OR₃ (depth 3, size 2)": (bf.OR(3), 3, 2),
    "Parity₃ (depth 3, size 2³)": (bf.parity(3), 3, 8),
    "Majority₃ (depth 3, size ≤8)": (bf.majority(3), 3, 8),
}

print("Decision Tree Fourier Analysis:")
print(f"{'Function':<25} | {'Depth d':<8} | {'Size s':<8} | {'#Non-zero':<10} | {'≤4^d?':<7} | {'Σ|f̂|':<8} | {'≤s?'}")
print("-" * 90)

for name, (f, depth, size) in test_fns.items():
    # Direct API: f.fourier()
    fourier = f.fourier()
    
    # Count non-zero coefficients
    nonzero = sum(1 for c in fourier if abs(c) > 1e-10)
    bound_4d = 4**depth
    
    # Sum of absolute values
    l1_norm = sum(abs(c) for c in fourier)
    
    check_4d = "ok" if nonzero <= bound_4d else "fail"
    check_s = "ok" if l1_norm <= size + 0.01 else "fail"
    
    print(f"{name:<25} | {depth:<8} | {size:<8} | {nonzero:<10} | {check_4d:<7} | {l1_norm:<8.3f} | {check_s}")
Decision Tree Fourier Analysis:
Function                  | Depth d  | Size s   | #Non-zero  | ≤4^d?   | Σ|f̂|    | ≤s?
------------------------------------------------------------------------------------------
AND₃ (depth 3, size 2)    | 3        | 2        | 8          | ok      | 2.500    | fail
OR₃ (depth 3, size 2)     | 3        | 2        | 8          | ok      | 2.500    | fail
Parity₃ (depth 3, size 2³) | 3        | 8        | 1          | ok      | 1.000    | ok
Majority₃ (depth 3, size ≤8) | 3        | 8        | 4          | ok      | 2.000    | ok

Problem 4: Influence via Resampling¶

Alternative characterizations of influence and variance:

(a) $\mathbf{Inf}_i[f] = 2 \cdot \Pr_{X, X^i}[f(X) \neq f(X^i)]$
where $X^i$ is $X$ with the $i$-th coordinate resampled independently.

(b) $\mathbf{Var}[f] = 2 \cdot \Pr_{X,Y}[f(X) \neq f(Y)]$
where $X, Y$ are independent uniform.

(c) Poincaré inequality via hybrids: $\mathbf{Var}[f] \leq \mathbf{I}[f]$

In [8]:
def influence_via_resampling(f, i, trials=10000):
    """Compute Inf_i[f] = 2 * Pr[f(X) ≠ f(X^i)]."""
    n = f.n_vars
    disagree = 0
    
    for _ in range(trials):
        x = np.random.randint(0, 2, n)
        x_i = x.copy()
        x_i[i] = np.random.randint(0, 2)  # Resample i-th coordinate
        
        if f.evaluate(x) != f.evaluate(x_i):
            disagree += 1
    
    return 2 * disagree / trials

def variance_via_disagreement(f, trials=10000):
    """Compute Var[f] = 2 * Pr[f(X) ≠ f(Y)]."""
    n = f.n_vars
    disagree = 0
    
    for _ in range(trials):
        x = np.random.randint(0, 2, n)
        y = np.random.randint(0, 2, n)
        
        if f.evaluate(x) != f.evaluate(y):
            disagree += 1
    
    return 2 * disagree / trials

# Test on Majority
f = bf.majority(5)
# Direct API: f.fourier() and f.influences()
fourier = f.fourier()
true_influences = f.influences()
true_var = 1 - fourier[0]**2

print("Influence via Resampling (Majority₅):")
print(f"{'Variable':<10} | {'True Inf':<12} | {'Via Resample':<12}")
print("-" * 40)

for i in range(f.n_vars):
    resample_inf = influence_via_resampling(f, i)
    print(f"x_{i+1:<8} | {true_influences[i]:<12.4f} | {resample_inf:<12.4f}")

print(f"\nVariance:")
print(f"  True: {true_var:.4f}")
print(f"  Via disagreement: {variance_via_disagreement(f):.4f}")

# Poincaré inequality
total_inf = sum(true_influences)
print(f"\nPoincaré inequality: Var ≤ I[f]")
print(f"  {true_var:.4f} <= {total_inf:.4f} ? {'yes' if true_var <= total_inf + 0.01 else 'no'}")
Influence via Resampling (Majority₅):
Variable   | True Inf     | Via Resample
----------------------------------------
x_1        | 0.3750       | 0.3708      
x_2        | 0.3750       | 0.3522      
x_3        | 0.3750       | 0.3816      
x_4        | 0.3750       | 0.3690      
x_5        | 0.3750       | 0.3788      

Variance:
  True: 1.0000
  Via disagreement: 1.0026

Poincaré inequality: Var ≤ I[f]
  1.0000 <= 1.8750 ? yes

Problem 5: Goldreich-Levin and Pseudorandom Generators¶

The Goldreich-Levin Theorem shows how to construct pseudorandom generators from one-way permutations.

Given:

  • One-way permutation $f: \mathbb{F}_2^n \to \mathbb{F}_2^n$
  • PRG construction: $g(r, s) = (r, f(s), r \cdot s)$

Key insight: The inner product $r \cdot s = \sum_i r_i s_i \mod 2$ (a.k.a. parity) serves as a hardcore predicate.

If an adversary can predict $r \cdot s$ from $(r, f(s))$ with advantage $\gamma$, then Goldreich-Levin can invert $f$ with non-negligible probability!

In [9]:
# Demonstrate connection to Fourier analysis
# For fixed s, the function A_y(x) = A(x, y) where A predicts r·s
# The Goldreich-Levin algorithm finds s by looking for heavy Fourier coefficients

from boofun.analysis.learning import goldreich_levin

# Create a "secret" s
n = 6
s = np.array([1, 0, 1, 1, 0, 1])  # Secret vector

# The inner product function χ_s(r) = (-1)^{r·s}
def inner_product_with_s(r):
    return sum(r * s) % 2

chi_s = bf.create(inner_product_with_s, n=n)

# This is exactly χ_S in Fourier terms!
# The only heavy coefficient should be at S = {indices where s_i = 1}
# Direct API: f.fourier()
fourier = chi_s.fourier()

print(f"Secret s = {s}")
print(f"S = {{{', '.join(str(i) for i, v in enumerate(s) if v == 1)}}}")

# Find heavy Fourier coefficients
heavy = [(i, c) for i, c in enumerate(fourier) if abs(c) > 0.5]
print(f"\nHeavy Fourier coefficient at index: {heavy}")

# Use Goldreich-Levin to find heavy coefficients
print("\nUsing Goldreich-Levin algorithm:")
recovered = goldreich_levin(chi_s, threshold=0.1)
print(f"Recovered coefficient: {recovered}")
Secret s = [1 0 1 1 0 1]
S = {0, 2, 3, 5}

Heavy Fourier coefficient at index: [(45, 1.0)]

Using Goldreich-Levin algorithm:
Recovered coefficient: [(45, 1.0)]

Summary¶

Key Takeaways from HW2:¶

  1. Majority is Unique: The only symmetric, monotone, odd voting rule

  2. LTF Influences:

    • Larger weight → larger influence
    • Nassau County: Some voters have zero influence (dummy voters!)
    • Not all functions are LTFs (XOR is NOT an LTF!)
  3. Decision Tree Fourier:

    • Depth $d$ → at most $4^d$ non-zero coefficients
    • Size $s$ → $\sum_S |\hat{T}(S)| \leq s$
    • Strong spectral concentration
  4. Influence via Resampling:

    • $\mathbf{Inf}_i[f] = 2 \cdot \Pr[f(X) \neq f(X^i)]$
    • Leads to Poincaré inequality: $\mathbf{Var}[f] \leq \mathbf{I}[f]$
  5. Goldreich-Levin:

    • Heavy Fourier coefficients reveal secrets
    • Connection to PRGs and cryptographic hardness

Using boofun:¶

import boofun as bf
from boofun.analysis import ltf_analysis

# Create LTF directly with library function
nassau = bf.weighted_majority([31, 31, 28, 21, 2, 2])
at_least_3 = bf.threshold(5, 3)

# Full LTF analysis
analysis = ltf_analysis.analyze_ltf(nassau)
print(analysis.summary())

# Find dummy voters
dummies = ltf_analysis.dummy_voters(nassau)

# Test if a function is an LTF
is_ltf = ltf_analysis.is_ltf(bf.parity(3))  # False!

# Chow parameters uniquely identify LTFs
chow = ltf_analysis.chow_parameters(nassau)

# Goldreich-Levin
from boofun.analysis.learning import goldreich_levin
heavy = goldreich_levin(f, threshold=0.1)