boofun.analysis.sampling

Boolean functions as random variables: sampling and estimation.

This module provides a probabilistic view of Boolean functions, enabling: - Sampling from the hypercube under uniform and p-biased distributions - Spectral sampling (sampling subsets proportional to Fourier weight) - Monte Carlo estimation of Fourier coefficients and influences - Statistical analysis (expectation, variance, covariance)

This aligns with O’Donnell’s “Analysis of Boolean Functions” Chapters 1-3, which treat Boolean functions as random variables on the discrete cube.

Key concepts: - f: {-1,+1}^n → R as a random variable with E[f] = f̂(∅) - Fourier weight W^k[f] = Σ_{|S|=k} f̂(S)² forms a probability distribution - Spectral sampling: draw S with Pr[S] ∝ f̂(S)² - Monte Carlo: estimate f̂(S) = E[f(x)χ_S(x)] via sampling

References: - O’Donnell Chapter 1: Fourier expansion and basic identities - O’Donnell Chapter 2: Influences and their estimation - O’Donnell Chapter 3: Learning and sampling algorithms - Goldreich-Levin algorithm for finding heavy Fourier coefficients

Functions

estimate_expectation(f, n_samples[, p, rng])

Estimate E[f] under the (p-biased) distribution.

estimate_fourier_adaptive(f, S[, ...])

Adaptively estimate f_hat(S) until standard error is below target.

estimate_fourier_coefficient(f, S, n_samples)

Estimate Fourier coefficient f̂(S) via Monte Carlo sampling.

estimate_influence(f, i, n_samples[, rng, ...])

Estimate influence of variable i via Monte Carlo sampling.

estimate_total_influence(f, n_samples[, rng])

Estimate total influence I[f] = Σ_i Inf_i[f] via sampling.

estimate_variance(f, n_samples[, p, rng])

Estimate Var[f] under the (p-biased) distribution.

sample_biased(n, p, n_samples[, rng])

Sample from the p-biased distribution μ_p on {0,1}^n.

sample_input_output_pairs(f, n_samples[, p, rng])

Sample (input, output) pairs from a Boolean function.

sample_spectral(f, n_samples[, rng])

Sample subsets S with probability proportional to f̂(S)².

sample_uniform(n, n_samples[, rng])

Sample uniformly from {0,1}^n.

sample_uniform_bits(n, n_samples[, rng])

Sample uniformly from {0,1}^n, returning bit arrays.

Classes

RandomVariableView(f[, p])

View a Boolean function as a random variable on the hypercube.

SpectralDistribution(weights, probabilities, ...)

Represents the spectral (Fourier weight) distribution of a Boolean function.

boofun.analysis.sampling.sample_uniform(n: int, n_samples: int, rng: Generator | None = None) ndarray[source]

Sample uniformly from {0,1}^n.

Parameters:
  • n – Number of variables

  • n_samples – Number of samples to draw

  • rng – Random number generator (default: numpy default)

Returns:

Array of shape (n_samples,) with integer inputs in [0, 2^n)

Raises:

ValueError – If n >= 64 (integer overflow limitation)

Note

For n >= 64, the range [0, 2^n) exceeds int64 capacity. For such large functions, consider: - Using sample_uniform_bits() which returns bit arrays - Oracle-based evaluation patterns - Streaming/lazy evaluation approaches

boofun.analysis.sampling.sample_biased(n: int, p: float, n_samples: int, rng: Generator | None = None) ndarray[source]

Sample from the p-biased distribution μ_p on {0,1}^n.

Each coordinate is independently 1 with probability p, 0 with probability 1-p.

Parameters:
  • n – Number of variables

  • p – Bias parameter (Pr[bit = 1])

  • n_samples – Number of samples

  • rng – Random number generator

Returns:

Array of shape (n_samples,) with integer inputs

boofun.analysis.sampling.sample_spectral(f: BooleanFunction, n_samples: int, rng: np.random.Generator | None = None) np.ndarray[source]

Sample subsets S with probability proportional to f̂(S)².

This is “spectral sampling” - drawing from the Fourier weight distribution. By Parseval’s identity, this is a valid probability distribution for non-constant functions (sum of weights = 1 for ±1-valued functions).

Useful for: - Finding heavy Fourier coefficients - Learning algorithms (Goldreich-Levin) - Analyzing spectral structure

Parameters:
  • f – BooleanFunction to sample from

  • n_samples – Number of samples

  • rng – Random number generator

Returns:

Array of shape (n_samples,) with subset masks (integers)

boofun.analysis.sampling.sample_input_output_pairs(f: BooleanFunction, n_samples: int, p: float = 0.5, rng: np.random.Generator | None = None) Tuple[np.ndarray, np.ndarray][source]

Sample (input, output) pairs from a Boolean function.

Draws inputs from the p-biased distribution and evaluates f.

Parameters:
  • f – BooleanFunction to sample from

  • n_samples – Number of samples

  • p – Bias parameter (0.5 = uniform)

  • rng – Random number generator

Returns:

  • inputs: array of shape (n_samples,) with integer inputs

  • outputs: array of shape (n_samples,) with function values in {0,1}

Return type:

Tuple of (inputs, outputs) where

boofun.analysis.sampling.estimate_fourier_coefficient(f: BooleanFunction, S: int, n_samples: int, p: float = 0.5, rng: np.random.Generator | None = None, return_confidence: bool = False) float | Tuple[float, float][source]

Estimate Fourier coefficient f̂(S) via Monte Carlo sampling.

Uses the identity f̂(S) = E_x[f(x) χ_S(x)] where χ_S(x) = (-1)^{⟨x,S⟩}.

For p-biased sampling, uses the adjusted formula from O’Donnell Chapter 8.

Parameters:
  • f – BooleanFunction to analyze

  • S – Subset mask (integer)

  • n_samples – Number of samples

  • p – Bias parameter (0.5 = uniform)

  • rng – Random number generator

  • return_confidence – If True, return (estimate, std_error)

Returns:

Estimated f̂(S), or (estimate, std_error) if return_confidence=True

Note

Error scales as O(1/√n_samples). For 1% relative error, need ~10000 samples.

boofun.analysis.sampling.estimate_influence(f: BooleanFunction, i: int, n_samples: int, rng: np.random.Generator | None = None, return_confidence: bool = False) float | Tuple[float, float][source]

Estimate influence of variable i via Monte Carlo sampling.

Uses the identity Inf_i[f] = Pr_x[f(x) != f(x^{(i)})].

Parameters:
  • f – BooleanFunction to analyze

  • i – Variable index (0-indexed)

  • n_samples – Number of samples

  • rng – Random number generator

  • return_confidence – If True, return (estimate, std_error)

Returns:

Estimated Inf_i[f], or (estimate, std_error) if return_confidence=True

boofun.analysis.sampling.estimate_expectation(f: BooleanFunction, n_samples: int, p: float = 0.5, rng: np.random.Generator | None = None) float[source]

Estimate E[f] under the (p-biased) distribution.

For uniform distribution, E[f] = f̂(∅) = bias of f in ±1 representation.

Parameters:
  • f – BooleanFunction (treated as ±1-valued)

  • n_samples – Number of samples

  • p – Bias parameter

  • rng – Random number generator

Returns:

Estimated E[f]

boofun.analysis.sampling.estimate_variance(f: BooleanFunction, n_samples: int, p: float = 0.5, rng: np.random.Generator | None = None) float[source]

Estimate Var[f] under the (p-biased) distribution.

For ±1-valued functions, Var[f] = 1 - E[f]² = Σ_{S≠∅} f̂(S)².

Parameters:
  • f – BooleanFunction

  • n_samples – Number of samples

  • p – Bias parameter

  • rng – Random number generator

Returns:

Estimated Var[f]

boofun.analysis.sampling.estimate_total_influence(f: BooleanFunction, n_samples: int, rng: np.random.Generator | None = None) float[source]

Estimate total influence I[f] = Σ_i Inf_i[f] via sampling.

Uses the identity I[f] = E[s(f,x)] where s(f,x) is the sensitivity at x.

Parameters:
  • f – BooleanFunction

  • n_samples – Number of samples

  • rng – Random number generator

Returns:

Estimated total influence

class boofun.analysis.sampling.RandomVariableView(f: BooleanFunction, p: float = 0.5)[source]

View a Boolean function as a random variable on the hypercube.

This class provides a probabilistic interface to Boolean functions, supporting operations like expectation, variance, and sampling.

In O’Donnell’s framework: - f: {-1,+1}^n → R is a random variable - E[f] = f̂(∅) - Var[f] = Σ_{S≠∅} f̂(S)² - f = Σ_S f̂(S) χ_S is an orthonormal decomposition

function

The underlying BooleanFunction

p

Bias parameter for sampling (default 0.5 = uniform)

__init__(f: BooleanFunction, p: float = 0.5)[source]

Initialize random variable view.

Parameters:
  • f – BooleanFunction to view as random variable

  • p – Bias parameter (default 0.5 = uniform distribution)

seed(seed: int) RandomVariableView[source]

Set random seed for reproducibility.

property n_vars: int

Number of variables.

property spectral_distribution: SpectralDistribution

Get the spectral distribution (cached).

expectation() float[source]

Compute exact E[f] = f̂(∅).

variance() float[source]

Compute exact Var[f].

fourier_coefficient(S: int) float[source]

Get exact Fourier coefficient f̂(S).

influence(i: int) float[source]

Get exact influence of variable i.

total_influence() float[source]

Get exact total influence.

sample(n_samples: int) Tuple[ndarray, ndarray][source]

Sample (input, output) pairs.

Parameters:

n_samples – Number of samples

Returns:

Tuple of (inputs, outputs) arrays

sample_inputs(n_samples: int) ndarray[source]

Sample inputs from the (p-biased) distribution.

sample_spectral(n_samples: int) ndarray[source]

Sample subsets from the Fourier weight distribution.

estimate_expectation(n_samples: int) float[source]

Estimate E[f] via sampling.

estimate_variance(n_samples: int) float[source]

Estimate Var[f] via sampling.

estimate_fourier_coefficient(S: int, n_samples: int, return_confidence: bool = False) float | Tuple[float, float][source]

Estimate f̂(S) via sampling.

estimate_influence(i: int, n_samples: int, return_confidence: bool = False) float | Tuple[float, float][source]

Estimate Inf_i[f] via sampling.

estimate_total_influence(n_samples: int) float[source]

Estimate I[f] via sampling.

validate_estimates(n_samples: int = 10000, tolerance: float = 0.1) Dict[str, bool][source]

Validate that estimates are close to exact values.

Parameters:
  • n_samples – Number of samples for estimation

  • tolerance – Maximum allowed relative error

Returns:

Dictionary of validation results

summary() str[source]

Return human-readable summary.

class boofun.analysis.sampling.SpectralDistribution(weights: ndarray, probabilities: ndarray, n_vars: int)[source]

Represents the spectral (Fourier weight) distribution of a Boolean function.

The spectral distribution has: - Support: all subsets S ⊆ [n] - Probabilities: Pr[S] = f̂(S)² / Σ_T f̂(T)²

For ±1-valued functions, Σ_S f̂(S)² = 1 by Parseval.

weights

Array of f̂(S)² values indexed by subset

Type:

numpy.ndarray

probabilities

Normalized probabilities

Type:

numpy.ndarray

n_vars

Number of variables

Type:

int

weights: ndarray
probabilities: ndarray
n_vars: int
classmethod from_function(f: BooleanFunction) SpectralDistribution[source]

Create spectral distribution from a Boolean function.

sample(n_samples: int, rng: Generator | None = None) ndarray[source]

Sample subsets from the spectral distribution.

weight_at_degree(k: int) float[source]

Total weight at degree k: W^k[f] = Σ_{|S|=k} f̂(S)².

entropy() float[source]

Shannon entropy of the spectral distribution.

effective_support_size(threshold: float = 0.01) int[source]

Count subsets with probability > threshold.

__init__(weights: ndarray, probabilities: ndarray, n_vars: int) None