GMMParameterEstimation.jl Documentation

GMMParameterEstimation.jl is a package for estimating the parameters of Gaussian k mixture models using the method of moments. It works for general k with known mixing coefficients, and for k=2,3,4 for unknown mixing coefficients.

Example

The following code snippet will generate a 3D 2-mixture, take a sample, compute the necessary moments, and then return an estimate of the parameters using the method of moments.

using GMMParameterEstimation
d = 3
k = 2
diagonal = true
num_samples = 10^4
w, true_means, true_covariances = generateGaussians(d, k, diagonal)
sample = getSample(num_samples, w, true_means, true_covariances)
first_moms, diagonal_moms, off_diagonals = sampleMoments(sample, k)
pass, (mixing_coefficients, means, covariances) = estimate_parameters(d, k, first_moms, diagonal_moms, off_diagonals, diagonal)

$\\~\\$

Parameter estimation

The main functionality of this package stems from

GMMParameterEstimation.estimate_parametersFunction
estimate_parameters(d::Integer, k::Integer, first::Vector{Float64}, second::Matrix{Float64}, last::Union{Dict{Vector{Int64}, Expression}, Nothing}, diagonal::Bool)

Compute an estimate for the parameters of a d-dimensional Gaussian k-mixture model from the moments.

If w is provided it is taken as the mixing coefficients, otherwise those are computed as well. first should be a list of moments 0 through 3k for the first dimension, second should be a matrix of moments 1 through 2k+1 for the remaining dimensions, and last should be a dictionary of the indices as lists of integers and the corresponding moments or nothing if the covariance matrices are diagonal.

estimate_parameters(d::Integer, k::Integer, w::Array{Float64}, first::Vector{Float64}, second::Matrix{Float64}, last::Union{Dict{Vector{Int64}, Expression}, Nothing}, diagonal::Bool)

Compute an estimate for the parameters of a d-dimensional Gaussian k-mixture model from the moments.

If w is provided it is taken as the mixing coefficients, otherwise those are computed as well. first should be a list of moments 0 through 3k for the first dimension, second should be a matrix of moments 1 through 2k+1 for the remaining dimensions, and last should be a dictionary of the indices as lists of integers and the corresponding moments or nothing if the covariance matrices are diagonal.

which computes the parameter recovery using Algorithm 1 from Estimating Gaussian Mixtures Using Sparse Polynomial Moment Systems.

In one dimension, for a random variable $X$ with density $f$ we define the $i$th moment as $m_i=E[X^i]=\int xf(x)dx$. For a Gaussian mixture model, this results in a polynomial in the parameters. For a sample $\{y_1,y_2,\dots,y_N\}$, we define the $i$th sample moment as $\overline{m_i}=\frac{1}{N}\sum_{j=1}^N y_j^i$. The sample moments approach the true moments as $N\rightarrow\infty$, so by setting the polynomials equal to the empirical moments, we can then solve the polynomial system to recover the parameters.

For a multivariate random variable $X$ with density $f_X$ we define the moments as $m_{i_1,\dots,i_n} = E[X_1^{i_1}\cdots X_n^{i_n}] = \int\cdots\int x_1^{i_1}\cdots x_n^{i_n}f_X(x_1,\dots,x_n)dx_1\cdots dx_n$ and the empirical moments as $\overline{m}_{i_1,\dots,i_n} = \frac{1}{N}\sum_{j=1}^Ny_{j_1}^{i_1}\cdots y_{j_n}^{i_n}$. And again, by setting the polynomials equal to the empirical moments, we can then solve the system of polynomials to recover the parameters. However, choosing which moments becomes more complicated.

$\\~\\$

Generate and sample from Gaussian Mixture Models

Note that the entries of the resulting covariance matrices are generated from a normal distribution centered at 0 with variance 1.

$\\~\\$

GMMParameterEstimation.generateGaussiansFunction
generateGaussians(d::Integer, k::Integer, diagonal::Bool)

Generate means and covariances for k Gaussians with dimension d.

diagonal should be true for spherical case, and false for dense covariance matrices.

The parameters are returned as a tuple, with weights in a 1D vector, means as a k x d array, and variances as a k x d x d array. Note that each entry of each parameter is generated from a normal distribution centered at 0 with variance 1.

$\\~\\$

GMMParameterEstimation.getSampleFunction
getSample(numb::Integer, w::Vector{Float64}, means::Matrix{Float64}, covariances::Array{Float64, 3})

Generate a Gaussian mixture model sample with numb entries, mixing coefficients w, means means, and covariances covariances.

This relies on the Distributions package.

$\\~\\$

GMMParameterEstimation.sampleMomentsFunction
sampleMoments(sample::Matrix{Float64}, k; diagonal = false)

Use the sample to compute the moments necessary for parameter estimation using method of moments.

Returns moments 0 to 3k for the first dimension, moments 1 through 2k+1 for the other dimensions as a matrix, and a dictionary with indices and moments for the off-diagonal system if diagonal is false.

GMMParameterEstimation.perfectMomentsFunction
perfectMoments(d, k, w, true_means, true_covariances)

Use the given parameters to compute the exact moments necessary for parameter estimation.

Returns moments 0 to 3k for the first dimension, moments 1 through 2k+1 for the other dimensions as a matrix, and a dictionary with indices and moments for the off-diagonal system.

Both expect parameters to be given with weights in a 1D vector, means as a k x d array, and variances as a k x d x d array.

$\\~\\$

Build the polynomial systems

GMMParameterEstimation.build1DSystemFunction
build1DSystem(k::Integer, m::Integer)

Build the polynomial system for a mixture of 1D Gaussians where 'm' is the highest desired moment.

If a is given, use a as the mixing coefficients, otherwise leave them as unknowns.

build1DSystem(k::Integer, m::Integer, a::Union{Vector{Float64}, Vector{Variable}})

Build the polynomial system for a mixture of 1D Gaussians where 'm' is the highest desired moment.

If a is given, use a as the mixing coefficients, otherwise leave them as unknowns.

GMMParameterEstimation.selectSolFunction
selectSol(k::Integer, solution::Result, polynomial::Expression, moment::Number)

Select a k mixture solution from solution accounting for polynomial and moment.

Sort out a k mixture statistically significant solutions from solution, and return the one closest to moment when polynomial is evaluated at those values.

GMMParameterEstimation.mixedMomentSystemFunction
mixedMomentSystem(d, k, mixing, ms, vs)

Build a linear system for finding the off-diagonal covariances entries.

For a d dimensional Gaussian k-mixture model with mixing coefficients mixing, means ms, and covariances vs where the diagonal entries have been filled in and the off diagonals are variables.

Index