MixFit.jl Documentation

Model estimation

MixFit.MixModelType

Structure of mixture model parameters.

Variables

  • α::Vector{Float32}: Weights of the components
  • μ::Vector{Float32}: Means of the components
  • σ::Vector{Float32}: SDs of the components
source
MixFit.mixfitFunction
mixfit(x::Vector{<:Real},
        m::Int;
        rtol::AbstractFloat = 0.00001,
        α::Vector{<:Real} = fill(1/m, m),
        μ::Vector{<:Real} = quantile!(x, (1:m)/m),
        σ::Vector{<:Real} = fill(std(x) / √(m), m),
        maxswap::Int = 5 * m^2,
        maxiter_inner::Int = 0,
        maxiter::Int = 0,
        silent::Bool = false,
        kernel::Function = dnorm)

Get the maximum likelihood estimate of an m-component mixture model using random-swap EM. Random-swap EM avoids local optimums by randomly replacing components and using the result with the maximum likelihood [1]. The component distributions are given by kernel. The maximum number of swaps is given by maxswap. If results vary across runs, then maxswap is too low - the default is $5m^2$. maxiter_inner is the maximum number of iterations for the estimates that go through the swapping process, and maxiter is the maximum for the final estimate. Similarly, rtol is for the final estimate, while the inner estimates use 0.1. To supress output, simply set silent to true. Starting values can be provided via the α, μ, and σ arguments, but this shouldn't be necassary due to the use of random swapping.

See also: densfit, em_run!

source
MixFit.densfitFunction
densfit(x::Vector{<:Real};
        wait::Int = 3,
        rtol_em::AbstractFloat = 0.00001,
        criterion::Function = AIC,
        silent::Bool = false,
        maxiter::Int = 0,
        maxiter_inner::Int = 0,
        maxswap::Int = 0,
        kernel::Function = dnorm)

Estimate the density of x by a mixture model. Successive mixture model estimates are done via random swap EM with increasing number of clusters until criterion decreases for 3 iterations. By default, criterion uses AIC, which performs well for density estimation [2]. However, AIC2 should be used instead if one wants to actually estimate the number of clusters in a true mixture model. The relative tolerance for EM convergence is given by rtol_em. To disable output, set silent to true.

See also: densfit, em_run!

source
MixFit.em_run!Function
em_run!(est::MixModel, x::Vector{<:Real}, rtol::AbstractFloat = 0.00001, maxiter::Int = 0)

Run EM steps until the percent increase in log-likelihood is below rtol or the number of iterations is greater than maxiter. If maxiter is set to zero, EM steps will continue until the increase is below rtol regardless of the number of iterations.

See also: em_run!, mixfit

source

Model methods

MixFit.LLFunction
LL(x::Vector{<:Real}, est::MixModel)

Get the log-likelihood of a mixture model detailed in est, using the data x.

See also: AIC, AIC3, BIC

source
MixFit.AICFunction
AIC(x::Vector{<:Real}, est::MixModel)

Get the AIC of a mixture model, using the data x. The AIC for a model $M$ is given by: $AIC(M) = 2*l(M) - 2*k$ Where $k$ is the number of parameters.

See also: AIC3, BIC, LL

source
MixFit.AIC3Function
AIC3(x::Vector{<:Real}, est::MixModel)

Get the modified AIC, "AIC3", of a mixture model using the data x. The AIC3 for a model $M$ is given by [3]: $AIC(M) = 2*l(M) - 3*k$ Where $k$ is the number of parameters.

See also: AIC, BIC, LL

source
MixFit.BICFunction
BIC(x::Vector{<:Real}, est::MixModel)

Get the BIC of a mixture model using the data x.

See also: AIC, AIC3, LL

source
MixFit.describeFunction
describe(est::MixModel; data::Vector{<:Real})

Pretty-print the parameters and fit indicies for the mixture model est. If data is not specified, fit indicies will not be printed.

source

Distributions

MixFit.dnormFunction
dnorm(x::Real, μ::Real, σ::Real)

Normal distribution density function

source
MixFit.dgumbelFunction
dgumbel(x::Real, μ::Real, σ::Real)

Gumbel density, parameterized by mean (μ) and SD (σ) of x.

source
MixFit.dgammaFunction
dgamma(x::Real, μ::Real, σ::Real)

Gamma density, parameterized by mean (μ) and sigma(σ) of x.

source
MixFit.dlognormFunction
dlognorm(x::Real, μ::Real, σ::Real)

Lognormal density, parameterized by mean (μ) and SD (σ) of x, NOT of log(x).

source
  • 1Zhao, Q., Hautamäki, V., Kärkkäinen, I., & Fränti, P. (2012). Random swap EM algorithm for Gaussian mixture models. Pattern Recognition Letters, 33(16), 2120-2126.
  • 2Wang, Y., & Chee, C. S. (2012). Density estimation using non-parametric and semi-parametric mixtures. Statistical Modelling, 12(1), 67-92.
  • 3Bozdogan, H. (1994). Mixture-model cluster analysis using model selection criteria and a new informational measure of complexity. In Proceedings of the first US/Japan conference on the frontiers of statistical modeling: An informational approach (pp. 69-113). Springer, Dordrecht.