MixFit.jl Documentation
Model estimation
MixFit.MixModel
— TypeStructure of mixture model parameters.
Variables
α::Vector{Float32}
: Weights of the componentsμ::Vector{Float32}
: Means of the componentsσ::Vector{Float32}
: SDs of the components
MixFit.mixfit
— Functionmixfit(x::Vector{<:Real},
m::Int;
rtol::AbstractFloat = 0.00001,
α::Vector{<:Real} = fill(1/m, m),
μ::Vector{<:Real} = quantile!(x, (1:m)/m),
σ::Vector{<:Real} = fill(std(x) / √(m), m),
maxswap::Int = 5 * m^2,
maxiter_inner::Int = 0,
maxiter::Int = 0,
silent::Bool = false,
kernel::Function = dnorm)
Get the maximum likelihood estimate of an m
-component mixture model using random-swap EM. Random-swap EM avoids local optimums by randomly replacing components and using the result with the maximum likelihood [1]. The component distributions are given by kernel
. The maximum number of swaps is given by maxswap
. If results vary across runs, then maxswap
is too low - the default is $5m^2$. maxiter_inner is the maximum number of iterations for the estimates that go through the swapping process, and maxiter is the maximum for the final estimate. Similarly, rtol
is for the final estimate, while the inner estimates use 0.1. To supress output, simply set silent
to true. Starting values can be provided via the α, μ, and σ arguments, but this shouldn't be necassary due to the use of random swapping.
MixFit.densfit
— Functiondensfit(x::Vector{<:Real};
wait::Int = 3,
rtol_em::AbstractFloat = 0.00001,
criterion::Function = AIC,
silent::Bool = false,
maxiter::Int = 0,
maxiter_inner::Int = 0,
maxswap::Int = 0,
kernel::Function = dnorm)
Estimate the density of x
by a mixture model. Successive mixture model estimates are done via random swap EM with increasing number of clusters until criterion
decreases for 3 iterations. By default, criterion
uses AIC, which performs well for density estimation [2]. However, AIC2 should be used instead if one wants to actually estimate the number of clusters in a true mixture model. The relative tolerance for EM convergence is given by rtol_em
. To disable output, set silent
to true.
MixFit.em_run!
— Functionem_run!(est::MixModel, x::Vector{<:Real}, rtol::AbstractFloat = 0.00001, maxiter::Int = 0)
Run EM steps until the percent increase in log-likelihood is below rtol
or the number of iterations is greater than maxiter
. If maxiter
is set to zero, EM steps will continue until the increase is below rtol
regardless of the number of iterations.
MixFit.em_step!
— FunctionModel methods
MixFit.LL
— FunctionLL(x::Vector{<:Real}, est::MixModel)
Get the log-likelihood of a mixture model detailed in est
, using the data x
.
MixFit.AIC
— FunctionAIC(x::Vector{<:Real}, est::MixModel)
Get the AIC of a mixture model, using the data x
. The AIC for a model $M$ is given by: $AIC(M) = 2*l(M) - 2*k$ Where $k$ is the number of parameters.
MixFit.AIC3
— FunctionAIC3(x::Vector{<:Real}, est::MixModel)
Get the modified AIC, "AIC3", of a mixture model using the data x
. The AIC3 for a model $M$ is given by [3]: $AIC(M) = 2*l(M) - 3*k$ Where $k$ is the number of parameters.
MixFit.BIC
— FunctionBIC(x::Vector{<:Real}, est::MixModel)
Get the BIC of a mixture model using the data x
.
MixFit.describe
— Functiondescribe(est::MixModel; data::Vector{<:Real})
Pretty-print the parameters and fit indicies for the mixture model est
. If data
is not specified, fit indicies will not be printed.
Distributions
MixFit.dnorm
— Functiondnorm(x::Real, μ::Real, σ::Real)
Normal distribution density function
MixFit.dgumbel
— Functiondgumbel(x::Real, μ::Real, σ::Real)
Gumbel density, parameterized by mean (μ) and SD (σ) of x.
MixFit.dgamma
— Functiondgamma(x::Real, μ::Real, σ::Real)
Gamma density, parameterized by mean (μ) and sigma(σ) of x.
MixFit.dlognorm
— Functiondlognorm(x::Real, μ::Real, σ::Real)
Lognormal density, parameterized by mean (μ) and SD (σ) of x, NOT of log(x).
- 1Zhao, Q., Hautamäki, V., Kärkkäinen, I., & Fränti, P. (2012). Random swap EM algorithm for Gaussian mixture models. Pattern Recognition Letters, 33(16), 2120-2126.
- 2Wang, Y., & Chee, C. S. (2012). Density estimation using non-parametric and semi-parametric mixtures. Statistical Modelling, 12(1), 67-92.
- 3Bozdogan, H. (1994). Mixture-model cluster analysis using model selection criteria and a new informational measure of complexity. In Proceedings of the first US/Japan conference on the frontiers of statistical modeling: An informational approach (pp. 69-113). Springer, Dordrecht.