IT++ Logo
Classes | Functions
Mixture of Gaussians (MOG)
Statistics Module

Classes and functions for modelling multivariate data as a Mixture of Gaussians. More...

Classes

class  itpp::MOG_diag
 Diagonal Mixture of Gaussians (MOG) class. More...
 
class  itpp::MOG_generic
 Generic Mixture of Gaussians (MOG) class. Used as a base for other MOG classes. More...
 

Functions

void itpp::MOG_diag_ML (MOG_diag &model_in, Array< vec > &X_in, int max_iter_in, double var_floor_in, double weight_floor_in, bool verbose_in)
 
void itpp::MOG_diag_kmeans (MOG_diag &model_in, Array< vec > &X_in, int max_iter_in, double trust_in, bool normalise_in, bool verbose_in)
 

Detailed Description

Classes and functions for modelling multivariate data as a Mixture of Gaussians.

Author
Conrad Sanderson

The following example shows how to model data:

Array<vec> X;
// ... fill X with vectors ...
int K = 3; // specify the number of Gaussians
int D = 10; // specify the dimensionality of vectors
MOG_diag model(K,D);
MOG_diag_kmeans(model, X, 10, 0.5, true, true); // initial optimisation using 10 iterations of k-means
MOG_diag_ML(model, X, 10, 0.0, 0.0, true); // final optimisation using 10 iterations of ML version of EM
double avg = model.avg_log_lhood(X); // find the average log likelihood of X

See also the tutorial section for a more elaborate example.

Function Documentation

void itpp::MOG_diag_ML ( MOG_diag &  model_in,
Array< vec > &  X_in,
int  max_iter_in = 10,
double  var_floor_in = 0.0,
double  weight_floor_in = 0.0,
bool  verbose_in = false 
)
Author
Conrad Sanderson

Maximum Likelihood Expectation Maximisation based optimisation of the parameters of an instance of the MOG_diag class. The seed values (starting points) are typically first obtained via MOG_diag_kmeans(). See [CSB06] and the references therein for detailed mathematical descriptions.

Parameters
model_inThe model to optimise (MOG_diag)
X_inThe training data (array of vectors)
max_iter_inMaximum number of iterations. Default is 10.
var_floor_inVariance floor (lowest allowable variance). Default is 0.0 (but see the note below)
weight_floor_inWeight floor (lowest allowable weight). Default is 0.0 (but see the note below)
verbose_inWhether progress in printed. Default is false.
Note
The variance and weight floors are set to std::numeric_limits<double>::min() if they are below that value. As such, they are machine dependant. The largest allowable weight floor is 1/K, where K is the number of Gaussians.

Definition at line 316 of file mog_diag_em.cpp.

References itpp::MOG_diag_EM_sup::ml().

void itpp::MOG_diag_kmeans ( MOG_diag &  model_in,
Array< vec > &  X_in,
int  max_iter_in = 10,
double  trust_in = 0.5,
bool  normalise_in = true,
bool  verbose_in = false 
)
Author
Conrad Sanderson

K-means based optimisation (training) of the parameters of an instance of the MOG_diag class. The obtained parameters are typically used as a seed by MOG_diag_ML().

Parameters
model_inThe model to optimise
X_inThe training data
max_iter_inMaximum number of iterations. Default is 10.
trust_inThe trust factor, where 0 <= trust_in <= 1. Default is 0.5.
normalise_inUse normalised distance measure (in effect). Default is true.
verbose_inWhether to print progress. Default is false.
Note
The higher the trust factor, the more we trust the estimates of covariance matrices and weights. Set this to 1.0 only if you have plenty of training data. One rule of thumb is to have 10*D vectors per Gaussian, where D is the dimensionality of the vectors. For smaller amounts of data, a lower trust factor will help (but not completely avoid) the EM algorithm ( used in MOG_diag_ML() ) from getting stuck in a local minimum.
Setting normalise_in to true causes the the training data to be normalised to zero mean and unit variance prior to running the k-means algorithm. The data is unnormalised before returning. The normalisation helps clustering when the range of values varies greatly between dimensions. e.g. dimension 1 may have values in the [-1,+1] interval, while dimension 2 may have values in the [-100,+100] interval. Without normalisation, the distance between vectors is dominated by dimension 2.

Definition at line 347 of file mog_diag_kmeans.cpp.

References itpp::MOG_diag_kmeans_sup::run().

SourceForge Logo

Generated on Sat May 25 2013 16:32:28 for IT++ by Doxygen 1.8.2