Classes and functions for modelling multivariate data as a Mixture of Gaussians. More...

Classes
class	itpp::MOG_diag
	Diagonal Mixture of Gaussians (MOG) class. More...

class	itpp::MOG_generic
	Generic Mixture of Gaussians (MOG) class. Used as a base for other MOG classes. More...

Functions
void	itpp::MOG_diag_ML (MOG_diag &model_in, Array< vec > &X_in, int max_iter_in, double var_floor_in, double weight_floor_in, bool verbose_in)

void	itpp::MOG_diag_kmeans (MOG_diag &model_in, Array< vec > &X_in, int max_iter_in, double trust_in, bool normalise_in, bool verbose_in)

Detailed Description

Classes and functions for modelling multivariate data as a Mixture of Gaussians.

Author: Conrad Sanderson

The following example shows how to model data:

Array<vec> X;
// ... fill X with vectors ...
int K = 3;     // specify the number of Gaussians
int D = 10;    // specify the dimensionality of vectors
MOG_diag model(K,D);
MOG_diag_kmeans(model, X, 10, 0.5, true, true); // initial optimisation using 10 iterations of k-means
MOG_diag_ML(model, X, 10, 0.0, 0.0, true);      // final optimisation using 10 iterations of ML version of EM
double avg = model.avg_log_lhood(X);            // find the average log likelihood of X

See also the tutorial section for a more elaborate example.

Function Documentation

void itpp::MOG_diag_ML	(	MOG_diag &	model_in,
		Array< vec > &	X_in,
		int	max_iter_in = `10`,
		double	var_floor_in = `0.0`,
		double	weight_floor_in = `0.0`,
		bool	verbose_in = `false`
	)

Author: Conrad Sanderson

Maximum Likelihood Expectation Maximisation based optimisation of the parameters of an instance of the MOG_diag class. The seed values (starting points) are typically first obtained via MOG_diag_kmeans(). See [CSB06] and the references therein for detailed mathematical descriptions.

[CSB06] F. Cardinaux, C. Sanderson and S. Bengio, "User authentication via adapted statistical models of face images", IEEE Transactions on Signal Processing, Vol 54, No. 1, 2006, pp. 361-373.

Parameters

model_in	The model to optimise (MOG_diag)
X_in	The training data (array of vectors)
max_iter_in	Maximum number of iterations. Default is 10.
var_floor_in	Variance floor (lowest allowable variance). Default is 0.0 (but see the note below)
weight_floor_in	Weight floor (lowest allowable weight). Default is 0.0 (but see the note below)
verbose_in	Whether progress in printed. Default is false.

Note: The variance and weight floors are set to std::numeric_limits<double>::min() if they are below that value. As such, they are machine dependant. The largest allowable weight floor is 1/K, where K is the number of Gaussians.

Definition at line 316 of file mog_diag_em.cpp.

References itpp::MOG_diag_EM_sup::ml().

void itpp::MOG_diag_kmeans	(	MOG_diag &	model_in,
		Array< vec > &	X_in,
		int	max_iter_in = `10`,
		double	trust_in = `0.5`,
		bool	normalise_in = `true`,
		bool	verbose_in = `false`
	)

Author: Conrad Sanderson

K-means based optimisation (training) of the parameters of an instance of the MOG_diag class. The obtained parameters are typically used as a seed by MOG_diag_ML().

Parameters

model_in	The model to optimise
X_in	The training data
max_iter_in	Maximum number of iterations. Default is 10.
trust_in	The trust factor, where 0 <= `trust_in` <= 1. Default is 0.5.
normalise_in	Use normalised distance measure (in effect). Default is true.
verbose_in	Whether to print progress. Default is false.

Note: The higher the trust factor, the more we trust the estimates of covariance matrices and weights. Set this to 1.0 only if you have plenty of training data. One rule of thumb is to have 10*D vectors per Gaussian, where D is the dimensionality of the vectors. For smaller amounts of data, a lower trust factor will help (but not completely avoid) the EM algorithm ( used in MOG_diag_ML() ) from getting stuck in a local minimum.; Setting normalise_in to true causes the the training data to be normalised to zero mean and unit variance prior to running the k-means algorithm. The data is unnormalised before returning. The normalisation helps clustering when the range of values varies greatly between dimensions. e.g. dimension 1 may have values in the [-1,+1] interval, while dimension 2 may have values in the [-100,+100] interval. Without normalisation, the distance between vectors is dominated by dimension 2.

Definition at line 347 of file mog_diag_kmeans.cpp.

References itpp::MOG_diag_kmeans_sup::run().

Classes

Functions

Detailed Description

Function Documentation