The **EM algorithm** is used for obtaining maximum likelihood estimates of parameters when some of the data is missing. More generally, however, the EM algorithm can also be applied when there is latent, i.e. unobserved, data which was never intended to be observed in the first place.

In that case, we simply assume that the latent data is missing and proceed to apply the EM algorithm. The EM algorithm has many applications throughout statistics. It is often used for example, in machine learning and data mining applications, and in Bayesian statistics where it is often used to obtain the mode of the posterior marginal distributions of parameters

**The Classical EM Algorithm**

We begin by assuming that the complete data-set consists of Z = (X , Y) but that only X is observed. The complete-data log likelihood is then denoted by l(θ; X , Y) where θ is the unknown parameter vector for which we wish to find the MLE.

E-Step: The E-step of the EM algorithm computes the expected value of l(θ; X , Y) given the observed data, X , and the current parameter estimate, θold say. In particular, we define

Q(θ; θold) := E [l(θ; X , Y) | X , θold]

where p(· | X , θ_{old}) is the conditional density of Y given the observed data, X , and assuming θ = θ_{old}