Given the intractable sample complexity for learning Bayesian classifiers, we must look for ways to reduce this complexity. The **Naive **Bayes classifier does this by making a conditional independence assumption that dramatically reduces the number of parameters to be estimated when modeling P(X|Y), from our original 2(2 −1) to just 2^{n}

**Conditional Independence**

* Definition*: Given three sets of random variables X,Y and Z, we say X is conditionally independent of Y given Z, if and only if the probability distribution governing X is independent of the value of Y given Z; that is

(∀i, j, k)P(X = xi |Y = y j ,Z = zk) = P(X = xi |Z = zk)

**Derivation of Naive Bayes Algorithm**

The Naive Bayes algorithm is a classification algorithm based on Bayes rule and a set of conditional independence assumptions. Given the goal of learning P(Y|X) where X = hX_{1} …,X_{n}i, the **Naive Bayes** algorithm makes the assumption that each Xi is conditionally independent of each of the other X_{k’}s given Y, and also independent of each subset of the other X_{k}’s given Y.

The value of this assumption is that it dramatically simplifies the representation of P(X|Y), and the problem of estimating it from the training data. Consider, for example, the case where X = hX1,X2i. In this case

P(X|Y) = P(X1,X2|Y)

= P(X1|X2,Y)P(X2|Y)

= P(X1|Y)P(X2|Y)