Given the intractable sample complexity for learning Bayesian classifiers, we must look for ways to reduce this complexity. The Naive Bayes classifier does this by making a conditional independence assumption that dramatically reduces the number of parameters to be estimated when modeling P(X|Y), from our original 2(2 −1) to just 2n
Definition: Given three sets of random variables X,Y and Z, we say X is conditionally independent of Y given Z, if and only if the probability distribution governing X is independent of the value of Y given Z; that is
(∀i, j, k)P(X = xi |Y = y j ,Z = zk) = P(X = xi |Z = zk)
Derivation of Naive Bayes Algorithm
The Naive Bayes algorithm is a classification algorithm based on Bayes rule and a set of conditional independence assumptions. Given the goal of learning P(Y|X) where X = hX1 …,Xni, the Naive Bayes algorithm makes the assumption that each Xi is conditionally independent of each of the other Xk’s given Y, and also independent of each subset of the other Xk’s given Y.
The value of this assumption is that it dramatically simplifies the representation of P(X|Y), and the problem of estimating it from the training data. Consider, for example, the case where X = hX1,X2i. In this case
P(X|Y) = P(X1,X2|Y)