Let’s speak about Bayes : its theorem, Bayes classifiers and Bayes inference.
The conditional probabilities P( B | A ) and P( A | B ) verify :
meaning that the probability to see A and B, is the probability to see B, and then the probability to A given we have already seen B. Or inversely, seeing A, then B given A.
This leads to the Bayes theorem that gives the relation between P( B | A ) and P( A | B ) :
Let’s see what it means in practice.
Bayes updating and inference
Let’s say y is the class we want to know the distribution, and we make an observation x. Then :
which is rewritten :
The prior is the probability of the class when we do not observe x. It is the distribution of classes.
Bayes theorem tells us that once we made the observation x, the probability of the class (the posterior) has evolved and gives us an update rule.
Naive bayes classifier
To build a classifier out of the probability, we suppose the x is represented by n features , independant conditionaly to the class:
because is a constant if features are known.
The naïve Bayes classifier consists in using the maximum a posteriory rule :
The class prior might be easy to estimate from the training set :
To estimate the distribution of features for each class, assumptions are made with models.
Under a uniform prior
Also, we see that under a uniform prior,
and maximising the posterior is equivalent to maximising the likelihood.