Skip to main content

Naïve Bayes Classifier

Naïve Bayes Classifier

Classfication base review(有一说一,感觉又可以略过..

Response values

  1. Nominal Categorical
    • Binary
    • Multi-class
  2. Ordinal Categorical
  3. Discrete Numerical

output

  1. Single class value(类别标签,如SVM, kNN, Decision Tree)
  2. Score for positive class label.(正类的概率值(需要设定threshold的那种),如Logistic Regression, ANN)
  3. Score for all class labels(所有类别的概率值(最后选择最大的),如Multi-class SVM, ANN)

logic

  1. Response can be calculated by adding and/or rescaling input feature values(基于函数学习的模型,如Linear Regression, ANN)
  2. Response is assumed to be same/similar among “similar” samples(基于相似性的模型,如kNN, Decision Tree)

Bayes Classifier

Calculate probability from data

y=argmaxyP(yX)y = argmax_y P(y | X)

in which:

  • yy is the class label
  • XX is the feature vector
  • P(yX)P(y | X) is the posterior probability of class yy given feature vector XX(给定特征的条件概率..?)

我们希望的,是找到一个 yy 使得 P(yX)P(y | X) 最大化。

Bayes' theorem

P(yX)=P(Xy)P(y)P(X)P(y | X) = \frac{P(X | y) P(y)}{P(X)}

而由于 P(X)P(X) 是常数,对于每个 yy 来说都是一样的,所以我们可以忽略掉它:

P(yX)P(y)P(Xy)P(y | X) \propto P(y) P(X | y)

所以我们最终得到了(Naïve) assumption of conditional independence:

P(yX)P(y)i=1kP(xiy)P(y | X) \propto P(y) \prod_{i=1}^k P(x_i | y)

重新整理一下,Full Bayes Classifier就是:

y=argmaxyP(y)i=1kP(xiy)y = argmax_y P(y) \prod_{i=1}^k P(x_i | y)

Laplace smoothing

比如某个类别下,某个特征的值没有出现过,那么 P(xiy)P(x_i | y) 就会是0,这样就会导致整个乘积为0。 为了解决这个问题,我们可以使用Laplace smoothing:
即给每个特征的值加上一个小的常数,比如 α=1\alpha = 1

举个例子: Laplace Smoothing
就比如说这个 P(CatYellow)=0P(Cat | Yellow) = 0, 我们就 +1,变成了 P(CatYellow)=0+13+4=17P(Cat | Yellow) = \frac{0+1}{3+4} = \frac{1}{7} -- 因为Fur color这个特征有4个类别,各都要跟着 +1

Diff input data types -> Diff Naïve Bayes

  1. Gaussian Naïve Bayes:
    连续值,且服从正态分布。P(xiy)P(x_i | y) 是用高斯概率密度函数来计算的。(懒得管具体是啥了..
  2. Multinomial Naïve Bayes: 离散值。
  3. Categorical Naïve Bayes: Categorical值。
  • Gaussian Naïve Bayes 常用于numrical inputs的分类: Gaussian Naïve Bayes

  • Multinomial Naïve Bayes 常用于文本分类 -- 文本被转化为bag of words.

Eva (可以参考kNN & Decision Tree的Comparision部分)

  1. Can predict the response / label for a new data?
    Y

  2. Easy to interpret? (Using intuitive logic)
    Y

  3. Defines real relationships b/w features and response?
    Maybe
    Cuz we assume the features are independent, but in reality they might be not, which will make the ability of its "Relationship Modeling" weak.

  4. Model is fast / easy to store / share / optimize/ apply?
    Y. 只存了概率表

  5. Few hyperparameters?
    Y

  6. Use data with minimal preprocessing?
    Y

  7. Potentially useful?
    Y

  8. Used to validate a hypothesis?
    Y/N?
    毕竟朴素贝叶斯本身只是个模型,不是个假设验证的工具。

  9. Find novel interpretation of data?
    N !!!
    朴素贝叶斯的结构很扁平, 相对于kNN和Decision Tree来说。所以不具备自动发现复杂数据结构的能力。