5 Single layer networks: classification

In regression problems the output is modeled as a linear function of the model parameters. In classification problems, because the target is discrete, we need some extra step to convert the linear function output to match it. We talk about this extra step in this chapter, and its related complications. And since the linear function output is continuous while the target is discrete, we need to decide how to cut the continuous space into discrete, disjoint regions.

based on how we achieve this:

discriminative learning learns a function f: input –> label.
probabilistic discriminative learning learns a distribution p(label|input).
probabilistic generative learning learns a joint distribution p(input, label).

5.1 Discriminant functions

5.1.1 Two classes

5.1.2 Multiple classes

5.1.3 1-of-K encoding

5.1.4 Least squares for classification

5.2 Decision theory

5.2.1 Misclassification rate

5.2.2 Expected loss

5.2.3 The reject option

5.2.4 Inference and decision

5.2.5 Classification accuracy

5.2.6 ROC curve

5.3 Generative probabilistic classifiers

5.3.1 Continuous inputs

5.3.2 Maximum likelihood solution

5.3.3 Discrete features

5.3.4 Exponential family

5.4 Discriminative probabilistic classifiers