There are five types of learning problem that have received the preponderance of attention in machine learning. The first four are all cases of function estimation, grouped along two dimensions: whether the learning task is supervised or unsupervised, and whether the variable to be predicted is nominal or real-valued.
Classification involves supervised learning of a function f (x) whose value is nominal, that is, drawn from a finite set of possible values. The learned function is called a classifier. It is given instances x of one or another class, and it must determine which class each instance belongs to; the value f (x) is the classifier’s prediction regarding the class of the instance. For example, an instance might be a particular word in context, and the classification task is to determine its part of speech. The learner is given labeled data consisting of a collection of instances along with the correct answer, that is, the correct class label, for each instance.
The unsupervised counterpart to classification is clustering. The goal in clustering is also to assign instances to classes, but the clustering algorithm is given only the instances, not the correct answers for any of them. (In clustering, the instances are usually called data points and the classes are called clusters.) The primary difference between classification and clustering is not the task to be performed, but the sort of data that is given to the learner as input; in particular, whether the data is labeled or not. The remaining two function estimation tasks involve estimation of a function that takes on real values, instead of values from a finite range. The supervised version is called regression; it differs from classification only in that the function to be learned takes on real values. Unsupervised learning of a real-valued function can be viewed as density estimation. The learner is given an unlabeled set of training data, consisting of a finite sample of data points from a multi-dimensional space, and the goal is to learn a function f (x) assigning a real value to every point in the space; the function is interpreted as (proportional to) a probability density.
Finally, we mentioned a fifth setting that does not fall under function estimation. This fifth setting is reinforcement learning. In reinforcement learning, the learner receives a stream of data from sensors, and its “answers” consist in actions, in the form of commands sent to actuators. There is, additionally, a reward signal that is to be maximized (over the long run). There are at least two significant ways this differs from the four function estimation settings. First is the sequential nature of the inputs. Even if we assume discrete time, there are temporal dependencies that cannot be ignored: in particular, actions have time-delayed effects on sensors and reward. Second is the indirect nature of the supervision. The reward signal provides information about the relative value of different actions, but it is much less direct than simply providing the correct answer, as in classification.
Semisupervised learning generalizes supervised and unsupervised learning. The generalization is easiest to see with classification and clustering. As already mentioned, classification and clustering involve essentially the same task and the same inputs; they differ primarily in whether the training data is labeled or not. (They also differ in the way they are evaluated, but the difference in evaluation is a consequence of the difference in the kind of training data – more on that later.) The obvious generalization is to give the learner labels for some of the training data. At one extreme, all of the data is labeled, and the task is classification, and at the other extreme, none of the data is labeled, and the task is clustering. The mixed labeled/unlabeled setting is indeed the canonical case for semisupervised learning, and it will be our main interest.
At the same time, a mix of labeled and unlabeled information is only one way of providing a learner with partial information about the labels for training data. Many semisupervised learning methods work with alternate kinds of partial information, such as a handful of reliable rules for labeling instances, or constraints limiting the candidate labels for particular instances. We will also consider these extensions of the canonical setting. In principle, the kind of indirect information about labels found in reinforcement learning qualify it as a kind of semisupervised learning, but the indirect-information aspect of reinforcement learning is difficult to disentangle from the temporal dependencies, and the connection between reinforcement learning and other semisupervised approaches remains obscure; it lies beyond the scope of the present work.
Classification involves supervised learning of a function f (x) whose value is nominal, that is, drawn from a finite set of possible values. The learned function is called a classifier. It is given instances x of one or another class, and it must determine which class each instance belongs to; the value f (x) is the classifier’s prediction regarding the class of the instance. For example, an instance might be a particular word in context, and the classification task is to determine its part of speech. The learner is given labeled data consisting of a collection of instances along with the correct answer, that is, the correct class label, for each instance.
The unsupervised counterpart to classification is clustering. The goal in clustering is also to assign instances to classes, but the clustering algorithm is given only the instances, not the correct answers for any of them. (In clustering, the instances are usually called data points and the classes are called clusters.) The primary difference between classification and clustering is not the task to be performed, but the sort of data that is given to the learner as input; in particular, whether the data is labeled or not. The remaining two function estimation tasks involve estimation of a function that takes on real values, instead of values from a finite range. The supervised version is called regression; it differs from classification only in that the function to be learned takes on real values. Unsupervised learning of a real-valued function can be viewed as density estimation. The learner is given an unlabeled set of training data, consisting of a finite sample of data points from a multi-dimensional space, and the goal is to learn a function f (x) assigning a real value to every point in the space; the function is interpreted as (proportional to) a probability density.
Finally, we mentioned a fifth setting that does not fall under function estimation. This fifth setting is reinforcement learning. In reinforcement learning, the learner receives a stream of data from sensors, and its “answers” consist in actions, in the form of commands sent to actuators. There is, additionally, a reward signal that is to be maximized (over the long run). There are at least two significant ways this differs from the four function estimation settings. First is the sequential nature of the inputs. Even if we assume discrete time, there are temporal dependencies that cannot be ignored: in particular, actions have time-delayed effects on sensors and reward. Second is the indirect nature of the supervision. The reward signal provides information about the relative value of different actions, but it is much less direct than simply providing the correct answer, as in classification.
Semisupervised learning generalizes supervised and unsupervised learning. The generalization is easiest to see with classification and clustering. As already mentioned, classification and clustering involve essentially the same task and the same inputs; they differ primarily in whether the training data is labeled or not. (They also differ in the way they are evaluated, but the difference in evaluation is a consequence of the difference in the kind of training data – more on that later.) The obvious generalization is to give the learner labels for some of the training data. At one extreme, all of the data is labeled, and the task is classification, and at the other extreme, none of the data is labeled, and the task is clustering. The mixed labeled/unlabeled setting is indeed the canonical case for semisupervised learning, and it will be our main interest.
At the same time, a mix of labeled and unlabeled information is only one way of providing a learner with partial information about the labels for training data. Many semisupervised learning methods work with alternate kinds of partial information, such as a handful of reliable rules for labeling instances, or constraints limiting the candidate labels for particular instances. We will also consider these extensions of the canonical setting. In principle, the kind of indirect information about labels found in reinforcement learning qualify it as a kind of semisupervised learning, but the indirect-information aspect of reinforcement learning is difficult to disentangle from the temporal dependencies, and the connection between reinforcement learning and other semisupervised approaches remains obscure; it lies beyond the scope of the present work.