Saturday, August 30, 2014

Semi-supervised learning : Major varieties of learning problem

Semi-supervised learning : Major varieties of learning problem
There are five types of learning problem that have received the preponderance of attention in machine learning. The first four are all cases of function estimation, grouped along two dimensions: whether the learning task is supervised or unsupervised, and whether the variable to be predicted is nominal or real-valued.

Classification involves supervised learning of a function f (x) whose value is nominal, that is, drawn from a finite set of possible values. The learned function is called a classifier. It is given instances x of one or another class, and it must determine which class each instance belongs to; the value f (x) is the classifier’s prediction regarding the class of the instance. For example, an instance might be a particular word in context, and the classification task is to determine its part of speech. The learner is given labeled data consisting of a collection of instances along with the correct answer, that is, the correct class label, for each instance.

The unsupervised counterpart to classification is clustering. The goal in clustering is also to assign instances to classes, but the clustering algorithm is given only the instances, not the correct answers for any of them. (In clustering, the instances are usually called data points and the classes are called clusters.) The primary difference between classification and clustering is not the task to be performed, but the sort of data that is given to the learner as input; in particular, whether the data is labeled or not. The remaining two function estimation tasks involve estimation of a function that takes on real values, instead of values from a finite range. The supervised version is called regression; it differs from classification only in that the function to be learned takes on real values. Unsupervised learning of a real-valued function can be viewed as density estimation. The learner is given an unlabeled set of training data, consisting of a finite sample of data points from a multi-dimensional space, and the goal is to learn a function f (x) assigning a real value to every point in the space; the function is interpreted as (proportional to) a probability density.

Finally, we mentioned a fifth setting that does not fall under function estimation. This fifth setting is reinforcement learning. In reinforcement learning, the learner receives a stream of data from sensors, and its “answers” consist in actions, in the form of commands sent to actuators. There is, additionally, a reward signal that is to be maximized (over the long run). There are at least two significant ways this differs from the four function estimation settings. First is the sequential nature of the inputs. Even if we assume discrete time, there are temporal dependencies that cannot be ignored: in particular, actions have time-delayed effects on sensors and reward. Second is the indirect nature of the supervision. The reward signal provides information about the relative value of different actions, but it is much less direct than simply providing the correct answer, as in classification.

Semisupervised learning generalizes supervised and unsupervised learning. The generalization is easiest to see with classification and clustering. As already mentioned, classification and clustering involve essentially the same task and the same inputs; they differ primarily in whether the training data is labeled or not. (They also differ in the way they are evaluated, but the difference in evaluation is a consequence of the difference in the kind of training data – more on that later.) The obvious generalization is to give the learner labels for some of the training data. At one extreme, all of the data is labeled, and the task is classification, and at the other extreme, none of the data is labeled, and the task is clustering. The mixed labeled/unlabeled setting is indeed the canonical case for semisupervised learning, and it will be our main interest.

At the same time, a mix of labeled and unlabeled information is only one way of providing a learner with partial information about the labels for training data. Many semisupervised learning methods work with alternate kinds of partial information, such as a handful of reliable rules for labeling instances, or constraints limiting the candidate labels for particular instances. We will also consider these extensions of the canonical setting. In principle, the kind of indirect information about labels found in reinforcement learning qualify it as a kind of semisupervised learning, but the indirect-information aspect of reinforcement learning is difficult to disentangle from the temporal dependencies, and the connection between reinforcement learning and other semisupervised approaches remains obscure; it lies beyond the scope of the present work.

Introduction of Semi-supervised Learning for Computational Linguistic

Introduction of Semi-supervised Learning for Computational Linguistic


Creating sufficient labeled data can be very time-consuming. Obtaining the output sequences is not difficult: English texts are available in great quantity. What is time-consum
Introduction of Semi-supervised Learning for Computational Linguistic
Subsequent work in computational linguistics led to development of alternative algorithms for semisupervised learning, the algorithm of Yarowsky being a prominent example. These algorithms were developed specifically for the sorts of problems that arise frequently in computational linguistics: problems in which there is a linguistically correct answer, and large amounts of unlabeled data, but very little labeled data. Unlike in the example of acoustic modeling, classic unsupervised learning is inappropriate, because not just any way of assigning classes will do. The learning method is largely unsupervised, because most of the data is unlabeled, but the labeled data is indispensable, because it provides the only characterization of the linguistically correct classes. 

The algorithms just mentioned turn out to be very similar to an older learning method known as self-training that was unknown in computational linguistics at the time. For this reason, it is more accurate to say that they were rediscovered, rather than invented, by computational linguists. Until very recently, most prior work on semisupervised learning has been little known even among researchers in the area of machine learning. One goal of the present volume is to make the prior and also the more recent work on semisupervised learning more accessible to computational linguists.

Shortly after the rediscovery of self-training in computational linguistics, a method called co-training was invented by Blum and Mitchell, machinelearning researchers working on text classification. Self-training and co-training have become popular and widely employed in computational linguistics; together they account for all but a fraction of the work on semisupervised learning in the field. We will discuss them in the next chapter. In the remainder of this chapter, we give a broader perspective on semisupervised learning, and lay out the plan of the rest of the book.

Motivation of Semi-supervised Learning


For most learning tasks of interest, it is easy to obtain samples of unlabeled data. For many language learning tasks, for example, the World Wide Web can be seen as a large collection of unlabeled data. By contrast, in most cases, the only practical way to obtain labeled data is to have subject-matter experts manually annotate the data, an expensive and time-consuming process.

The great advantage of unsupervised learning, such as clustering, is that it requires no labeled training data. The disadvantage has already been mentioned: under the best of circumstances, one might hope that the learner would recover the correct clusters, but hardly that it could correctly label the clusters. In many cases, even the correct clusters are too much to hope for. To say it another way, unsupervised learning methods rarely perform well if evaluated by the same yardstick used for supervised learners. If we expect a clustering algorithm to predict the labels in a labeled test set, without the advantage of labeled training data, we are sure to be disappointed.

The advantage of supervised learning algorithms is that they do well at the harder task: predicting the true labels for test data. The disadvantage is that they only do well if they are given enough labeled training data, but producing sufficient quantities of labeled data can be very expensive in manual effort. The aim of semisupervised learning is to have our cake and eat it, too. Semisupervised learners take as input unlabeled data and a limited source of label information, and, if successful, achieve performance comparable to that of supervised learners at significantly reduced cost in manual production of training data.

We intentionally used the vague phrase “a limited source of label information.” One source of label information is obviously labeled data, but there are alternatives. We will consider at least the following sources of label information:
  • labeled data
  • a seed classifier
  • limiting the possible labels for instances without determining a unique label
  • constraining pairs of instances to have the same, but unknown, label (co-training)
  • intrinsic label definitions
  • a budget for labeling instances selected by the learner (active learning)

One of the grand aims of computational linguistics is unsupervised learning of natural language. From a psychological perspective, it is widely accepted that explicit instruction plays little part in human language learning, and from a technological perspective, a completely autonomous system is more useful than one that requires manual guidance. Yet, in contradiction to the characterization sometimes given of the goal of unsupervised learning, the goal of unsupervised language learning is not the recovery of arbitrary “interesting” structure, but rather the acquisition of the correct target language. On the face of it, learning a target classification – much less an entire natural language – without labeled data hardly seems possible.
Motivation of Semi-supervised Learning

Semisupervised learning may provide the beginning of an account. If a kernel of labeled data can be acquired through unsupervised learning, semisupervised learning might be used to extend it to a complete solution. Something along these lines appears to characterize human language acquisition: in the psycholinguistic literature, bootstrapping refers to the process by which an initial kernel of language is acquired by explicit instruction, in the form, for example, of naming an object while drawing a child’s attention to it. The processes by which that kernel is extended to the entirety of the language are thought to be different; distributional regularities of linguistic forms, rather than direct connections to the physical world, seem to play a large role. Semisupervised learning methods provide possible characterizations of the process of extending the initial kernel.

Supervised and unsupervised training with Hidden Markov Models

Hidden Markov Models in Supervised and unsupervised training


Supervised and unsupervised training with Hidden Markov Models
The probabilistic models used by Church and DeRose in the papers just cited were Hidden Markov Models (HMMs), imported from the speech recognition community. An HMM describes a probabilistic process or automaton that generates sequences of states and parallel sequences of output symbols. Commonly, a sequence of output symbols represents a sentence of English or of some other natural language. An HMM, or any model, that defines probabilities of word sequences (that is, sentences) of a natural language is known as a language model.

The probabilistic automaton defined by an HMM may be in some number of distinct states. The automaton begins by choosing a state at random. Then it chooses a symbol to emit, the choice being sensitive to the state. Next it chooses a new state, emits a symbol from that state, and the process repeats. Each choice is stochastic – that is, probabilistic. At each step, the automaton makes its choice at random from a distribution over output symbols or next states, as the case may be.

Which distribution it uses at any point is completely determined by the kind of choice, either emission of an output symbol or transition to a new state, and the identity of the current state. The actual model consists in a collection of numeric values, one for each possible transition or emission, representing the probability that the automaton chooses that particular transition or emission when making one of its stochastic choices. Learning an HMM is straightforward if one is provided with labeled data, meaning state sequences paired with output sequences. Each sequence pair is a record of the stochastic choices made by the automaton. To estimate the probability that the automaton will choose a particular value x when faced with a stochastic choice of type T , one can simply count how often the automaton actually chose x when making a choice of type T in the record of previous computations, that is, in the labeled data. If sufficient labeled data is available, the model can be estimated accurately in this way.

Church and DeRose applied HMMs to the problem of part-of-speech tagging by identifying the states of the automaton with parts of speech. The automaton generates a sequence of parts of speech, and emits a word for each part of speech. The result is a tagged text, which is a text in which each word is annotated with its part of speech. Supervised learning of an HMM for part-of-speech tagging is quite effective; HMM taggers for English generally have an error rate of 3.5 to 4 percent. Their effectiveness was what brought probabilistic models to the attention of computational linguists, as already mentioned.

Probabilistic methods in computational linguistics

Computational linguistics : Probabilistic methods

Probabilistic methods in linguistics
Computational linguistics seeks to describe methods for natural language processing, that is, for processing human languages by automatic means. Since the advent of electronic computers in the late 1940s, human language processing has been an area of active research; machine translation in particular attracted early interest. Indeed, the inspiration for computing machines was the creation of a thinking automaton, a machina sapiens, and language is perhaps the most distinctively human cognitive capacity. In early work on artificial intelligence, there was something of a competition between discrete, “symbolic” reasoning and stochastic systems, particularly neural nets. But the indispensability of a firm probabilistic basis for dealing with uncertainty was soon recognized.

In computational linguistics, by contrast, the presumption of the sufficiency of grammatical and logical constraints, supplemented perhaps by ad hoc heuristics, was much more tenacious. When the field recognized the need for probabilistic methods, the shift was sudden and dramatic. It is probably fair to identify the birth of awareness with the appearance in 1988 of two papers on statistical part-of-speech tagging, one by Church and one by DeRose.

These were not the first papers that proposed stochastic methods for part of speech disambiguation, but they were the first in prominent venues in computational linguistics, and it is no exaggeration to say that the field was reshaped within a decade. The main barrier to progress in natural language processing at the time was the brittleness of manually constructed systems. The dominant issues were encapsulated under the rubrics of ambiguity resolution, portability, and robustness. The primary method for ambiguity resolution was the use of semantic constraints, but they were often either too loose, leaving a large number of viable analyses, or else too strict, ruling out the correct analysis. Well-founded and automatic means for softening constraints and resolving ambiguities were needed. Portability meant in particular automatic means for adapting to variability across application domains. Robustness covers both the fact that input to natural language systems is frequently errorful, and also the fact that, in Sapir’s terms, “all grammars leak”. No manually constructed description of language is complete. Together, these issues point to the need for automatic learning methods, and explain why the penetration of probabilistic methods, and machine learning in particular, was so rapid. Computational linguistics has now become inseparable from machine learning.

Wednesday, August 6, 2014

Harmoni Cinta : an album by Gita Gutawa

Harmoni Cinta

Harmoni Cinta is an album by Gita Gutawa. It was released in 2009 by Sony Music Indonesia, with a part of the sales used to send poor students to school. Produced over a period of nine months, it was a collaboration between Gutawa and numerous Indonesian musicians, including her father Erwin, Melly Goeslaw, and Glenn Fredly.
Production of Harmoni Cinta required nine months, from June 2008 to March 2009. It involved numerous Indonesian musicians, including Gita Gutawa's father Erwin, as well Glenn Fredly, Yovie Widyanto, and Melly Goeslaw. Singaporean songwriter Dick Lee also contributed the song "Remember", while "Aku Cinta Dia", a cover of the title song of Chrisye's album Aku Cinta Dia, was also included.
Harmoni CintaHarmoni CintaThe vocals were recorded in Aluna Studio, Jakarta and the City of Prague Philharmonic Orchestra and Sofia Symphonic Orchestra recorded their pieces in their respective cities. Six of the songs were mixed at 301 Studio in Sydney, while the remaining six were mixed at Aluna Studio; "Aku Cinta Dia" was later mastered at Sterling Sound Mastering in New York. As such, work on the album took place on four different continents: Asia, Europe, Oceania, and North America.
Gita Gutawa played a greater role in the recording of Harmoni Cinta then her self-titled debut album. She assisted in deciding the concepts behind the album, choosing the songs included, and writing five of them.
Gita Gutawa described the album as combining light, enjoyable, teen pop with orchestrated classic pop. She stated that, similar to her debut album Gita Gutawa, Harmoni Cinta dealt with themes of young love, friendship, family ties, and worldliness.
Parasit" tells of the puppy love between two pre-teens, using terms indicative of biology, physics, and geography, including the Sahara Desert, Antarctica, and outer space. The following love song, "Harmoni Cinta", has "extravagant"[C] orchestral backing, while "Mau Tapi Malu" has Gita singing "coquettishly"[D] with Mey Chan and Maia Estianty. "Remember" featured English and Indonesian-language lyrics with traditional instruments. Meanwhile, "Selamat Datang Cinta", "Meraih Mimpi", "Lullaby" and "When You Wish Upon a Star" are slower and more minimalistic.

Related Sites for Harmoni Cinta

1996 Thomas Cup

1996 Thomas Cup

1996 Thomas Cup
The 1996 Thomas & Uber Cup was the 19th tournament of the Thomas Cup, and the 16th tournament of the Uber Cup, which are the major international team competitions in world badminton.
The 1996 Thomas Cup press conference was held in Bank Rakyat Indonesia's building in Sentra BRI complex in Sudirman, Central Jakarta. The press conference is led by Putera Sampoerna, the chairman of PT HM Sampoerna Tbk, manufacturer of A Mild, the 5th Indonesian largest cigarette brand. A Mild also as the main sponsor of the 1996 TUC.
1996 Thomas CupThe opening and closing ceremony of the 1996 TUC also led by Putera Sampoerna, because A Mild was the main sponsor of the 1996 TUC.
Indonesia's Thomas & Uber Cup Squads unite the title champion in Thomas Cup and Uber Cup (third title).
Final Stage, including
and Hong Kong, as host team.
Indonesia, as defending champion,
-
Final Stage.-

Related Sites for 1996 Thomas Cup