part-of-speech tagging

the task of part-of-speech tagging consists of taking a sequence of words and assigning each word a part of speech like or , and the task of named entity recognition, assigning words or phrases tags like , or .

HMM tagging as decoding

for any model, such as an HMM, that contains hidden variables, the task of determining the hidden variables sequence corresponding to the sequence of observations is called decoding. more formally,

given as input an HMM and a sequence of observations find the most probable sequence of states .

for part-of-speech tagging, the goal of HMM decoding is to choose the tag sequence that is most probable given the observation sequence of words :

the way we'll do this in the HMM is to use bayes' rule to instead compute:

furthermore, we simplify eq-pos-tag-2 by dropping the denominator :

HMM taggers make two further simplifying assumptions. the first is that the probability of a word appearing depends only on its own tag and is independent of neighboring words and tags:

the second assumption, the bigram assumption, is that the probability of a tag is dependent only on the previous tag, rather than the entire tag sequence;

plugging the simplifying assumptions from eq-pos-tag-4 and eq-pos-tag-5 into eq-pos-tag-3 results in the following equation for the most probable tag sequence from a bigram tagger:

the two parts of eq-pos-tag-6 correspond neatly to the emission probability and transition probability.