NLPA Statistical Machine Translation

Get Started. It's Free
or sign up with your email address
NLPA Statistical Machine Translation by Mind Map: NLPA Statistical Machine Translation

1. other approaches to STM

1.1. bag translation

1.1.1. original STM

1.1.2. translate each word

1.1.3. reassemble using language model in target language

1.2. syntax-based STM

1.2.1. respect syntactic structures

1.2.2. partial parse trees

1.3. hierarchical phrase-based approaches

1.3.1. like phrase-based, but taking into account hierarchy

1.4. notably absent

1.4.1. statistical tree-to-tree transformations

1.4.2. neural network based approaches

2. The Alignment Template Approach to STM Och and Ney

2.1. phrase-based statistical MT

2.2. alignment template approach

2.3. Och is at Google

2.4. log-linear models

2.4.1. translation = argmax P(E | S)

2.4.2. HMM = source-channel approach (generative model)

2.4.3. Och&Ney = discriminative approach, maximum entropy, logistic regression

2.4.4. set of feature functions, combined with logistic regression

2.5. statistical alignment

2.5.1. given an English and a French sentence, find corresponding words

2.5.2. Viterbi alignment: highest probability alignment

2.5.3. mapping from source position to target position

2.5.4. build a statistical model p(f,a|e)

2.5.5. given a corpus, optimize the parameters of p(f,a|e) to maximize the likelihood

2.5.6. special hacks for dealing with missing words and to make the model symmetric

2.5.7. example

2.6. phrase extraction

2.6.1. phrase = consecutive sequence of words (not linguistic phrase)

2.6.2. finding word sequences that align in the two languages (consecutive words in one language go to consecutive words in the other)

2.6.3. sample phrases

2.6.4. alignment templates

2.6.4.1. add generalization capability to the phrase lexicon

2.6.4.2. replace words with word classes

2.7. translation model

2.7.1. decompose the source and target sentences into a sequence of phrases

2.7.2. allow permutations of phrases

2.7.3. features

2.7.3.1. scores of individual templates

2.7.3.2. scores of word correspondences

2.7.3.3. amount of non-monotonicity

2.7.3.3.1. (this is very Eurocentric)

2.7.3.4. language model features

2.7.3.4.1. trigram with back-off

2.7.3.4.2. 5-gram class-based language model

2.7.3.5. word penalty

2.7.3.6. word pair presence in "conventional lexicon"

2.7.4. train a logistic regression model so that the best translation gets the best score

2.7.5. after the model has been trained, search for the best translation using pruned breadth-first search

2.8. results

2.8.1. Chinese-English

2.8.2. examples

2.9. discussion

2.9.1. completely data driven, completely automatic

2.9.2. (well, not so completely automatic, since there is lots of tuning of the feature functions)

2.9.3. absolutely no semantics

2.9.4. no information is carried between sentences

2.9.5. what do you expect can't it do?

2.9.6. can you make it fail?

3. overview

3.1. many different approaches over the years

3.2. really took off with IBM in 1993

3.3. Ney's group in Aachen also a major force

3.4. we'll just look at a fairly recent approach