Create your own awesome maps

Even on the go

with our free apps for iPhone, iPad and Android

Get Started

Already have an account?
Log In

NLPA Statistical Machine Translation by Mind Map: NLPA Statistical Machine
0.0 stars - reviews range from 0 to 5

NLPA Statistical Machine Translation

other approaches to STM

bag translation

original STM

translate each word

reassemble using language model in target language

syntax-based STM

respect syntactic structures

partial parse trees

hierarchical phrase-based approaches

like phrase-based, but taking into account hierarchy

notably absent

statistical tree-to-tree transformations

neural network based approaches

The Alignment Template Approach to STM Och and Ney

phrase-based statistical MT

alignment template approach

Och is at Google

log-linear models

translation = argmax P(E | S)

HMM = source-channel approach (generative model)

Och&Ney = discriminative approach, maximum entropy, logistic regression

set of feature functions, combined with logistic regression

statistical alignment

given an English and a French sentence, find corresponding words

Viterbi alignment: highest probability alignment

mapping from source position to target position

build a statistical model p(f,a|e)

given a corpus, optimize the parameters of p(f,a|e) to maximize the likelihood

special hacks for dealing with missing words and to make the model symmetric


phrase extraction

phrase = consecutive sequence of words (not linguistic phrase)

finding word sequences that align in the two languages (consecutive words in one language go to consecutive words in the other)

sample phrases

alignment templates, add generalization capability to the phrase lexicon, replace words with word classes

translation model

decompose the source and target sentences into a sequence of phrases

allow permutations of phrases

features, scores of individual templates, scores of word correspondences, amount of non-monotonicity, (this is very Eurocentric), language model features, trigram with back-off, 5-gram class-based language model, word penalty, word pair presence in "conventional lexicon"

train a logistic regression model so that the best translation gets the best score

after the model has been trained, search for the best translation using pruned breadth-first search





completely data driven, completely automatic

(well, not so completely automatic, since there is lots of tuning of the feature functions)

absolutely no semantics

no information is carried between sentences

what do you expect can't it do?

can you make it fail?


many different approaches over the years

really took off with IBM in 1993

Ney's group in Aachen also a major force

we'll just look at a fairly recent approach