Create your own awesome maps

Even on the go

with our free apps for iPhone, iPad and Android

Get Started

Already have an account?
Log In

NLPA Machine Translation by Mind Map: NLPA Machine Translation
0.0 stars - reviews range from 0 to 5

NLPA Machine Translation


controlled language

phrase-based statistical translation

software and projects


Google Translate (statistical)

Systran (rule-based)

UI: Android and browser extensions


Giza++, open source statistical machine translation, Och, IBM Models 1-5, HMM word alignment

Jane, Aachen (Ney), phrase-based statistical machine translation

Joshua, Johns Hopkins, synchronous context free grammars, chart parsing, n-gram language models, beam / cube pruning, k-best extraction, suffix-array grammar extraction, minimum error rate training

MOSES, EU project, phrase-based translation

GALE, DARPA program (military, intelligence applications)


BLEU score

BLEU = bilingual evaluation understudy

high correlation with human judgments of quality

BLEU scores are between 0 and 1

calculate, compute precision for words/n-grams between machine translation and human translation, modified precision score (clipped etc.), calculate modified precision for n-grams, n=4 correlates well with human quality judments

data sets

LDC (linguistic data consortium)

EUROPARL, European parliament translations, aligned sentences

should you work on it?

+, interesting and easy to understand problem, great test case for machine learning, lots of ideas that haven't been explored/tested, fundamental AI questions about learning language, semantics, etc.

-, problem is ill-defined, practical solutions may not depend much on actual translation quality, but by using translation well, controlled language, etc., competing against people with lots of resources, current approaches have nothing to do with AI, scoring / evaluation in the community is arbitrary and wouldn't catch the interesting improvements

=, may be better to work on NLU and text generation separately, pick an interesting application like gaming (dormant for years, but imagine being able to talk to your NPCs), or pick a well-defined sub-topic: better tools (including statistical learning) for controlled languages, camera-based translation, etc., lots of smaller topics: POS tagging as feature functions, topic modeling, neural network language modeling, etc., E-prime enforcer?



approach, write down rules for mapping source sentences into target sentences, use a dictionary to translate specific words, use a morphological analyzer to translate word forms

properties, creation requires a combination of programming and linguistic expertise, but little data, efficient and easy to understand, doesn't work very well, unfortunately, doesn't scale up, hard to adapt to new languages and domains


approach, treat language as a sequence of words with statistical relationships (HMMs, etc.), model the statistical relationship between input and output languages, generally, Bayesian approach, P( English sentence | German sentence), try to factor this probability in some way, build models by learning from large corpora

properties, requires some kind of performance measure, can take advantage of large amounts of online text, requires large amounts of training data, can be retargeted to different languages with new training data, has difficulties with syntactically dissimilar languages, has no deep understanding of what it is translating

interlingual machine translation

approach, analyze the source text and translate it into an intermediate language, translate the intermediate language into the target, the intermediate is an actual natural language (possibly constructed)

properties, reduces N^2 problem to N+1 problem, seemingly attractive but not widely used, people love designing the intermediate language, you now have two synset mismatches


approach, analyze the source text and translate it into an intermediate representation, translate the intermediate representation into an output representation, intermediate representation, shallow / syntactic, subject / object / verb / indirect object, deep / semantic, agents, relations, meaning

ultimately, natural language understanding, text generation

properties, usually rule-based (could be statistical, but that hasn't been explored much), statistical parsers, word sense disambiguation, tagging, etc. slowly being incorporated


word ambiguities

e.g. "borrow / lend" vs. "leihen"

syntactic ambiguities

Mary cuts the card with the code.

Mary cuts the card with the scissors.


John sees Jack. His glasses really help.

John sees Jack. He appears to be walking down the street.

sutble meaning

John is a figurehead for the organization.

John is a representative for the organization.


replace human translators


text-to-text, technical documents, manuals. web pages, education, medical

speech-to-speech, travel, military, conferences, medical, business

image-to-text, travel, military

speech-to-signed, accessibility

speech-to-neural, Star Trek universal translator

current practice

simultaneous interpretation, origin at the Nuremberg trials, widely used speech-to-speech

offline translation

artistic recreation (literary translation)

existing automatic systems

image-to-text apps for phones

speech-to-speech app for phones

word translation popups for browsers

text-to-text for web pages (browser plug-in or bookmarklet)

text-to-text using computer/human combo

full pages

splitting up and recombining

utility and simplifications


foreign scripts are even hard to look up if you don't know them

just words and translations are very useful

word-by-word translations

can get a sense of what something is about just from the words, without syntax

fairly easy, but still runs into the problem of ambiguities

translations in an interactive context (e.g., travel)

travel translation, others often allow for feedback

yes/no questions, please point to...

pictionaries, visual feedback

translations of legal and technical documents

high accuracy required

highly specialized and knowledgeable translators needed

may benefit from controlled language

artistic translations

often more recreations of the original work "in the spirit of", Now is the winter of our discontent Made glorious summer by this son of York, Shakespeare (Richard the Third), Nun ward der Winter unsers Mißvergnügens Glorreicher Sommer durch die Sonne Yorks, Schlegel, sun / son