Google Translate (statistical)
UI: Android and browser extensions
Giza++, open source statistical machine translation, Och, IBM Models 1-5, HMM word alignment
Jane, Aachen (Ney), phrase-based statistical machine translation
Joshua, Johns Hopkins, synchronous context free grammars, chart parsing, n-gram language models, beam / cube pruning, k-best extraction, suffix-array grammar extraction, minimum error rate training
MOSES, EU project, phrase-based translation
GALE, DARPA program (military, intelligence applications)
BLEU = bilingual evaluation understudy
high correlation with human judgments of quality
BLEU scores are between 0 and 1
calculate, compute precision for words/n-grams between machine translation and human translation, modified precision score (clipped etc.), calculate modified precision for n-grams, n=4 correlates well with human quality judments
LDC (linguistic data consortium)
EUROPARL, European parliament translations, aligned sentences
+, interesting and easy to understand problem, great test case for machine learning, lots of ideas that haven't been explored/tested, fundamental AI questions about learning language, semantics, etc.
-, problem is ill-defined, practical solutions may not depend much on actual translation quality, but by using translation well, controlled language, etc., competing against people with lots of resources, current approaches have nothing to do with AI, scoring / evaluation in the community is arbitrary and wouldn't catch the interesting improvements
=, may be better to work on NLU and text generation separately, pick an interesting application like gaming (dormant for years, but imagine being able to talk to your NPCs), or pick a well-defined sub-topic: better tools (including statistical learning) for controlled languages, camera-based translation, etc., lots of smaller topics: POS tagging as feature functions, topic modeling, neural network language modeling, etc., E-prime enforcer?
approach, write down rules for mapping source sentences into target sentences, use a dictionary to translate specific words, use a morphological analyzer to translate word forms
properties, creation requires a combination of programming and linguistic expertise, but little data, efficient and easy to understand, doesn't work very well, unfortunately, doesn't scale up, hard to adapt to new languages and domains
approach, treat language as a sequence of words with statistical relationships (HMMs, etc.), model the statistical relationship between input and output languages, generally, Bayesian approach, P( English sentence | German sentence), try to factor this probability in some way, build models by learning from large corpora
properties, requires some kind of performance measure, can take advantage of large amounts of online text, requires large amounts of training data, can be retargeted to different languages with new training data, has difficulties with syntactically dissimilar languages, has no deep understanding of what it is translating
approach, analyze the source text and translate it into an intermediate language, translate the intermediate language into the target, the intermediate is an actual natural language (possibly constructed)
properties, reduces N^2 problem to N+1 problem, seemingly attractive but not widely used, people love designing the intermediate language, you now have two synset mismatches
approach, analyze the source text and translate it into an intermediate representation, translate the intermediate representation into an output representation, intermediate representation, shallow / syntactic, subject / object / verb / indirect object, deep / semantic, agents, relations, meaning
ultimately, natural language understanding, text generation
properties, usually rule-based (could be statistical, but that hasn't been explored much), statistical parsers, word sense disambiguation, tagging, etc. slowly being incorporated
e.g. "borrow / lend" vs. "leihen"
Mary cuts the card with the code.
Mary cuts the card with the scissors.
John sees Jack. His glasses really help.
John sees Jack. He appears to be walking down the street.
John is a figurehead for the organization.
John is a representative for the organization.
text-to-text, technical documents, manuals. web pages, education, medical
speech-to-speech, travel, military, conferences, medical, business
image-to-text, travel, military
speech-to-neural, Star Trek universal translator
simultaneous interpretation, origin at the Nuremberg trials, widely used speech-to-speech
artistic recreation (literary translation)
splitting up and recombining
foreign scripts are even hard to look up if you don't know them
just words and translations are very useful
can get a sense of what something is about just from the words, without syntax
fairly easy, but still runs into the problem of ambiguities
travel translation, others often allow for feedback
yes/no questions, please point to...
pictionaries, visual feedback
high accuracy required
highly specialized and knowledgeable translators needed
may benefit from controlled language
often more recreations of the original work "in the spirit of", Now is the winter of our discontent Made glorious summer by this son of York, Shakespeare (Richard the Third), Nun ward der Winter unsers Mißvergnügens Glorreicher Sommer durch die Sonne Yorks, Schlegel, sun / son