NLPA Machine Translation

Jetzt loslegen. Gratis!
oder registrieren mit Ihrer E-Mail-Adresse
NLPA Machine Translation von Mind Map: NLPA Machine Translation

1. more

1.1. controlled language

1.2. phrase-based statistical translation

2. software and projects

2.1. online

2.1.1. Google Translate (statistical)

2.1.2. Systran (rule-based)

2.1.3. UI: Android and browser extensions

2.2. systems

2.2.1. Giza++ open source statistical machine translation Och IBM Models 1-5, HMM word alignment

2.2.2. Jane Aachen (Ney) phrase-based statistical machine translation

2.2.3. Joshua Johns Hopkins synchronous context free grammars chart parsing n-gram language models beam / cube pruning k-best extraction suffix-array grammar extraction minimum error rate training

2.2.4. MOSES EU project phrase-based translation

2.2.5. GALE DARPA program (military, intelligence applications)

2.3. evaluation

2.3.1. BLEU score

2.3.2. BLEU = bilingual evaluation understudy

2.3.3. high correlation with human judgments of quality

2.3.4. BLEU scores are between 0 and 1

2.3.5. calculate compute precision for words/n-grams between machine translation and human translation modified precision score (clipped etc.) calculate modified precision for n-grams n=4 correlates well with human quality judments

2.4. data sets

2.4.1. LDC (linguistic data consortium)

2.4.2. EUROPARL European parliament translations aligned sentences

2.5. should you work on it?

2.5.1. + interesting and easy to understand problem great test case for machine learning lots of ideas that haven't been explored/tested fundamental AI questions about learning language, semantics, etc.

2.5.2. - problem is ill-defined practical solutions may not depend much on actual translation quality, but by using translation well, controlled language, etc. competing against people with lots of resources current approaches have nothing to do with AI scoring / evaluation in the community is arbitrary and wouldn't catch the interesting improvements

2.5.3. = may be better to work on NLU and text generation separately pick an interesting application like gaming (dormant for years, but imagine being able to talk to your NPCs) or pick a well-defined sub-topic: better tools (including statistical learning) for controlled languages, camera-based translation, etc. lots of smaller topics: POS tagging as feature functions, topic modeling, neural network language modeling, etc. E-prime enforcer?

3. approaches

3.1. rule-based

3.1.1. approach write down rules for mapping source sentences into target sentences use a dictionary to translate specific words use a morphological analyzer to translate word forms

3.1.2. properties creation requires a combination of programming and linguistic expertise, but little data efficient and easy to understand doesn't work very well, unfortunately doesn't scale up, hard to adapt to new languages and domains

3.2. statistical

3.2.1. approach treat language as a sequence of words with statistical relationships (HMMs, etc.) model the statistical relationship between input and output languages generally, Bayesian approach P( English sentence | German sentence) try to factor this probability in some way build models by learning from large corpora

3.2.2. properties requires some kind of performance measure can take advantage of large amounts of online text requires large amounts of training data can be retargeted to different languages with new training data has difficulties with syntactically dissimilar languages has no deep understanding of what it is translating

3.3. interlingual machine translation

3.3.1. approach analyze the source text and translate it into an intermediate language translate the intermediate language into the target the intermediate is an actual natural language (possibly constructed)

3.3.2. properties reduces N^2 problem to N+1 problem seemingly attractive but not widely used people love designing the intermediate language you now have two synset mismatches

3.4. transfer-based

3.4.1. approach analyze the source text and translate it into an intermediate representation translate the intermediate representation into an output representation intermediate representation shallow / syntactic deep / semantic

3.4.2. ultimately natural language understanding text generation

3.4.3. properties usually rule-based (could be statistical, but that hasn't been explored much) statistical parsers, word sense disambiguation, tagging, etc. slowly being incorporated

4. challenges

4.1. word ambiguities

4.1.1. e.g. "borrow / lend" vs. "leihen"

4.2. syntactic ambiguities

4.2.1. Mary cuts the card with the code.

4.2.2. Mary cuts the card with the scissors.

4.3. anaphora

4.3.1. John sees Jack. His glasses really help.

4.3.2. John sees Jack. He appears to be walking down the street.

4.4. sutble meaning

4.4.1. John is a figurehead for the organization.

4.4.2. John is a representative for the organization.

5. vision

5.1. replace human translators

5.2. kinds

5.2.1. text-to-text technical documents, manuals. web pages, education, medical

5.2.2. speech-to-speech travel, military, conferences, medical, business

5.2.3. image-to-text travel, military

5.2.4. speech-to-signed accessibility

5.2.5. speech-to-neural Star Trek universal translator

5.3. current practice

5.3.1. simultaneous interpretation origin at the Nuremberg trials widely used speech-to-speech

5.3.2. offline translation

5.3.3. artistic recreation (literary translation)

6. existing automatic systems

6.1. image-to-text apps for phones

6.2. speech-to-speech app for phones

6.3. word translation popups for browsers

6.4. text-to-text for web pages (browser plug-in or bookmarklet)

6.5. text-to-text using computer/human combo

6.5.1. full pages

6.5.2. splitting up and recombining

7. utility and simplifications

7.1. image-to-text

7.1.1. foreign scripts are even hard to look up if you don't know them

7.1.2. just words and translations are very useful

7.2. word-by-word translations

7.2.1. can get a sense of what something is about just from the words, without syntax

7.2.2. fairly easy, but still runs into the problem of ambiguities

7.3. translations in an interactive context (e.g., travel)

7.3.1. travel translation, others often allow for feedback

7.3.2. yes/no questions, please point to...

7.3.3. pictionaries, visual feedback

7.4. translations of legal and technical documents

7.4.1. high accuracy required

7.4.2. highly specialized and knowledgeable translators needed

7.4.3. may benefit from controlled language

7.5. artistic translations

7.5.1. often more recreations of the original work "in the spirit of" Now is the winter of our discontent Made glorious summer by this son of York Shakespeare (Richard the Third) Nun ward der Winter unsers Mißvergnügens Glorreicher Sommer durch die Sonne Yorks Schlegel sun / son