Technical side of Watson, David Boloker
by Udi h Bauman
1. CTO, Emerging Technologies, IBM
2. About
3. Motivation
3.1. Offload more of the decision maker tasks to the engine
4. Strategy
4.1. read lots of texts
4.2. analyze subject-verb-object
4.3. build semantic network
4.4. find patterns
4.4.1. officials submit resignations (0.8)
5. Keywords match isn't enough
5.1. need
5.1.1. temporal reasoning
5.1.2. statistical paraphrasing
5.1.3. geospatial reasoning
6. Arch
6.1. decompose the question
6.2. run many many searches
6.2.1. our data
6.2.1.1. 2 FTS engines
6.2.1.1.1. Lucene
6.2.1.1.2. Indri
6.3. generate hypothesis
6.4. evidence sourcing
6.4.1. get confidence on hypothesis
6.5. deep evidence scoring
6.6. sythesis
6.7. apply machine learning model to get final confidence
6.7.1. learning from its mistakes
6.7.2. training data is the archive of all historic jeopardy games
6.8. output answer & confidence
7. Had iterations in architecture, each time improving accuracy
7.1. till they got 87%
8. HW
8.1. 2890 cores
8.1.1. very parallel architecture
8.2. 90 severs
8.3. 16TB RAM
8.4. 20TB disk
9. Challenges
9.1. Real language is full of slang, & metaphores
9.1.1. e.g., pun in question
10. History
10.1. PIQUANT
10.2. OpenEuphyra
11. uses UIMA for SOA
11.1. 100 annotators
11.1.1. in
11.1.1.1. Java
11.1.1.2. Prolog
11.2. reveal different kind of features
12. arch high-level
12.1. learn
12.1.1. ingest many sources
12.1.1.1. wikipedia
12.1.1.2. yago2
12.1.1.3. dbpedia
12.1.1.4. wordnet many more
12.1.2. store all in RDF store
12.2. Question analysis
12.3. Primary search
12.4. Shallow & deep scoring
12.5. Merging & ranking
13. Demo
13.1. Audience question
13.1.1. Country that has the largest solar dish?
13.1.1.1. Israel 24%
13.1.1.2. Untitled
13.1.1.3. Negev <10%
13.2. Demo machine uses only 7 servers
13.2.1. takes about 20 seconds