
1. Untitled
2. first stage: naive implementation
2.1. single threaded
2.1.1. XML parser
2.1.1.1. parse wiki XML dump into pages
2.1.2. XML consumer
2.1.2.1. parse pages
2.1.2.2. put into hash map
2.2. roughly 105s
3. second stage: enhanced with producer-consumer design pattern
3.1. multiple consumer/producer?
3.1.1. scenario
3.1.1.1. parser takes roughly 10s
3.1.1.2. counter takes roughly 95 s
3.1.2. conclusion
3.1.2.1. single parser, multiple counter
3.2. implementation
3.2.1. parser/counter queue with ArrayBlockingQueue
3.2.2. synchronized
3.2.2.1. naive implementation
3.2.2.1.1. multiple counter shared counter locked by ReentrantLock
3.2.2.2. problem of naive implementation
3.2.2.2.1. Untitled
3.2.2.3. improved implementation
3.2.2.3.1. ConcurrentHashMap
3.2.2.3.2. mechanism
3.2.2.3.3. Untitled
4. third stage: batch ConcurrentHashMap merge
4.1. strategy
4.1.1. each counter has a local counter
4.1.2. when counting frequencies with in a page, update local counter
4.1.3. after counting frequencies for a page, merge it into global counter
4.2. result
4.2.1. Untitled