Solr

Overview of Solr 1.4

Jetzt loslegen. Gratis!
oder registrieren mit Ihrer E-Mail-Adresse
Solr von Mind Map: Solr

1. Indexing

1.1. CSV, XML, JDBC

1.2. Streaming data from file system

1.3. Other document formats

1.3.1. Solr Cell

1.3.1.1. Thin Layer

1.3.1.2. Uses Apache Tika

1.3.1.2.1. Delegates to 3rd Party Libraries

1.3.1.2.2. Generates SAX events

1.3.1.2.3. Media Files - mp3 etc,

1.4. POJOs

1.4.1. Embedded Solar ?

2. Text Analysis

2.1. Extracting terms

2.1.1. Documents

2.1.2. Queries

2.1.3. Analyzer Chains for each

2.2. Tokenizer

2.3. Stemming

2.3.1. Reducing variants to one representation

2.4. Synonyms

2.4.1. Query Time

2.5. Index Time

2.5.1. Stems are indexed

2.5.2. Better scores for rare terms

2.5.3. Adding synonyms requires re-indexing

2.6. Stop Words

2.7. Phonetic Search

2.8. Wild Card

2.8.1. N-Gram

2.8.1.1. Minimum, Maximum

2.8.1.2. To, Ton, Toni, Tonig ...

2.8.1.3. Resource Intensive

2.8.2. Only Trailing

3. Searching

3.1. Query Expressions

3.1.1. + Mandatory

3.1.2. - Prohibited

3.1.3. Optional

3.1.4. fileldname:xxx Field Qualifier

3.1.5. " xxx yyy" Phrases

3.1.6. "xxx yyy"~3 Term Proximity

3.1.7. Range Queries

3.2. Boosting

3.2.1. < 1 Reduce

3.2.2. > 1 Increase

3.2.3. fieldname:xxx ^ 2

3.2.4. Phrase Boosting

3.2.4.1. "Billy Joel" is higher

3.2.5. Boost Queries

3.2.5.1. Recent albums are higher

3.3. Filtering

3.3.1. Limit what the user sees

3.3.2. fq:...

3.3.3. Filter Queries are cached

3.3.4. Does not affect the score

3.4. Scoring

3.4.1. Relative, not Absolute measure

3.4.2. Term Frequency - tf

3.4.2.1. More frequent higher score

3.4.3. Inverse Document Frequency - idf

3.4.3.1. Rare term higher score

3.4.4. Coordination Factor

3.4.4.1. More terms match higher score

3.4.5. Field Length

3.4.5.1. Shorter the matched term higher score

3.4.5.2. Smashing > Smashing Pumpkin

3.5. Faceting

3.5.1. Text

3.5.2. Dates

3.5.3. Queries

3.5.3.1. Create Ranges

3.5.3.1.1. [0-2], [2-4], [>4] ..

3.5.4. Sort by Count

3.5.5. Limit to x values

3.5.6. Power auto suggest

4. Deployment

4.1. Search Handlers

4.1.1. For different user categories

4.1.1.1. Paying / Non-Paying

4.1.1.2. Regular / Admin

4.2. Cores

4.2.1. Rebuilding Indexes

4.2.1.1. Bulk updates of separate index

4.2.1.2. Searchers not impacted

4.2.1.3. SWAP to activate the new index

4.2.2. Testing Configuration Changes

4.2.3. Operations

4.2.3.1. SWAP

4.2.3.2. MERGE

4.2.3.3. RENAME

4.3. Security

4.3.1. Standard - behind the firewall

4.3.2. WAR file - web.xml controls

4.3.3. AJAX requests

4.3.3.1. Search Handlers

4.3.4. Document Access

4.3.4.1. Use Faceting

4.3.4.1.1. Multi valued Facet for Role

4.3.4.1.2. Index document with appropriate Role values

4.3.4.1.3. Add facet to queries

5. Integration

5.1. Embedded Solr

5.1.1. Streaming Local Content

5.1.2. Upgrading existing Lucene to Solr

5.2. JSON Support

5.2.1. Return results to Javascript

6. Lucene

6.1. Library, not Server

6.2. Not a Web Crawler

6.3. Inverted Index

6.4. Scoring Algortihm

6.5. Text Analyzers

6.6. Query Syntax

6.7. Highlighter

7. Other Features

7.1. Serverization of Solr

7.2. HTTP based API

7.3. Caching of results

7.4. Spell Checker

7.5. More Like This

7.6. Highlighting

8. Schema Design

8.1. Single vs. Multiple Index

8.1.1. Multiple entities

8.1.2. Pros

8.1.2.1. Easier to search across entities

8.1.2.2. Easy to maintain

8.1.3. Cons

8.1.3.1. Namespace Collision

8.1.3.1.1. Add prefix to each field

8.1.3.2. Lower Quality for rare terms

8.1.3.2.1. Term may not be rare across entities

8.1.3.3. Slower performance

8.1.3.3.1. Fuzzy, prefix, Wildcard Searches

8.1.3.3.2. Frequent commits to a single entity invalidates entire cache

8.2. Design

8.2.1. Determine common searches

8.2.2. Identify the key entities

8.2.3. Denormalize related entities

8.2.3.1. Use multi-valued felds

8.2.4. Omit fields that are only used to display

8.2.4.1. Optional - Performance Hit

9. Scaling

9.1. # Users

9.1.1. Multiple Servers

9.1.1.1. Master Slave support

9.1.1.1.1. Index into the Master

9.1.1.1.2. Solr Replicates

9.1.1.1.3. External Load Balancer

9.1.1.2. New node

9.1.2. Single Server

9.1.2.1. Caching

9.1.2.1.1. Leverage HTTP Caching

9.1.2.1.2. External Cache (e.g. Squid)

9.1.2.1.3. Optimization

9.1.2.2. Schema Design

9.1.2.2.1. Index only essential fields

9.1.2.2.2. Store only essential fields

9.1.2.3. Indexing Strategy

9.1.2.3.1. Tune the Batch Size

9.1.2.4. Enable Term Vectors

9.1.2.4.1. List of terms extracted from a field

9.1.2.4.2. Faster Retrieval - precomputation

9.1.2.4.3. More Like This

9.1.2.4.4. Highlighting

9.1.2.5. Shingling

9.1.2.5.1. Faster Phrase Search

9.1.2.5.2. "quick brown", "lazy dog" ...

9.1.2.5.3. Fields where phrase search is common

9.2. Amount of Data

9.2.1. Sharding Support

9.2.1.1. Solr aggregates results from shards

9.2.1.2. Need to manage document to shard assgnment