Online Mind Mapping and Brainstorming

Create your own awesome maps

Online Mind Mapping and Brainstorming

Even on the go

with our free apps for iPhone, iPad and Android

Get Started

Already have an account? Log In

Solr by Mind Map: Solr
5.0 stars - 1 reviews range from 0 to 5

Solr

Indexing

CSV, XML, JDBC

Streaming data from file system

Other document formats

Solr Cell, Thin Layer, Uses Apache Tika, Delegates to 3rd Party Libraries, Generates SAX events, Media Files - mp3 etc,

POJOs

Embedded Solar ?

Text Analysis

Extracting terms

Documents

Queries

Analyzer Chains for each

Tokenizer

Stemming

Reducing variants to one representation

Synonyms

Query Time

Index Time

Stems are indexed

Better scores for rare terms

Adding synonyms requires re-indexing

Stop Words

Phonetic Search

Wild Card

N-Gram, Minimum, Maximum, To, Ton, Toni, Tonig ..., Resource Intensive

Only Trailing

Lucene

Library, not Server

Not a Web Crawler

Inverted Index

Scoring Algortihm

Text Analyzers

Query Syntax

Highlighter

Other Features

Serverization of Solr

HTTP based API

Caching of results

Spell Checker

More Like This

Highlighting

Schema Design

Single vs. Multiple Index

Multiple entities

Pros, Easier to search across entities, Easy to maintain

Cons, Namespace Collision, Add prefix to each field, Lower Quality for rare terms, Term may not be rare across entities, Slower performance, Fuzzy, prefix, Wildcard Searches, Frequent commits to a single entity invalidates entire cache

Design

Determine common searches

Identify the key entities

Denormalize related entities, Use multi-valued felds

Omit fields that are only used to display, Optional - Performance Hit

Searching

Query Expressions

+ Mandatory

- Prohibited

Optional

fileldname:xxx Field Qualifier

" xxx yyy" Phrases

"xxx yyy"~3 Term Proximity

Range Queries

Boosting

< 1 Reduce

> 1 Increase

fieldname:xxx ^ 2

Phrase Boosting, "Billy Joel" is higher

Boost Queries, Recent albums are higher

Filtering

Limit what the user sees

fq:...

Filter Queries are cached

Does not affect the score

Scoring

Relative, not Absolute measure

Term Frequency - tf, More frequent higher score

Inverse Document Frequency - idf, Rare term higher score

Coordination Factor, More terms match higher score

Field Length, Shorter the matched term higher score, Smashing > Smashing Pumpkin

Faceting

Text

Dates

Queries, Create Ranges, [0-2], [2-4], [>4] ..

Sort by Count

Limit to x values

Power auto suggest

Deployment

Search Handlers

For different user categories, Paying / Non-Paying, Regular / Admin

Cores

Rebuilding Indexes, Bulk updates of separate index, Searchers not impacted, SWAP to activate the new index

Testing Configuration Changes

Operations, SWAP, MERGE, RENAME

Security

Standard - behind the firewall

WAR file - web.xml controls

AJAX requests, Search Handlers

Document Access, Use Faceting, Multi valued Facet for Role, Index document with appropriate Role values, Add facet to queries

Integration

Embedded Solr

Streaming Local Content

Upgrading existing Lucene to Solr

JSON Support

Return results to Javascript

Scaling

# Users

Multiple Servers, Master Slave support, Index into the Master, Solr Replicates, External Load Balancer, New node

Single Server, Caching, Leverage HTTP Caching, External Cache (e.g. Squid), Optimization, Increase Hit Ratio, Lower Eviction Rate, Schema Design, Index only essential fields, Store only essential fields, Indexing Strategy, Tune the Batch Size, 10 for Large Docs, 100 for Small Docs, Enable Term Vectors, List of terms extracted from a field, Faster Retrieval - precomputation, More Like This, Highlighting, Shingling, Faster Phrase Search, "quick brown", "lazy dog" ..., Fields where phrase search is common, Title

Amount of Data

Sharding Support, Solr aggregates results from shards, Need to manage document to shard assgnment