Online Mind Mapping and Brainstorming

Create your own awesome maps

Online Mind Mapping and Brainstorming

Even on the go

with our free apps for iPhone, iPad and Android

Get Started

Already have an account? Log In

Intro to NoSQL & Cassandra, Ran Tavori by Mind Map: Intro to NoSQL & Cassandra, Ran Tavori
5.0 stars - 1 reviews range from 0 to 5

Intro to NoSQL & Cassandra, Ran Tavori





Not Only not No

SQL is Good

SQL DB's implement ACID


In Internet Scale

SQL gets too expensive

usually Hardware costs

Internet changed the scale of applications & introduced the need, previous corporate apps didn't have such scale

Social Apps - not banks

high RW

frequent schema changes

growth is usually non-linear

no need for same level ACID, e.g., Twitter feed not always consistent


Facebook still has sharded MySQL for main storage, but also Cassandra

Twitter too, but also Cassandra

Google's main storage is BigTable, but also some sharded MySQL

Scaling solutions


Master & slaeves, scales Reads

problem in consistency, due to replica's synchronization, similar problem in caching


scales also Writes

makes you "sharding slaves", all the time resharding, causes pain - lot's of maintenance

you loose some of SQL's features, joins, sorting, grouping &c

Brewer's CAP theorem

Berkley proffessor & CTO of Akamai

needed to deal with



partition tolerance, toelrating disconnection between nodes

& found that

you can only choose 2

as apps gets larger, partition toelrance is a must

mid-time between failures is large

so you need to choose between consistency & availability

A+C (no partition tolerance)

Master server, MySQL

C + P

not available, unavailable some time

A + P

tolerate inconsistencies

e.g., BigTable, Dynamo, Cassandra

Consistency levels




levels, Casual, Read your writes, Monotonic

eventually the data will be in all replica's properly

NoSQL examples


to some extent not A





scaling is awfull, but nice features


similar to Cassandra

much more difficult to manage


Developed at FaceBook

Data model of BigTable

Network mgmt modelled on Dynamo

Eventual Consistency

2 developers that worked on Amazon's Dynamo, & didn't like it there

moved to FaceBook

Implemented in Java

beautiful implementation


tens to thousands of nodes

useful for lots of nodes, not few

"Down to earth" Consistency


N, number of replicas, for any data item

W, nomber of nodes a write operation blocks on

R, nomber of nodes a read operation blocks on

typical values

Untitled, W=1, block until 1st node written successfully, W=N, blocks until all nodes written succesfully, W=0, async writes

Untitled, R=1, blocks until the 1st node returns an answer, R=N, block until all nodes return an answer, R=0, doesn't make sense

Quorum, R=N/2+1, W=N/2+1, you write to all but await just for W acks, Fully consistent

N defined on server

different R/W setups on different column-families, API calls, clients &c

Data model

Forget SQL

or any query language, you can integrate with text search, Lucandra, you search by text, get an id & then continue, in BigTable, they added indices, as secondary key, will be supported in Cassandra soon

or any grouping/aggregation


modelled after BigTable

difference from Key-Value, you can get/set just some columns

scales really well

denormalization is a must

client responsibility to write all, not transactional

works well with their disk usage


Keyspace, like namespaces for unique keys, schema

Column Family, very much like a Table.., rows & columns, but not quite, sparse array

Key, a key that represents a row (of columns), search is always by key, you must model your data according to predicted usage, if use-cases change, you're stuck, but there are some solutions

Column, represents a value with, Column name, Value, Timestamp

Super Column, column that holds list of columns inside


programmatic not declarative

support many languauges



get_slice, some columns

multi_get, saves round-trip



get_range_slice, slice over rows & columns






also meta-api

e.g., describe ring peers

not convenient

to say the least

written in Thrift

language/compiler developed in Facebook

takes .idl & generates implementation in different languages, even Erlang

Map/Reduce also supported, in an extension, not part of the API

using Hadoop

which takes the data from Cassandra

You usually

If Key-Value works, use it

Else, if Column-based works, use it

Else, if Document-oriented works, use it

Else, use SQL

Your main consideration

How much Data will you have

Unfortunately, it's hard to know in advance..

SQL over NoSQL



also SQL vendors offering scaling similar to NoSQL

e.g.,, Sybase IQ, more

Eventually data-store solutions will merge

RDBMS will offer NoSQL features

to enable scale-out

NoSQL will feature RDBMS features

to enable better features