Create your own awesome maps

Even on the go

with our free apps for iPhone, iPad and Android

Get Started

Already have an account?
Log In

Entity Resolution in Plnnr, Imri Goldberg by Mind Map: Entity Resolution in Plnnr, Imri Goldberg
0.0 stars - reviews range from 0 to 5

Entity Resolution in Plnnr, Imri Goldberg

about

founder @ Plnnr.com

agenda

What is it

Possible solutiona

...

The problem

They collect lots of data

Need resolve many representations of the same entity

Plnnr need aggregate info on tourist attractions

Complications

Languages

Duplications

Missing information in some sources

Need allow manual coorrection

Process must be repeatable & deterministic

They do harvesting all the time

Solution

DB holds table of POI's

Point of Interest, their entity class

Each source POI point to its combined version

Representations

Algorithm

Create graph of entity & its representations

Entity resolution in general

Many use cases

When working with TV series, many formats & representations, & no identity standard

eLibrary, OS project

Delver, needed to resolve people

Properties of the problem

Single or multiple entity types

Are their standard/strong identifiers

Do entities have relations between them

Entity resolution across

data sources

time

Conflicting versions (across sources or time)

Show all or just most common version

These properties dimensions cause that there isn't a silverbullete single solution

Design goals

Quality results

Repeatable & determinstic

Reasonably fast

DRY code

Possible design decisions

Relational or Schema-less DB

Fragments (source entities) in the same table as combined entity

Combine only source entities, or use past results

Relevant design patterns

Actual data needs to be explicit

priorities between data-sources, to resolve conflicts

Always use slugs, or UUIDs at the least

Autoincremented id's as foreign keys are meaningless & problematic when regenerating DB

Data-sources for everything

Avoid n^2 algorithms by taking advantage of data properties

hurts when having 10K such operations

e.g., use coordinates to distinguish entities

Don't allow new data to ruin old matches easily

Content editor already corrected data

Tools

Google Translate

to deal with lanaguages

dbPedia

Semantinet

gype

example of good restful API

Google Refine