Começar. É Gratuito
ou inscrever-se com seu endereço de e-mail
Big Data Lab por Mind Map: Big Data Lab

1. Research Fields

1.1. Operations

1.1.1. Cluster provisioning

1.1.2. Installation

1.1.3. Maintenance

1.1.4. Tools

1.1.4.1. Vagrant

1.2. Data Integration

1.2.1. Tools

1.2.1.1. Apache Flume

1.2.1.2. Pentaho Data Integration

1.2.1.3. Informatica - PowerCenter

1.2.1.4. Sqoop

1.3. Data visualization

1.4. Data science

1.4.1. Open Data

1.4.2. Tools

1.4.2.1. Spark

1.4.2.2. Pig

1.4.2.3. Hive

1.4.2.4. Impala

1.4.2.5. Gephi

1.4.2.6. H2O

1.4.3. Machine learning

1.5. Architecture

1.5.1. Security

1.5.1.1. Kerberos

1.5.1.2. Active Directory

1.5.2. Data Organization

1.5.3. Resource management

1.5.4. Backup

1.6. Search - navigation

1.6.1. tools

1.6.1.1. Solr

1.6.1.2. ElasticSearch

2. Experiments

2.1. Itec data hub

2.1.1. Data

2.1.1.1. Logs

2.1.1.2. Documents

2.1.2. Applications

2.1.2.1. Google-like search on logs

2.1.2.2. Google-like search on all documents

2.2. Machine learning

2.2.1. Recommendation algorithm with Spark

2.2.2. Email

2.2.2.1. clustering

2.2.2.2. graph algorithms

3. Objectives

3.1. Defining practices and guidelines

3.2. Experimenting tools

3.3. Build knowledge

3.4. Experimenting architectures

4. Team

4.1. Technology Group

4.1.1. System Administrator - System Architect

4.1.2. System Administrator

4.2. Data Science Group

4.2.1. Data scientist

4.2.2. Data scientist

5. Assets

5.1. 6 machines - 2 quad core - 32GB RAM

5.1.1. 5 nodes Cloudera 5 cluster on physical machines

5.1.2. 4 nodes Cloudera 5Kerberized cluster on virtual machines

5.1.3. Informatica Power Center Big Data Edition

6. Challenges

6.1. Kaggle

6.1.1. Criteo