Data warehousing & Business Intelligence

Get Started. It's Free
or sign up with your email address
Data warehousing & Business Intelligence by Mind Map: Data warehousing & Business Intelligence

1. Project Planning

1.1. Project Team

1.1.1. Business

1.1.1.1. Business Sponsor

1.1.1.1.1. Justifies and sponsors the Project

1.1.1.1.2. Ensures Commitment from end users

1.1.1.1.3. Handles investment decision

1.1.1.1.4. Top level or SBU head level executive

1.1.1.1.5. Understands business pain points

1.1.1.1.6. Understands DW Concepts

1.1.1.2. Business Analyst

1.1.1.2.1. Subject matter Expert

1.1.1.2.2. Understands Business

1.1.1.2.3. Conducts and documents business requirements

1.1.1.2.4. Skills

1.1.1.3. Business Champion

1.1.1.3.1. Represents interest and concerns of business user community

1.1.1.3.2. Strong understanding of business pain points

1.1.1.3.3. Strong understanding of business process

1.1.1.3.4. Good communicator

1.1.2. Project Management

1.1.2.1. Program / Project Manager

1.1.2.1.1. has overall responsibility for the delivery

1.1.2.1.2. works with the project sponsor, user community, and development team

1.1.2.1.3. Develops and manages the project plan

1.1.2.1.4. Manages project scope, timeline and budget

1.1.2.1.5. Resolves budget issues

1.1.2.1.6. Communicates project progress to Project Sponsor(s)

1.1.2.1.7. Responsible for delivering on the commitments

1.1.3. Technical

1.1.3.1. DBA

1.1.3.1.1. management and maintenance of the physical tables of the Data Warehouse, Staging Area

1.1.3.1.2. Implements Backup and recovery procedures

1.1.3.2. Data Warehouse Architect / Data Modeller

1.1.3.2.1. responsible for developing the logical data models

1.1.3.2.2. documenting the models in Entity-Relationship Diagrams (ERD)

1.1.3.2.3. must work with the Business Analysts and ETL Developers to translate the business requirements into a logical model

1.1.3.2.4. develops the overall Data Warehouse logical architecture e.g. configuration of the data warehouse, data marts, and an operational data store or staging area

1.1.3.3. OLAP Modeler

1.1.3.3.1. Responsible for developing OLAP models

1.1.3.4. ETL Developer

1.1.3.4.1. Expert in ETL tools like Informatica, SSIS, Talend

1.1.3.4.2. Develops ETL routines

1.1.3.5. Report developer

1.1.3.5.1. Develops canned reports

1.1.3.5.2. Skills: Crystal reports, Cognos report studio, Jasper reports

1.1.4. Training

1.1.4.1. Trainers

1.1.5. Testing

1.1.5.1. QA Project Manager

1.1.5.2. Testing Analyst

1.1.5.3. UAT Analyst

1.1.5.4. Tester

1.2. Peroject Plan

1.3. Documentation

2. Testing Framework

3. Business Intelligence

3.1. Pyramid of Decision-making

3.1.1. Strategic

3.1.2. Tactical

3.1.3. Operational

3.2. Definition

3.2.1. Business intelligence (BI) is an umbrella term that includes the applications, infrastructure and tools, and best practices that enable access to and analysis of information to improve and optimize decisions and performance.

3.3. Benefits

3.3.1. improves decision making

3.3.2. identify cost-cutting ideas

3.3.3. uncover business opportunities

3.3.4. improves business operations

3.4. OLAP

3.4.1. Types

3.4.1.1. ROLAP

3.4.1.2. MOLAP

3.4.1.3. HOLAP

3.4.2. Operations

3.4.2.1. Drill-up

3.4.2.2. Drill-down

3.4.2.3. Drill thru

3.4.2.4. Slice & Dice

3.4.2.5. Pivot

3.4.3. Structure

3.4.3.1. Dimensions

3.4.3.1.1. Hierarchies

3.4.3.1.2. Levels

3.4.3.1.3. Members

3.4.3.2. Measures

3.4.3.3. Cube

3.4.4. Demonstration

3.4.5. MDX

4. Data Warehousing

4.1. database that is maintained separately from organization's operational database(s)

4.2. Key personalities

4.2.1. Bill Inmon (father of data warehousing, Coined the term in 1970)

4.2.1.1. Data warehouse is a part of the overall business intelligence system

4.2.1.2. An enterprise has one data warehouse, and several data marts source their information from the data warehouse

4.2.1.3. Information is stored in 3NF

4.2.1.4. Definition

4.2.1.4.1. subject-oriented

4.2.1.4.2. integrated

4.2.1.4.3. time-variant

4.2.1.4.4. non-volatile

4.2.1.4.5. collection of data in support of mgmt. decision-making process

4.2.2. Ralph Kimball

4.2.2.1. conglomerate of all data marts within the enterprise

4.2.2.2. Information is always stored in the dimensional model

4.2.3. Dan Linstedt (Data Vault)

4.3. Kimball vs Inmon

4.3.1. Requirements

4.3.1.1. Strategic (I)

4.3.1.2. Tactical (K)

4.3.2. Data integration

4.3.2.1. Enterprise wide (I)

4.3.2.2. Departmental (K)

4.3.3. Data Structure

4.3.3.1. Meets multiple and varied information requirements including non-metric data (I)

4.3.3.2. Primarily metric data, business performance, scorecards, multiple dimensions (K)

4.3.4. Skill set requirements

4.3.4.1. Bigger team of specialists (I)

4.3.4.2. Relatively smaller team of generalists (K)

4.3.5. Time to market

4.3.5.1. High (I)

4.3.5.2. Low, incremental

4.3.6. Start-up cost

4.3.6.1. High (I)

4.3.6.2. Low (K)

4.4. Technology

4.4.1. Hardware considerations

4.4.1.1. Scalability

4.4.1.2. Parallel Processing Support

4.4.1.3. CPU Speed

4.4.1.4. Disk Capacity

4.4.1.5. Memory Capacity (RAM)

4.4.1.6. I/O Capacity

4.4.2. Software

4.4.2.1. Databases

4.4.2.1.1. Oracle

4.4.2.1.2. Microsoft SQL Server

4.4.2.1.3. IBM DB2

4.4.2.1.4. Teradata

4.4.2.1.5. Sybase

4.4.2.1.6. MYSQL

4.4.2.1.7. Infobright

4.4.2.1.8. HP Vertica

4.4.2.1.9. SAP Hana

4.4.2.1.10. IBM Dash DB

4.4.2.2. ETL

4.4.2.2.1. SQL Server Integration Services

4.4.2.2.2. Informatica

4.4.2.2.3. Ab Initio

4.4.2.2.4. IBM DataStage

4.4.2.2.5. Talend

4.4.2.2.6. Pentaho Kettle

4.4.2.3. Reporting

4.4.2.3.1. Cognos

4.4.2.3.2. SAP Business Objects

4.4.2.3.3. Microstrategy

4.4.2.3.4. Crystal Reports

4.4.2.3.5. SQL Server Reporting Services

4.4.2.3.6. OBIEE

4.4.2.3.7. Pentaho

4.4.2.3.8. Jaspersoft

4.4.2.3.9. Spago BI

4.4.2.4. OLAP

4.4.2.4.1. Cognos

4.4.2.4.2. Business Objects

4.4.2.4.3. Microstrategy

4.4.2.4.4. Pentaho Mondrian

4.4.2.4.5. SQL Server Analysis Services

4.4.2.5. In-memory Analytics

4.4.2.5.1. Tableau

4.4.2.5.2. Qlikview

4.4.2.6. Predictive Analytics / Data Mining

4.4.2.6.1. SAS

4.4.2.6.2. R

4.4.2.6.3. SSAS

4.4.2.7. Metadata

4.4.2.7.1. ERWIN

4.4.2.8. DW Life cycle management / Automation

4.4.2.8.1. Kalido

4.4.2.8.2. BI Ready

4.4.2.8.3. WhereScape

4.4.2.9. Open Source BI

4.4.2.9.1. Jaspersoft

4.4.2.9.2. Pentaho

4.4.2.9.3. Spago BI

4.4.3. DW Appliance

4.4.3.1. Netezza

4.4.3.2. Teradata

4.4.3.3. DATAllegro

4.4.4. Standards

4.4.4.1. Common Warehouse Metamodel (CWM)

4.5. OLTP vs Data Warehousing

4.6. Datamart

4.6.1. A data mart is a subset or an aggregation of the data stored to a primary data warehouse. It includes a set of information pieces relevant to a specific business area, corporate department, or category of users.

4.6.2. Dependent

4.6.3. Independent

4.7. Architecture

4.7.1. Layers

4.7.1.1. Data Source layer

4.7.1.2. Data Extraction layer

4.7.1.3. Staging Layer

4.7.1.4. Data Transformation and Load

4.7.1.5. Data storage (DW layer)

4.7.1.6. Aggregate layer

4.7.1.7. Presentation Layer

4.7.1.8. Metadata Layer

4.7.1.8.1. Business Metadata

4.7.1.8.2. Technical Metadata

4.7.1.8.3. Operational Metadata

4.7.1.9. Operation layer

4.8. Critical Success Factors

4.8.1. Sponsorship and Involvement

4.8.2. Business Requirements

4.8.3. Enterprise Architecture

4.8.4. DW Architecture and Design

4.8.5. DW Technology

4.8.6. Information / Data Quality

4.8.7. Development Environment

4.9. Dimensional Modelling

4.9.1. Overview

4.9.1.1. set of techniques and concepts used in data warehouse design

4.9.1.2. Does not necessarily involve a relational database and can be used with

4.9.1.2.1. RDBMS

4.9.1.2.2. Column Stores

4.9.1.2.3. Multi-dimensional Databases

4.9.1.2.4. or even a flat files

4.9.1.2.5. json

4.9.1.3. intended to support end user queries

4.9.1.4. oriented around understandability & performance

4.9.1.5. always use concept of facts and dimensions

4.9.2. 4 Step dimensional design process

4.9.2.1. select a business process

4.9.2.1.1. business processes are the operational activities performed by an organization

4.9.2.2. declare grain

4.9.2.2.1. examples

4.9.2.2.2. grain is level of details we want to capture in datawarehouse

4.9.2.2.3. declaring grain is the pivotal step in dimensional modelling

4.9.2.2.4. grain must be declared before choosing dimensions or facts because every candidate dimension or fact must be consistent with the grain

4.9.2.3. identify dimensions

4.9.2.3.1. Dimension provides "who, what, where, when, why, and how" context surrounding a business process event

4.9.2.3.2. Dimension provides descriptive attributes used by BI systems. Used for filtering and grouping of facts

4.9.2.3.3. A disproportionate amount of effort is put into the modelling of dimensions because they are the drivers of the user’s BI experience

4.9.2.4. identify facts

4.9.2.4.1. Facts are the measurements that result from a business process event and are almost always numeric

4.9.3. Bus Architecture

4.9.3.1. key element of incremental DW development approach

4.9.3.2. it is technology and database independent

4.9.3.3. it decomposes DW planning process into manageable pieces by focusing on organization's core processes

4.9.4. Schemas

4.9.4.1. Star Schema

4.9.4.2. Snowflake Schema

4.9.4.3. Fact Constellation

4.9.5. Development Approaches

4.9.5.1. Top Down

4.9.5.1.1. Driven by user requirements

4.9.5.2. Bottom Up

4.9.5.2.1. Driven by existing source systems

4.9.5.3. Hybrid

4.9.5.3.1. Combination of Top Down / Bottom Up

4.9.6. Techniques

4.9.6.1. Dimension Tables

4.9.6.1.1. Structure

4.9.6.1.2. Surrogate Keys

4.9.6.1.3. Multiple hierarchies

4.9.6.1.4. Role playing dimensions

4.9.6.1.5. Slowly Changing Dimensions

4.9.6.1.6. Conformed Dimensions

4.9.6.1.7. Junk Dimension

4.9.6.1.8. Degenerated dimensions

4.9.6.1.9. Time Dimension

4.9.6.2. Dimension Hierarchies

4.9.6.2.1. Regular (fixed depth) hierarchy

4.9.6.2.2. Ragged hierarchy

4.9.6.3. Fact Tables

4.9.6.3.1. Structure

4.9.6.3.2. Types of meaasures

4.9.6.3.3. Types of Fact Tables

5. Other Related fields

5.1. Master data Management

5.2. Corporate Performance Management

5.3. Complex Event Processing