The Ultimate Data Classification Mind Map

Começar. É Gratuito
ou inscrever-se com seu endereço de e-mail
The Ultimate Data Classification Mind Map por Mind Map: The Ultimate Data Classification Mind Map

1. by Fundamental Data Classes

1.1. Metadata

1.1.1. Data that describes other data

1.1.2. Can include

1.1.2.1. Names

1.1.2.2. Descriptions

1.1.2.3. Definitions

1.1.2.4. Meaning & purpose

1.1.2.5. Classifications

1.1.2.6. Ownership / role / access information

1.1.2.7. Relationships to other data

1.1.2.8. Version

1.1.2.9. Location

1.1.2.10. Technical attributes, like format

1.1.2.11. Quality scores & attributes

1.1.3. Metadata examples

1.1.3.1. Metadata example

1.1.3.2. Data.world screen

1.1.4. On enterprise level metadata should be managed as a knowledge graph

1.1.5. How well company handles metadata determines how well it "knows" it's data

1.1.6. Technical metadata should be scraped directly from the systems of origin

1.1.7. 5 Levels of Metadata Management Maturity

1.2. Master Data

1.2.1. Consistent and uniform set of identifiers and extended attributes that describe the core entities of the business.

1.2.2. Typically master data represents the nouns that business operates with.

1.2.3. Deletion of master ids is not allowed to keep referential integrity

1.2.3.1. Data removal can be achieved by wiping

1.2.4. Most important data in the organization - data quality is paramount

1.2.5. There should be only ONE authoritative source for master data in the organization

1.2.6. Examples

1.2.6.1. Master data examples

1.2.7. Reference data

1.2.7.1. A relatively static subset of master data, usually pre-defined.

1.2.7.2. It is also master data. It is typically manually produced and slow changing.

1.2.7.3. Reference data example

1.3. Transactional Data

1.3.1. Transactions are events, they are a change of state.

1.3.2. Transactions are atomic and immutable (they should not be updated)

1.3.3. Typically, they have an ID/sequence number + timestamp

1.3.4. Often transactions reference master data entities

1.4. Analytical Data

1.4.1. Data that is intended to be consumed outside of primary operational use, typically for analytical, compliance or legal purposes.

1.4.2. Historical Transactional Data

1.4.3. Snapshots / time series

1.5. Temporary Data

1.5.1. Data that appears and dissolves on a temporary basis, without the need to keep the state.

1.5.2. As temporary data is stateless, there should not be any persisted references to temporary data.

1.5.3. All data in transit; all staging data; all caches, etc; temporary copies of data; data being migrated; etc.

1.6. Relationship between classes

1.6.1. Relationships

2. by Sensitivity

2.1. Public

2.1.1. Data freely available to the public

2.1.2. Can be freely used and redistributed without any repercussions.

2.1.3. Examples: data published by governments, data openly available on the web, open datasets

2.2. Internal

2.2.1. Data accessible only within the organization

2.2.2. Impact if leaked: minimal

2.2.3. It might circulate freely within the organization, but cannot leave its boundaries.

2.2.4. Examples: internal communications, policies, business plans, organizational information

2.3. Confidential

2.3.1. Data accessible only within specific organizational unit or function

2.3.2. Impact if leaked: medium to severe

2.3.3. Requires specific authorization or clearance to access

2.3.4. Examples: pricing and contract information, certain performance indicators, financial information, trade secrets, etc.

2.3.5. If disclosed this data can negatively affect your business.

2.3.6. Information at this level can be protected by NDAs and other agreements

2.4. Restricted

2.4.1. Highly sensitive data accessible only on a need-to-know basis

2.4.2. Impact if leaked: severe

2.4.3. This information is typically protected by privacy regulations, such as GDPR, CCPA, HIPAA, etc.

2.4.4. Examples: personal information, potentially identifiable information (PII), credit card details, etc.

3. by Structure

3.1. Structured

3.1.1. Data that is organized in a pre-defined model and is therefore the easiest to access and analyze

3.1.2. Typically tabular data (rows and columns), stored in tabular file formats (CSV, Excel, etc.), spreadsheets, relational databases and other tabular stores

3.1.3. Structured data example

3.1.3.1. Tabular data

3.1.4. Every entity is represented as a table

3.1.5. Entities can be linked together using identifiers (see relational modeling)

3.1.5.1. Using IDs in a relational model

3.2. Semi-structured

3.2.1. Allows marking specific data elements or relationships / hierarchy, but otherwise doesn't require a rigid structure.

3.2.2. Typically represented as hierarchical formats

3.2.2.1. JSON

3.2.2.1.1. Format description

3.2.2.1.2. JSON example

3.2.2.2. XML

3.2.2.2.1. Format description

3.2.2.2.2. XML example

3.2.3. Includes graph databases and graph data formats

3.2.3.1. Graph data example

3.2.4. Some formats are well positioned for a self-describing structure.

3.3. Unstructured

3.3.1. Data that cannot be captured using a pre-defined model

3.3.2. Examples: photo, video, audio, text, documents, emails, presentations, etc.

3.3.3. 80% of enterprise data (Gartner)

4. Why classify data?

4.1. Short version

4.1.1. Awareness and application of data classification saves time and money

4.2. Long version

4.2.1. To be data-centric it is required to determine life cycles of specific data assets.

4.2.2. To determine appropriate life cycles, it is necessary to classify data.

4.2.3. Data strategies that employ data classification are easier to understand and execute

4.2.4. Classifying data makes the compliance and security work so much easier

4.2.5. There are some core principles associated with fundamental data classes, bad things start happen (companies lose money) when those principles are violated. Awareness of those classes is crucial.

4.2.6. Using Cynefin Framework as an example, classifying data allows to move decision making from Complex domain to Complicated, and from Complicated to Obvious

4.2.6.1. Cynefin framework (wikipedia)

4.2.6.2. Cynefin framework (youtube)

5. About this mind map

5.1. By Ruben Sardaryan

5.2. Infocratic

5.3. https://infocratic.io

5.4. Data Strategy & Data-Centric Enterprise Consulting