ETL Process
af Zineb Hachlaf
1. SOURCES What is ETL with a clear example - Data Engineering Concepts https://www.youtube.com/watch?v=wDTzxdShbd8 What is ETL for Beginners | ETL Non-Technical Explanation https://www.youtube.com/watch?v=wyn-PkJB3Lk What is the difference between ETL and ELT? https://aws.amazon.com/de/compare/the-difference-between-etl-and-elt/ https://airflow.apache.org/docs/apache-airflow/1.10.10/start.html
2. Process: EXTRACT structured and unstructred data from a source into a buffer area TRANSFORM data by cleaning and organising to improve data quality LOAD data on DB on batches or at once (batch vs stream)
2.1. Helps avoid extracting, transforming and loading data everytime you need it so it is time efficient and accessible + insights cannot be extracted from transactional data
2.1.1. ETL
2.1.1.1. Consists of loading only the structured aggregated,transformed, data because storage is limited So only this historical data is available for analysis & reporting. *What if you need different data? Redo the process but difficult to change the automated rules of ETL that run periodically. So ELT is the alternative.
2.1.2. ELT
2.1.2.1. Loading all the raw data (structured + unstructured) into a DL via Hadoop ecosystem then transform the data depending on the need. ETL vs ELT? Depends on business agitily & type of data (volume, velocity, variety).