Data Lake Architecture vs. Traditional Datawarehouse Architecture
Datalake architecture
Data lake is a repository for storing huge volumes of structured, unstructured and semi-structured data. There is no limit on the file size or the format that can be stored.
Learn from Leaders of IT
“How to use Chatgpt and Generative AI”
Join the Event
Data lake architectural components
- Data Ingestion- It contains connectors to extract data from multiple data sources (databases, servers, emails etc) , in a variety of format( structured, semi-structured and unstructured). It provides data curation options.
- Data storage component- It is able to store raw and curated data in any format. This component allows compression and encryption of data.
- Security components- Security is enabled at all stages of information flow in data lake be it data ingestion, data storage, data consumption or data discovery
- Data quality management- Data Lake implementation allows options for setting data quality rules, data quality reporting and remediation
- Meta data management – Data Lake has mechanisms for data audits, data lineage checks, data lifecycle management and policy enforcement.
- Data auditing- Data Lake provides options for complete data auditing and recording data transformation from the perspective of risk and compliance. It helps audit who/how/or when the data elements have been changed
Flow of information in a data lake
There are multiple layers in this architecture
- Ingestion Tier- This layer ingests data in various format
- Storage Tier- This layer stores the raw data
- Insights Tier- These layers provide insights of the input data
- Distillation Tier- This layer consumes data from storage and converts it into structured format for better analysis
- Processing Tier- This layer uses algorithms and processes user queries
- Presentation layer- This layer presents the results and analysis
Traditional data warehouse architecture
It consists of three tiers
- Ist Tier (Bottom Tier)- It contains the database server which extracts data from data sources
- 2nd Tier (ETL Tier or Middle Tier)- The data is extracted, transformed and loaded into the enterprise data warehouse and then into data marts.
- 3rd Tier (Client layer)- The data prepared for analysis is then thoroughly analysed by high level data analytic tools and presented as reports