Services
Partners
Contact

Traditional data warehouse Vs Data lake

August 16, 2021

Data Lake Architecture vs. Traditional Datawarehouse Architecture

Datalake architecture

Data lake is a repository for storing huge volumes of structured, unstructured and semi-structured data. There is no limit on the file size or the format that can be stored.

Data lake architectural components

Data Ingestion- It contains connectors to extract data from multiple data sources (databases, servers, emails etc) , in a variety of format( structured, semi-structured and unstructured). It provides data curation options.
Data storage component- It is able to store raw and curated data in any format. This component allows compression and encryption of data.
Security components- Security is enabled at all stages of information flow in data lake be it data ingestion, data storage, data consumption or data discovery

Data quality management- Data Lake implementation allows options for setting data quality rules, data quality reporting and remediation
Meta data management – Data Lake has mechanisms for data audits, data lineage checks, data lifecycle management and policy enforcement.
Data auditing- Data Lake provides options for complete data auditing and recording data transformation from the perspective of risk and compliance. It helps audit who/how/or when the data elements have been changed

Flow of information in a data lake

There are multiple layers in this architecture

Ingestion Tier- This layer ingests data in various format
Storage Tier- This layer stores the raw data
Insights Tier- These layers provide insights of the input data

Distillation Tier- This layer consumes data from storage and converts it into structured format for better analysis
Processing Tier- This layer uses algorithms and processes user queries
Presentation layer- This layer presents the results and analysis

Traditional data warehouse architecture

It consists of three tiers

Ist Tier (Bottom Tier)- It contains the database server which extracts data from data sources
2nd Tier (ETL Tier or Middle Tier)- The data is extracted, transformed and loaded into the enterprise data warehouse and then into data marts.
3rd Tier (Client layer)- The data prepared for analysis is then thoroughly analysed by high level data analytic tools and presented as reports