Traditional data warehouse Vs Data lake

Data Lake Architecture vs. Traditional Datawarehouse Architecture 

Datalake architecture 

Data lake is a repository for storing huge volumes of structured, unstructured and semi-structured data. There is no limit on the file size or the format that can be stored.  

Data lake architectural components 

  • Data Ingestion- It contains connectors to extract data from multiple data sources (databases, servers, emails etc) , in a variety of format( structured, semi-structured and unstructured). It provides data curation options. 
  • Data storage component- It is able to store raw and curated data in any format. This component allows compression and encryption of data. 
  • Security components- Security is enabled at all stages of information flow in data lake be it data ingestion, data storage, data consumption or data discovery 
  • Data quality management- Data Lake implementation allows options for setting data quality rules, data quality reporting and remediation 
  • Meta data management – Data Lake has mechanisms for data audits, data lineage checks, data lifecycle management and policy enforcement. 
  • Data auditing- Data Lake provides options for complete data auditing and recording data transformation from the perspective of risk and compliance. It helps audit who/how/or when the data elements have been changed 

Flow of information in a data lake 

There are multiple layers in this architecture 

  • Ingestion Tier- This layer ingests data in various format 
  • Storage Tier- This layer stores the raw data 
  • Insights Tier- These layers provide insights of the input data 
  • Distillation Tier- This layer consumes data from storage and converts it into structured format for better analysis 
  • Processing Tier- This layer uses algorithms and processes user queries 
  • Presentation layer- This layer presents the results and analysis 

Traditional data warehouse architecture 

It consists of three tiers 

  • Ist Tier (Bottom Tier)- It contains the database server which extracts data from data sources 
  • 2nd Tier (ETL Tier or Middle Tier)- The data is extracted, transformed and loaded into the enterprise data warehouse and then into data marts.  
  • 3rd Tier (Client layer)- The data prepared for analysis is then thoroughly analysed by high level data analytic tools and presented as reports 

 

Register a Free Cloud ROI Assesment Workshop

Register a Free Cloud ROI Assesment Workshop

Get a Detailed assessment report with recommendations with an assessment report

Schedule free Workshop
Register a Free Cloud ROI Assesment Workshop
Register a Free Cloud ROI Assesment Workshop

Related articles you may would like to read

Leveraging Data Management Maturity Model to boost data management capabilities

Request a Consultation