Carleton University - School of Computer Science Honours Project
Fall 2019
Evaluating and Assessing ETL tools for Data Analysis and Visualization
Lama Elnaggar
SCS Honours Project Image
ABSTRACT
One of the important components toward developing a business intelligence framework is data integration. It is used to combine data from different data sources and integrate it together before storing it into a data warehouse. The process of data migration from its sources to the final destination happens in several stages. Data sourced from different data sources gets into a data ingestion pipeline to transport the data into a Data Lake. A Data lake is a centralized repository that stores all the structured and unstructured data at any scale (storing the data as is). The data stored in the Data Lake is in its raw format. Throughout the data integration process, the data needs to transform to a compatible/standard format to be later stored in a data warehouse for future data analysis on it. In this stage, the data is populated into the Data warehouse through the process of Extract, Transform and Load (ETL). ETL tool helps business intelligence frameworks to obtain clean, consistent and comprehensive data. The project’s scope will consist on evaluating 5 open source ETL tools, proposing a specific ETL tool for data warehousing and finally applying data analysis/visualization on that data using Microsoft Power BI.