Data warehouse is a relational database hosted on cloud or an enterprise mainframe server. A data warehouse is a collection of data related to a certain area, like a company. It is arranged to help the decision making by directly contributing to create reports and data analysis. It is considered a fundamental component of business intelligence. Data warehouse for an organization is a file. Normally only the transactional and operational information are saved. But data warehouse stores data in a database designed to favor the analysis and efficient dissemination of data. Data stores may contain large amounts of information. That is why they may be subdivided into smaller logical units depending on the subsystem when such is necessary.
Data warehouse is not a particular type of technology. Often, data warehouses are built using the traditional database such as MySQL. However, there are newer & specific databases like Amazon RedShift, HP Vertica, Teradata optimized analyzing massive datasets.
To build a data warehouse, we need to copy the raw data from each of the data sources, cleanse, and optimize. The process of getting data into a data warehouse is called ETL: Extract, Transform, Load.
---
Extract the data : Some of the data sources could have millions of records. Copying every record during every sync will not be a good idea, instead a system required to skip copying the same unmodified data.
Transform : Data needs to be cleansed and made ready for analysis. There are steps such as denormalizing which optimizes the ability of data warehouses to read data.
Load : It means loading into warehouse to make ready for analysis.
The companies move, products change, and some attributes of an item are static. Slowly changing dimensions acknowledge this by adding the idea of effective dates.
In a data warehouse, what is wanted is to contain data that is necessary or useful for an organization, that is, that it is used as a data repository to later transform it into useful information for the user. A data warehouse must deliver the correct information to the indicated people at the optimum time and in the appropriate format. The data warehouse responds to the needs of expert users, using Decision Support Systems, Executive Information Systems or tools to make inquiries or reports. End users can easily make inquiries about their data stores without touching or affecting the operation of the system.
The Data Marts are subsets of data from a data warehouse for specific areas.