Data Warehouse Architecture
The effective, efficient, and economic management of data is essential for an organization’s success. Data supports expert opinions of people and provides input into decisions of emerging technologies such as machine learning, business intelligence, and artificial intelligence solutions.
The practice of managing historical and cumulative data can be difficult. Data collected from numerous sources in various formats and following different naming conventions overall present a challenging situation for the organization. This makes it difficult to give data a consistent meaning and to provide accessibility to people and applications that use the data to make decisions. A data warehouse architecture can help with these complexities.
What is Data Warehouse Architecture?
Data warehouse architecture consists of planning, designing, constructing, and managing daily operational processes for how data is used for organizational intelligence and decision support. A data warehouse architecture helps create a single source of truth for large volumes of data derived from various and different data sources. Data is then transformed into information and information is transformed into knowledge for analytics within the data warehouse architecture.
The data lifecycle includes data collection from identified sources, data integrity management and reconciliation, data storage, data transfer, and continuous improvement of data relative to organizational maturity, analytics, and decision needs. The data warehouse architecture must support these activities and other aspects of data lifecycle management.
Data warehouse architectures are usually designed to be stakeholder-oriented such as for sales, marketing, and others. Although using common data, each stakeholder has different modeling and data analysis needs for their decisions. This includes people using various tools as well as how technologies or applications consume data for translating the data to information and decisions.
Type of Data Warehouse Architectures
It is not a good practice to support analytical processing with a transactional database because of performance challenges. Transactional databases are optimized for processing huge volumes of transactions in real-time while analytical databases are optimized for long-running, resource-intensive queries. For this reason, transactional data should be an input to the data warehouse database rather than supporting both transactional and analytic needs.
There are different data warehouse models such as:
Basic data warehouse architecture – Single Tier
This architecture minimizes the amount of data stored and data redundancies. It is not commonly used but may meet the needs of some small organizations that do not require enterprise access to data. Performance issues often occur when analytical and transaction processing are not separate.
Data warehouse architecture with a centralized repository – Two Tier
This architecture uses staging to extract specific data, transform data for usage, and load it into a data warehouse. This process is called Extraction, Transformation, and Loading (ETL). One of the extraction sources can be from a transactional database. Information is saved to one logically centralized individual repository, a data warehouse, that is paired with analytical tools. Data marts may be included in a two-tier data warehouse architecture to deliver focused business user applications.
Data warehouse architecture with a centralized repository, and an OLAP Server – Three Tier
This architecture adds an On-Line Analytical Processing (OLAP) Server to the two-tier design. This middle tier provides an abstracted view of the database for the end-user and helps with system scalability and performance.
In each data warehouse architecture listed, there is always room for additional optimization, such as using clusters to decentralize how data is managed and processed. This could be useful for challenges relative to data governance, locally or internationally. Data warehouse architectures could include bus, hub- and-spoke, and federated models to solve specific needs.
The following diagram shows a three-tier data warehouse architecture. The data warehouse structure can be modified at each level to fit more like components, such as an increase in the number of data marts to support additional functional units in the organization.
Data Warehouse Infrastructure Diagram
The main components of a data warehouse architecture are:
- Data Sources – databases and other files, including a transactional database
- The Data Warehouse itself
- Data Marts – for specific stakeholder analytical capabilities
- OLAP Server – enables fast, flexible multidimensional data analysis
- Tools that stakeholder uses to access analytics (applications)
One of the values of architecture in data warehousing is simplicity. An organization can start with a basic structure using few components and add more later into various parts of the architecture as the data strategy evolves. Basically, keep the design structure and expand the specific elements such as data sources to add depth and breadth to the solution.
Properties of Data Warehouse Architectures
Data warehouse architectures should focus on analytical processing. Transactional processing should be done separately using a different database. A transactional processing database should be a data source for the more extensive data warehouse.
Other properties of the data warehouse should include:
- The ability to scale the use of data for analytics quickly. This can be an essential factor for the prevalence of derived analytics incorporating the most recent data for the specific decisions that need to be made.
- The architecture should easily support additional data without redesigning the entire system.
- The data must be adequately secured. The data warehouse contains data about the entire organization. Compromise here is risky and could be very costly.
- Extraction, Transformation, and Loading tools should support different data sources.
- The architecture management should not be overly complicated and should be simplified for ease of use and better analytical outcomes.
- The data warehouse architecture in data mining applications should use trusted data that has been adequately extracted, transformed, and loaded into the data warehouse. Data mining tools that do not have good data will only return inaccurate results.
- As the organization matures and understands how to use data, the data warehouse solution should have the ability to transform quickly to accommodate changes.
Data warehouse architectures should also deliver a level of warranty relative to availability, security, capacity, and continuity of usage. These elements of service warranty for the data warehouse should also include usability and performance.
The data warehouse should easily support tools and applications such as reporting, data mining, and application development tools.
Traditional Data Warehouse vs Cloud Data Warehouses
As mentioned, a data warehouse is a collection of data from various sources, reconciled to form a more extensive data warehouse for primary analytical processing to support decisions for multiple stakeholders within the organization. The difference between a traditional data warehouse vs a cloud data warehouse is related to the general power of using cloud-based computing.
Cloud data warehouses allow the organization to:
- Take advantage of unlimited storage, rapid elasticity, and scalability
- Improve flexibility for supporting different architectures
- Improve mobility and access to data
- Support Big Data analytics better than typical on-premise solutions
- Deploy more quickly than on-premise solutions
- Gain more full-proof disaster recovery
- Pool IT resources more efficiently
Organizations can also be creative and use a hybrid solution leveraging the best of on-premise and cloud architectures to support their data warehouse outcomes for various stakeholders.
OLAP solutions can be leveraged for either architectural solution. OLAP allows multidimensional analysis of data warehouse data, information, and knowledge to support complex modeling and trend analysis of the data warehouse solution. Business Intelligence (BI) and decision making across all functional areas in the organization that utilize data warehouses can leverage OLAP for quick, fast, effective, and responsive analytics.
Success with data warehouse solutions relies on understanding organizational decision needs. Each stakeholder should be treated differently since how and when they make decisions varies. When possible, enable end-user self-service to make configuration changes in what and how data is accessed with their applications. The stakeholders will have to give feedback for ETL processing to make sure the data is understandable and meets their needs. Stakeholders and data warehouse support must work together collaboratively and in a coordinated way to manage, evolve and transform the data and the data warehouse into an effective, efficient, and economical solution for the organization.