Back in the 1990s, I was an industry analyst covering data warehousing and business intelligence at Giga Information Group (subsequently acquired by Forrester Research). It continues to surprise me how little these technologies have changed since that time. Organizations today are still struggling with some of the very same complexities when it comes to building a data warehouse and continue to find the allure of a shortcut like data virtualization very appealing.
What is Data Virtualization?
What data virtualization is trying to accomplish is not new. Just like the virtual data warehouse concept that emerged around thirty years ago, data virtualization enables you to run queries directly against the source system(s) instead of moving the data into a physical data warehouse.
Software makes data virtualization development, operationalization, and management easier than in the past by providing at least some of the following capabilities:
- Abstraction – Abstract the technical aspects of stored data, such as location, storage structure, API, access language, and storage technology.
- Virtualized Data Access – Connect to different data sources and make them accessible from a common logical data access point.
- Transformation – Transform, improve quality of, reformat, aggregate, etc. source data for consumer use.
- Data Federation – Combine result sets from multiple source systems.
- Data Delivery – Publish result sets as views and/or data services executed by client application or users when requested.
Why You Still Need a Data Warehouse
Although software solutions have made data virtualization simpler than it used to be, data virtualization runs up against some of the very same issues as the virtual data warehouses of old—particularly if you’re attempting to use this approach to replace a data warehouse rather than as a complement to a data warehouse for certain use cases.
Here are some of the top reasons why you really do need a data warehouse:
- In a data warehouse, reporting and analytics can occur without negatively impacting the performance of your operational systems. As just one example, imagine how slowing down your electronic commerce applications might impact sales.
- A data warehouse provides better performance for analytical queries than transactional databases that are designed to read and write individual rows efficiently. Adding to this, it is nearly impossible to achieve acceptable performance when queries involve complex, high-cardinality joins and aggregations across source systems
- A data warehouse supports long-term archival of transactional data. This provides two benefits: First, source systems can be purged of old data to ensure continued high performance; second, historical data serves as a foundation for many analytical requirements, particularly in artificial intelligence and machine learning scenarios where historical data is often required for valid results.
- Unlike data virtualization, a data warehouse can still deliver insights when source systems are offline or unavailable. Data virtualization attempts to overcome the problem of source system unavailability through data caching mechanisms, but it’s simply not feasible to cache everything that may be required.
- Extract, Transform, and Load (ETL) and data quality tools used in data warehousing help handle complex transformation requirements and resolve data quality issues that data virtualization doesn’t address.
The Role of Data Virtualization
Although not a replacement for a data warehouse, data virtualization is a valuable addition to help address many obstacles, including situations when:
- You can’t move your data into a data warehouse due to compliance restrictions
- You have too much data at the edge to move into the data warehouse
- You need to accommodate unplanned queries that require access to data not stored in the data warehouse
- You require multiple passes over in-memory data to support iterative processing requirements.
Data virtualization may have its uses as an adjunct to a true data warehouse, but it’s no replacement for one. To learn more about the benefits of the data warehouse, I suggest that you read Data Warehouse vs Database – Which Should you Choose? While shortcuts won’t cut it when it comes to data warehousing, these best practices will help.