Actian Blog / Data Virtualization: Virtual Data Warehouse Déjà Vu?

Data Virtualization: Virtual Data Warehouse Déjà Vu?

Data Virtualization

Back in the 1990s, I was an industry analyst covering data warehousing and business intelligence at Giga Information Group (subsequently acquired by Forrester Research). It continues to surprise me how little these technologies have changed since that time. Organizations today are still struggling with some of the very same complexities when it comes to building a data warehouse and continue to find the allure of a shortcut like data virtualization very appealing.

What is Data Virtualization?

What data virtualization is trying to accomplish is not new. Just like the virtual data warehouse concept that emerged around thirty years ago, data virtualization enables you to run queries directly against the source system(s) instead of moving the data into a physical data warehouse.

Software makes data virtualization development, operationalization, and management easier than in the past by providing at least some of the following capabilities:

  • Abstraction – Abstract the technical aspects of stored data, such as location, storage structure, API, access language, and storage technology.
  • Virtualized Data Access – Connect to different data sources and make them accessible from a common logical data access point.
  • Transformation – Transform, improve quality of, reformat, aggregate, etc. source data for consumer use.
  • Data Federation – Combine result sets from multiple source systems.
  • Data Delivery – Publish result sets as views and/or data services executed by client application or users when requested.

Why You Still Need a Data Warehouse

Although software solutions have made data virtualization simpler than it used to be, data virtualization runs up against some of the very same issues as the virtual data warehouses of old—particularly if you’re attempting to use this approach to replace a data warehouse rather than as a complement to a data warehouse for certain use cases.

Here are some of the top reasons why you really do need a data warehouse:

  • In a data warehouse, reporting and analytics can occur without negatively impacting the performance of your operational systems. As just one example, imagine how slowing down your electronic commerce applications might impact sales.
  • A data warehouse provides better performance for analytical queries than transactional databases that are designed to read and write individual rows efficiently. Adding to this, it is nearly impossible to achieve acceptable performance when queries involve complex, high-cardinality joins and aggregations across source systems
  • A data warehouse supports long-term archival of transactional data. This provides two benefits: First, source systems can be purged of old data to ensure continued high performance; second, historical data serves as a foundation for many analytical requirements, particularly in artificial intelligence and machine learning scenarios where historical data is often required for valid results.
  • Unlike data virtualization, a data warehouse can still deliver insights when source systems are offline or unavailable. Data virtualization attempts to overcome the problem of source system unavailability through data caching mechanisms, but it’s simply not feasible to cache everything that may be required.
  • Extract, Transform, and Load (ETL) and data quality tools used in data warehousing help handle complex transformation requirements and resolve data quality issues that data virtualization doesn’t address.

The Role of Data Virtualization

Although not a replacement for a data warehouse, data virtualization is a valuable addition to help address many obstacles, including situations when: 

  • You can’t move your data into a data warehouse due to compliance restrictions
  • You have too much data at the edge to move into the data warehouse
  • You need to accommodate unplanned queries that require access to data not stored in the data warehouse
  • You require multiple passes over in-memory data to support iterative processing requirements.

Summary

Data virtualization may have its uses as an adjunct to a true data warehouse, but it’s no replacement for one. To learn more about the benefits of the data warehouse, I suggest that you read Data Warehouse vs Database – Which Should you Choose? While shortcuts won’t cut it when it comes to data warehousing, these best practices will help.

About Teresa Wingfield

As the Director of Product Marketing at Actian, Teresa Wingfield focuses on hybrid cloud data solutions. Prior to joining Actian, Teresa managed cloud and security product marketing at industry leaders such as Cisco, VMware, and McAfee. She was also Datameer’s first Vice President of Marketing where she led all marketing functions for the company’s big data analytics solution built on Hadoop. Before this, Teresa was VP of Research at Giga Information Group, acquired by Forrester, providing strategic advisory services for data warehousing and analytics. Teresa holds graduate degrees in management from MIT’s Sloan School and software engineering from Harvard University.

facebooklinkedinrsstwitterBlogAsset 1PRDatasheetDatasheetAsset 1DownloadForumGuideLinkWebinarPRPresentationRoad MapVideo