What is data virtualization (DV)

The main motto for Data Virtualization is: recipients, tools and applications are independent of the physical and logical data structures..
The traditional Business Intelligence, Data Warehouse and ETL apporach has been always linked to issues with excessive data growth, data silos, moving data from one pile to another, complexity of the data and IT development process and latency of the data (meaning in most cases daily DWH loads).

Data virtualization from the other hand is oriented on real time (or near real time) 'light' processing, big data techniques, getting most of external data sources, in many cases that means data in the cloud. DV development process is light and not very resource and cost consuming. A Data Virtualization platform is a bridge between IT and the business users and in ideal scenario can be used in a self-service mode by the business users.

What data virtualization means in practice ?

The key features of a data virtualization platform:

  • Canonical business views of the data - meaningful business representation of the data, high-level entities and relationships between them.
  • Pre-integrated information for discovery and self-service, fast and near real-time access, flexible and accurate.
  • Performance is expected independent of the type of system, location, technology.
  • Easily handling multiple data sources and systems - different query languages, data models, security approaches.
  • A quick answer to dynamically changing business conditions like acquisitions, mergers, etc.

Benefits of data virtualization

The key DV benefits:

  • Information quality - data silos integrated easily, including social, unstructured, cloud, web, big data
  • More agile and flexible operation - adapt easily to changing environment thus lower the costs of maintenance
  • First results visible quickly - a POC can be delivered in days, projects in weeks and the total ROI usually doesn't exceed half a year

Data virtualization architecture

The most common data virtualization layers include data source connection, intermediate layer processing, bringing data to a canonical form and making it available for reporting through a publishing layer. Data virtualization architecture - layers

Data virtualization vs ETL

Extract,Transform,Load (ETL) Data Virtualization (DV)
  • ETL should be still the preferred option when materialization of the data is still needed
  • ETL processes in most cases move data from one database or data warehouse layer to another (staging, cleanse, core, data marts, etc.). Grouping, sorting and other transformations are performed within the ETL tool
  • Rigid structure and quite big effort to adjust the structure
  • Upfront project planning, changes are problematic
  • Applications performing data mining or historical analysis supporting of strategic planning or long term performance management
  • Data virtualization will show its strength when data can be virtually accessed (on-the-fly)
  • Data virtualization leaves the source data where it is and delegates the queries down to the source systems. The key to success is to push sorting, aggregation and joins operations to the source DB before the rows are returned
  • Logical data model can be modified quite easily
  • Flexible and dynamic project planning, frequent changes are accepted and normal
  • Operational decision support applications, managing inventory levels or applications that need to report near real time updates of the data
  • Although the two concepts are different, in real-life scenarios ETL and Data Virtualization are complementary and most organization will benefit from having both solutions in house.

    Data Virtualization software