Data movement for any organization has always been an expensive process. All or most of the projects involve movement of data from one system to another, usually from external systems. Every company is thinking through various approaches to solve this issue, where physical data movement is reduced while still providing the end user with the consolidated data for their needs.
Every company has to adjust to the change. There has been billions of dollars spent annually by business and governments to maintain integration across different applications with ever changing needs and continuous upgrades/changes to keep pace with the change in sources.
What is Data Federation?
One approach that is gaining traction is consolidation of data from disparate sources, without physically moving the data, thereby significantly reducing the costs incurred by the enterprises. This approach, known as Data Federation, should be considered as an alternate to ETL for Data Integration. Data Federation with application of canonical modelling will give organizations an excellent boost in the data management space to solve problems related to information governance.
Data Federation vs Data virtualization vs Data Lake
Data Federation is a method to provide access to data which is spread across disparate sources. It provides a data entity which enables meaningful data usage, while eliminating the multiple staging areas with complex data integrations. By contrast, Data Virtualization is a technique in which the end user accesses all data using a layer of abstraction called metadata, where the end uses should be able to use data without the know-how of where the data resides.
These days both of these terms, Data Federation and Data Virtualization, are used interchangeably. They are also synonymous with Enterprise Information Integration (EII)
A third approach for data consolidation is known as the Data Lake. In this process all the data will be dumped into one single data store without reconciling and rationalizing the relationships so that it can be used later. However there could be issues when the data is being used, since we allowed bad data to enter our Data Lake without exactly knowing whether it is fit for use or not.
- No movement of data.
- Multiple copies of data storage can be reduced or eliminated.
- Data when stored in multiples databases is expensive across enterprise.
- Security issues can be resolved with single view of data and application of controls at one single data federation layer.
- Data can be orchestrated in the federation layer with real time access.
- Easy to build and develop (Agile).
- Data can be published in multiple formats based on end user needs.“
- Worse performance than traditional DB systems.
- Performance bottlenecks can occur on the Data Federation application server .
- Only amount of data available in the source systems can be queried.
Data Federation is a good idea when there is a need for data to be access from multiple sources, e.g., for Business Intelligence, reporting or analysis of applications in real-time. However it is not recommended to be used on systems where heavy transaction processing happens. Not all scenarios can be solved using Data Federation technology.
We can use one of Cisco’s Data Integration Strategy Decision Tools to help decide when to use Data Federation, a Data Lake, or a combination of both.
A few Links for reference:
Cisco® Data Virtualization Tool has many excellent features like
- Data Integration Technology
- Query Optimization Engine
- Data abstraction
- Business Directory
Below is a representation of how a federation strategy combined with the canonical model implementation is envisioned.