In this digital economy, organizations demand new and improved grain and varietal of data to enable business growth and the ability to rapidly develop and deploy new data science products.
At the same time, technology change and data migration approaches to define and reside the legacy and newly discovered unstructured and high velocity data grains are happening at a rate faster than ever before. And organizations are continually challenged to efficiently access these new data abstraction lakes and repositories to benefit their business.
Data migration for modernization is about retaining and extending the value of legacy data assets, reusing the intellectual property buried in these data layers, transforming them to more modern data architectures, and correlating the legacy data assets with new unstructured, semi-structured and high velocity data assets.
There are numerous drivers that compel an organization to explore and act on data migration for platform modernization, as well as numerous areas that influence and are impacted during the journey of this process. Some are mentioned below:
Modernizing a platform, especially from a data migration perspective, and one that has multiple data platforms residing in the ecosystem, typically involves number of adjustments to the overall data and data integration architecture. Initially, an assessment needs to be performed, preferably using an industry standard model like Zachman, etc., in order to understand the need and business scenario or use case(s) driving this change, especially from an advanced analytics perspective and newly required digitized insights.
On the top, specialty assessments need to be performed requiring the discovery that needs to be done for the as-is and to-be grains of data residing in specific databases including ODS, DWH, marts, files, MDM including PIM, CDI, DI, BI, technology stacks. This also includes analyzing the logical and canonical business models and physical data models, and how they align with the overall enterprise to-be data strategy and roadmap. This assessment will be used as a stepping stone during the blueprinting phase of the project where the more granular functional requirements, code analysis and migration scope will be drafted.
During solutioning, the data migration approach and modern data architecture design emphasis need to be paid on how to extend the capacity and capability of a data lake in order to distribute, re-distribute and correlate data grains in multiple data technology platforms in the ecosystem depending on the processing and workload requirements.
Key areas that need to be focused include: a) identification of new data assets and how they will be ingested in order to cater to specific use cases, b) future data scalability, c) guiding principles for technology platforms involved, d) data back-up, e) restore and replication approach, f) in-memory database processing requirements and capabilities, g) data auditing, h) archival and security guidelines, i) data monitoring and query optimization scenarios, j) development plan including deployment, k) scheduling and implementation, l) legacy data platform and integration processes decommissioning, m) operational maintenance, etc.
The solutioning phase will also cover defining the components of a data lake, use cases and migration guidelines for Analytics and MPP Engine and Hadoop cluster, changes to canonical models, data landing and staging layers and legacy physical database designs.
Deep dives need to be done during this phase from a data integration perspective as to how the new batch, streaming event processing and near-real time DI design patterns will be defined including their 1) topologies, 2) how the legacy pattern can be forklifted or improved, 3) what use cases require historical migrations to be performed from a relational RDBMS into Hadoop and Analytics Engine, 4) the load and unload strategies, 5) what new business and IT policies need to be adapted from a data profiling, 6) quality and governance perspective and what will be the overall approach for metadata management and data lineage.
Similar steps need to be performed from an advanced analytics and BI perspective as to what would be the top list of data mining techniques for this business- text, semantic and social analytics, visualization layer guidelines, legacy BI layer architectural change, statistical and predictive modeling approaches, tool and technology reference guidelines, etc.
Continued advances need to be performed in the area of data federation allowing organizations to extend the data architecture view to permit access to external data sources on an as-needed basis. This federated portion of the data architecture gives an organization quick access to seldom used data sources without going through the process of moving that data into the existing data lake. This approach allows data outside the existing lake to be accessed and analyzed.