By Rhubesh Goomiah, Regional Leader Victoria
Introduction
At Altis, I work with a number of large enterprise clients who are at different stages of their data transformation journeys. Most of them have started by migrating from a legacy data warehouse to a modern cloud data platform. On the rare occasion, there is no legacy centralised data solution and building the cloud data platform is a greenfield initiative. Often, in these large organisations, obtaining raw data from core sources onto the cloud data platform can be a challenging initiative. In this article, I will discuss the importance of potentially having a dedicated stream to focus on source system integrations and data ingestion into the cloud data platform.
Skill and resource requirements
Integrating data platforms with source systems usually involve a range of different considerations for each source such as:
- Where the source system is hosted (on-premise, public cloud, private cloud) and how the data can be accessed (direct access to backend database, API layer, hot / cold replica, access via other middleware etc.)
- Preferred data integration pattern (batch, micro-batch, real-time streaming, transactional replication)
- Ability to obtain required data (identify which source objects provide specific data, how to get deltas including hard deletes, does source data quality meet expectations)
- Security (how to securely connect to the source and move the required data, what is the classification level of the data being ingested and what data security rules need to apply on the data platform if sensitive or personal data is ingested)
- Performance (what is the performance impact on the source, how long does it take to extract and transfer the data)
Based on the above, there are key skills around architecture, integration and security required to successfully integrate and ingest data from core sources. This often requires resources from the data & analytics team working on these activities to actively collaborate with other parts of the I.T. organisation such as source application teams as well as architecture, infrastructure and cybersecurity teams.
Reducing the time to business value
Most data transformation programs are driven by one or more business cases seeking to achieve specific business outcomes and return on investment. These outcomes often rely on use cases to be implemented on the data platform, rather than simply having raw data on the platform. However, a lot of data initiatives in large organisations fail to anticipate the amount of effort and level of collaboration across different teams required just to have the raw data on the platform, as outlined in the previous section.
Therefore, by decoupling the implementation of foundational components such as integrating and ingesting core sources from use case delivery, it may be easier to secure a portion of the overall required funding and get started with onboarding data sources on the cloud platform. Separate funding requests can then be made for use cases while foundational components are being delivered. Other potential benefits of this approach include:
- It can enable a bi-modal delivery model and quick wins can be tactically delivered from the raw data via mode 2 initiatives.
- Use case delivery can be accelerated as multiple use cases may rely on the same underlying data sources.
It can also enable a decentralised delivery model, whereby integrations and ingestions are tackled by a central team and use cases are delivered by supplementary teams.
Conclusion
If any of the arguments that I have put forward in this article resonate with you, then you should seriously consider having a dedicated integration & ingestion stream/squad as part of your data transformation program. At Altis, we also have a set of proven frameworks that can assist this stream in fast-tracking the onboarding of data sources onto the data platform.
For a more detailed conversation, please connect with us or reach out to Rhubesh for more information.