Among the obstacles is creating a data architecture that works across legacy systems, new digital solutions and third-party data providers. As data management increasingly moves into the cloud and tech stacks become more vertically integrated by vendors, tension is growing between the flexibility required for digital-first processes and the lock-in that can be a consequence of proprietary vendor data models.
Added to this is the potential loss of third-party cookies and the growing power of digital walled gardens. All of which make for difficulties when trying to map the end-to-end customer journey or gain a unified view of the customer.
This white paper considers the challenges of enterprise-wide data integration across the data estate. It draws on a DataIQ Leaders roundtable discussion that was held in early 2020.
Five principles for data integration projects
A common point of agreement around data integration is that it should not be viewed as a technology project. If it is, then the solution will be led by technology, rather than by the business. This is not always easy to avoid given how many businesses are organised, for example, a digital services business where data engineering sits under the chief technology officer, with everything else under the chief data officer. As the member from that organisation observed, “that creates issues and shouldn’t happen.”
So the first principle of data integration is:
1 – Identify the data needs of the business and map technology against them.
The outcome from this will differ from company to company, with some working to reduce the technology stack as a specific goal, with others simply adopting whichever solutions will deliver against those needs. In either case, the key is to avoid being sold a solution ahead of understanding what processes it will support.
In parallel to this, the second principle of data integration is:
2 – Create a data strategy that is adopted enterprise-wide.
At one insurance provider, the arrival of a new CEO saw an engagement with a strategic consultancy which included the creation of a data strategy. For the first time, senior stakeholders, including the CIO, recognised that this was core to delivery of the CEO’s vision and an enabler of critical business processes.
This will require cross-functional co-operation, which is a third principle of data integration:
3 – Partner to ensure production.
One of the biggest obstacles facing all data integration projects, but especially any innovative data-driven or data science-based ones, is moving them into the production environment. Often, data scientists lack the experience to code for this, while IT is highly sceptical and wary of anything that might interrupt core operating processes. Close partnership, up to and including secondment of technology leads into the data office, can narrow the gap that needs to be bridged between model and live delivery.
This does not presume that IT has sole ownership of deliverable solutions. In one example, two models were build for the same project – one entirely by the data science team, the other in partnership with IT. As it turned out, the data science version performed best and was adopted, introducing Python into a production environment for the first time.
Building cross-functional bridges will also enable the fourth principle of data integration:
4 – Share data, share data ownership.
A critical goal for any data integration project is to release data sources from restrictive ownership by any single function, which may view them as a source of power. Integration has a multiplying effect on the value of each piece of data which is a significant upside. But the project must avoid the risk of the data office taking sole responsibility for all the problems inherent in data. It should govern from the centre, but rely on each function to support standards and quality. As one member put it, it is a form of data socialism: “The new data principle is for functions to give what data they have and take what data they need.”
The final principle for data integration links back to the first principle:
5 – Build for business benefit.
All businesses have an extensive range of projects seeking funding. Even where specific investment budget has ben set aside for data-driven initiatives, there will still be more project outlines than can be attempted in any given cycle. To prioritise, some understanding of the benefits must be arrived at, even where these are foundational, rather than financial.
This is especially true for innovative projects that can struggle to specify a return on investment in advance. As one organisation noted about data science, “it is difficult to talk money with an unproven solution. There is a risk that you over-promise and disappoint or under-promise and don’t get the project prioritised.” Where stakeholders have been engaged from the outset and their data requirements listened to and built into a project, their support will help to overcome this hurdle.
It is also worth noting that data integration should be something that benefits the business primarily and is not just done because a vendor has proposed it. What is needed is a true partnership so that the solution brings the new data-driven processes alive.
Case study: Data integration in a digital services company
The company provides price comparison services and is marketing-led with a strong focus on personalisation and next best action. While these are not complex requirements, it is also exploring more cutting-edge technologies, such as machine learning. It has a vision to deliver a customer experience that is personalised and consistent across devices and channels.
To achieve this, it has worked with a digital marketing consultancy to review and benchmark the organisation, its technology and ways of working. As a heavy investor in online advertising, certain solutions are essential to its operation, but the review risked increasing its tech stack, rather than consolidating it. It was also keen to avoid vendor lock-in.
Although already 20 years-old as a business, its operating model is relatively simple with partner brands receiving an offline file of leads generated by its online marketing. Ensuring consistent tracking and reporting of the customer journey to the point of conversion is critical, with the revenue impact of even a single tag failing for one day being significant. Implementing new tags, such as adding a pixel for a new publisher, was lengthy, requiring as much as two months to reach production. This reflected the fact that each product has its own team with a technical lead, solution architect, developer and so on. Co-ordinating across all 25 delivery managers was contributing to those delays and acting as a blocker on creativity and the adoption of new channels. It also meant reports were not fully aligned, with some data sources in walled gardens limiting visibility of performance.
Off the back of its technology review, the company decided to reduce its data tech stack from three down to two by migrating out of SQL into an on-premise Cloudera Hadoop platform. But having migrated only 10% of its data, the capacity of that platform was exceeded, so a second migration into AWS was undertaken.
Although the data integration did not simplify the underlying data estate, it did lead to greater alignment of core metrics. Landmark events have been a focus and control of business rules to define them has been achieved by the data function. If those steps in the customer journey are working, then it indicates that the whole journey is likely to be ok.
The BI team has established rules for reporting and has identified where they vary from historical KPIs or where incorrect metrics were being produced or not tracked, such as when product teams added call to action buttons and they were not picked up. All of these have been documented and discussed with the relevant product teams.
Alongside the migration into AWS, the company also adopted Tealium as its tag management solution. This was because it is vendor agnostic and simple to deploy, with data portable to other solutions, such as segment tags, and also supports event streaming. This has brought implementation time down to an average of two hours.
A critical next step is to create an API-based “build and serve” approach to models. It has identified that building is not the problem, but integration of data sources and productionising the model can be. By gaining sign off from product teams when they adopt a new model, it can be served into other teams, such as when CRM built a price prediction model that can also be used in acquisition, there removing duplicate effort.
Throughout, product teams were asked about their requirements around functionality, coding and reporting which helped to accelerate implementation as they were already supportive. Communication was constant, especially an emphasis on how the new data layer was beneficial as it meant there was only one new thing for the teams to learn.