Research carried out by Nesta in 2018, “Making sense of skills,” identified 143 skills clusters in the digital economy. Within these, data engineering had the highest annual median salary at £55,000, with a range of £43,000 to £73,000, and also the strongest growth in demand.
The Royal Society report, “Dynamics of data science skills,” published in May 2019 identified a 452% increase in demand for data engineers between 2013 and 2018, up from 1,213 to 6,699, based on analysis of 9.2 million UK-based job postings by Burning Glass Technologies. Over this period, average salaries rose by 38% from £53,449 to £73,559 (see Figure 1).
One caveat around these findings is that data engineering does not have a commonly-agreed definition for what is involved. Nesta identified data engineering as appearing under the alternative job titles as developer, java developer and devops engineer. Technical skills typically demanded for data engineers may include NoSQL, Python and Ruby, big data technologies like Scala and Spark, as well as an understanding of machine learning principles and data eco-systems.
Legacy skills are also still in demand, such as Cobol, since many organisations still operate mainframe systems that are unlikely to be transitioned into new processing environments. This will drive the use case for a data integration layer, but this can be hard to sell in. Volume of data will determine the depth of skills required, for example streaming data which is very challenging.
But soft skills are equally required due to the number of interactions with data owners and business functions involved. A consequence of this is that there is a risk of creating a new unicorn job specification that requires coding, analytics and business relationship skills.
The need for data engineers is often a reflection of maturity in the data and analytics functions as a recent discussion on the subject revealed (see below). Initially, data integration tasks will likely be handled by individuals as part of their work, such as data scientists, or by the technology function, As demand grows from business partners, however, more data sources will come into play and it will become necessary to build out the foundational data layer or create an integrated data platform. Growth in the number of users for the same data sources will also drive a need to develop a shared resource.
Data engineering in the real world
At a recent DataIQ Leaders roundtable, the issue of whether data engineers are a new permanent hire that has been identified by organisations and, if so, how best to find them was discussed. Experiences were very diverse and heavily dependent on organisational size, structure and level of maturity in the use of data and analytics. A key driver is the merger or acquisition of businesses, more than internal demand from business functions for more access to data.
Highly-mature data organisations
UK bank – advanced level of data maturity (between levels 4 and 5) originating in a data-driven, customer-facing app development project five years ago. It has created a data lake with data governance applied at the point of entry – data definitions are required for every column before a data source will be accepted. While time-consuming for data owners initially, this significantly reduces the need for data engineering and data governance in daily use. The data lake now supports 380 live apps and is supported by a team of 120 which includes data engineers. Over its five-year lifespan to date, the bank has onboarded some 300 data engineers, but many of these have been let go as projects reached completion.
Global retailer – has a data operations function within its global analytics team led by a head engineer. Is now developing a “guild” for data engineers based on the Spotify model to create direction and authority for data across the organisation and its business users.
Mid-level data maturity
Retail group – hired a group CDO six months ago to focus on resolving data silos. Currently has a head of data responsible for the “what” and a head of technology responsible for the “how”.
Landline and broadband network operator – created a centralised data team following a business merger one-and-a-half years ago. Data has been integrated into a single warehouse and will move to a cloud environment.
Mobile network operator – created a new digital team three years ago to build new revenue streams from big data following an acquisition. A digital transformation is in hand which will lead to data-as-a-service and the insourcing of data engineering skills which are currently largely supplied by its systems vendors.
Emerging data maturity
Sports rights owner – is building a global insights function following a business merger. Currently has no data engineering capacity – this is mainly managed by the CTO, with one data scientist handling their own needs.
Media owner – is undertaking a data re-platforming project following a number of business acquisitions. As part of this, common data models and standards will be applied to eliminate data silos and make data transfers easier.
Travel business – created a new data team one-and-a-half years ago which has grown from two to 25 people, but has data silos and no data engineering capability. Is planning a data lake as part of a digital transformation.
B2B publisher – has no foundational data stack with extensive silos across each brand leading to significant levels of lifting and shifting. Has identified the need for data engineering as part of a business divestment strategy and reduction in its tech stack.
Recruitment challenges around data engineers
1. Job specs are speculative
Writing job specifications can be a challenge because the projects that data engineers will be deployed against are often a “best guess” as to what the solution will ultimately be. Typically, the final solution will not be completely clear on day one. As a result, employers can default to pick lists or wish lists of technical skills, often over-specifying to try to reduce risk.
2. Garage-land engineers are not always the hot-shots they appear
Among more junior candidates, especially those who have developed their skills on home projects, experience can often be around entry-level technology and solutions, meaning they will not have worked at an enterprise scale or distributed platforms and the need for highly-efficient processing. Many Computer Science courses are now theory-based and students may never have written a line of code. One leader has never recruited a Computer Science graduate because of this gap in their knowledge.
3. Pre-loading can be an solution
One solution to the recruitment bottleneck is to work with “pre-loaded” solutions provides, such as Kubrick Group or Eden Smith. These consultancies fund training programmes for data scientists and data engineers and then assign individuals to work for clients, recouping their investment via the fees charged. At the end of a two-year period, these trained consultants are free to be employed directly by the client, stay with the consultancy, or find work elsewhere. This can be valuable because few, if any employers offer on-the-job training for data engineers and expect external candidates (especially contractors) to be up to speed with techniques.
4. Retention is limited, but so are projects
A number of data leaders reported a high attrition rate from their graduate recruitment and development schemes with candidates working for one to one-and-a-half years, then leaving for a higher salary at a start-up. Creating the right kind of environment can be a mitigating factor, even down to the level of dress code within the data function. As one leader put it, “you can’t act like Google and dress like a banker.”
But this throughput may be desirable and have the effect of bringing a constant stream of fresh skills into the business. A major role for data engineers is in ending the “data bus” effect where data is continuously being moved around the organisation. Instead, the end state will be an integrated data asset in one place. That does build in obsolescence for DEs since they will no longer be required once this type of project is realised, but new requirements may spring up or they may have to be redeployed. Automation is also increasing in the ETL and data integration space which may reduce the need for data engineers, especially if artificial intelligence comes on stream.
5. Diplomat or mechanic?
Just as happened with data scientists, the early rush to onboard data engineers with highly-technical abilities has started to be tempered by the reality that they also need to be able to communicate. Most major data projects, such as the integration of sources across the organisation or a data re-platforming, involve detailed discussions with data owners and stakeholders within business functions. For this, data engineers often need the skills of a diplomat to negotiate access, controls and standards as much as those of a mechanic capable of coding these into a system.
Conclusion
Data volumes continue to rise and the complexity of the uses for it also grows, not least with the increasing use of artificial intelligence and machine learning. As a result, organisations need to move and integrate their data at speed or, preferably, create data lakes and data platforms which remove this requirement.
Data engineers sit at the heart of these projects because they are critical enablers of these projects. This is one of the main reasons for the current overheated recruitment market and wage inflation for these roles. Unlike data science, which academia has recognised and responded to as a new educational domain, data engineering sits awkwardly across data science and computer science, but without the clear parameters of either. With no standard definition or skills set attached, this makes creating a steady flow of candidates difficult. In the short-term, employers seem like to remain reliant on internal promotion or redeployment as well as external contractors or vendors.