Avoiding lock-in
When it comes to technology and new tools, it is easy to be swept up in the excitement and hype cycle of new developments, but there must be considerations about which platform(s) should be selected. For example, some tools are complex and require too many skills to be used effectively across an organisation.
“A challenge with selecting a commercial platform is that there is a risk of getting locked-in to vendors,” explained Simon. “Often, there are no easy ways to transfer models across platforms and the needs of the business will invariably evolve which risks outgrowing or moving away from the specialisms of the selected tool. Participants of the roundtable noted that the use of auto ML capabilities meant the lock-in risk was even higher.”
A few roundtable participants explained to the group that they had found tooling that worked for them, but they noted that the real challenges were the gaps between proof-of-concept and something fit for production. Businesses at different stages of their data maturity journey struggled with different aspects of the production lifecycle and this was partly down to these identifiable gaps that are left by different data platforms and tools. Most of the table agreed that data platforms are far more adept at the proof-of-concept aspect compared to the production portion.
Sourcing the data needed
Part of the issue implementing ML ops into a tech stack is being able to find and utilise the data sets required for success. This was a common hurdle faced by members of the roundtable as this task would usually be required at the start of a project when there was minimal capacity and an eagerness to prove proof of concept to decision makers.
There were calls from the roundtable to create data catalogues and dictionaries for internal use, as well as to improve the storytelling and data literacy capabilities of the team. Some felt they knew already where their data was located, with one business representative explaining how their organisation had been actively migrating from an on-premises system to a cloud platform. This transformation had taken over two years and was difficult, but because the data team were so involved in the process they know exactly where the data sets they require are and the lineage of the data.
A smaller, but by no mean insignificant hurdle faced by the group was that of team member churn. When an established member of the team left, there would often be a large knowledge gap in their absence that made tasks such as collating the right data slower. An issue found with higher churn is that the response times to problems are slower and the ability to fix the problems before they scale is reduced, and this is arguably something that ML will not be able to address without the human skills behind it.
The shape of an ML team
There were participants of the roundtable that had just begun their own ML ops journey and were eager to learn from those that had been there before. Someone pointed out that ML ops are a development function and there is always a risk of getting lost in the semantics, which is why the scope of roles needs to be defined and adhered to.
“One question was ‘what is the right team shape for creating and deploying ML models?’,” said Simon. “Another asked ‘how many data engineers do I need?’ and ‘what should be the ratio of scientists to engineers?’ After some discussion, the consensus was that the rule of thumb is at least an engineer per scientist to have a model working in the business. There are a lot of variables to this journey and no hard-and-fast rules.” It should become good practice for a data function to hire a data engineer and an ML ops engineer at the same time.
One member noted that organisations are frequently hampered in creating models that work within their existing organisational structures, which means the shape and scope of the team often needs to be flexible depending on aspects such as maturity, size and the project remits. Elsewhere, it is still common to see data scientists in a different part of the business to the data engineers with a level of disconnect between the two roles. “The roundtable agreed that there must be a cross-functional team to deliver greatness, but many had obstructive internal organisational structures that made this ambition hard to achieve,” said Simon.
The future of ML in a world of AI
The year 2023 will go down as the year of gen AI discussions, and it was evident that the topic of gen AI manifested across the conference, with roundtable attendees weighing up the relationship between gen AI and more classic ML. “The general feeling across the group was that gen AI and classic ML address different challenges,” said Simon. “For example, market segmentation – the segmentation task is a classic data analysis/ML activity. Gen AI can be used in this context for some of the activities such as analysing free text survey responses and would certainly be the tool of choice for helping craft marketing activities for each segment. It can also rewrite marketing copy for different segments to achieve greater levels of success.”
The consensus was that data engineers should be the ones operating gen AI tools, and not data scientists, further highlighting the need for collaborative approaches between the different roles. There must be a true and proper understanding of the creative aspects of data science, and this must be allowed to flourish with ML ops and the burgeoning gen AI era.