DataIQ (DIQ): Firstly, can you outline the context in which annotation is applied at Bloomberg?
Tina Tseng (TT): Bloomberg applies annotation to complex data sets in a variety of areas relevant to our clients, such as finance, news, law and government. Annotation is used to enrich information that is directly accessed by our clients (such as adding metadata to make search functionalities more robust) or to provide training and/or evaluation data for automation solutions like machine learning.
DIQ: Is this typical of where the need for annotation is likely to arise in other organisations? Do you see common issues for this type of work?
TT: Yes. Machine learning, in particular, is being used by many organisations to present information intelligently and efficiently in ways that are not possible at scale by human curation. Machine learning models need annotations that are both accurate and consistent – accurate annotations ensure that the downstream product meets users’ expectations, while consistent annotations are required for pattern recognition.
Failing to take the right strategic steps during the annotation process can affect both annotation accuracy and consistency. For example, if the annotation requirements are not fully described or are subject to significant interpretation by the workforce producing the annotations, the resulting annotations will likely vary greatly and may not conform to the desired end goal. Establishing clear, written guidelines to explain how to handle both common and edge cases in the data set can help to ensure that the annotation workforce has the same and correct understanding of what is required.
DIQ: Do data scientists anticipate the level of effort and resourcing that will be required by annotation when embarking on a machine learning project? Does it need to be better understood/taught?
TT: Annotation is usually learned through practical experience and not through formal training. For our “Best practices for managing data annotation projects” guide, we collected feedback from more than 30 experienced annotation project managers who work in Bloomberg’s Global Data department. This allowed us to determine recommended strategies based on successfully-executed projects in a large variety of contexts and for different products.
Bloomberg’s Global Data analysts collaborate with data scientists to plan and implement annotation projects because they often have the subject matter knowledge needed to fully understand the complex data sets that the data scientists must leverage for their machine learning models.
The level of effort and resourcing required for an annotation project can differ greatly depending on the data set and desired end goal – there is no one-size-fits-all approach to an annotation project, but there are common milestones and best practices. To understand the scope of an annotation project, it is important to take the time at the outset to explore the relevant data set to understand its features and limitations, to draft an initial set of annotation guidelines, and then to undertake a limited pilot period where a small sample of the data can be annotated using the guidelines.
These initial steps are critical for project managers to ascertain the complexity of the task and the type of resources needed. Reviewing the results from the pilot period and debriefing with the testers to get their feedback on the task and any areas of confusion can help project managers refine the project guidelines, revisit their choice of workforce, and anticipate/address pain points.
DIQ: How is this part of data science viewed by practitioners – is it just considered as “grunt work” that has to be got through, or accepted as a key foundational activity for machine learning?
TT: As machine learning becomes more prevalent, the importance of accurate and consistent annotations has become more widely recognised. For supervised learning, the accuracy and consistency of the annotations used for training data are key for ensuring the model works and the desired data patterns are recognised. For unsupervised learning, the accuracy and consistency of annotations are just as important for evaluation purposes – having the ability to use annotations to drive quantitative evaluation techniques for tracking trends in the model’s performance is critical.
Quite often, machine learning researchers do not possess expertise in annotation best practices. But those who do – or those who develop it – can achieve recognition for providing high quality annotated data for other researchers.
DIQ: How important is domain expertise in the data subject area, ie, do stakeholders/SMEs need to be involved to contribute to and assess the quality of labels being developed?
TT: Like the required effort and resourcing for an annotation project, the domain knowledge required depends on the data set and desired end goal. If the data set is esoteric and the use case is highly specific, subject matter experts may be required. For example, I work on projects that support Bloomberg Law, which is Bloomberg’s research platform for delivering news, content and analytics to attorneys and other legal professionals.
Because lawyers like myself understand the nuances of legal text and what language may be legally significant, we work together with data scientists and other machine learning engineers to design and implement annotation projects that extract specific information from documents like court opinions. We also understand what legal professionals are looking for when they are doing research and how they would want relevant information to be surfaced.
For these projects, the data set and use case dictates the need for a fairly high level of domain expertise to determine what labels are needed and how they should be applied.
DIQ: What impact do you see from approaching this in the right way – are there examples of positive impact at Bloomberg?
TT: Annotation is the essential driving force of all of our machine learning initiatives, and that is why we thought it was important to highlight the strategies that have worked for us in our best practice document – so that others within our company and beyond could benefit from our experience and expertise.
We have successfully leveraged machine learning in so many different areas at Bloomberg, from automating some of our internal processes in order to make them more efficient and scalable to developing new user-facing features that enable clients to find the right information they are seeking more quickly. On our Bloomberg Law platform, for example, we use machine learning to help our customers retrieve valuable documents that are buried in court dockets, to understand trends in how judges have ruled on specific motion types, and to quickly find support for arguments they want to present to courts on behalf of their clients, among many other things.
One of our products, Points of Law, received the 2018 New product award from the American Association of Law Libraries because it presents legal information in a way that has never been done without human curation before: the tool automatically aggregates legally significant language from millions of court opinions and makes the relevant text searchable.
Without good annotations, none of these machine learning initiatives would have been possible. As we say in our guide, “garbage in, garbage out”.
DIQ: When creating the best practice guide, did you draw on general principles of project management and quality control that might apply to any data and analytics project, or were you mostly guided by the specific requirements of annotation?
TT: Even though we specifically focused on the context of annotation projects, I think there are a lot of themes in our guide that apply to any data project, such as engaging with all stakeholders, establishing clear project requirements and milestones that align with the desired end goal, implementing a sound quality control methodology, and ensuring resource allocation for monitoring and quality assurance in perpetuity because data changes over time and needs to be updated.
DIQ: Communication is a core aspect of your guide – are data scientists generally adept at explaining and responding to queries/feedback? Do you hope the guide will impact this?
TT: I don’t think there has been enough emphasis on soft skills when it comes to annotation and machine learning. I hope that our best practice guide conveys the importance of effective collaboration and clear communication as keys to the success of any annotation project.
Understanding different stakeholders’ perspectives can help facilitate communication and that is why one of the initial steps we recommend is to identify and meet with all key stakeholders in order to define project parameters and goals, thereby ensuring everyone is aligned from the very beginning of the project.
We also recommend setting up methods like scrum, Kanban, or the dynamic systems development method (DSDM) to maintain ongoing communication throughout the project so that strategic decisions are made with input and buy-in from all stakeholders. Having structure around communication, like a set meeting cadence, can encourage the exchange of information and the regular incorporation of feedback.
DIQ: Is the need for human intervention to label data in this way a temporary requirement – could it be automated via machine learning itself at some point?
TT: There are certainly ways to bootstrap annotations, as well as unsupervised and active learning techniques. However, I think there will always be tasks where humans are still needed to provide the ground truth for supervised learning.
Furthermore, as long as the end-users are humans, humans will often be needed to provide annotations to evaluate machine learning models to ensure that they are delivering on their downstream objective and meeting end-users’ expectations.