Olivier Thereaux, head of technology at the Open Data Institute, knows a lot about synthetic data. He and his team have been working on a UK government-funded research and development project on risk and data, with close consideration as to how we can increase access to data while managing risk and maintaining trust. This led his team to the dual topics of anonymisation and synthetic data, which they have now been studying for the past six months.
Toni Sekinah: What is synthetic data?
Olivier Thereaux: It is the idea that you can create data that resembles the real thing. By real thing, you would mean data that is gathered by observing the real world or taking measurements of the real world. The idea of synthetic data is, what if you had the ability to generate or create data, through technical or non-technical means, in such a way that it resembles the real thing. The idea is that you look at what characteristics there are in the real data and you try and make sure that the data that you create has similar characteristics.
TS: Can you give an example?
OT: You’ve got a list of people with their height and weight, but it is sensitive data so you don’t want to share that with others but you do want to give them a rough idea of what that group’s typical measurements look like. You can create another data set that has entries that are made up but where the average height and the distribution of weight is very similar to the real thing and that’s what you would call synthetic data.
Thank you for your input
Thank you for your feedback
DataIQ is a trading name of IQ Data Group Limited
10 York Road, London, SE1 7ND
Phone: +44 020 3821 5665
Registered in England: 9900834
Copyright © IQ Data Group Limited 2024