New approaches to data science

Supplementary content information

A team of EPSRC-funded mathematicians, statisticians and computer scientists are driving the development and application of topological data analysis to improve existing data science techniques and solve real-world problems.

Data is everywhere now; huge volumes of high-quality data are being generated every day, and the pace is accelerating. However, because data is often complex, high-dimensional and may include temporal and spatial information, extracting and interpreting it using standard machine learning or statistical techniques is not always easy or even possible.

A multidisciplinary team of mathematicians, statisticians and computer scientists is working on a research project funded by the Engineering and Physical Sciences Research Council (EPSRC) that aims to improve the understanding, interpretation and application of data. The project is developing new mathematics and algorithms in order to explore the shape of data – the manner in which data falls into groups – and build on existing data science techniques.

Until recently, it has been difficult to do this. But, advances in computation and algorithms have enabled topological data analysis – a field of mathematics that uses methods of topology and geometry to study shapes – to grow.

Heather Harrington, professor of mathematics at the University of Oxford and co-principal investigator of the research project, explained:

“People are generating such great data, but the data is ever more complex and often noisy. We are working with scientists and practitioners in academia and industry to help them make sense of their data and unlock hidden patterns and insights. We are getting into the problems they are grappling with and helping them look at their systems in entirely new ways.”

The researchers, a team of 50 experts across sites in Oxford, Liverpool and Swansea, are working at The Centre for Topological Data Analysis, a new facility that is supported by the EPSRC grant New Approaches to Data Science: Application Driven Topological Data. The Centre has partnered with leading practitioners in academia and industry and is already researching urgent, topical real-world problems, such as the COVID-19 pandemic.

Harrington said: “We have been involved in a large consortium of researchers and clinicians, looking at molecular data from COVID-19 patients. There are 17 different types of experiments they measure for each patient, including gene expression, protein levels and immune cell populations. There is so much data and we are hoping to answer many questions; for example, can we identify molecular signatures that predict disease severity.”

Biomedical science is one of the main areas of focus for the research team. They are currently using topological data analysis to study immune response, oesophageal cancer, and the growth of blood vessels near tumours, for example.

There is a small but rapidly growing community of people working in topological data analysis in the UK. Harrington hopes the project will enable that community to flourish and establish an international reputation. The project team hosted an international conference soon after the project started and plans to hold a larger one in 2022. Harrington wants to raise the profile of topological data analysis and integrate it with other data science approaches.

Harrington said: “We are extending the toolkit of data analysis and modelling techniques with topological data analysis. We believe these techniques will be enormously useful to the study of homogeneous data and complicated models, so the impact will reach scientific communities beyond the mathematical sciences.”

Image credit: © AZ Goriely