Mathematical foundations of data science and AI
Page content:
Data science refers to the study of theory, methods, algorithms, and applications focused around data, and is a highly interdisciplinary subject which relies on solid foundations of mathematical and statistical fundamentals.
Broadly speaking data science covers mathematical, statistical and computational methods for learning from data. This includes approaches from machine learning, such as neural networks, reinforcement learning, natural language processing, recommender systems and graph-based models; computational statistics, such as Markov chain Monte Carlo, sequential Monte Carlo, Gaussian processes; and applied mathematics, such as numerical approximation, optimization, and high performance computing. This group sits beside the interest groups in Statistics, Inverse Problems and Uncertainty Quantification.
Our research projects
The types of problems of interest within the department span the spectrum from theoretical to application, and include supervised, semi-supervised, unsupervised and reinforcement learning, as well as general inference and calibration problems. The scope of applications is similarly broad with applications in medical imaging, material science, geosciences, fluid dynamics, chemistry, biology and finance.
Some examples of specific projects and sub-areas include:
- Applying ideas from the calculus of variations and partial differential equations to analyse the large data limits in graph-based learning and answer questions such as "how much labeled data does one need?" and "when is the methodology well-posed and ill-posed?".
- Using the dynamical interpretation of neural networks to design and analyse improved architectures.
- Developing the Bayesian approach in order to identify when graph-based models have consistent posteriors and as a tool for uncertainty quantification.
- Multilevel Monte Carlo methods for Bayesian computation in the context where the likelihood itself is intractable -- multi-index (more than one distinct axis of approximation) and randomized (random index sets) versions are of particular interest recently, as well as the multi-fidelity context (lack of clear convergence structure).
- Methods for low-regularity and heteroskedastic data.
- Probabilistic machine learning, including the Bayesian approach to machine learning.
- Applications to data centric science and engineering and medical imaging.
More information about our research outputs and research-related activities can be found by browsing the webpages of the staff listed on the right. Potential PhD students may email academic staff directly to discuss possible projects.
Research seminars
Recommended research seminars: