Uncertainty Quantification in the Classification of High Dimensional Data

Andrew Stuart, California Institute of Technology
April 26th, 2017 at 3:30PM–4:30PM in 891 Evans Hall [Map]

We provide a unified framework for graph based semi-supervised learning which brings together a variety of methods which have been introduced in different communities within the mathematical sciences; the unification is through an inverse problems formulation. We study probit classification, generalize the level-set method for Bayesian inverse problems to the classification setting, and generalize the Ginzburg-Landau optimization-based classifier to a Bayesian setting; we also show that the probit and level set approaches are natural relaxations of the harmonic function approach introduced in machine learning.

We introduce efficient numerical methods, suited to large data-sets, for both MCMC-based sampling as well as gradient-based MAP estimation. Through numerical experiments we study classification accuracy and uncertainty quantification for our models; these experiments showcase a suite of datasets commonly used to evaluate graph-based semi-supervised learning algorithms. Finally we study continuum limits of the problem formulations, and algorithms, arising in the infinite data limit.

Collaboration with AL Bertozzi, X Luo (UCLA), and KC Zygalakis (Edinburgh).