Research

Simultaneous outlier-exclusion and distributionally robust learning through partial optimal transport

10/2025

Teach robust models to separate truly misleading data from unusual but useful surprises.

Introduction:

Real industrial data often has two problems at the same time: a few samples may be genuinely misleading, and even the remaining samples may not perfectly represent future conditions. This paper brings those two issues into one learning framework. Partial optimal transport is used to choose the subset of data that the model should trust most, while distributionally robust optimization prepares the model for uncertainty that still remains after this filtering step. The result is a training method that does not simply panic when it sees strange data, but also does not blindly trust every point. The framework is developed for both regression and classification, and is tested on synthetic examples and chemical process datasets.

Zhongyu Zhang