Research
Simultaneous outlier-exclusion and distributionally robust learning through partial optimal transport
10/2025
Teach robust models to separate truly misleading data from unusual but useful surprises.
Paper URL: https://www.sciencedirect.com/science/article/pii/S0098135425004119
Introduction:
Real industrial data often has two problems at the same time: a few samples may be genuinely misleading, and even the remaining samples may not perfectly represent future conditions. This paper brings those two issues into one learning framework. Partial optimal transport is used to choose the subset of data that the model should trust most, while distributionally robust optimization prepares the model for uncertainty that still remains after this filtering step. The result is a training method that does not simply panic when it sees strange data, but also does not blindly trust every point. The framework is developed for both regression and classification, and is tested on synthetic examples and chemical process datasets.