Contributions to stochastic bilevel optimization

Mathieu Dagréou

October, 2024

Abstract

Bilevel problems are a type of optimization problem characterized by a hierarchical structure. In these problems, one wants to minimize an outer function under the constraint that some variables minimize an inner function. These problems are gaining popularity in the machine learning community due to their wide range of applications, such as hyperparameter optimization. In this thesis, we explore approximate implicit differentiation-based algorithms to address bilevel problems where both the outer and inner functions are empirical means over potentially large sample sets. This empirical risk minimization framework is a classical approach in many machine learning tasks. First, we introduce a general algorithmic framework that allows any first-order stochastic solver, initially designed for single-level problems, to be adapted to bilevel problems. We provide and analyze two instantiations of this framework:an adaptation of the stochastic gradient descent and of the SAGA algorithm to bilevel problems. Our analysis demonstrates that these algorithms have complexities comparable to their single-level counterparts. Then, we interest ourselves in the complexity of bilevel optimization in the nonconvex/strongly convex setting. We propose an algorithm class that includes several stochastic approximate implicit differentiation based methods and establish a lower bound on the number of oracle calls required to reach an approximate stationary point. Then, we propose an algorithm whose complexity matches this lower bound. The performances of the proposed methods are evaluated numerically by a benchmark comparing those methods to other bilevel algorithms on quadratic functions, hyperparameter selection problems, and the datacleaning task.

Type