PhD Manuscript

2025-05-16

Authors:   Alexandre Louvet

Supervised by Victor-Emmanuel Brunel and Nabil Mustafa

Abstract

High-dimensional data are common in fields like genomics, image processing, and social network analysis. They present significant challenges in computational and statistical analysis. Datasets of larger and larger size in these different fields increases the need for efficient data approximation tools. Even if the dimension of data might be large, their intrinsic dimension, that is, the minimal dimension of the information they bear might be small. This is represented through concepts such as VC-dimension that we discuss in this work. We focus our work on finding efficient algorithm for large datasets of fixed intrinsic dimension.

In this thesis, we present algorithms to compute some fundamental structures of combinatorial data reduction: \(\varepsilon\)-approximations, \(\delta\)-coverings, low-discrepancy colorings and low-crossing partitions. In particular we study the following three problems.

In the first part, we present a new two-player game and show the existence of an almost optimal strategy using the celebrated Lovett-Meka discrepancy algorithm. We then present a multiplicative weights algorithms to compute a family of small average discrepancy colorings that uses our game's value in its analysis.

In the second part, we present a new low-crossing partition algorithm for general set systems. This is the first instance of a practically fast algorithm to compute low-crossing partitions for general set systems, as previous results are limited to set systems spanned by halfspaces in low dimension.

Finally, we present new algorithms to compute near-minimal \(\delta\)-coverings of finite VC-dimension set systems. These algorithms are the first non-trivial algorithms to obtain \(\delta\)-coverings of minimal size. We present applications of it to low-discrepancy coloring and \(\epsilon\)-approximation, deriving faster algorithms matching the optimal discrepancy of finite VC-dimension set systems: \(O(n^{1 / 2-1 / (2d)})\).

Manuscript

Based on the following publications

Other publications

Committee

NameAffiliationRole
Frédérique BassinoUniversité Sorbonne Paris NordExaminatrice
Victor-Emmanuel BrunelÉcole Nationale de la Statistique et de l'Administration EconomiqueDirecteur
Sergio CabelloUniversity of LjubljanaRapporteur
Mónika CsikósUniversité Paris CitéExaminatrice
Yan GérardUniversité Clermont AuvergneExaminateur
Guillaume LecuéÉcole Supérieure des Sciences Economiques et CommercialesExaminateur
Nabil MustafaUniversité Sorbonne Paris NordDirecteur
János PachRényi Institute of MathematicsExaminateur
Jeff M. PhillipsUniversity of UtahRapporteur