Publications
Talks
Teaching
Software
Misc
2025-05-16
Supervised by Victor-Emmanuel Brunel and Nabil Mustafa
High-dimensional data are common in fields like genomics, image processing, and social network analysis. They present significant challenges in computational and statistical analysis. Datasets of larger and larger size in these different fields increases the need for efficient data approximation tools. Even if the dimension of data might be large, their intrinsic dimension, that is, the minimal dimension of the information they bear might be small. This is represented through concepts such as VC-dimension that we discuss in this work. We focus our work on finding efficient algorithm for large datasets of fixed intrinsic dimension.
In this thesis, we present algorithms to compute some fundamental structures of combinatorial data reduction: \(\varepsilon\)-approximations, \(\delta\)-coverings, low-discrepancy colorings and low-crossing partitions. In particular we study the following three problems.
In the first part, we present a new two-player game and show the existence of an almost optimal strategy using the celebrated Lovett-Meka discrepancy algorithm. We then present a multiplicative weights algorithms to compute a family of small average discrepancy colorings that uses our game's value in its analysis.
In the second part, we present a new low-crossing partition algorithm for general set systems. This is the first instance of a practically fast algorithm to compute low-crossing partitions for general set systems, as previous results are limited to set systems spanned by halfspaces in low dimension.
Finally, we present new algorithms to compute near-minimal \(\delta\)-coverings of finite VC-dimension set systems. These algorithms are the first non-trivial algorithms to obtain \(\delta\)-coverings of minimal size. We present applications of it to low-discrepancy coloring and \(\epsilon\)-approximation, deriving faster algorithms matching the optimal discrepancy of finite VC-dimension set systems: \(O(n^{1 / 2-1 / (2d)})\).
A New Discrepancy Game with Victor-Emmanuel Brunel and Nabil Mustafa. https://hal.science/hal-05064760 [pdf]
A Greedy Algorithm for Low-Crossing Partitions for General Set Systems with Mónika Csikós and Nabil Mustafa. In 2025 Proceedings of the Symposium on Algorithm Engineering and Experiments (ALENEX) (pp. 209-220). Society for Industrial and Applied Mathematics. [pdf]
Name | Affiliation | Role |
---|---|---|
Frédérique Bassino | Université Sorbonne Paris Nord | Examinatrice |
Victor-Emmanuel Brunel | École Nationale de la Statistique et de l'Administration Economique | Directeur |
Sergio Cabello | University of Ljubljana | Rapporteur |
Mónika Csikós | Université Paris Cité | Examinatrice |
Yan Gérard | Université Clermont Auvergne | Examinateur |
Guillaume Lecué | École Supérieure des Sciences Economiques et Commerciales | Examinateur |
Nabil Mustafa | Université Sorbonne Paris Nord | Directeur |
János Pach | Rényi Institute of Mathematics | Examinateur |
Jeff M. Phillips | University of Utah | Rapporteur |