Publications
2026
-
Bandit Optimal TransportLorenzo CroissantFeb 2026Linear bandits have long been a central topic in online learning, with applications ranging from recommendation systems to adaptive clinical trials. Their general learnability has been established when the objective is to minimise the inner product between a cost parameter and the decision variable. While this is highly general, this reliance on an inner product structure belies the name of linear bandits, and fails to account for problems such as Optimal Transport. Using the Kantorovich formulation of Optimal Transport as an example, we show that an inner product structure is not necessary to achieve efficient learning in linear bandits. We propose a refinement of the classical OFUL algorithm that operates by embedding the action set into a Hilbertian subspace, where confidence sets can be built via least-squares estimation. Actions are then constrained to this subspace by penalising optimism. The analysis is completed by leveraging convergence results from penalised (entropic) transport to the Kantorovich problem. Up to this approximation term, the resulting algorithm achieves the same trajectorial regret upper bounds as the OFUL algorithm, which we turn into worst-case regret using functional regression techniques. Its regret interpolates between \(\tilde{O}(\sqrt{T})\,\)and \({O}(T)\), depending on the regularity of the cost function, and recovers the parametric rate \(\tilde{O}(\sqrt{dT})\,\)in finite-dimensional settings.
@misc{Croissant_BOT_25, title = {Bandit {Optimal} {Transport}}, author = {Croissant, Lorenzo}, month = feb, year = {2026}, keywords = {Bandit Algorithms, Optimal transport}, } -
FeatHawkes: Scalable Feature-Based Attribution For Temporal Event DataRemi Verronneau, Lorenzo Croissant, Bartholome Vieille, and 1 more authorFeb 2026Attribution, the problem of assigning proportional responsibility for an outcome to each event in a temporal sequence, is central to diverse applications ranging from marketing and seismology to sports analytics. While incorporating exogenous features substantially enhances the expressiveness of attribution models, existing approaches lack the scalability required to integrate modern machine learning. We introduce FeatHawkes, a feature-augmented Hawkes process framework for event-level attribution in continuous time. Our core contribution is a novel first-order optimization routine for Hawkes processes that leverages stochastic gradient methods, scaling favorably with both dataset size and feature dimensionality. This gradient-based formulation enables compatibility with automatic differentiation and end-to-end machine learning pipelines. We release FeatHawkes as an open-source Python library and demonstrate its effectiveness through synthetic experiments and a case study on professional football data, where the framework supports scenario-based analyses, such as evaluating the modelled effect of player substitutions in a lineup.
@unpublished{verronneau25, title = {FeatHawkes: Scalable Feature-Based Attribution For Temporal Event Data}, author = {Verronneau, Remi and Croissant, Lorenzo and Vieille, Bartholome and Perchet, Vianney}, url = {https://hal.science/hal-05284437}, note = {working paper or preprint}, year = {2026}, month = feb, hal_id = {hal-05284437}, hal_version = {v2}, keywords = {}, }
2025
-
Diffusive limit approximation of pure jump optimal ergodic control problemsMarc Abeille, Bruno Bouchard, and Lorenzo CroissantStochastic Processes and their Applications Mar 2025Motivated by the design of fast reinforcement learning algorithms, see (Croissant et al., 2024), we study the diffusive limit of a class of pure jump ergodic stochastic control problems. We show that, whenever the intensity of jumps \(\varepsilon\,\)is large enough, the approximation error is governed by the Hölder regularity of the Hessian matrix of the solution to the limit ergodic partial differential equation and is, indeed, of order \(\varepsilon^\fracγ2\,\)for all \(γ∈(0,1)\). This extends to this context the results of Abeille et al. (2023) obtained for finite horizon problems. Using the limit as an approximation, instead of directly solving the pre-limit problem, allows for a very significant reduction in the numerical resolution cost of the control problem. Additionally, we explain how error correction terms of this approximation can be constructed under appropriate smoothness assumptions. Finally, we quantify the error induced by the use of the Markov control policy constructed from the numerical finite difference scheme associated to the limit diffusive problem, which seems to be new in the literature and of independent interest.
@article{ABC25, title = {Diffusive limit approximation of pure jump optimal ergodic control problems}, volume = {181}, copyright = {All rights reserved}, issn = {03044149}, doi = {10.1016/j.spa.2024.104536}, language = {en}, urldate = {2024-12-05}, journal = {Stochastic Processes and their Applications}, author = {Abeille, Marc and Bouchard, Bruno and Croissant, Lorenzo}, month = mar, year = {2025}, pages = {104536}, } -
Pareto-Optimality, Smoothness, and Stochasticity in Learning-Augmented One-Max-SearchIn Proceedings of the 42nd International Conference on Machine Learning 13–19 jul 2025One-max search is a classic problem in online decision-making, in which a trader acts on a sequence of revealed prices and accepts one of them irrevocably to maximise its profit. The problem has been studied both in probabilistic and in worst-case settings, notably through competitive analysis, and more recently in learning-augmented settings in which the trader has access to a prediction on the sequence. However, existing approaches either lack smoothness, or do not achieve optimal worst-case guarantees: they do not attain the best possible trade-off between the consistency and the robustness of the algorithm. We close this gap by presenting the first algorithm that simultaneously achieves both of these important objectives. Furthermore, we show how to leverage the obtained smoothness to provide an analysis of one-max search in stochastic learning-augmented settings which capture randomness in both the observed prices and the prediction.
@inproceedings{Benomar25, title = {Pareto-Optimality, Smoothness, and Stochasticity in Learning-Augmented One-Max-Search}, author = {Benomar, Ziyad and Croissant, Lorenzo and Perchet, Vianney and Angelopoulos, Spyros}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {3776--3805}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, }
2024
-
Near-continuous time Reinforcement Learning for continuous state-action spacesLorenzo Croissant, Marc Abeille, and Bruno BouchardIn Proceedings of the 35th International Conference on Algorithmic Learning Theory 25–28 feb 2024We consider the reinforcement learning problem of controlling an unknown dynamical system to maximise the long-term average reward along a single trajectory. Most of the literature considers system interactions that occur in discrete time and discrete state-action spaces. Although this standpoint is suitable for games, it is often inadequate for systems in which interactions occur at a high frequency, if not in continuous time, or those whose state spaces are large if not inherently continuous. Perhaps the only exception is the linear quadratic framework for which results exist both in discrete and continuous time. However, its ability to handle continuous states comes with the drawback of a rigid dynamic and reward structure. This work aims to overcome these shortcomings by modelling interaction times with a Poisson clock of frequency \(\varepsilon^{-1}\,\)which captures arbitrary time scales from discrete (\(\varepsilon=1\)) to continuous time (\(\varepsilon\downarrow0\)). In addition, we consider a generic reward function and model the state dynamics according to a jump process with an arbitrary transition kernel on \(R^d\). We show that the celebrated optimism protocol applies when the sub-tasks (learning and planning) can be performed effectively. We tackle learning by extending the eluder dimension framework and propose an approximate planning method based on a diffusive limit (\(\varepsilon\downarrow0\)) approximation of the jump process. Overall, our algorithm enjoys a regret of order \(\tilde{O}(\sqrt{T})\,\)or \(\tilde{O}(\varepsilon^{1/2} T+\sqrt{T})\,\)with the approximate planning. As the frequency of interactions blows up, the approximation error \(\varepsilon^{1/2} T\,\)vanishes, showing that \(\tilde{O}(\sqrt{T})\,\)is attainable in near-continuous time.
@inproceedings{CAB23, author = {Croissant, Lorenzo and Abeille, Marc and Bouchard, Bruno}, title = {Near-continuous time {Reinforcement} {Learning} for continuous state-action spaces}, pages = {444-498}, booktitle = {Proceedings of the {35th} International Conference on Algorithmic Learning Theory}, year = {2024}, series = {Proceedings of Machine Learning Research}, month = {25--28 Feb}, publisher = {PMLR}, }
2023
-
Diffusion limit control and reinforcement learning, with applications to online auctionsLorenzo Croissant25–28 feb 2023We consider the diffusive limit of a generic pure-jump Markov control problem as the intensity of the driving Poisson process tends to infinity. We quantify the convergence speed in terms of the Hölder exponent of the Hessian of the limit problem. This approximation provides an efficient alternative method for the numerical resolution of these problems when the intensity of jumps is large. Considering the control of unknown systems, we extend this approach to the context of online Reinforcement Learning. Under the OFU paradigm, we leverage the eluder dimension framework for learning and the diffusive limit for approximate resolution of the planning sub-problem. Our algorithm extends existing theory from discrete processes to continuous states and actions. Our study of diffusion limit systems is motivated and illustrated by the bidding problem in a high-frequency online auction against a revenue-maximising seller.
@phdthesis{thesis, author = {Croissant, Lorenzo}, school = {Université Paris-Dauphine, Université PSL}, title = {Diffusion limit control and reinforcement learning, with applications to online auctions}, year = {2023}, url = {https://theses.hal.science/tel-04356751/}, } -
Reinforcement Learning in near-continuous time for continuous state-action spacesLorenzo Croissant, Marc Abeille, and Bruno BouchardIn Sixteenth European Workshop on Reinforcement Learning Sep 2023We consider the reinforcement learning problem of controlling an unknown dynamical system to maximise the long-term average reward along a single trajectory. Most of the literature considers system interactions that occur in discrete time and discrete state-action spaces. Although this standpoint is suitable for games, it is often inadequate for systems in which interactions occur at a high frequency, if not in continuous time, or those whose state spaces are large if not inherently continuous. Perhaps the only exception is the linear quadratic framework for which results exist both in discrete and continuous time. However, its ability to handle continuous states comes with the drawback of a rigid dynamic and reward structure. This work aims to overcome these shortcomings by modelling interaction times with a Poisson clock of frequency \(\varepsilon^{-1}\,\)which captures arbitrary time scales from discrete (\(\varepsilon=1\,\)) to continuous time (\(\varepsilon\downarrow0\)). In addition, we consider a generic reward function and model the state dynamics according to a jump process with an arbitrary transition kernel on \(R^d\). We show that the celebrated optimism protocol applies when the sub-tasks (learning and planning) can be performed effectively. We tackle learning by extending the eluder dimension framework and propose an approximate planning method based on a diffusive limit (\(\varepsilon\downarrow0\)) approximation of the jump process. Overall, our algorithm enjoys a regret of order \(\tilde{O}(\sqrt{T})\,\)or \(\tilde{O}(\varepsilon^{1/2} T+\sqrt{T})\,\)with the approximate planning. As the frequency of interactions blows up, the approximation error \(\varepsilon^{1/2} T\,\)vanishes, showing that \(\tilde{O}(\sqrt{T})\,\)is attainable in near-continuous time.
@inproceedings{CAB23EWRL, title = {Reinforcement Learning in near-continuous time for continuous state-action spaces}, author = {Croissant, Lorenzo and Abeille, Marc and Bouchard, Bruno}, booktitle = {Sixteenth European Workshop on Reinforcement Learning}, year = {2023}, month = sep, url = {https://openreview.net/forum?id=bqiRIs3fJNk}, }
2022
-
Diffusive limit approximation of pure-jump optimal stochastic control problemsMarc Abeille, Bruno Bouchard, and Lorenzo CroissantJournal of Optimization Theory and Applications 2022We consider the diffusive limit of a typical pure-jump Markovian control problem as the intensity of the driving Poisson process tends to infinity. We show that the convergence speed is provided by the Hölder constant of the Hessian of the limit problem, and explain how correction terms can be constructed. This provides an alternative efficient method for the numerical approximation of the optimal control of a pure-jump problem in situations with very high intensity of jump. We illustrate this approach in the context of a display advertising auction problem.
@article{ABC21, title = {Diffusive limit approximation of pure-jump optimal stochastic control problems}, author = {Abeille, Marc and Bouchard, Bruno and Croissant, Lorenzo}, journal = {Journal of Optimization Theory and Applications}, pages = {1--30}, year = {2022}, publisher = {Springer}, doi = { 10.1007/s10957-022-02135-7}, url = {https://link.springer.com/article/10.1007/s10957-022-02135-7}, keywords = {Optimization and Control (math.OC), FOS: Mathematics, FOS: Mathematics}, month = {}, }
2020
-
Real-Time Optimisation for Online Learning in AuctionsLorenzo Croissant, Marc Abeille, and Clement CalauzenesIn Proceedings of the 37th International Conference on Machine Learning 13–18 jul 2020In display advertising, a small group of sellers and bidders face each other in up to \(10^{12}\)auctions a day. In this context, revenue maximisation via monopoly price learning is a high-value problem for sellers. By nature, these auctions are online and produce a very high frequency stream of data. This results in a computational strain that requires algorithms be real-time. Unfortunately, existing methods inherited from the batch setting suffer \(O(\sqrt{t})\)time/memory complexity at each update, prohibiting their use. In this paper, we provide the first algorithm for online learning of monopoly prices in online auctions whose update is constant in time and memory.
@inproceedings{CAC20, title = {Real-Time Optimisation for Online Learning in Auctions}, author = {Croissant, Lorenzo and Abeille, Marc and Calauzenes, Clement}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {2217--2226}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, }