Enno Mammen (Heidelberg University)
Strong Approximations for Robbins-Monro Procedures
The Robbins-Monro algorithm is a recursive, simulation-based stochastic procedure to approximate the zeros of a function that can be written as an expectation. It is known that under some technical assumptions, Gaussian limit theorems approximate the stochastic performance of the algorithm. Here, we are interested in strong approximations for Robbins-Monro procedures. The main tool for getting them are local limit theorems, that is, studying the convergence of the density of the algorithm. The analysis relies on a version of parametrix techniques for Markov chains converging to diffusions. The main difficulty that arises here is the fact that the drift is unbounded. The talk is based on joint work with Valentin Konakov, Moscow, and Lorick Huang, Toulouse.
Claudia Strauch (Heidelberg University)
Statistical Guarantees for Denoising Reflected Diffusion Models
Diffusion-based generative models offer a highly flexible approach to modelling and generating complex data distributions, yet their theoretical properties on bounded domains remain only partially understood. We study a class of denoising reflected diffusion models on bounded domains, which address the practical limitations of conventional designs operating in unbounded state spaces. A primary mathematical challenge in this setting is the absence of Gaussian transition kernels. To overcome this, we employ infinite series expansions and spectral methods, combined with a rigorous analysis of sparse neural networks, to approximate the score function and control the resulting approximation error. For target distributions under Sobolev smoothness assumptions, we establish near-minimax optimal convergence rates in total variation and Wasserstein distances, demonstrating full adaptivity to the intrinsic dimension of the underlying subspace. These results confirm that incorporating reflecting boundaries preserves the statistical efficiency of the underlying diffusion processes, matching the convergence behaviour known for unconstrained settings.
Joint work with Asbjørn Holk Thomsen and Lukas Trottner.
Shixuan Wang (University of Reading)
Multiscale Change Point Detection for Functional Time Series
We study the problem of detecting and localizing multiple changes in
the mean parameter of a Banach space–valued time series. The goal is to
construct a collection of narrow confidence intervals, each containing at least one (or exactly one) change, with globally controlled error probability. Our approach relies on a new class of weighted scan statistics, called Hölder-type statistics, which allow a smooth trade-off between efficiency (enabling the detection of closely spaced, small changes) and robustness (against heavier tails and stronger dependence). For Gaussian noise, maximum weighting can be applied, leading to a generalization of optimality results known for scalar, independent data. Even for scalar time series, our approach is advantageous, as it accommodates broad classes of dependency structures and non-stationarity.
Its primary advantage, however, lies in its applicability to functional time
series, where few methods exist and established procedures impose strong
restrictions on the spacing and magnitude of changes. We obtain general results by employing new Gaussian approximations for the partial sum process in Hölder spaces. As an application of our general theory, we consider the detection of distributional changes in a data panel. The finite-sample properties and applications to real-world datasets further highlight the merits of our Method.
Joint work with Tim Kutta & Holger Dette.
Martin Bladt (University of Copenhagen)
Recent Nonparametric Advances in Jump Process Estimation
This work addresses recent advances in nonparametric inference for finite-state jump processes in both Markov and non-Markov settings. We provide flexible methods to estimate state occupation probabilities and transition mechanisms under minimal assumptions. By focusing on conditioning, we show how internal and external information can sharpen individualized and subgroup-specific predictions. To handle high-dimensional data, we introduce adaptive tree- and forest-based learning strategies. Finally, we discuss procedures for transition-rate estimation.
Axel Bücher (Ruhr University Bochum)
Latent linear factor models for tail dependence in high dimensions
A common object to describe the extremal dependence of a d-variate random vector X is the stable tail dependence function L. Various parametric models have emerged, with a popular sub-class consisting of those stable tail dependence functions that arise for linear and max-linear factor models with heavy tailed factors. We study such models under the assumption that the factors are possibly dependent, which results in a model for L that depends on a (d x K) loading matrix A (with K << d the number of factors) and the lower-dimensional spectral measure of the K factors. We suggests algorithms to estimate K and A under an additional assumption on A called the ‘pure variable assumption’. The results are illustrated with numerical experiments and a case study.
Joint work with Alexis Boulin.
Martina Scolamiero (KTH Stockholm)
p-norms, matchings and functional summaries in persistence.
Topological Data Analysis and in particular the method of persistent homology allow to summarise higher order patterns in metric spaces and networks. In this talk I will review some functional summaries of persistence, present their stability properties, discuss their computation and opportunity for data analysis. On the way I will also highlight the importance of comparing spaces of data and how algebraic comparisons can be approximated by combinatorial matchings while minimizing appropriate p-norm costs.
Alexei Onatski (University of Cambridge)
Extreme singular values of random rectangular Toeplitz matrices
We study extreme singular values of large rectangular random Toeplitz and circulant matrices with independent entries. For Toeplitz matrices, the largest singular value converges to the (2 \to 2) norm of a bilinear operator built from two scaled sine-kernel operators. For rectangular circulant matrices, it converges to 1. The smallest singular value for both Toeplitz and circulant rectangular matrices converges to zero independently of the aspect ratio. We establish a lower bound on the rate of this convergence, showing that it is faster than any polylogarithmic rate yet slower than any polynomial rate.
Johannes Heiny (KTH Royal Institute of Technology)
Phase transitions for high-dimensional U-statistics and their applications
The asymptotic behavior of U-statistics based on n i.i.d. observations is well understood when the data dimension d is fixed. In this classical setting, nondegenerate U-statistics converge to a Gaussian limit, while degenerate ones converge to a weighted sum of chi-squared variables. However, in many contemporary applications, the data dimension d is comparable to or even larger than the sample size n, necessitating a re-examination of these asymptotic results.
In this talk, we explore the setting where d is allowed to scale with the sample size. Recent works have shown that new limit distributions can emerge in such a high-dimensional setting. We provide some new examples and study phase transitions depending on d and the tail of the data distribution. As an application, we study the linear spectral statistics of sample correlation matrices.
Joint work with Xuechun Hu.
Marcela Mandarić (University of Split)
Application of TDA for random sets
We present a methodology for detecting outliers and testing the goodness-of-fit of random sets using topological data analysis. We construct a filtration from the sublevel sets of the signed distance function and consider various summary functions of the persistence diagrams derived from the obtained persistent homology. Outliers are detected using functional depths for the summary functions. Global envelope tests, employing these summary statistics as test statistics, were used to construct the goodness-of-fit test. We also turn to the asymptotic properties of persistence diagrams obtained from random set models. Specifically, we establish central limit theorems (CLTs) for persistence diagrams associated with germ-grain random set models. The procedures were justified by a simulation study using germ-grain random set models and application to real data concerning histological images of mastopathic and mammary cancer breast tissue.
Lujia Bai (Ruhr-University Bochum)
Uniform variance reduced simultaneous inference of time-varying correlation networks
This paper proposes a unified framework for inferring large-scale time-varying correlation networks via data-driven time-varying thresholds that can control uncertainty simultaneously. The framework allows the dimension of time series vectors to be fixed or diverging at a high polynomial rate of the sample size. It also allows the time series to exhibit changing temporal characteristics beyond stationarity without specific structural assumptions. Motivated by the practical issue that the confidence band of non-parametric estimators of correlations can exceed their natural domain [−1, 1], we propose a simple uniform variance reduction technique. When applied to the construction of a correlation network, the new device yields more accurate thresholds, which enhance the probability of recovering the time-varying network structures. We broaden the applicability of our method by developing difference-based estimators of cross-correlations that are robust to structure breaks in the time-varying mean functions, and by allowing both a fixed and a diverging number of lags in the correlation functions. We prove the asymptotic validity of the proposed method, especially in achieving accurate family-wise error control when disclosing flexible time-varying network structures. The effectiveness of our method in finite samples is demonstrated through simulation studies and data analysis.
Rikkert Hindriks (VU Amsterdam)
Implications of elliptical symmetry for four-point correlations in EEG brain oscillations
Electroencephalography (EEG) studies on brain dynamics have largely focused on second-order statistical dependencies. Although higher-order dependencies have been observed in local neural circuits, it is not known if they can be observed in EEG data. Furthermore, because the number of dependencies scales exponentially with the order, a simplifying principle is needed. We do this by exploring the implications of elliptical symmetry. In particular, elliptical symmetry implies that fourth-order complex cumulants are proportional to sums of products of second-order complex cumulants and, furthermore, that the proportionality constant equals excess kurtosis. We asses to which extent these predictions are corroborated in frequency-domain EEG data and discuss the implications for brain dynamics.
The first afternoon will close with a poster session, where the participants present their research topics. Knowing about the broad research interests of their peers from the start of the workshop will facilitate discussions and interactions. All conference participants will be able to vote for the prize for the best poster presentation. This will provide additional incentives for excellent presentations and helps the development of early-career researchers.
Hans Reimann (Heidelberg University)
Data-Driven Impulse Control in Multiple Dimensions via Non-Parametric Estimation of the Optimal Stopping Rule
We investigate optimal stopping in multiple dimensions and corresponding non-parametric approaches for data-driven impulse control strategies. The analysis can be separated into two steps: understanding the underlying optimal stopping problem in higher dimensions for diffusion processes with known components, and constructing as well as evaluating non-parametric approaches for the case of unknown system components.
The key insights are as follows: Optimal stopping in multiple dimensions can be formulated via an operator for constructing a value function substitute with desirable properties regarding error stability in characterizing quantities. We can reliably estimate such quantities by estimating the unknown components therein. Based on these results, we propose a data-driven strategy and evaluate its proficiency.
Huixiaqing Liu (Vrije Universiteit Amsterdam)
Semiparametric Estimation of Elliptical Copula Generators for Non-Gaussian Dependency Analysis in Intracranial EEG
Understanding higher-order interactions among brain regions requires estimating multivariate probability densities, which is a task that becomes intractable in high dimensions. For functional magnetic resonance imaging (fMRI) data, the Gaussian assumption provides a convenient shortcut, but intracranial electroencephalography (iEEG) signals exhibit heavy-tailed dependencies that violate this assumption.
We address this challenge using elliptical copulas, a flexible family of dependence models that includes the Gaussian as a special case. A key property of elliptical copulas is that their entire high-dimensional structure is determined by a single one-dimensional function, the density generator, effectively reducing a high-dimensional estimation problem to a univariate one. We estimate this generator using a semiparametric kernel method and validate its accuracy on synthetic data with known ground truth. Applied to real iEEG recordings, surrogate-based hypothesis testing confirms that the observed dependencies are significantly non-Gaussian, while formal ellipticity diagnostics support the validity of the elliptical model.
This framework provides a principled, computationally tractable route to information-theoretic measures of neural interaction beyond Gaussian assumptions.
Patrick Bastian (Aarhus University)
TWIN: Two window inspection for online change point detection
We propose a new class of sequential change point tests, both for changes in the mean parameter and in the overall distribution function. The methodology builds on a two-window inspection scheme (TWIN), which aggregates data into symmetric samples and applies strong weighting to enhance statistical performance. The detector yields logarithmic rather than polynomial detection delays, representing a substantial reduction compared to state-of-the-art alternatives. Delays remain short, even for late changes, where existing methods perform worst. Moreover, the new procedure also attains higher power than current methods across broad classes of local alternatives.For mean changes, we further introduce a self-normalized version of the detector that automatically cancels out temporal dependence, eliminating the need to estimate nuisance parameters. The advantages of our approach are supported by asymptotic theory, simulations and an application to monitoring COVID19 data. Here, structural breaks associated with new virus variants are detected almost immediately by our new procedures.This indicates potential value for the real-time monitoring of future epidemics.Mathematically, our approach is underpinned by new exponential moment bounds for the global modulus of continuity of the partial sum process, which may be of independent interest beyond change point testing.
Albertas Dvirnas (Umeå University)
Bridging Matrix Profiles and Empirical Dynamic Modelling in the Search for Patterns and Predictions in Environmental Data
Empirical dynamical modelling (EDM) and matrix profiles offer complementary ways to discover structure in complex time series. EDM reconstructs low-dimensional attractors from high-dimensional observations, enabling local analogue forecasting and causal inference, while matrix profiles provide a scalable, domain-agnostic mechanism for fast motif discovery, anomaly detection, and nearest-neighbour search. This poster explores how these two perspectives can be combined to analyse high-dimensional environmental data, such as multi-species environmental DNA (eDNA) time series.
By interpreting matrix profile subsequences as embedded states in EDM’s reconstructed phase space, we obtain a unified framework for identifying recurrent dynamical patterns and constructing local, interpretable forecasts. The approach naturally extends to streaming settings, where incremental updates to the matrix profile support real-time pattern tracking and prediction as new observations arrive. We illustrate this with examples in seasonal environmental monitoring, highlighting how the joint use of matrix profiles and EDM can reveal candidate mechanisms, regime shifts, and nonlinear dependencies that are obscured by purely statistical or purely mechanistic models. The goal is to position this synergy as a practical toolkit for exploratory analysis and prediction in modern, high-dimensional environmental datasets.
Maximilian Rücker (Ulm University)
Estimation and Inference in High-Dimensional Panel Data Models with Interactive Fixed Effects
We develop new econometric methods for estimation and inference in high-dimensional panel data models with interactive fixed effects. Our approach can be regarded as a non-trivial extension of the very popular common correlated effects (CCE) approach. Roughly speaking, we proceed as follows: We first construct a projection device to eliminate the unobserved factors from the model by applying a dimensionality reduction transform to the matrix of cross-sectionally averaged covariates. The unknown parameters are then estimated by applying lasso techniques to the projected model. For inference purposes, we derive a desparsified version of our lasso-type estimator. While the original CCE approach is restricted to the low-dimensional case where the number of regressors is small and fixed, our methods can deal with both low- and high-dimensional situations where the number of regressors is large and may even exceed the overall sample size. We derive theory for our estimation and inference methods both in the large-T-case, where the time series length T tends to infinity, and in the small-T-case, where T is a fixed natural number. Specifically, we derive the convergence rate of our estimator and show that its desparsified version is asymptotically normal under suitable regularity conditions. The theoretical analysis of the paper is complemented by a simulation study and an empirical application to characteristic based asset pricing.
Daniel Peer (University of Vienna)
On the Edgeworth expansion of the maxima and the blessings of dimensionality
Let $X_1,\ldots, X_n \in \mathbb{R}^d$ be a sequence of i.i.d. random vectors, where $d$ may be potentially much larger than $n$. A fundamental problem in high-dimensional statistics concerns normal approximations and convergence properties of the maximum statistic $$M_n=\max_{1\leq k\leq d} \frac{1}{\sqrt{n}}\sum_{i=1}^n X_{i,k},$$ whose study was initiated in seminal works by Chernozhukov, Chetverikov and Kato. A next step in understanding the asymptotic properties of $M_n$ and accompanying quantile approximations is the development of Edgeworth-type expansions and corresponding bootstrap methods. A very recent result in this direction was established by Koike, developing an Edgeworth expansion for $\frac{1}{\sqrt{n}}\sum_{i=1}^n X_i$ based on Stein kernels, subject to some regularity conditions. In our project, we view the problem through the lens of Poisson-approximations to directly construct an Edgeworth expansion for $M_n$. Our main assumptions are a Cram\'{e}r-type condition for all pairs of components of $X_i$ and a notion of weak dependence across the dimension. Utilizing this expansion, we obtain second order approximations for $\mathbb{P}(M_n\leq x)$ and the quantiles of $M_n$. Under suitable uniformity assumptions on the moments across components, we improve these convergence rates to third and higher orders. Furthermore, we extend our results to studentized case, that is to the statistic $\max_{1\leq k\leq d} T_{n,k}$, where $T_{n,k}$ are the component-wise Student-t statistics.
Daria Tieplova (Aarhus University)
Testing approximate sphericity for high-dimensional covariance matrices
Exact testing of model assumptions is often of limited relevance, especially in high-dimensional settings. Structural assumptions on large-dimensional covariance matrices such as sphericity are rarely expected to hold exactly for real data and practitioners are often primarily interested in whether such model assumptions are approximately satisfied. In this work, we propose a test for approximate sphericity of high-dimensional covariance matrices, where the tolerated level of deviation from sphericity can be chosen by the user. Our test statistic is based on estimators of the largest and smallest eigenvalue of the population covariance matrix in a high-dimensional regime, where the corresponding sample eigenvalues are not consistent. We derive theoretical guarantees showing that the test keeps the prescribed asymptotic level under the null hypothesis and is power consistent under the alternative. Our key theoretical contribution is a joint central limit theorem for the estimators of the extreme eigenvalues of the population covariance matrix, provided the corresponding eigenvalues exceed the critical phase transition threshold.
Thomas Stark (Aarhus University)
IMPLICIT VS. EXPLICIT REGULARIZATION FOR HIGH-DIMENSIONAL GRADIENT DESCENT
In this paper, we investigate the generalization error of gradient descent (GD) applied to an L2-regularized ordinary least squares (OLS) objective in the linear model. Based on our analysis, we develop new methodology for computationally tractable and statistically efficient linear prediction in a high-dimensional, massive-data setting (large n, large p).
Our results are based on the surprising observation that the generalization error of optimally tuned regularized gradient descent approaches that of an optimal benchmark procedure monotonically as the number of iterations increases. By contrast, standard GD for OLS without explicit regularization achieves the benchmark only in degenerate cases. This shows that optimal explicit regularization can be nearly statistically efficient, whereas implicit regularization through early stopping cannot.
To complete our methodology, we provide a fully data-driven and computationally tractable choice of the L2 regularization parameter that is cheaper to compute than cross-validation. In doing so, we follow and extend ideas of Dicker (2014) to the non-Gaussian case, which requires new results on high-dimensional sample covariance matrices that may be of independent interest.
Dicker, H. Lee (2014). “Variance estimation in high-dimensional linear models.” Biometrika 101: 269–284.
Nikolaj Nyvold Lundbye (Aarhus University)
Poisson approximation of large-lifetime cycles
In topological data analysis, the notions of persistent homology, birthtime, lifetime, and deathtime are used to assign and capture relevant cycles (i.e., topological features) of a point cloud, such as loops and cavities. In particular, cycles with a large lifetime are of special interest. In this paper, we study such large-lifetime cycles when the point cloud is modeled as a Poisson point process. First, we consider the case with no bound on the deathtime, where we establish Poisson convergence of the centers of large-lifetime cycles on the 2-dimensional flat torus. Afterwards, by imposing a bound on the deathtime, we enter a sparse connectivity regime, and we prove joint Poisson convergence of the centers, lifetimes, and deathtimes under suitable model conditions.