Nonasymptotic one-and two-sample tests in high dimension with unknown covariance structure

Gilles Blanchard; Jean-Baptiste Fermanian

doi:10.1007/978-3-031-30114-8_3

Chapitre D'ouvrage Année : 2023

Nonasymptotic one-and two-sample tests in high dimension with unknown covariance structure

(1, 2) , (3, 1)

1
2
3

Gilles Blanchard

Fonction : Auteur
PersonId : 738034
IdHAL : gilles-blanchard
ORCID : 0000-0003-2125-933X
IdRef : 190973250

Laboratoire de Mathématiques d'Orsay

Understanding the Shape of Data

Jean-Baptiste Fermanian

Fonction : Auteur
PersonId : 748045
IdHAL : jean-baptiste-fermanian
ORCID : 0000-0001-7750-8337

École normale supérieure - Rennes

Laboratoire de Mathématiques d'Orsay

Résumé

Let $\mathbf{X} = (X_i)_{1\leq i \leq n}$ be an i.i.d. sample of square-integrable variables in $\mathbb{R}^d$, with common expectation $\mu$ and covariance matrix $\Sigma$, both unknown. We consider the problem of testing if $\mu$ is $\eta$-close to zero, i.e. $\|\mu\| \leq \eta $ against $\|\mu\| \geq (\eta + \delta)$; we also tackle the more general two-sample mean closeness (also known as *relevant difference*) testing problem. The aim of this paper is to obtain nonasymptotic upper and lower bounds on the minimal separation distance $\delta$ such that we can control both the Type I and Type II errors at a given level. The main technical tools are concentration inequalities, first for a suitable estimator of $\|\mu\|^2$ used a test statistic, and secondly for estimating the operator and Frobenius norms of $\Sigma$ coming into the quantiles of said test statistic. These properties are obtained for Gaussian and bounded distributions. A particular attention is given to the dependence in the pseudo-dimension $d_*$ of the distribution, defined as $d_* := \|\Sigma\|_2^2/\|\Sigma\|_\infty^2$. In particular, for $\eta=0$, the minimum separation distance is ${\Theta}( d_*^{\frac{1}{4}}\sqrt{\|\Sigma\|_\infty/n})$, in contrast with the minimax estimation distance for $\mu$, which is ${\Theta}(d_e^{\frac{1}{2}}\sqrt{\|\Sigma\|_\infty/n})$ (where $d_e:=\|\Sigma\|_1/\|\Sigma\|_\infty$). This generalizes a phenomenon spelled out in particular by Baraud (2002).

Mots clés

Effective dimensionality Minmax testing separation distance Two-sample test Signal detection Relevant hypotheses

Domaines

Statistiques [math.ST] Intelligence artificielle [cs.AI] Apprentissage [cs.LG]

Fichier principal

Blanchard_Fermanian_clean_HAL_v2.pdf (469.35 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Gilles Blanchard : Connectez-vous pour contacter le contributeur

https://universite-paris-saclay.hal.science/hal-03329848

Soumis le : jeudi 7 octobre 2021-11:49:50

Dernière modification le : jeudi 14 mars 2024-03:14:43

Dates et versions

hal-03329848 , version 1 (31-08-2021)

hal-03329848 , version 2 (07-10-2021)

Identifiants

HAL Id : hal-03329848 , version 2
DOI : 10.1007/978-3-031-30114-8_3

Citer

Gilles Blanchard, Jean-Baptiste Fermanian. Nonasymptotic one-and two-sample tests in high dimension with unknown covariance structure. Denis Belomestny; Cristina Butucea; Enno Mammen; Eric Moulines; Markus Reiß; Vladimir Ulyanov. Foundations of Modern Statistics : Festschrift in Honor of Vladimir Spokoiny, Berlin, Germany, November 6–8, 2019, Moscow, Russia, November 30, 2019, PROMS. 425, Springer International Publishing, pp.121-162, 2023, Springer Proceedings in Mathematics & Statistics, ⟨10.1007/978-3-031-30114-8_3⟩. ⟨hal-03329848v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA LM-ORSAY INRIA2 UNIV-PARIS-SACLAY UNIV-COTEDAZUR UNIV-RENNES ANR GS-MATHEMATIQUES GS-COMPUTER-SCIENCE

373 Consultations

187 Téléchargements

Nonasymptotic one-and two-sample tests in high dimension with unknown covariance structure

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager