Estadística Matemática

Organizadores: Florencia Leonardi (florencia@usp.br), Pamela Llop (lloppamela@gmail.com), Daniela Rodriguez (drodrig@dm.uba.ar)

  • Martes 14

    15:00 - 15:45

    Differentially private inference via noisy optimization

    Marco Avella Medina (Columbia University, Estados Unidos)

    We propose a general optimization-based framework for computing differentially private M-estimators and a new method for the construction of differentially private confidence regions. Firstly, we show that robust statistics can be used in conjunction with noisy gradient descent and noisy Newton methods in order to obtain optimal private estimators with global linear or quadratic convergence, respectively. We establish global convergence guarantees, under both local strong convexity and self-concordance, showing that our private estimators converge with high probability to a neighborhood of the non-private M-estimators. The radius of this neighborhood is nearly optimal in the sense it corresponds to the statistical minimax cost of differential privacy up to a logarithmic term. Secondly, we tackle the problem of parametric inference by constructing differentially private estimators of the asymptotic variance of our private M-estimators. This naturally leads to the use of approximate pivotal statistics for the construction of confidence regions and hypothesis testing. We demonstrate the effectiveness of a bias correction that leads to enhanced small-sample empirical performance in simulations.

    15:45 - 16:30

    Adjusting ROC curves for covariates: a robust approach

    Ana M. Bianco (Universidad de Buenos Aires, Argentina)

    ROC curves are a popular tool to describe the discriminating power of a binary classifier based on a continuous marker as the threshold is varied. They become an interesting strategy to evaluate how well an assignment rule based on a diagnostic test distinguishes one population from the other. Under certain circumstances the marker's discriminatory ability may be affected by certain covariates. In this situation, it seems sensible to include this information in the ROC analysis. This task can be accomplished either by the induced or the direct method.

    In this talk we will focus on ROC curves in presence of covariates. We will show the impact of outliers on the conditional ROC curves and we will introduce a robust proposal. We follow a semiparametric approach where we combine robust parametric estimators with weighted empirical distribution estimators based on an adaptive procedure that downweights outliers.

    We will discuss some aspects concerning consistency and through a Monte Carlo study we will compare the performance of the proposed estimators with the classical ones both, in clean and contaminated samples.

    16:45 - 17:30

    Adaptive regression with Brownian path covariate

    Karine Bertin (Universidad de Valparaíso, Chile)

    In this talk, we will study how to obtain optimal estimators in problems of non-parametric estimation. More specifically, we will present the Goldenshluger-Lepski (2011) method that allows one to obtain estimators that adapt to the smoothness of the function to be estimated. We will show how to extend this statistical procedure in regression with functional data when the regressor variable is a Wiener process \(W\). Using the Wiener-Ito decomposition of m(W), where \(m\) is the regression function, we will define a family of estimators that satisfy an oracle inequality, are proved to be adaptive and converge at polynomial rates over specific classes of functions.

    17:30 - 18:15

    A non-asymptotic analysis of certain high-dimensional estimators for the mean

    Roberto Imbuzeiro Oliveira (Instituto de Matemática Pura e Aplicada, Brasil)

    Recent work in Statistics and Computer Science has considered the following problem. Given a distribution \(P\) over \({\bf R}^d\) and a fixed sample size \(n\), how well can one estimate the mean \(\mu = \bf E_{X\sim P} X\) from a sample \(X_1,\dots,X_n\stackrel{i.i.d.}{\sim}P\) while only requiring finite second moments and allowing for sample contamination? It turns out that the best estimators are not related to the sample mean. In this talk we present a new analysis of certain approaches to this problem, and reproduce or improve previous results by several authors.

  • Miércoles 15

    15:00 - 15:45

    Least trimmed squares estimators for functional principal component analysis

    Holger Cevallos-Valdiviezo (Escuela Superior Politécnica del Litoral, Ecuador y Ghent University, Bélgica)

    Classical functional principal component analysis can yield erroneous approximations in presence of outliers. To reduce the influence of atypical data we propose two methods based on trimming: a multivariate least trimmed squares (LTS) estimator and its coordinatewise variant. The multivariate LTS minimizes the multivariate scale corresponding to \(h-\)subsets of curves while the coordinatewise version uses univariate LTS scale estimators. Consider a general setup in which observations are realizations of a random element on a separable Hilbert space \(\mathcal{H}\). For a fixed dimension \(q\), we aim to robustly estimate the \(q\) dimensional linear space in \(\mathcal{H}\) that gives the best approximation to the functional data. Our estimators use smoothing to first represent irregularly spaced curves in a high-dimensional space and then calculate the LTS solution on these multivariate data. The solution of the multivariate data is subsequently mapped back onto \(\mathcal{H}\). Poorly fitted observations can therefore be flagged as outliers. Simulations and real data applications show that our estimators yield competitive results when compared to existing methods when a minority of observations is contaminated. When a majority of the curves is contaminated at some positions along its trajectory coordinatewise methods like Coordinatewise LTS are preferred over multivariate LTS and other multivariate methods since they break down in this case.

    15:45 - 16:30

    Stick-breaking priors via dependent length variables

    Ramsés Mena Chávez (Universidad Nacional Autónoma de México, México)

    In this talk, we present new classes of Bayesian nonparametric prior distributions. By allowing length random variables, in stick-breaking constructions, to be exchangeable or Markovian, appealing models for discrete random probability measures appear. As a result, by tuning the stochastic dependence in such length variables allows to recover extreme families of random probability measures, i.e. Dirichlet and Geometric processes. As a byproduct, the ordering of the weights, in the species sampling representation, can be controlled and thus tuned for efficient MCMC implementations in density estimation or unsupervised classification problems. Various theoretical properties and illustrations will be presented.

    16:45 - 17:30

    Modelling in pandemic times: using smart watch data for early detection of COVID-19

    Mayte Suarez-Farinas (Icahn School of Medicine at Mount Sinai, Estados Unidos)

    The COVID-19 pandemic brought many challengers to statisticians and modelers across all quantitative disciplines. From accelerated clinical trials to modelling of epidemiological interventions at a feverish pace, to the study of the impact of the pandemic in mental health outcomes and racial disparities. In this talk, we would like to share our experience using classical statistical modelling and machine learning to use data obtained from wearable devices as digital biomarkers of COVID-19 infection.

    Early in the pandemic, health care workers in the Mount Sinai Health System (New York city) were prospectively followed in an observational study using the custom Warrior Watch Study app, to collect weekly information about stress, symptoms and COVID-19 infection. Participants wore an Apple Watch for the duration of the study, measuring heart rate variability (HRV), a digital biomarker previously associated with infection in other settings, throughout the follow-up period.

    The HRV data collected through the Apple Watch was characterized by a circadian pattern, with sparse sampling over a 24-hour period, and non-uniform timing across days and participants. These characteristics preclude us from using easily derived features (ie mean, maximum, CV etc) with Machine learning methods to develop a diagnostic tool. As such, suitable modelling of the non-uniform, sparsely sampled circadian rhythm data derived from wearable devices are an important step to advance the use of integrated wearable data for prediction of health outcomes.

    To circumvent such limitations, we introduced the mixed-effects COSINOR model, where the daily circadian rhythm is express as a non-linear function with three rhythm characteristics: the rhythm-adjusted mean (MESOR), half the extent of variation within a cycle (amplitude), and an angle relating to the time at which peak values recur in each cycle (acrophase). The longitudinal changes in the circadian patterns can then be evaluated extending the COSINOR model to a mixed-effect model framework, allowing for random effects and interaction between COSINOR parameters and time-varying covariates. In this talk, we will discuss our model framework, boostrapped-based hypothesis testing and prediction approaches, as well as our evaluation of HRV measures as early biomarkers of COVID-19 diagnosis. To facilitate the future use of the mixed-effect COSINOR model, we implemented in an R package cosinoRmixedeffects.