Estadística Matemática
Organizadores: Florencia Leonardi (florencia@usp.br), Pamela Llop (lloppamela@gmail.com), Daniela Rodriguez (drodrig@dm.uba.ar)

Tuesday 14
15:00  15:45Differentially private inference via noisy optimization
Marco Avella Medina (Columbia University, Estados Unidos)
We propose a general optimizationbased framework for computing differentially private Mestimators and a new method for the construction of differentially private confidence regions. Firstly, we show that robust statistics can be used in conjunction with noisy gradient descent and noisy Newton methods in order to obtain optimal private estimators with global linear or quadratic convergence, respectively. We establish global convergence guarantees, under both local strong convexity and selfconcordance, showing that our private estimators converge with high probability to a neighborhood of the nonprivate Mestimators. The radius of this neighborhood is nearly optimal in the sense it corresponds to the statistical minimax cost of differential privacy up to a logarithmic term. Secondly, we tackle the problem of parametric inference by constructing differentially private estimators of the asymptotic variance of our private Mestimators. This naturally leads to the use of approximate pivotal statistics for the construction of confidence regions and hypothesis testing. We demonstrate the effectiveness of a bias correction that leads to enhanced smallsample empirical performance in simulations.
15:45  16:30Adjusting ROC curves for covariates: a robust approach
Ana M. Bianco (Universidad de Buenos Aires, Argentina)
ROC curves are a popular tool to describe the discriminating power of a binary classifier based on a continuous marker as the threshold is varied. They become an interesting strategy to evaluate how well an assignment rule based on a diagnostic test distinguishes one population from the other. Under certain circumstances the marker's discriminatory ability may be affected by certain covariates. In this situation, it seems sensible to include this information in the ROC analysis. This task can be accomplished either by the induced or the direct method.
In this talk we will focus on ROC curves in presence of covariates. We will show the impact of outliers on the conditional ROC curves and we will introduce a robust proposal. We follow a semiparametric approach where we combine robust parametric estimators with weighted empirical distribution estimators based on an adaptive procedure that downweights outliers.
We will discuss some aspects concerning consistency and through a Monte Carlo study we will compare the performance of the proposed estimators with the classical ones both, in clean and contaminated samples.
16:45  17:30Adaptive regression with Brownian path covariate
Karine Bertin (Universidad de Valparaíso, Chile)
In this talk, we will study how to obtain optimal estimators in problems of nonparametric estimation. More specifically, we will present the GoldenshlugerLepski (2011) method that allows one to obtain estimators that adapt to the smoothness of the function to be estimated. We will show how to extend this statistical procedure in regression with functional data when the regressor variable is a Wiener process \(W\). Using the WienerIto decomposition of m(W), where \(m\) is the regression function, we will define a family of estimators that satisfy an oracle inequality, are proved to be adaptive and converge at polynomial rates over specific classes of functions.
17:30  18:15A nonasymptotic analysis of certain highdimensional estimators for the mean
Roberto Imbuzeiro Oliveira (Instituto de Matemática Pura e Aplicada, Brasil)
Recent work in Statistics and Computer Science has considered the following problem. Given a distribution \(P\) over \({\bf R}^d\) and a fixed sample size \(n\), how well can one estimate the mean \(\mu = \bf E_{X\sim P} X\) from a sample \(X_1,\dots,X_n\stackrel{i.i.d.}{\sim}P\) while only requiring finite second moments and allowing for sample contamination? It turns out that the best estimators are not related to the sample mean. In this talk we present a new analysis of certain approaches to this problem, and reproduce or improve previous results by several authors.

Wednesday 15
15:00  15:45Least trimmed squares estimators for functional principal component analysis
Holger CevallosValdiviezo (Escuela Superior Politécnica del Litoral, Ecuador y Ghent University, Bélgica)
Classical functional principal component analysis can yield erroneous approximations in presence of outliers. To reduce the influence of atypical data we propose two methods based on trimming: a multivariate least trimmed squares (LTS) estimator and its coordinatewise variant. The multivariate LTS minimizes the multivariate scale corresponding to \(h\)subsets of curves while the coordinatewise version uses univariate LTS scale estimators. Consider a general setup in which observations are realizations of a random element on a separable Hilbert space \(\mathcal{H}\). For a fixed dimension \(q\), we aim to robustly estimate the \(q\) dimensional linear space in \(\mathcal{H}\) that gives the best approximation to the functional data. Our estimators use smoothing to first represent irregularly spaced curves in a highdimensional space and then calculate the LTS solution on these multivariate data. The solution of the multivariate data is subsequently mapped back onto \(\mathcal{H}\). Poorly fitted observations can therefore be flagged as outliers. Simulations and real data applications show that our estimators yield competitive results when compared to existing methods when a minority of observations is contaminated. When a majority of the curves is contaminated at some positions along its trajectory coordinatewise methods like Coordinatewise LTS are preferred over multivariate LTS and other multivariate methods since they break down in this case.
15:45  16:30Stickbreaking priors via dependent length variables
Ramsés Mena Chávez (Universidad Nacional Autónoma de México, México)
In this talk, we present new classes of Bayesian nonparametric prior distributions. By allowing length random variables, in stickbreaking constructions, to be exchangeable or Markovian, appealing models for discrete random probability measures appear. As a result, by tuning the stochastic dependence in such length variables allows to recover extreme families of random probability measures, i.e. Dirichlet and Geometric processes. As a byproduct, the ordering of the weights, in the species sampling representation, can be controlled and thus tuned for efficient MCMC implementations in density estimation or unsupervised classification problems. Various theoretical properties and illustrations will be presented.
16:45  17:30Modelling in pandemic times: using smart watch data for early detection of COVID19
Mayte SuarezFarinas (Icahn School of Medicine at Mount Sinai, Estados Unidos)
The COVID19 pandemic brought many challengers to statisticians and modelers across all quantitative disciplines. From accelerated clinical trials to modelling of epidemiological interventions at a feverish pace, to the study of the impact of the pandemic in mental health outcomes and racial disparities. In this talk, we would like to share our experience using classical statistical modelling and machine learning to use data obtained from wearable devices as digital biomarkers of COVID19 infection.
Early in the pandemic, health care workers in the Mount Sinai Health System (New York city) were prospectively followed in an observational study using the custom Warrior Watch Study app, to collect weekly information about stress, symptoms and COVID19 infection. Participants wore an Apple Watch for the duration of the study, measuring heart rate variability (HRV), a digital biomarker previously associated with infection in other settings, throughout the followup period.
The HRV data collected through the Apple Watch was characterized by a circadian pattern, with sparse sampling over a 24hour period, and nonuniform timing across days and participants. These characteristics preclude us from using easily derived features (ie mean, maximum, CV etc) with Machine learning methods to develop a diagnostic tool. As such, suitable modelling of the nonuniform, sparsely sampled circadian rhythm data derived from wearable devices are an important step to advance the use of integrated wearable data for prediction of health outcomes.
To circumvent such limitations, we introduced the mixedeffects COSINOR model, where the daily circadian rhythm is express as a nonlinear function with three rhythm characteristics: the rhythmadjusted mean (MESOR), half the extent of variation within a cycle (amplitude), and an angle relating to the time at which peak values recur in each cycle (acrophase). The longitudinal changes in the circadian patterns can then be evaluated extending the COSINOR model to a mixedeffect model framework, allowing for random effects and interaction between COSINOR parameters and timevarying covariates. In this talk, we will discuss our model framework, boostrappedbased hypothesis testing and prediction approaches, as well as our evaluation of HRV measures as early biomarkers of COVID19 diagnosis. To facilitate the future use of the mixedeffect COSINOR model, we implemented in an R package cosinoRmixedeffects.