Abstract: This talk focuses on my recent research on grouped multiple hypothesis testing inan online setting. Classical multiple testing procedures are offline in nature, meaning thatthe entire collection of hypotheses and corresponding test statistics is available before thetesting procedure begins. This setting allows for efficient use of available resources, such as the overall error budget and auxiliary structural information about the hypotheses, leading to procedures with high statistical power while maintaining control of a global error measure.
In contrast, online multiple testing procedures are a relatively recent development in theliterature. In the online framework, hypotheses arrive sequentially over time, and decisionsmust be made in real time based only on past information, before future test statistics areobserved. The lack of knowledge about future test statistics makes the task of controlling an overall error measure substantially more challenging than in the offline setting.
The talk introduces the ‘Grouped Online Testing Algorithm (GOTA)’ , which integrates ideasfrom both online and offline multiple testing to address settings in which hypotheses arrivein groups over a potentially infinite sequence. Unlike most existing multiple testingprocedures that rely on p-values, GOTA is built using the local false discovery rate as itsfundamental building block. I will discuss the theoretical properties of the algorithm,including its guarantees for controlling an overall error measure, as well as its practicalperformance. Simulation studies demonstrate that the proposed method achieves substantially higher power than a comparable p-value–based procedure.
Given the current lack of multiple testing methods tailored to such grouped online settings,this work aims to fill an important methodological gap. The talk will also briefly reviewfoundational concepts in multiple hypothesis testing and highlight my related recentresearch. No prior background in multiple testing is assumed, and the talk is intended to be accessible to everyone interested.
Abstract: The study of statistics of random permutations is arguably the earliest result in probability. These statistics bring out deep connections with fields like combinatorics, number theory, and representation theory. The probability that a uniform random permutation has $k$ orbits/cycles is log-concave (in $k$). In fact, it was observed by Levy that the number of orbits of a random permutation has the same distribution as the sum of independent Bernoullis. This allows one to deduce a central limit theorem for the number of orbits of a uniform random permutation. The situation is more delicate for a random pair of commuting permutations. Consider a pair of commuting permutations drawn uniformly at random from the set of all commuting pairs of permutations. It was conjectured by (Nekrasov--Okunkov) Heim-Neuhauser that the probability it has $k$ orbits is (unimodal) log-concave. This problem remains widely open. In this talk, we will discuss some recent partial progress on this problem. In particular, we prove a CLT for the number of orbits of a random pair of commuting permutations.
Abstract: We describe a framework for Bayesian analysis of vector-valued time series of counts. The approach consists of a flexible level correlated model (LCM) framework for building hierarchical models that incorporate correlated latent level effects and temporal effects to model the multivariate data. This allows for faster computation than using the multivariate Poisson distribution, whose likelihood calculation can be slow as the vector dimension increases. The LCM framework is versatile and allows us to model many types of multivariate time series such as counts, positive-valued observations, etc. For count time series, this framework allows us to combine univariate distributions for counts (Poisson, negative binomial, ZIP, etc.) for each component series, while accounting for association among the components via an unobserved (latent) Gaussian random vector. We also allow for univariate autoregression (AR) or vector autoregression (VAR) evolution of the latent states. We employ the integrated nested Laplace approximation (INLA) setup for fast approximate Bayesian modeling via the R-INLA package, building custom functions to handle the VAR evolution. We illustrate our approach using intra-day financial data streams. We show an application to analyzing financial data streams. This flexible framework can be easily extended to other scenarios such as modeling multivariate positive-valued time series, with application in several domains including ecology, marketing, and transportation safety.