Estimating the proportion of signals in high-dimensional data via integral equations

Primary author: Xiongzhi Chen

Primary college/unit: Arts and Sciences
Campus: Pullman


In scientific endeavors such as identifying genes that may be associated with a disease, a researcher often simultaneously tests many null hypotheses (e.g., as many null hypotheses as the number genes under investigation) using some high-dimensional data. This makes the proportion of signals, i.e., “the proportion of false null hypotheses” (e.g., the proportion of genes that are associated with a disease), a very important quantity. In particular, accurate information on the proportion increases the accuracy and power of the decisions to be made. However, the proportion is unknown in practice and needs to be estimated. Even though there are several major methods to estimate the proportion, they are very restrictive, in that they are statistically inconsistent or require stringent modeling assumptions. To eliminate the shortcomings of these existing estimators, uniformly consistent estimators of the proportion are constructed as solutions to Lebesgue-Stieltjes integral equations. Their excellent performances are verified by simulation studies.