Mathematical Sciences Department PhD Dissertation Defense - Xiaohui Chen "Novel Statistical Methods for Aggregating Correlated and Missing Data with Applications to Chronic Disease Research" (UH 420)
12:00 p.m. to 2:00 p.m.

Mathematical Sciences Department
PhD Dissertation Defense
Xiaohui Chen, PhD Candidate
Friday, August 4, 2023
12:00 pm - 2:00 pm
Unity Hall 420
Title: Novel Statistical Methods for Aggregating Correlated and Missing
Data with Applications to Chronic Disease Research
Abstract:
Information-aggregation methods are crucial for identifying risk
factors for chronic diseases, analyzing treatment effects, and handling missing
data problems. However, significant gaps persist in the literature, including
the lack of signal-adaptive methods for summary statistics, inadequate study
of the correlation-robustness properties of hypothesis-testing methods, and
insufficient methods for large missing rates. In this dissertation, we aim to
address these gaps and advance relevant statistical methodology.
In the first part, we propose a new signal-adaptive analysis pipeline to
address unknown signal patterns using the omnibus thresholding Fisher’s
method (oTFisher). The oTFisher remains robustly powerful over various
patterns of genetic effects. Its adaptive thresholding can be applied to estimate
important single nucleotide polymorphisms (SNPs) contributing to the overall
significance of the given SNP set. Efficient calculation algorithms are
developed to control the type I error rate, which accounts for the linkage
disequilibrium among SNPs. Extensive simulations show that the oTFisher
has robustly high power and provides higher balanced accuracy in screening
SNPs than the traditional Bonferroni and FDR procedures. We apply the
oTFisher to study the genetic association of genes and haplotype blocks of the
bone density-related traits using the GWAS summary data of the Genetic
Factors for Osteoporosis Consortium. The oTFisher identifies more novel and
literature-reported genetic factors than existing p-value combination methods.
Next, we provide theoretical analyses examining the correlation-
robustness properties of hypothesis-testing methods in analyzing correlated
data. We focus specifically on two classical tests - the minimum P-value
(minP) and the Simes tests. Our investigation delves into the tail probabilities
of the minP and the Simes tests under the Gaussian mean model, considering
an arbitrary correlation matrix. Our study reveals that both tests demonstrate
asymptotic robustness to any non-perfect correlations. These findings hold
significant practical implications, particularly when calculating extreme tail
probabilities, as seen in scenarios requiring stringent type I error control in
large-scale data analysis. Utilizing the approximation by the probability under
independence could significantly expedite computation for analyzing large
datasets.
In the third part of this research, we study the missing data problems with
high missing rates across different time points in pulmonary arterial
hypertension. The COVID-19 pandemic introduced new challenges, such as
high missing rates and unverifiable missing assumptions, that affect the
measurement of drug effects. Multiple imputation methods are systemically
compared to address the high missing rate issue based on remotely collected
data (e.g., actigraphy data) under a simulation study. Four scenarios are
considered in the simulation: missingness due to missing at random, adverse
events, lack of efficacy, and a mixture case. We demonstrate that traditional
parametric methods in the Bayesian framework have a high relative bias with
a 40% missing rate. However, adding remotely available data related to the
primary outcome and imputing the missingness by the best guess of reasons
can lead to smaller relative biases.
Dissertation Committee:
Dr. Zheyang Wu, WPI (Advisor)
Dr. Qingshuo Song, WPI
Dr. Fangfang Wang, WPI
Dr. Dali Zhou, U.S. Food and Drug Administration
Dr. Jian Zou, WPI