Mathematical Sciences - PhD Dissertation Proposal "Hierarchical Bayesian Models for Binary Data from Sub-Areas" by Lu Chen

Monday, April 23, 2018
1:30 pm to 3:30 pm
Floor/Room #: 

Lu Chen PhD Dissertation Proposal Presentation

Title: Hierarchical Bayesian Models for Binary Data from Sub-Areas

Abstract: Many population-based surveys have binary responses from a large number of individuals in each household within small areas. An example is the Nepal Living Standards Survey (NLSS II), in which health status binary data (good versus poor) for each individual from sampled households (sub-areas) are available in sampled wards (small areas). To make inference for the finite population proportion of individuals in each household, we develop two hierarchical Bayesian models. The first model is the sub-area Beta-Binomial model without covariates. We use an approximation method with random sampling to fit the model efficiently. We applied our model to NLSS II data to show that the approximation method can provide good estimates as the exact method. The second model is the sub-area logistic regression model with reliable auxiliary information. The contribution of this model is twofold. First, we extend an area-level model to a sub-area level model. Second, because there are numerous sub-areas, standard Markov chain Monte Carlo (MCMC) methods to find the joint posterior density are very time consuming. Therefore, we provide a sampling-based method, the integrated nested normal approximation (INNA), which permits fast computation. Our main goal is to describe this twofold hierarchical Bayesian logistic regression model and to show that the computation is much faster than the exact MCMC method and also reasonably accurate. The performance of our method is studied by using NLSS II data. We further compare this model with the one-fold logistic regression model, and using NLSS II data, we show that the twofold model is preferred over the one-fold model that ignores the sub-areas within areas. Our models can borrow strength from both areas and sub-areas to obtain more efficient and precise estimates. The hierarchical structure of our model captures the variation in the binary data reasonably well.