Congratulations to Our Newest PhD Graduate, Dr. Jaeyoung Lee!

The Department of Statistics at Virginia Tech is proud to announce and celebrate the successful completion of Dr. Jaeyoung Lee's doctoral degree requirements.

Dr. Jaeyoung Lee successfully defended his dissertation titled "Supervised Variational Autoencoders for Structural Learning and Statistical Inference with Heterogeneous Data" on May 26, 2026, a significant milestone representing years of rigorous work, dedication, and impactful research.

Title: Supervised Variational Autoencoders for Structural Learning and Statistical Inference with Heterogeneous Data.

Abstract: 

Large-scale datasets, such as images, often exhibit heterogeneous structures caused by diverse subpopulations or complex experimental designs. Extracting meaningful low-dimensional representations from such data, while accounting for heterogeneity and enabling statistical inference, remains a significant challenge. This dissertation addresses two related challenges within variational autoencoder (VAE) frameworks. The first project introduces the Generalized Variational Autoencoder (GVAE), a unified deep generative model that integrates dimension reduction, structural learning, and adaptive prediction into a single framework. GVAE constructs a composite latent space consisting of a Gaussian component for feature extraction and a stick-breaking process component for capturing latent subpopulation structure. A mixture-of-experts predictor linked to this composite latent space produces predictions that adapt to the subgroups. Simulation studies and an application to brain tumor MRI data demonstrate that GVAE achieves improved predictive accuracy over competing models while effectively recovering the underlying latent heterogeneous structure. The conditional generative component further reveals how images vary with the response, providing an additional tool for understanding complex data. The second project extends the VAE framework to association testing by proposing a two-step M-estimation approach in which VAE encoder parameters serve as first-step nuisance estimators for testing the association between high-dimensional inputs and a scalar response variable. We establish two Wilks-type asymptotic results depending on how the first-step nuisance parameters are obtained. When the nuisance parameters are trained with an unsupervised VAE, the classical Wilks' theorem holds, and the two-step likelihood ratio statistic converges to chi-square under the null. When the parameters are trained with a supervised VAE, the classical approximation may fail and we conjecture that the scaled statistic follows an approximate chi-square distribution. Through simulation studies, we investigate the asymptotic behavior of the proposed test statistic and evaluate the type I error and power.

Dr. Jaeyoung Lee will be joining the Indiana University School of Medicine as a Postdoctoral Researcher.