Tensor Data Analysis
- Tensor SVD: Statistical and computational limits (with Dong Xia), IEEE Transactions on Information Theory, 64, 1-28, 2018. [An R implementation]
- Cross: Efficient low-rank tensor completion (single author), The Annals of Statistics, 47, 936-964, 2019. [R package]
- Optimal sparse singular value decomposition for high-dimensional high-order data (with Rungang Han), Journal of American Statistical Association, 114, 1708-1725, 2019. [R package]
- An optimal statistical and computational framework for generalized tensor estimation (with Rungang Han and Rebecca Willett), The Annals of Statistics, to appear, 2021.
- Open problem: Average-case hardness of hypergraphic planted clique detection (with Yuetian Luo), Conference on Learning Theory (COLT), 125, 3852-3856, 2020. [talk and slides]
- A sharp blockwise tensor perturbation bound for orthogonal iteration (with Yuetian Luo, Garvesh Raskutti, and Ming Yuan), Journal of Machine Learning Research, 22, 1-48, 2021.
- ISLET: fast and optimal low-rank tensor regression via importance sketchings (with Yuetian Luo, Garvesh Raskutti, and Ming Yuan), SIAM Journal on Mathematics of Data Science, 2, 444-479, 2020. [R package]
- Sparse and low-rank tensor estimation via cubic sketchings (with Botao Hao and Guang Cheng), IEEE Transactions on Information Theory, 66, 9, 2020.
- Denoising Atomic Resolution 4D Scanning Transmission Electron Microscopy Data with Tensor Singular Value Decomposition (with Chenyu Zhang, Rungang Han, and Paul Voyles), Ultramicroscopy, 219, 113123, 2020.
- Learning good state and action representations for Markov decision process via tensor decomposition (with Chengzhuo Ni, Yaqi Duan, Munther Dahleh, and Mengdi Wang), Journal of the Machine Learning Research, to appear.
- Inference for Low-rank Tensors -- No Need to Debias (with Dong Xia and Yuchen Zhou), The Annals of Statistics, to appear, 2021.
- Exact clustering in tensor block model: Statistical optimality and computational limit (with Rungang Han, Yuetian Luo, and Miaoyan Wang), Journal of the Royal Statistical Society, Series B, to appear.
(Rungang Han received the Student's Paper Award from the Statistical Learning and Data Science Section of the American Statistical Association, 2021 through this paper)
- Tensor clustering with planted structures: Statistical optimality and computational limits (with Yuetian Luo), the Annals of Statistics, to appear, 2021.
- Optimal high-order tensor SVD via tensor-train orthogonal iteration (with Yuchen Zhou, Lili Zheng, and Yazhen Wang), IEEE Transactions on Information Theory, to appear. [R Package]
- Low-rank tensor estimation via Riemannian Gauss-Newton: Statistical optimality and second-order convergence (with Yuetian Luo), Journal of Machine Learning Research, to appear.
- Guaranteed functional tensor singular value decomposition (with Rungang Han and Pixu Shi), Journal of the American Statistical Association, to appear.
- Statistical and computational limits for tensor-on-tensor association detection (with Ilias Diakonikolas, Daniel Kane, and Yuetian Luo), Proceedings of Thirty Sixth Conference on Learning Theory (COLT), 195, 5260-5310, 2023.
- Core shrinkage covariance estimation for matrix-variate data (with Peter Hoff and Andrew McCormack), Journal of the Royal Statistical Society, Series B, to appear.
- Learning polynomial transformations (with Sitan Chen, Jerry Li, and Yuanzhi Li)
- One-dimensional tensor network recovery (with Ziang Chen and Jianfeng Lu)
- Estimating higher-order mixed memberships via the l_{2,\infty} tensor perturbation bound (with Joshua Agterberg)
- Tensor-on-tensor regression: Riemannian optimization, over-parameterization, statistical-computational gap, and their interplay (with Yuetian Luo)
. [R Package]
- Jing Lei, Anru R. Zhang, Zihan Zhu (2023+), Computational and statistical thresholds in multi-layer stochastic block models.
Slides on this topic:
Electronic Health Records Data
- Shiyi Jiang, Xin Gai, Miriam Treggiari, William Stead, Yuankang Zhao, David Page, Anru R. Zhang (2023), Soft phenotyping for sepsis via EHR time-aware soft clustering, Journal of Biomedical Informatics, to appear.
- Muhang Tian, Bernie Chen, Allan Guo, Shiyi Jiang, Anru R. Zhang (2023+), Fast and reliable generation of EHR time series via diffusion models.
- Timeline registration for electronic health records (with Shiyi Jiang, Rungang Han, Krishnendu Chakrabarty, David Page, and William Stead), AMIA Summits on Translational Science Proceedings, 2023, 291-299.
(This paper won the Data Science Distinguished Paper Award from 2023 AMIA Informatics Summit. Only one paper receives this award.)
(Deep) Generative Models
- Muhang Tian, Bernie Chen, Allan Guo, Shiyi Jiang, Anru R. Zhang (2023+), Fast and reliable generation of EHR time series via diffusion models.
- Learning polynomial transformations (with Sitan Chen, Jerry Li, and Yuanzhi Li), 2023 Annual ACM Symposium on Theory of Computing (STOC), to appear.
- Sampling is as easy as learning the score: Theory for diffusion models with minimal data assumptions (with Sitan Chen, Sinho Chewi, Jerry Li, Yuanzhi Li, and Adil Salim), 2023 International Conference on Learning Representations (ICLR), accept: notable-top-5%.
Nonconvex/Manifold Optimization
- On geometric connections of embedded and quotient geometries in Riemannian fixed-rank matrix optimization (with Yuetian Luo and Xudong Li), Mathematics of Operations Research, to appear.
- Nonconvex Factorization and Manifold Formulations are Almost Equivalent in Low-rank Matrix Optimization (with Yuetian Luo and Xudong Li).
- Recursive importance sketching for rank constrained least squares (with Yuetian Luo, Xudong Li, and Wen Huang), Operations Research, to appear.
- Low-rank tensor estimation via Riemannian Gauss-Newton: Statistical optimality and second-order convergence (with Yuetian Luo), Journal of Machine Learning Research, under revision.
- Tensor-on-tensor regression: Riemannian optimization, over-parameterization, statistical-computational gap, and their interplay (with Yuetian Luo)
Slides on this topic:
Markov Process Process State Aggregation, Information-based Reinforcement Learning
- Learning Markov models via low-rank optimization (with Ziwei Zhu, Xudong Li and Mengdi Wang), Operations Research, to appear.
- Spectral state compression of Markov processes (with Mengdi Wang), IEEE Transactions on Information Theory, 66, 3202-3231, 2020.
- Estimation of Markov chain via rank-constrained likelihood (with Mengdi Wang and Xudong Li), International Conference on Machine Learning (ICML), PMLR 80:3033-3042, 2018.
- Learning good state and action representations via tractable tensor decomposition (with Chengzhuo Ni, Yaqi Duan and Mengdi Wang), Journal of the Machine Learning Research, tentatively accepted with minor revision.
Microbiome Data Analysis
- Guaranteed functional tensor singular value decomposition (with Rungang Han and Pixu Shi), Journal of the American Statistical Association, to appear.
- High-dimensional log-error-in-variable regression with applications to microbial compositional data analysis (with Pixu Shi and Yuchen Zhou), Biometrika, to appear, 2021.
- Multi-sample estimation of bacterial composition matrix in metagenomics data (with Yuanpei Cao and Hongzhe Li), Biometrika, 107, 75-92, 2020.
(This paper received Biometrics Early-Stage Investigator Award by the Biometrics Section of the American Statistical Association, 2019)
- Regression Analysis for Microbiome Compositional Data (with Pixu Shi and Hongzhe Li), The Annals of Applied Statistics, 10, 1019-1040, 2016. [Matlab package]
Matrix Estimation, Matrix Completion, Phase Retrieval
- Core shrinkage covariance estimation for matrix-variate data (with Peter Hoff and Andrew McCormack)
- Phase transition for detecting a small community in a large network (with Jiashun Jin, Tracy Ke, and Paxton Turner), 2023 International Conference on Learning Representations (ICLR), accepted.
- Recursive Importance Sketching for Rank Constrained Least Squares: Algorithms and High-order Convergence (with Yuetian Luo, Xudong Li, and Wen Huang).
- Structured matrix completion with applications in genomic data integration (with Tianxi Cai and Tony Cai), Journal of American Statistical Association, 111, 621-633, 2016. [R package]
- ROP: matrix recovery via rank-one projections (with Tony Cai), The Annals of Statistics, 43, 102-138, 2015.
Matrix PCA/SVD
- Nonparametric covariance estimation for mixed longitudinal studies, with applications in midlife women's health (with Kehui Chen), Statistica Sinica, 32, 345-365, 2022.
- A Schatten-q low-rank matrix perturbation analysis via perturbation projection error bound (with Yuetian Luo and Rungang Han), Linear Algebra and Its Applications, to appear, 2021.
- Heteroskedastic PCA: Algorithm, optimality, and applications (with Tony Cai and Yihong Wu), The Annals of Statistics, to appear.
- Rate-optimal perturbation bounds for singular subspaces with applications to high-Dimensional statistics (with Tony Cai), The Annals of Statistics, 46, 60-89, 2018.
Slides on this topic:
Semisupervised Inference
Slides on this topic:
Compressed Sensing and High-dimensional Regression
- Sparse Group Lasso: Optimal Sample Complexity, Convergence Rate, and Statistical Inference (with Tony Cai and Yuchen Zhou), IEEE Transactions on Information Theory, to appear.
- Sparse representation of a polytope and recovery of sparse signals and low-rank matrices (with Tony Cai), IEEE Transactions on Information Theory, 60, 122-132, 2014.
- Sharp RIP bound for sparse signal and low-rank matrix recovery (with Tony Cai), Applied and Computational Harmonic Analysis, 35, 74-93, 2013.
- Compressed sensing and affine rank minimization under restricted isometry (with Tony Cai), IEEE Transactions on Signal Processing, 61, 3279-3290, 2013.
High-dimensional Covariance Matrix Estimation
Applied Probability
Miscellaneous
- Instrumental variables estimation with some invalid instruments and its application to Mendelian randomization (with Hyunseung Kang, Tony Cai and Dylan Small), Journal of American Statistical Association, 111, 132-144, 2016. [R Package]
- Sequential rerandomization (with Quan Zhou, Philip Ernst, Kari Lock Morgan, and Donald Rubin), Biometrika, 105, 745-752, 2018.
Collaborative Research and Other Manuscripts
- Ventriculomegaly and postoperative intraventricular blood predict cerebrospinal fluid diversion following posterior fossa tumor resection (Park, C., Liu, B., Harward, S., Zhang, A. R. et al.), Journal of Neurosurgery: Pediatrics, to appear.
- Denoising Atomic Resolution 4D Scanning Transmission Electron Microscopy Data with Tensor Singular Value Decomposition (with Chenyu Zhang, Rungang Han, and Paul Voyles), Ultramicroscopy, 219, 113123, 2020.
- LTMG: a novel statistical modeling of transcriptional expression states in single-cell RNA-Seq data (Wan et al.) Nucleic Acids Research, 2019.
- High-dimensional statistical inference: from vector to matrix
PhD Thesis, 2015.
- Methods to calculate the upper bound of Gini coefficient based on grouped data and the result for China (with Pixu Shi)
preprint of Institute of Mathematics, Peking University, 2010-20.