... at the intersection of machine learning, computer vision, numerical optimization, healthcare, physical science, and engineering. Our current focus is to build the theoretical and computational foundations for machine/deep learning and apply machine/deep learning to tackle challenging scientific, engineering, and medical problems.
Trustworthy, Safe, and Deployable AI
Modern machine/deep-learning (MDL) tools are highly performant on standard datasets, but break down quickly when being applied to real-world data, i.e., lacking robustness, and fall short of explainability (why it works), interpretability (whether it works understandably and plausibly), fairness (sub-populations are treated fairly), and many other social metrics. Our current research focuses on robustness, fairness, and safety (so that AI doesn’t cause liabilities), major roadblocks to deploying deep learning tools in high-stakes environments such as autonomous vehicles and healthcare. A few highlights of our work:
- Robustness evaluation: We developed the first reliable algorithm framework for the practical robustness evaluation of MDL models, for any differentiable attack models.
- Selective prediction: We derived the first selective classification method that can work under a broad family of distribution shifts.
- AI safety via watermarking:
- We proposed a simple evasion method for image watermarks, which operates on a very different principle compared to state-of-the-art watermark evasion methods, and hence essentially complements the existing evasion methods.
- We also cleaned up the muddle around training-based watermarking methods---largely built on steganography, and proposed the first practical and rigorous training formulation and evaluation protocol for watermark detection and identification.
- Learning with imbalanced data: Funded by the National Cancer Institute of NIH, we are developing a comprehensive framework for learning with imbalanced data, ignoring which leads to highly biased MDL models:
- We showed classical methods such as reweighting and resampling for handling data imbalance can be very suboptimal;
- We have derived the first principled and exact methods to solve learning problems such as optimize-recall-with-fixed-precision, optimize-precision-with-fixed-recall, and optimize F1 score, and optimize average-precision.
Computation for AI
When machine/deep-learning (MDL) tools are applied to tackle science, engineering, and medical problems, domain knowledge such as physical laws, design targets, and medical conditions often maps to constraints in the resulting problem formulations. Standard practice to date turns these constraints into “soft” regularization that can easily lead to infeasible solutions. We aim to develop principled computing methods and software frameworks to solve deep learning problems with explicit constraints. A few highlights of our work:
- Machine/deep learning problems with nontrivial hard constraints:
- We developed the first principled and user-friendly computing framework, NCVX, built on PyTorch that can handle general MDL problems with nontrivial constraints;
- We wrote the first review paper to bridge the MDL and optimization communities to about state of the arts and, more importantly, open research directions in this field
AI for Science and Engineering
People have mostly used machine/deep learning (MDL) to incrementally improve the solutions to many problems, but we firmly believe the true power of MDL lies in solving grand open problems in the hardest regimes. Our current focus is tackling difficult scientific inverse and design problems, collaborating with people from material science, civil engineering, and beyond. A few highlights of our work:
- Foundational methods for inverse problems: Lots of scientific (e.g., microscopy) and engineering (e.g., 3D vision) problems take the form of estimating certain object of interest $\mathbf x$ from measurements $\mathbf y \approx f(\mathbf x)$, where $f$ models the measurement process. These are inverse problems.
- For inverse problems with symmetries in $f$, we have shown how careless deployment of supervised MDL can be very suboptimal.
- We have developed the most simple and effective ways of solving difficult (nonlinear) inverse problems using pretrained diffusion models.
- For untrained deep generative priors (e.g., deep image prior, implicit neural representation), we have developed the first general-purpose early stopping strategy to prevent their overfitting to noisy measurements, removing a substantial practical barrier facing these problems.
- By carefully customizing and optimizing the deep image prior, we have made major breakthroughs in decade-old nonlinear inverse problems, blind image deblurring, and Fourier phase retrieval, which impact numerous scientific and engineering fields.
- Materials discovery with physics-informed constraint modeling:
- MDL-enabled engineering design:
AI for Healthcare
General, trustworthy AI is a grand and remote goal. Healthcare is a field that is relatively narrow and well-controlled (e.g., chest x-rays and CTs are taken in lab environments that are far less complicated than those countered by autonomous vehicles), and hence we anticipate that modern AI is likely to produce concrete impacts on healthcare in reasonably near terms. We are actively collaborating with multiple research groups of the medical school and M Health Fairview to modernize healthcare using modern AI. A few highlights of our work:
- Quantification of tics in Tourette syndrome: Diagnosis and treatment of Tourette syndrome is current a painful process for doctors, as they need to spend hours watching the recording of patients' behaviors to make decisions. Funded by the National Institute of Neurological Disorders and Stroke of NIH, we are developing novel computer vision and MDL tools to substantially expedite the process.