Research
I am currently focused on research in
data-centric, resource-efficient, and scalable AI, which I believe
are essential for advancing the adoption of AI in real-world applications.
My earlier work primarily focused on data management and AI techniques in the realm of time-series (TS) and
spatiotemporal (ST) data. For now, these interests have broadened to include relational data, unstructured data,
and multimodal data.
I'm lucky to work and grow with an amazing group of passionate and talented young people at Zhejiang University.
Our research group is called
Sustainable Data Intelligence
and Data Systems [SuDIS].
Lately, we've been focusing on pushing the boundaries in two key areas.
- Data + AI:
- data storage [HyperMR, SIGMOD'25], knowledge amalgamation [BoKA, KDD'24], data protection
[PIECK, ICDE'24], data selection [CHASe, TKDE'25], data augmentation and synthesis
[CogSQL, AAAI'24] for AI model training and serving;
- AI-empowered data cleansing [MPIN, PVLDB'24; BiSIM, ICDE'23], data discovery [nlcTables,
SIGIR'25], and data ingestion [Hippo, SIGMOD'25].
- Efficient AI:
- lightweight AI models in edge computing [LightCTS, SIGMOD'23; E2USD, WWW'24; ReCTSi,
KDD'24; LightCTS*, TKDE'24] and federated learning [LightTR, ICDE'24; FedBFPT,
IJCAI'23];
- efficient LLM inference [HMI, VLDBJ'25; Draft&Verify, ACL'24] and fine-tuning [LoRAM,
ICLR'25].
Quick Access
- Codebase
"ReCTSi: Resource-efficient Correlated Time Series Imputation via Decoupled Pattern Learning and
Completeness-aware Attentions" (KDD 2024).
- Codebase "Draft & Verify: Lossless Large Language Model Acceleration via
Self-Speculative Decoding" (ACL 2024).
- A systematic review on Tabular Data Augmentation for Machine Learning.
- Check out the Champion Solution to Hybrid Vector Search (SIGMOD 2024
Programming Contest), w. SUSTech DB Group.
- Codebase "Missing value imputation for multi-attribute sensor data streams via
message propagation" (VLDB 2024).
- Codebase "LightCTS: A lightweight framework for correlated time series
forecasting" (SIGMOD 2023).
- SIGMOD 2022 Tutorial "Spatial data quality in the IoT era: Management and
exploitation" (with B. Tang, H. Lu, M.A. Cheema, and C.S. Jensen).
- Author
Version "Spatial data quality in the Internet of Things: Management, exploitation, and prospects"
(CSUR 2022).
- Benchmark
"Indoor spatial queries: Modeling, indexing, and processing" (EDBT 2020).
- EU Marie
Skłodowska-Curie Individual Fellowship "MALOT: Managing Mobility Data Quality for Location of
Things".