About me
I am a third-year CS Ph.D. student at Stanford University. My advisor is Kunle Olukotun.
My research interest is broadly in computer systems and machine learning. In particular, I focus on addressing system challenges (e.g. performance and resource usage, security and reliability) in scaling AI models (e.g. large language models, video analytics models). Three selected projects:
- Caravan (OSDI 2024, leading): Keeping specialized in-network ML models up-to-date with changing network traffic dynamics with an LLM-based data labeling agent.
- CacheGen (SIGCOMM 2024): Compressing LLM KV cache into compact bitstreams for streaming and fast model inference serving.
- DDS (SIGCOMM 2020): Scaling video analytics models to cheap and compute-constraint edges with server-driven streaming of salient video regions.
Previously, I obtained my bachelor’s degree from the University of Chicago, with three majors in mathematics, computer science, and statistics. During my undergraduate years, I was fortunate to work with Junchen Jiang and Ravi Netravali on computer networking, with a focus on designing networked systems for video streaming and analytics.
The pronunciation of my first name (Qizheng) is very close to that of “keygen” in public key encryption. I also go by Alex.
Last updated: November 2024
You might be looking for…
Personal SF bay area boba/dining map: See here.
For Autumn 2024, the Stanford systems reading group is Tuesday every week 2 - 3 pm. We read and discuss research papers in the general domain of systems. The webpage is here. Sign up for the mailing list here. We have free and high-quality boba for all participants, so please consider joining!
Publications
* indicates equivalent contribution
CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion
Jiayi Yao, Hanchen Li, Yuhan Liu, Siddhant Ray, Yihua Cheng, Qizheng Zhang, Kuntai Du, Shan Lu, Junchen Jiang
ACM European Conference on Computer Systems (EuroSys), 2025 [paper] [code]CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving
Yuhan Liu, Hanchen Li, Yihua Cheng, Siddhant Ray, Yuyang Huang, Qizheng Zhang, Kuntai Du, Jiayi Yao, Shan Lu, Ganesh Ananthanarayanan, Michael Maire, Henry Hoffmann, Ari Holtzman, Junchen Jiang
ACM Special Interest Group on Data Communication (SIGCOMM), 2024 [paper] [code]Caravan: Practical Online Learning of In-Network ML Models with Labeling Agents
Qizheng Zhang, Ali Imran, Enkeleda Bardhi, Tushar Swamy, Nathan Zhang, Muhammad Shahbaz, Kunle Olukotun
USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2024 [paper] [code] [slides] [talk]
A shorter version of this work was presented at the Compound AI Systems Workshop and the PACMI’24 Workshop.
SRC JUMP 2.0 Best Paper AwardThe Dataflow Abstract Machine Simulator Framework
Nathan Zhang, Rubens Lacouture, Gina Sohn, Paul Mure, Qizheng Zhang, Fredrik Kjolstad, Kunle Olukotun
ACM/IEEE International Symposium on Computer Architecture (ISCA), 2024 [paper] [code]
ISCA Distinguished Artifact AwardGRACE: Loss-Resilient Real-Time Video through Neural Codecs
Yihua Cheng, Ziyi Zhang, Hanchen Li, Anton Arapin, Yue Zhang, Qizheng Zhang, Yuhan Liu, Kuntai Du, Xu Zhang, Francis Y. Yan, Amrita Mazumdar, Nick Feamster, Junchen Jiang
USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2024 [website] [paper] [code]OneAdapt: Fast Adaptation for Deep Learning Applications via Backpropagation
Kuntai Du, Yuhan Liu, Yitian Hao, Qizheng Zhang, Haodong Wang, Yuyang Huang, Ganesh Ananthanarayanan, Junchen Jiang
ACM Symposium on Cloud Computing (SoCC), 2023 [paper] [code]Optimizing Real-Time Video Experience with Data Scalable Codec
Hanchen Li*, Yihua Cheng*, Ziyi Zhang, Qizheng Zhang, Anton Arapin, Nick Feamster, Amrita Mazumdar
ACM SIGCOMM Workshop on Emerging Multimedia Systems (EMS), 2023 [paper]AccMPEG: Optimizing Video Encoding for Video Analytics
Kuntai Du, Qizheng Zhang, Anton Arapin, Haodong Wang, Zhengxu Xia, Junchen Jiang
Conference on Machine Learning and Systems (MLSys), 2022 [paper] [code]Understanding the Potential of Server-Driven Edge Video Analytics
Qizheng Zhang, Kuntai Du, Neil Agarwal, Ravi Netravali, Junchen Jiang
ACM International Workshop on Mobile Computing Systems and Applications (HotMobile), 2022 [paper] [code] [slides] [talk]Server-Driven Video Streaming for Deep Learning Inference
Kuntai Du*, Ahsan Pervaiz*, Xin Yuan, Aakanksha Chowdhery, Qizheng Zhang, Henry Hoffmann, Junchen Jiang
ACM Special Interest Group on Data Communication (SIGCOMM), 2020 [paper] [code]
Teaching
Stanford CS 244: Advanced Topics in Networking
Course Assistant, Spring 2024UChicago CMSC 23000: Operating Systems
Course Assistant, Autumn 2021
Service
- Program Committee: EuroSys 2025 (Shadow PC)
- Artifact Evaluation Committee: MLSys 2023, OSDI 2023, ATC 2023
- Conference Reviewer: NeurIPS 2024, ICLR 2025, AISTATS 2025
- Workshop Reviewer: Workshop on Machine Learning and Compression (NeurIPS 2024)
- Stanford Computer Science Undergraduate Mentoring Program: 2024
- Stanford Computer Science Student Applicant Support Program: 2022, 2023, 2024