About me

I am a third-year CS Ph.D. student at Stanford University. My advisor is Kunle Olukotun.

My research interest is broadly in computer systems and machine learning. In particular, I focus on addressing system challenges (e.g. performance and resource usage, security and reliability) in scaling AI models (e.g. large language models, video analytics models). Three selected projects:

  • Caravan (OSDI 2024, leading): Keeping specialized in-network ML models up-to-date with changing network traffic dynamics with an LLM-based data labeling agent.
  • CacheGen (SIGCOMM 2024): Compressing LLM KV cache into compact bitstreams for streaming and fast model inference serving.
  • DDS (SIGCOMM 2020): Scaling video analytics models to cheap and compute-constraint edges with server-driven streaming of salient video regions.

Previously, I obtained my bachelor’s degree from the University of Chicago, with three majors in mathematics, computer science, and statistics. During my undergraduate years, I was fortunate to work with Junchen Jiang and Ravi Netravali on computer networking, with a focus on designing networked systems for video streaming and analytics.

The pronunciation of my first name (Qizheng) is very close to that of “keygen” in public key encryption. I also go by Alex.

Last updated: November 2024

You might be looking for…

Personal SF bay area boba/dining map: See here.

For Autumn 2024, the Stanford systems reading group is Tuesday every week 2 - 3 pm. We read and discuss research papers in the general domain of systems. The webpage is here. Sign up for the mailing list here. We have free and high-quality boba for all participants, so please consider joining!

Publications

* indicates equivalent contribution

  • CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion
    Jiayi Yao, Hanchen Li, Yuhan Liu, Siddhant Ray, Yihua Cheng, Qizheng Zhang, Kuntai Du, Shan Lu, Junchen Jiang
    ACM European Conference on Computer Systems (EuroSys), 2025 [paper] [code]

  • CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving
    Yuhan Liu, Hanchen Li, Yihua Cheng, Siddhant Ray, Yuyang Huang, Qizheng Zhang, Kuntai Du, Jiayi Yao, Shan Lu, Ganesh Ananthanarayanan, Michael Maire, Henry Hoffmann, Ari Holtzman, Junchen Jiang
    ACM Special Interest Group on Data Communication (SIGCOMM), 2024 [paper] [code]

  • Caravan: Practical Online Learning of In-Network ML Models with Labeling Agents
    Qizheng Zhang, Ali Imran, Enkeleda Bardhi, Tushar Swamy, Nathan Zhang, Muhammad Shahbaz, Kunle Olukotun
    USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2024 [paper] [code] [slides] [talk]
    A shorter version of this work was presented at the Compound AI Systems Workshop and the PACMI’24 Workshop.
    SRC JUMP 2.0 Best Paper Award

  • The Dataflow Abstract Machine Simulator Framework
    Nathan Zhang, Rubens Lacouture, Gina Sohn, Paul Mure, Qizheng Zhang, Fredrik Kjolstad, Kunle Olukotun
    ACM/IEEE International Symposium on Computer Architecture (ISCA), 2024 [paper] [code]
    ISCA Distinguished Artifact Award

  • GRACE: Loss-Resilient Real-Time Video through Neural Codecs
    Yihua Cheng, Ziyi Zhang, Hanchen Li, Anton Arapin, Yue Zhang, Qizheng Zhang, Yuhan Liu, Kuntai Du, Xu Zhang, Francis Y. Yan, Amrita Mazumdar, Nick Feamster, Junchen Jiang
    USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2024 [website] [paper] [code]

  • OneAdapt: Fast Adaptation for Deep Learning Applications via Backpropagation
    Kuntai Du, Yuhan Liu, Yitian Hao, Qizheng Zhang, Haodong Wang, Yuyang Huang, Ganesh Ananthanarayanan, Junchen Jiang
    ACM Symposium on Cloud Computing (SoCC), 2023 [paper] [code]

  • Optimizing Real-Time Video Experience with Data Scalable Codec
    Hanchen Li*, Yihua Cheng*, Ziyi Zhang, Qizheng Zhang, Anton Arapin, Nick Feamster, Amrita Mazumdar
    ACM SIGCOMM Workshop on Emerging Multimedia Systems (EMS), 2023 [paper]

  • AccMPEG: Optimizing Video Encoding for Video Analytics
    Kuntai Du, Qizheng Zhang, Anton Arapin, Haodong Wang, Zhengxu Xia, Junchen Jiang
    Conference on Machine Learning and Systems (MLSys), 2022 [paper] [code]

  • Understanding the Potential of Server-Driven Edge Video Analytics
    Qizheng Zhang, Kuntai Du, Neil Agarwal, Ravi Netravali, Junchen Jiang
    ACM International Workshop on Mobile Computing Systems and Applications (HotMobile), 2022 [paper] [code] [slides] [talk]

  • Server-Driven Video Streaming for Deep Learning Inference
    Kuntai Du*, Ahsan Pervaiz*, Xin Yuan, Aakanksha Chowdhery, Qizheng Zhang, Henry Hoffmann, Junchen Jiang
    ACM Special Interest Group on Data Communication (SIGCOMM), 2020 [paper] [code]

Teaching

  • Stanford CS 244: Advanced Topics in Networking
    Course Assistant, Spring 2024

  • UChicago CMSC 23000: Operating Systems
    Course Assistant, Autumn 2021

Service

  • Program Committee: EuroSys 2025 (Shadow PC)
  • Artifact Evaluation Committee: MLSys 2023, OSDI 2023, ATC 2023
  • Conference Reviewer: NeurIPS 2024, ICLR 2025, AISTATS 2025
  • Workshop Reviewer: Workshop on Machine Learning and Compression (NeurIPS 2024)
  • Stanford Computer Science Undergraduate Mentoring Program: 2024
  • Stanford Computer Science Student Applicant Support Program: 2022, 2023, 2024