About me

I am a fourth-year CS Ph.D. student at Stanford University. I am a member of Stanford Pervasive Parallelism Lab and LMCache. My advisor is Kunle Olukotun. I have also been working with James Zou, Junchen Jiang, and Muhammad Shahbaz.

I am interested in building AI systems that continuously learn from experience and improve over time, e.g. Caravan, ACE, etc.

The pronunciation of my first name (Qizheng) is very close to that of “keygen” in public key encryption. I also go by Alex.

Last updated: January 2026

You might be looking for…

Personal SF bay area boba/dining map: See here.

Publications

* indicates equivalent contribution

  • EvicPress: Joint KV-Cache Compression and Eviction for Efficient LLM Serving
    Shaoting Feng, Yuhan Liu, Xiaokun Chen, Hanchen Li, Samuel Shen, Kuntai Du, Zhuohan Gu, Rui Zhang, Yuyang Huang, Yihua Cheng, Jiayi Yao, Qizheng Zhang, Ganesh Ananthanarayanan, Junchen Jiang
    arXiv preprint, 2025 [paper]

  • Continuum: Efficient and Robust Multi-Turn LLM Agent Scheduling with KV Cache Time-to-Live
    Hanchen Li, Qiuyang Mang, Runyuan He, Qizheng Zhang, Huanzhi Mao, Xiaokun Chen, Alvin Cheung, Joseph Gonzalez, Ion Stoica
    arXiv preprint, 2025 [paper]

  • Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
    Qizheng Zhang*, Changran Hu*, Shubhangi Upasani, Boyuan Ma, Fenglu Hong, Vamsidhar Kamanuru, Jay Rainton, Chen Wu, Mengmeng Ji, Hanchen Li, Urmish Thakker, James Zou, Kunle Olukotun
    International Conference on Learning Representations (ICLR), 2026 [paper] [code]
    Media: VentureBeat, InfoQ, Discover AI, SambaNova AI, 机器之心, 量子位

  • FlowRL: Matching Reward Distributions for LLM Reasoning
    Xuekai Zhu, Daixuan Cheng, Dinghuai Zhang, Hengli Li, Kaiyan Zhang, Che Jiang, Youbang Sun, Ermo Hua, Yuxin Zuo, Xingtai Lv, Qizheng Zhang, Lin Chen, Fanghao Shao, Bo Xue, Yunchong Song, Zhenjie Yang, Ganqu Cui, Ning Ding, Jianfeng Gao, Xiaodong Liu, Bowen Zhou, Hongyuan Mei, Zhouhan Lin
    International Conference on Learning Representations (ICLR), 2026 [paper] [code]

  • Agentic Bridge Framework: Closing the Gap Between Agentic Capability and Performance Benchmarks
    Yun Du, Rubens Lacouture, Qizheng Zhang, Genghan Zhang, Tian Zhao, Kunle Olukotun
    NeurIPS Workshop on Machine Learning for Systems, 2025 [paper]

  • Agentic Plan Caching: Test-Time Memory for Fast and Cost-Efficient LLM Agents
    Qizheng Zhang, Michael Wornow, Kunle Olukotun
    Conference on Neural Information Processing Systems (NeurIPS), 2025 [paper]
    Short version: ICML 2025 ES-FoMo Workshop.
    Media: Discover AI

  • LowRA: Accurate and Efficient LoRA Fine-Tuning of LLMs under 2 Bits
    Zikai Zhou, Qizheng Zhang, Hermann Kumbong, Kunle Olukotun
    International Conference on Machine Learning (ICML), 2025 [paper] [code]

  • CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion
    Jiayi Yao, Hanchen Li, Yuhan Liu, Siddhant Ray, Yihua Cheng, Qizheng Zhang, Kuntai Du, Shan Lu, Junchen Jiang
    ACM European Conference on Computer Systems (EuroSys), 2025 [paper] [code]
    EuroSys Best Paper Award

  • CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving
    Yuhan Liu, Hanchen Li, Yihua Cheng, Siddhant Ray, Yuyang Huang, Qizheng Zhang, Kuntai Du, Jiayi Yao, Shan Lu, Ganesh Ananthanarayanan, Michael Maire, Henry Hoffmann, Ari Holtzman, Junchen Jiang
    ACM Special Interest Group on Data Communication (SIGCOMM), 2024 [paper] [code]

  • Caravan: Practical Online Learning of In-Network ML Models with Labeling Agents
    Qizheng Zhang, Ali Imran, Enkeleda Bardhi, Tushar Swamy, Nathan Zhang, Muhammad Shahbaz, Kunle Olukotun
    USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2024 [paper] [code] [slides] [talk]
    Short version: the Compound AI Systems Workshop, SOSP PACMI’24 Workshop.
    SRC JUMP 2.0 Best Paper Award

  • The Dataflow Abstract Machine Simulator Framework
    Nathan Zhang, Rubens Lacouture, Gina Sohn, Paul Mure, Qizheng Zhang, Fredrik Kjolstad, Kunle Olukotun
    ACM/IEEE International Symposium on Computer Architecture (ISCA), 2024 [paper] [code]
    ISCA Distinguished Artifact Award

  • GRACE: Loss-Resilient Real-Time Video through Neural Codecs
    Yihua Cheng, Ziyi Zhang, Hanchen Li, Anton Arapin, Yue Zhang, Qizheng Zhang, Yuhan Liu, Kuntai Du, Xu Zhang, Francis Y. Yan, Amrita Mazumdar, Nick Feamster, Junchen Jiang
    USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2024 [website] [paper] [code]

  • OneAdapt: Fast Adaptation for Deep Learning Applications via Backpropagation
    Kuntai Du, Yuhan Liu, Yitian Hao, Qizheng Zhang, Haodong Wang, Yuyang Huang, Ganesh Ananthanarayanan, Junchen Jiang
    ACM Symposium on Cloud Computing (SoCC), 2023 [paper] [code]

  • Optimizing Real-Time Video Experience with Data Scalable Codec
    Hanchen Li*, Yihua Cheng*, Ziyi Zhang, Qizheng Zhang, Anton Arapin, Nick Feamster, Amrita Mazumdar
    ACM SIGCOMM Workshop on Emerging Multimedia Systems (EMS), 2023 [paper]

  • AccMPEG: Optimizing Video Encoding for Video Analytics
    Kuntai Du, Qizheng Zhang, Anton Arapin, Haodong Wang, Zhengxu Xia, Junchen Jiang
    Conference on Machine Learning and Systems (MLSys), 2022 [paper] [code]

  • Understanding the Potential of Server-Driven Edge Video Analytics
    Qizheng Zhang, Kuntai Du, Neil Agarwal, Ravi Netravali, Junchen Jiang
    ACM International Workshop on Mobile Computing Systems and Applications (HotMobile), 2022 [paper] [code] [slides] [talk]

  • Server-Driven Video Streaming for Deep Learning Inference
    Kuntai Du*, Ahsan Pervaiz*, Xin Yuan, Aakanksha Chowdhery, Qizheng Zhang, Henry Hoffmann, Junchen Jiang
    ACM Special Interest Group on Data Communication (SIGCOMM), 2020 [paper] [code]

Service

Reviewer for major AI and systems venues (NeurIPS, ICML, ICLR, AAAI, AISTATS, EuroSys).