About me

I am a fourth-year CS Ph.D. student at Stanford University. I am a member of Stanford Pervasive Parallelism Lab and LMCache. My advisor is Kunle Olukotun. I have also been working with Junchen Jiang and Muhammad Shahbaz.

My research interest is broadly in computer systems (networking, operating/distributed systems, architecture) and machine learning. Some projects that I have been working on are:

Previously, I obtained my bachelor’s degree from the University of Chicago, with three majors in mathematics, computer science, and statistics. During my undergraduate years, I was fortunate to work with Junchen Jiang and Ravi Netravali on computer networking, with a focus on designing networked systems for video streaming and analytics. I had also interned at the Mathematics and Computer Science Division (MCS) at Argonne National Laboratory. Past projects during college:

The pronunciation of my first name (Qizheng) is very close to that of “keygen” in public key encryption. I also go by Alex.

Last updated: October 2025

You might be looking for…

Personal SF bay area boba/dining map: See here.

Publications

* indicates equivalent contribution

  • Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
    Qizheng Zhang*, Changran Hu*, Shubhangi Upasani, Boyuan Ma, Fenglu Hong, Vamsidhar Kamanuru, Jay Rainton, Chen Wu, Mengmeng Ji, Hanchen Li, Urmish Thakker, James Zou, Kunle Olukotun
    arXiv preprint, 2025 [paper]

  • Agentic Bridge Framework: Closing the Gap Between Agentic Capability and Performance Benchmarks
    Yun Du, Rubens Lacouture, Qizheng Zhang, Genghan Zhang, Tian Zhao, Kunle Olukotun
    NeurIPS Workshop on Machine Learning for Systems, 2025

  • FlowRL: Matching Reward Distributions for LLM Reasoning
    Xuekai Zhu, Daixuan Cheng, Dinghuai Zhang, Hengli Li, Kaiyan Zhang, Che Jiang, Youbang Sun, Ermo Hua, Yuxin Zuo, Xingtai Lv, Qizheng Zhang, Lin Chen, Fanghao Shao, Bo Xue, Yunchong Song, Zhenjie Yang, Ganqu Cui, Ning Ding, Jianfeng Gao, Xiaodong Liu, Bowen Zhou, Hongyuan Mei, Zhouhan Lin
    arXiv preprint, 2025 [paper] [code]

  • Agentic Plan Caching: Test-Time Memory for Cost-Efficient LLM Agents
    Qizheng Zhang, Michael Wornow, Kunle Olukotun
    Conference on Neural Information Processing Systems (NeurIPS), 2025 [paper]
    A short version of this work was published at ICML 2025 Workshop on Efficient Systems for Foundation Models (ES-FoMo).
    Thanks to the Discover AI youtube channel for making a nice video about our work!

  • LowRA: Accurate and Efficient LoRA Fine-Tuning of LLMs under 2 Bits
    Zikai Zhou, Qizheng Zhang, Hermann Kumbong, Kunle Olukotun
    International Conference on Machine Learning (ICML), 2025 [paper] [code]

  • CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion
    Jiayi Yao, Hanchen Li, Yuhan Liu, Siddhant Ray, Yihua Cheng, Qizheng Zhang, Kuntai Du, Shan Lu, Junchen Jiang
    ACM European Conference on Computer Systems (EuroSys), 2025 [paper] [code]
    EuroSys Best Paper Award

  • CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving
    Yuhan Liu, Hanchen Li, Yihua Cheng, Siddhant Ray, Yuyang Huang, Qizheng Zhang, Kuntai Du, Jiayi Yao, Shan Lu, Ganesh Ananthanarayanan, Michael Maire, Henry Hoffmann, Ari Holtzman, Junchen Jiang
    ACM Special Interest Group on Data Communication (SIGCOMM), 2024 [paper] [code]

  • Caravan: Practical Online Learning of In-Network ML Models with Labeling Agents
    Qizheng Zhang, Ali Imran, Enkeleda Bardhi, Tushar Swamy, Nathan Zhang, Muhammad Shahbaz, Kunle Olukotun
    USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2024 [paper] [code] [slides] [talk]
    A short version of this work was published at the Compound AI Systems Workshop and the SOSP PACMI’24 Workshop.
    SRC JUMP 2.0 Best Paper Award

  • The Dataflow Abstract Machine Simulator Framework
    Nathan Zhang, Rubens Lacouture, Gina Sohn, Paul Mure, Qizheng Zhang, Fredrik Kjolstad, Kunle Olukotun
    ACM/IEEE International Symposium on Computer Architecture (ISCA), 2024 [paper] [code]
    ISCA Distinguished Artifact Award

  • GRACE: Loss-Resilient Real-Time Video through Neural Codecs
    Yihua Cheng, Ziyi Zhang, Hanchen Li, Anton Arapin, Yue Zhang, Qizheng Zhang, Yuhan Liu, Kuntai Du, Xu Zhang, Francis Y. Yan, Amrita Mazumdar, Nick Feamster, Junchen Jiang
    USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2024 [website] [paper] [code]

  • OneAdapt: Fast Adaptation for Deep Learning Applications via Backpropagation
    Kuntai Du, Yuhan Liu, Yitian Hao, Qizheng Zhang, Haodong Wang, Yuyang Huang, Ganesh Ananthanarayanan, Junchen Jiang
    ACM Symposium on Cloud Computing (SoCC), 2023 [paper] [code]

  • Optimizing Real-Time Video Experience with Data Scalable Codec
    Hanchen Li*, Yihua Cheng*, Ziyi Zhang, Qizheng Zhang, Anton Arapin, Nick Feamster, Amrita Mazumdar
    ACM SIGCOMM Workshop on Emerging Multimedia Systems (EMS), 2023 [paper]

  • AccMPEG: Optimizing Video Encoding for Video Analytics
    Kuntai Du, Qizheng Zhang, Anton Arapin, Haodong Wang, Zhengxu Xia, Junchen Jiang
    Conference on Machine Learning and Systems (MLSys), 2022 [paper] [code]

  • Understanding the Potential of Server-Driven Edge Video Analytics
    Qizheng Zhang, Kuntai Du, Neil Agarwal, Ravi Netravali, Junchen Jiang
    ACM International Workshop on Mobile Computing Systems and Applications (HotMobile), 2022 [paper] [code] [slides] [talk]

  • Server-Driven Video Streaming for Deep Learning Inference
    Kuntai Du*, Ahsan Pervaiz*, Xin Yuan, Aakanksha Chowdhery, Qizheng Zhang, Henry Hoffmann, Junchen Jiang
    ACM Special Interest Group on Data Communication (SIGCOMM), 2020 [paper] [code]

Teaching

  • Stanford CS 244: Advanced Topics in Networking
    Course Assistant, Spring 2024

  • UChicago CMSC 23000: Operating Systems
    Course Assistant, Autumn 2021

Service

  • Program Committee: EuroSys 2025 (Shadow PC)
  • Area Chair: NeurIPS 2025 Workshop on Multi-Turn Interactions in Large Language Models (MTI-LLM)
  • Conference Reviewer: NeurIPS 2024, ICLR 2025, ICML 2025, NeurIPS 2025 Datasets and Benchmarks Track, AAAI 2026, ICLR 2026, AISTATS 2026
  • Workshop Reviewer: NeurIPS 2024 Workshop on Machine Learning and Compression, ICML 2025 Workshop on Efficient Systems for Foundation Models (ES-FoMo), NeurIPS 2025 Workshop on Continual and Compatible Foundation Model Updates (CCFM)
  • Artifact Evaluation Committee: MLSys 2023, OSDI 2023, ATC 2023
  • Stanford Computer Science Undergraduate Mentoring Program: 2024
  • Stanford Computer Science Student Applicant Support Program: 2022, 2023, 2024