Publications
Peer-reviewed Publications
- SchedFlow: Transparent and Flexible Intra-Device Parallelism via Programmable Operator Scheduling.
Yi Pan, Yile Gu, Jinbin Luo, Yibo Wu, Ziren Wang, Hongtao Zhang, Ziyi Xu, Shengkai Lin, Baris Kasikci, Stephanie Wang.
(To Appear) MLSys 2026, Bellevue, WA, USA, May 2026.
- RagInfer: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval.
Chien-Yu Lin*, Keisuke Kamahori*, Yiyu Liu, Xiaoxiang Shi, Madhav Kashyap, Yile Gu, Rulin Shao, Zihao Ye, Kan Zhu, Rohan Kadekodi, Stephanie Wang, Arvind Krishnamurthy, Rohan Kadekodi, Luis Ceze, Baris Kasikci.
(To Appear) MLSys 2026, Bellevue, WA, USA, May 2026. https://arxiv.org/abs/2502.20969
- Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs
Kan Zhu*, Tian Tang*, Qinyu Xu*, Yile Gu, Zhichen Zeng, Rohan Kadekodi, Liangyu Zhao, Ang Li, Arvind Krishnamurthy, Baris Kasikci.
(To Appear) ICLR 2026, Rio de Janeiro, Brazil, April 2026. https://arxiv.org/abs/2502.12216
- Mitigating Application Resource Overload with Targeted Task Cancellation
Yigong Hu, Zeyin Zhang, Yicheng Liu, Yile Gu , Shuangyu Lei, Baris Kasikci, Peng Huang.
SOSP 2025, Seoul, Republic of Korea, November 2025. https://dl.acm.org/doi/10.1145/3731569.3764835
- Scalable and Accurate Application-level Crash-Consistency Testing via Representative Testing
Yile Gu*, Ian Neal*, Jiexiao Xu, Shaun Christopher Lee, Ayman Said, Musa Haydar, Jacob Van Geffen, Rohan Kadekodi, Andrew Quinn, Baris Kasikci.
OOPSLA 2025, Singapore, October 2025. https://arxiv.org/abs/2503.01390
- NanoFlow: Towards Optimal Large Language Model Serving Throughput
Kan Zhu, Yufei Gao, Yilong Zhao, Liangyu Zhao, Gefei Zuo, Yile Gu , Dedong Xie, Zihao Ye, Keisuke Kamahori, Chien-Yu Lin, Ziren Wang, Stephanie Wang, Arvind Krishnamurthy, Baris Kasikci.
OSDI 2025, Boston, MA, USA, July 2025. https://arxiv.org/abs/2408.12757
- Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
Keisuke Kamahori*, Tian Tang*, Yile Gu, Kan Zhu, Baris Kasikci.
ICLR 2025, Singapore, May 2025. https://arxiv.org/abs/2402.07033
- Perseus: Removing Energy Bloat from Large Model Training
Jae-Won Chung, Yile Gu, Insu Jang, Luoxi Meng, Nikhil Bansal, Mosharaf Chowdhury.
SOSP 2024, Austin, TX, USA, November 2024. https://arxiv.org/abs/2312.06902
Preprints
- ConsumerBench: Benchmarking Generative AI Applications on End-User Devices
Yile Gu*, Rohan Kadekodi*, Hoang Nguyen, Keisuke Kamahori, Yiyu Liu, Baris Kasikci.
https://arxiv.org/abs/2506.17538
- Argos: Agentic Time-Series Anomaly Detection with Autonomous Rule Generation via Large Language Models
Yile Gu, Yifan Xiong, Jonathan Mace, Yuting Jiang, Yigong Hu, Baris Kasikci, Peng Cheng.
https://arxiv.org/abs/2501.14170
- The Streaming Batch Model for Efficient and Fault-Tolerant Heterogeneous Execution.
Frank Sifei Luan*, Ron Yifeng Wang*, Yile Gu, Ziming Mao, Charlotte Lin, Amog Kamsetty, Hao Chen, Cheng Su, Balaji Veeramani, Scott Lee, SangBin Cho, Clark Zinzow, Eric Liang, Ion Stoica, Stephanie Wang.
https://arxiv.org/abs/2501.12407
- Semantic Scheduling for LLM Inference
Wenyue Hua*, Dujian Ding*, Yile Gu, Yujie Ren, Kai Mei, Minghua Ma, William Yang Wang.
https://arxiv.org/abs/2506.12204
- AgentFlux: Decoupled Fine-Tuning \& Inference for On-Device Agentic Systems.
Rohan Kadekodi*, Zhan Jin*, Keisuke Kamahori, Yile Gu, Sean Khatiri, Noah H Bayindirli, Sergey Gorbunov, Baris Kasikci.
https://arxiv.org/abs/2510.00229