Publications

Scalable and Accurate Application-level Crash-Consistency Testing via Representative Testing
Yile Gu^*, Ian Neal^*, Jiexiao Xu, Shaun Christopher Lee, Ayman Said, Musa Haydar, Jacob Van Geﬀen, Rohan Kadekodi, Andrew Quinn, Baris Kasikci
(To Appear) OOPSLA 2025, Singapore, October 2025. https://arxiv.org/abs/2503.01390
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
Keisuke Kamahori^*, Tian Tang^*, Yile Gu, Kan Zhu, Baris Kasikci
(To Appear) ICLR 2025, Singapore, May 2025. https://arxiv.org/abs/2402.07033
Perseus: Removing Energy Bloat from Large Model Training
Jae-Won Chung, Yile Gu, Insu Jang, Luoxi Meng, Nikhil Bansal, Mosharaf Chowdhury
SOSP 2024, Austin, TX, USA, November 2024. https://arxiv.org/abs/2312.06902
Argos: Agentic Time-Series Anomaly Detection with Autonomous Rule Generation via Large Language Models
Yile Gu, Yifan Xiong, Jonathan Mace, Yuting Jiang, Yigong Hu, Baris Kasikci, Peng Cheng.
https://arxiv.org/abs/2501.14170
Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs
Kan Zhu^*, Tian Tang^*, Qinyu Xu^*, Yile Gu, Zhichen Zeng, Rohan Kadekodi, Liangyu Zhao, Ang Li, Arvind Krishnamurthy, Baris Kasikci.
https://arxiv.org/abs/2502.12216
TeleRAG: Eﬃcient Retrieval-Augmented Generation Inference with Lookahead Retrieval
Chien-Yu Lin^*, Keisuke Kamahori^*, Yiyu Liu, Xiaoxiang Shi, Madhav Kashyap, Yile Gu, Rulin Shao, Zihao Ye, Kan Zhu, Stephanie Wang, Arvind Krishnamurthy, Rohan Kadekodi, Luis Ceze, Baris Kasikci.
https://arxiv.org/abs/2502.20969
NanoFlow: Towards Optimal Large Language Model Serving Throughput
Kan Zhu, Yilong Zhao, Liangyu Zhao, Gefei Zuo, Yile Gu, Dedong Xie, Yufei Gao, Qinyu Xu, Tian Tang, Zihao Ye, Keisuke Kamahori, Chien-Yu Lin, Stephanie Wang, Arvind Krishnamurthy, Baris Kasikci.
https://arxiv.org/abs/2408.12757

Yile (Michael) Gu

Publications