Publications
- Scalable and Accurate Application-level Crash-Consistency Testing via Representative Testing
Yile Gu*, Ian Neal*, Jiexiao Xu, Shaun Christopher Lee, Ayman Said, Musa Haydar, Jacob Van Geffen, Rohan Kadekodi, Andrew Quinn, Baris Kasikci
(To Appear) OOPSLA 2025, Singapore, October 2025. https://arxiv.org/abs/2503.01390
- Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
Keisuke Kamahori*, Tian Tang*, Yile Gu, Kan Zhu, Baris Kasikci
(To Appear) ICLR 2025, Singapore, May 2025. https://arxiv.org/abs/2402.07033
- Perseus: Removing Energy Bloat from Large Model Training
Jae-Won Chung, Yile Gu, Insu Jang, Luoxi Meng, Nikhil Bansal, Mosharaf Chowdhury
SOSP 2024, Austin, TX, USA, November 2024. https://arxiv.org/abs/2312.06902
- Argos: Agentic Time-Series Anomaly Detection with Autonomous Rule Generation via Large Language Models
Yile Gu, Yifan Xiong, Jonathan Mace, Yuting Jiang, Yigong Hu, Baris Kasikci, Peng Cheng.
https://arxiv.org/abs/2501.14170
- Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs
Kan Zhu*, Tian Tang*, Qinyu Xu*, Yile Gu, Zhichen Zeng, Rohan Kadekodi, Liangyu Zhao, Ang Li, Arvind Krishnamurthy, Baris Kasikci.
https://arxiv.org/abs/2502.12216
- TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval
Chien-Yu Lin*, Keisuke Kamahori*, Yiyu Liu, Xiaoxiang Shi, Madhav Kashyap, Yile Gu, Rulin Shao, Zihao Ye, Kan Zhu, Stephanie Wang, Arvind Krishnamurthy, Rohan Kadekodi, Luis Ceze, Baris Kasikci.
https://arxiv.org/abs/2502.20969
- NanoFlow: Towards Optimal Large Language Model Serving Throughput
Kan Zhu, Yilong Zhao, Liangyu Zhao, Gefei Zuo, Yile Gu, Dedong Xie, Yufei Gao, Qinyu Xu, Tian Tang, Zihao Ye, Keisuke Kamahori, Chien-Yu Lin, Stephanie Wang, Arvind Krishnamurthy, Baris Kasikci.
https://arxiv.org/abs/2408.12757