Publications
- Squint: Scalable and Accurate Application-level Crash-Consistency Testing via Representative Testing
Yile Gu* , Ian Neal*, Musa Haydar, Hossein Golestani, Ayman Said, Andrew Quinn, Baris Kasikci
Under submission.
- Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
Keisuke Kamahori, Yile Gu , Kan Zhu, Baris Kasikci
Practical ML for Low Resource Settings Workshop @ ICLR 2024, Vienna, Austria, May 2024. https://arxiv.org/abs/2402.07033
- Perseus: Removing Energy Bloat from Large Model Training
Jae-Won Chung, Yile Gu , Insu Jang, Luoxi Meng, Nikhil Bansal, Mosharaf Chowdhury
SOSP 2024, Austin, TX, USA, November 2024. https://arxiv.org/abs/2312.06902
- NanoFlow: Towards Optimal Large Language Model Serving Throughput
Kan Zhu, Yilong Zhao, Liangyu Zhao, Gefei Zuo, Yile Gu , Dedong Xie, Yufei Gao, Qinyu Xu, Tian Tang, Zihao Ye, Keisuke Kamahori, Chien-Yu Lin, Stephanie Wang, Arvind Krishnamurthy, Baris Kasikci.
Under submission. https://arxiv.org/abs/2408.12757