llm-inference | Vikranth Srivatsa

Oct 14, 2024	MLSys @ WukLab - Can Scheduling Overhead Dominate LLM Inference Performance? A Study of CPU Scheduling Overhead on Two Popular LLM Inference Systems
May 14, 2024	MLSys @ WukLab - Preble: Efficient Prompt Scheduling for Augmented Large Language Models