“Looking beyond GPUs for DNN Scheduling on Multi-Tenant Clusters” paper summary


The paper “Looking beyond GPUs for DNN Scheduling on Multi-Tenant Clusters” proposes a resource-sensitive scheduler for shared GPU clusters. The scheduler uses offline profiling to detect a job’s sensitivity to CPU and memory resource allocation. The study shows that workload-aware CPU and memory allocations can improve job completion time by up to 3.4X and increase cluster resource utilization. The authors also present two heuristic algorithms for scheduling tasks, offering a more efficient solution for deep learning model training.
Read more at Medium…

Discover more from Emsi's feed

Subscribe now to keep reading and get access to the full archive.

Continue reading