Executive Summary : | Modern applications like autonomous driving and drones require the concurrent execution of multiple Deep Neural Networks (DNNs) on edge devices, typically realized using accelerators. These devices execute the entire sequence of operations, including sensing, filtering, processing, and post-processing/actuation. However, resource sharing during end-to-end processing can degrade the performance of tasks on accelerators and CPUs/GPUs, leading to up to 52% degradation in the speed of DNN accelerators. To reduce these adverse effects, the authors propose studying and identifying efficient resource sharing policies considering the access behavior of multiple DNN accelerators and processors in a system. Existing techniques, such as FR-FCFS, do not perform well for accelerator-rich systems with deadlines for individual tasks. Recent works, such as DASH and FLOSS, address this problem for memory scheduling and observe significant performance improvement. The authors plan to use a Xilinx Zynq FPGA board to execute multiple instances of DNN accelerators along with different tasks on the integrated CPU/GPU. They will study the effect of configurable options for partitioning/scheduling access requests to interconnects/last-level caches/memory on the overall performance of DNNs and CPU/GPU tasks. They will develop a model for estimating contention, either analytical or machine-learning-based, to explore and identify suitable resource management policies to improve execution speed and/or energy consumption. The developed insights and experimental infrastructure could foster further research in this direction and have a wider effect on improving computing efficiency. |