Viewing posts by chrisclarke

"Low-Latency Inference of Optimized AI-DSP Models for Hard Realtime Deadlines" by Christopher Johann Clarke (Singapore)

Digital audio processing is a hard real-time task. Each processing cycle must finish within a strict deadline set by the buffer size and sampling rate. If the deadline is missed, the result is an audible discontinuity. Unlike general-purpose computing, there is no allowance for variable execution time or occasional spikes in latency. Most current machine learning systems are designed for high throughput. They often rely on GPUs and parallel scheduling. These methods are effective for batch processing but do not address the timing requirements of real-time audio. In real-time contexts the worst case matters more than the average case. Operations that are acceptable in offline inference, such as asynchronous scheduling or dynamic memory allocation, cannot be tolerated inside a real-time audio process. This article describes strategies for running AI–DSP models under these conditions. It is divided into three parts: adjusting expectations to match fixed real-time limits, optimizing neural network structures to reduce computation and memory use, and implementing code that ensures bounded execution to support inference under the sub-millisecond deadlines common in current audio systems. I provide generic constructions of the arguments that I will be focusing on in the talk in this article.

Here is a plot of a single input inference, comparing the times for different models across different tasks