Abstract:
Deep neural networks (DNNs) have grown increasingly large
and complex, which requires effective optimization techniques to improve
efficiency and scalability. Sparsity has emerged as a primary and widely adopted
optimization approach, enabling significant reductions in DNN computational
demands while preserving model performance. Specifically, structured N:M
sparsity has emerged as a promising approach due to its alignment with modern
hardware architectures, allowing for efficient model compression and
computations