Efficient Concertormer for Image Deblurring and Beyond

Abstract

The Transformer architecture has excelled in NLP and vision tasks, but its self-attention complexity grows quadratically with image size, making high-resolution tasks computationally expensive. We introduce Concertormer, featuring Concerto Self-Attention (CSA) for image deblurring. CSA splits self-attention into global and local components while retaining partial information in additional dimensions, achieving linear complexity. A Cross-Dimensional Communication module enhances expressiveness by linearly combining attention maps. Additionally, our gated-dconv MLP merges the two-staged Transformer design into a single stage. Extensive evaluations show our method performs favorably against state-of-the-art works in deblurring, deraining, and JPEG artifact removal.

Overview

Directional Weight Score

We borrow the term “concerto” from music, where a concerto features contrasting elements: soloists (concertino) performing in dialogue with an accompanying orchestra (ripieno). Analogously, in our design, the concertino computes local self-attention, while the ripieno captures global (or shared) self-attention. The proposed Concerto Self-Attention can be applied in both the spatial and channel domains, resulting in four distinct self-attention mechanisms. All four self-attention mechanisms have linear complexity, enabling favorable computational efficiency and execution time.

Receptive Field

LAM an DI [Gu and Dong, CVPR 2021] visualizes and quantifies the receptive field, respectively. Comparing to W-MSA and SW-MSA [Liu et al., ICCV 2021], Concerto Self-Attention effectively expands the receptive field without incurring additional computational cost.

Model Size

Concertormer performs favorably against self-attention-based SOTA.

Motion Deblurring

Quantitative Comparison

Visual Comparison

Deraining

Quantitative Comparison

Visual Comparison

BibTeX

@inproceedings{kuo2025efficient,
      title={Efficient Concertormer for Image Deblurring and Beyond},
      author={Kuo, Pin-Hung and Pan, Jinshan and Chien, Shao-Yi and Yang, Ming-Hsuan},
      booktitle={International Conference on Computer Vision},
      year={2025},
    }