Efficient Concertormer for Image Deblurring and Beyond

1National Taiwan University
2Nanjing University of Science and Technology
3Google Research
4University of California, Merced
5Yonsei University

ICCV 2025
MY ALT TEXT

Motion Deblurring

Abstract

The Transformer architecture has excelled in NLP and vision tasks, but its self-attention complexity grows quadratically with image size, making high-resolution tasks computationally expensive. We introduce Concertormer, featuring Concerto Self-Attention (CSA) for image deblurring. CSA splits self-attention into global and local components while retaining partial information in additional dimensions, achieving linear complexity. A Cross-Dimensional Communication module enhances expressiveness by linearly combining attention maps. Additionally, our gated-dconv MLP merges the two-staged Transformer design into a single stage. Extensive evaluations show our method performs favorably against state-of-the-art works in deblurring, deraining, and JPEG artifact removal.

Overview

Directional Weight Score

We borrow the term “concerto” from music, where a concerto features contrasting elements: soloists (concertino) performing in dialogue with an accompanying orchestra (ripieno). Analogously, in our design, the concertino computes local self-attention, while the ripieno captures global (or shared) self-attention. The proposed Concerto Self-Attention can be applied in both the spatial and channel domains, resulting in four distinct self-attention mechanisms. All four self-attention mechanisms have linear complexity, enabling favorable computational efficiency and execution time.

Receptive Field

Directional Weight Score

LAM an DI [Gu and Dong, CVPR 2021] visualizes and quantifies the receptive field, respectively. Comparing to W-MSA and SW-MSA [Liu et al., ICCV 2021], Concerto Self-Attention effectively expands the receptive field without incurring additional computational cost.

Model Size

Directional Weight Score

Concertormer performs favorably against self-attention-based SOTA.

Motion Deblurring

Quantitative Comparison

Directional Weight Score

Visual Comparison

Directional Weight Score

Deraining

Quantitative Comparison

Directional Weight Score

Visual Comparison

Directional Weight Score

BibTeX

@inproceedings{kuo2025efficient,
      title={Efficient Concertormer for Image Deblurring and Beyond},
      author={Kuo, Pin-Hung and Pan, Jinshan and Chien, Shao-Yi and Yang, Ming-Hsuan},
      booktitle={International Conference on Computer Vision},
      year={2025},
    }