In this paper, we address the problem of enhancing perceptual quality in video super-resolution (VSR) using Diffusion Models (DMs) while ensuring temporal consistency among frames. We present StableVSR, a VSR method based on DMs that can significantly enhance the perceptual quality of upscaled videos by synthesizing realistic and temporally-consistent details. We introduce the Temporal Conditioning Module (TCM) into a pre-trained DM for single image super-resolution to turn it into a VSR method. TCM uses the novel Temporal Texture Guidance, which provides it with spatially-aligned and detail-rich texture information synthesized in adjacent frames. This guides the generative process of the current frame toward high-quality and temporally-consistent results. In addition, we introduce the novel Frame-wise Bidirectional Sampling strategy to encourage the use of information from past to future and vice-versa. This strategy improves the perceptual quality of the results and the temporal consistency across frames. We demonstrate the effectiveness of StableVSR in enhancing the perceptual quality of upscaled videos while achieving better temporal consistency compared to existing state-of-the-art methods for VSR. The project page is available at https://github.com/claudiom4sir/StableVSR.

Rota, C., Buzzelli, M., van de Weijer, J. (2025). Enhancing Perceptual Quality in Video Super-Resolution Through Temporally-Consistent Detail Synthesis Using Diffusion Models. In Computer Vision – ECCV 2024 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part XII Conference proceedings (pp.36-53). Springer Cham [10.1007/978-3-031-73254-6_3].

Enhancing Perceptual Quality in Video Super-Resolution Through Temporally-Consistent Detail Synthesis Using Diffusion Models

Rota, Claudio
;
Buzzelli, Marco;
2025

Abstract

In this paper, we address the problem of enhancing perceptual quality in video super-resolution (VSR) using Diffusion Models (DMs) while ensuring temporal consistency among frames. We present StableVSR, a VSR method based on DMs that can significantly enhance the perceptual quality of upscaled videos by synthesizing realistic and temporally-consistent details. We introduce the Temporal Conditioning Module (TCM) into a pre-trained DM for single image super-resolution to turn it into a VSR method. TCM uses the novel Temporal Texture Guidance, which provides it with spatially-aligned and detail-rich texture information synthesized in adjacent frames. This guides the generative process of the current frame toward high-quality and temporally-consistent results. In addition, we introduce the novel Frame-wise Bidirectional Sampling strategy to encourage the use of information from past to future and vice-versa. This strategy improves the perceptual quality of the results and the temporal consistency across frames. We demonstrate the effectiveness of StableVSR in enhancing the perceptual quality of upscaled videos while achieving better temporal consistency compared to existing state-of-the-art methods for VSR. The project page is available at https://github.com/claudiom4sir/StableVSR.
poster + paper
Video super-resolution, Perceptual quality, Temporal consistency, Diffusion models
English
ECCV 2024 18th European Conference - September 29–October 4, 2024
2024
Leonardis, A; Ricci, E; Roth, S; Russakovsky, O; Sattler, T; Varol, G
Computer Vision – ECCV 2024 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part XII Conference proceedings
9783031732539
28-nov-2024
2025
15070 LNCS
36
53
https://link.springer.com/chapter/10.1007/978-3-031-73254-6_3
reserved
Rota, C., Buzzelli, M., van de Weijer, J. (2025). Enhancing Perceptual Quality in Video Super-Resolution Through Temporally-Consistent Detail Synthesis Using Diffusion Models. In Computer Vision – ECCV 2024 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part XII Conference proceedings (pp.36-53). Springer Cham [10.1007/978-3-031-73254-6_3].
File in questo prodotto:
File Dimensione Formato  
Rota-2025-ECCV-VoR.pdf

Solo gestori archivio

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Tutti i diritti riservati
Dimensione 9 MB
Formato Adobe PDF
9 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/527021
Citazioni
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
Social impact