Layout-Corrector ECCV2024

TL;DR

We demonstrate that existing Discrete Diffusion Models struggle to correct generation errors in layouts, leading to errors becoming stuck (Layout Sticking Phenomenon). To address this, we propose Layout-Corrector, which resets the errors and prompts their correction.

Abstract

Layout generation is a task to synthesize a harmonious layout with elements characterized by attributes such as category, position, and size. Human designers experiment with the placement and modification of elements to create aesthetic layouts, however, we observed that current discrete diffusion models (DDMs) struggle to correct inharmonious layouts after they have been generated. In this paper, we first provide novel insights into layout sticking phenomenon in DDMs and then propose a simple yet effective layout-assessment module Layout-Corrector, which works in conjunction with existing DDMs to address the layout sticking problem. We present a learning-based module capable of identifying inharmonious elements within layouts, considering overall layout harmony characterized by complex composition. During the generation process, Layout-Corrector evaluates the correctness of each token in the generated layout, reinitializing those with low scores to the ungenerated state. The DDM then uses the high-scored tokens as clues to regenerate the harmonized tokens. Layout-Corrector, tested on common benchmarks, consistently boosts layout-generation performance when in conjunction with various state-of-the-art DDMs. Furthermore, our extensive analysis demonstrates that the Layout-Corrector (1) successfully identifies erroneous tokens, (2) facilitates control over the fidelity- diversity trade-off, and (3) significantly mitigates the performance drop associated with fast sampling.

Proposed Method: Layout-Corrector

Overview

Given a layout generated by a Discrete Diffusion Model (DDM), Layout-Corrector evaluates the correctness of each token in the layout. Tokens with low scores are reinitialized (masked) to the ungenerated state. The DDM then uses high-scored tokens as clues to regenerate harmonized tokens.

Training

Layout-Corrector is trained as a binary classifier. Given a generated layout, it learns to predict whether each token aligns with the ground truth or not.

Generation

The generation process starts with an initial state where all tokens are masked. At timestep t, a MASK-free sample is first obtained using the DDM. Layout-Corrector then evaluates the correctness of each token. Tokens with scores below a threshold are masked, and the process proceeds to the next timestep. To reduce computational overhead, Layout-Corrector is selectively applied at three specific timesteps.

Experimental Results

Layout-Corrector improves the performance of various baseline models

Layout-Corrector achieves better speed-quality trade-off

Qualitative Results (PubLayNet)

For more results and analysis, please refer to our paper!

BibTeX

@inproceedings{iwai2024layout,
    title={Layout-Corrector: Alleviating Layout Sticking Phenomenon in Discrete Diffusion Model},
    author={Shoma Iwai and Atsuki Osanai and Shunsuke Kitada and Shinichiro Omachi},
    booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
    year={2024},
}