CASteer concept erasure examples


Abstract

Diffusion models have transformed image generation, yet controlling their outputs to reliably erase undesired concepts remains challenging. Existing approaches usually require task-specific training and struggle to generalize across both concrete (e.g., objects) and abstract (e.g., styles) concepts. We propose CASteer (Cross-Attention Steering), a training-free framework for concept erasure in diffusion models using steering vectors to influence hidden representations dynamically.
CASteer precomputes concept-specific steering vectors by averaging neural activations from images generated for each target concept. During inference, it dynamically applies these vectors to suppress undesired concepts only when they appear, ensuring that unrelated regions remain unaffected. This selective activation enables precise, context-aware erasure without degrading overall image quality.
This approach achieves effective removal of harmful or unwanted content across a wide range of visual concepts, all without model retraining. CASteer outperforms state-of-the-art concept erasure techniques while preserving unrelated content and minimizing unintended effects.


Method

CASteer works by computing steering vectors from matched prompt pairs that differ only by the target concept. For each pair, we generate images and collect cross-attention (CA) outputs. The steering vector is the normalized difference between the mean CA outputs of positive (with concept) and negative (without concept) prompts.

During inference, CASteer projects out the concept direction from CA outputs to suppress concept carried by steering vector.


CASteer method overview

Results

We evaluate CASteer on multiple tasks: concrete concept erasure (Snoopy, Mickey Mouse, etc.), abstract/safety concept removal (nudity, violence), and artistic style erasure (Van Gogh, Picasso, etc.). CASteer is evaluated on Stable Diffusion 1.4, SDXL, and SANA models.


Quantitative results
Quantitative results on nudity removal based on I2P dataset. Detection of nude body parts is done by Nudenet at a threshold of 0.6. F: Female, M: Male. The best results are highlighted in bold, second-best are underlined.

Quantitative results
Quantitative results on inappropriate content removal based on I2P dataset. Detection of inappropriate content is done by Q16 classifier. The best results are highlighted in bold, second-best are underlined.

Quantitative results
Comparison of various methods on concrete concept erasure (removing “Snoopy”)

Citation


@article{gaintseva2025casteer,
title={CASteer: Cross-Attention Steering for Controllable Concept Erasure},
author={Tatiana Gaintseva and Andreea-Maria Oncescu and Chengcheng Ma and Ziquan Liu and Martin Benning and Gregory Slabaugh and Jiankang Deng and Ismail Elezi},
year={2025},
eprint={2503.09630},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.09630},
}