GenErase: Generalizable and Semantically-Aware Concept Erasure in Diffusion Models

Korada Sri Vardhana and Soma Biswas

Indian Institute of Science, Bengaluru

GenErase teaser figure

Figure 1. GenErase suppresses target concepts in text-to-image diffusion models while preserving unrelated content. The method works for direct prompts, paraphrased descriptions, and non-target generations.

Foundation text-to-image diffusion models such as Stable Diffusion are trained on large-scale internet-scraped datasets such as LAION-5B. While these models can generate high-quality images from natural language prompts, their training data may contain several undesired concepts, including copyrighted content, personal likeness, unsafe content, and IP-protected characters or styles. This creates a risk that such models may reproduce restricted or sensitive concepts during generation.

To address this issue, a large body of work studies concept erasure, where the goal is to prevent a diffusion model from generating a specified target concept. However, most existing concept-erasure methods rely on parameter editing, fine-tuning, or training a modified checkpoint. These approaches can be expensive, time-consuming, and often irreversible. They are also less suitable for user-level customization, where different users or applications may want to suppress different concepts temporarily.

In this work, we introduce GenErase, a training-free and inference-time concept-erasure method. Instead of changing the model weights, GenErase modifies the diffusion process during generation. This allows undesired concepts to be suppressed temporarily, while keeping the original model unchanged and reusable.

GenErase operates in the cross-attention value space of the diffusion model. It removes the semantic direction corresponding to the target concept and replaces it with a neutral direction, while explicitly preserving important non-target concepts. This helps the model suppress the undesired concept without damaging unrelated image content.

To make the erasure reliable, GenErase uses three key components:

• Safe Semantic Subspace: protects non-target concepts so that unrelated content is not unintentionally erased or distorted.

• Hard Geometric Gate: activates the erasure operation only when the current representation is strongly aligned with the target concept.

• Orthogonal Erase-and-Replace: removes the target direction and redirects it toward a neutral anchor, helping preserve generation quality.

A major challenge in concept erasure is paraphrase robustness. A user may not directly mention the target concept, but may describe it using aliases, roles, or contextual descriptions. For example, instead of using a person’s name, the prompt may refer to them through their profession, role, or a well-known association. GenErase addresses this by using geometric alignment in the model’s internal representation space, allowing it to suppress target semantics even when the prompt wording changes.

Experiments show that GenErase performs effective erasure across identity, object, artistic-style, IP-character, and multi-concept settings. It also preserves unrelated content and maintains image quality better than many existing training-free guard-railing methods.

This work was published at CVPR 2026 as a Highlight paper.