ReMaskable: Controllable Facial Attribute Editing Using Segmentation-Guided Latent Diffusion
DOI:
https://doi.org/10.47392/IRJAEH.2026.0171Keywords:
Controllable generation;, Diffusion models, Facial attribute editing, Identity preservation, Semantic segmentationAbstract
Facial attribute editing demands both spatial precision and visual fidelity, yet existing approaches fall short on one or both counts. Generative Adversarial Networks achieve photorealistic synthesis but suffer from attribute entanglement, where modifying one feature inadvertently alters unrelated regions. Diffusion models produce high-quality text-guided edits but lack spatial control, causing changes to propagate beyond the intended area. This paper presents ReMaskable, a framework that decouples the spatial localization problem (where to edit) from the semantic generation problem (what to generate). ReMaskable combines a multi-source segmentation system integrating DeepLabv3+ for 19-class face parsing, SAM for promptable region selection, and DINOv2 for boundary refinement, with a CLIP-conditioned latent diffusion inpainting model that operates exclusively within the masked region. Identity preservation is enforced through ArcFace cosine embedding loss and LPIPS perceptual consistency on unmasked regions. We describe the complete architecture, mathematical formulation, and training methodology. Evaluation metrics are projected from published baselines of each component rather than from completed end-to-end experimental runs, and this distinction is stated throughout. The modular architecture is designed for extensibility to video editing and 3D avatar generation.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2026 International Research Journal on Advanced Engineering Hub (IRJAEH)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
.