3D self-attention
The spectral decomposition of the 3D self-attention graph reflects 3D scene composition and groups 3D Gaussians into distinct semantic parts.
Despite the recent success of multi-view diffusion models for text/image-based 3D asset generation, instruction-based editing of 3D assets lags surprisingly far behind the quality of generation models. The main reason is that recent approaches using 2D priors suffer from view-inconsistent editing signals.
Going beyond 2D prior distillation methods and multi-view editing strategies, we propose a training-free editing method that operates within the latent space of a native 3D diffusion model, allowing us to directly manipulate 3D geometry. We guide the edit synthesis by blending 3D attention maps from the generation with the source object. Coupled with geometry-aware regularization guidance, a spectral modulation strategy in the Fourier domain and a refinement step for 3D enhancement, our method outperforms previous 3D editing methods enabling high-fidelity and precise edits across a wide range of shapes and semantic manipulations.
We operate in the latent space of a pre-trained 3D generative model. The source 3D object is represented as a multi-view Gaussian splat grid and inverted into its corresponding noise latent. Starting from this latent, we perform denoising guided by the edit prompt, while injecting 3D cross- and self-attention maps derived from the source object. A geometry regularization guidance term, a frequency modulation strategy and a 3D enhancement module further refine the result. Region-specific edits are supported via masks generated using GroundingDINO and SAM2.
The spectral decomposition of the 3D self-attention graph reflects 3D scene composition and groups 3D Gaussians into distinct semantic parts.
3D cross-attention weights define a token–3D Gaussian splat correspondence field, enabling accurate 3D localization of edit regions.
@article{parelli2025latte,
title = {3D-LATTE: Latent Space 3D Editing from Textual Instructions},
author = {Parelli, Maria and Oechsle, Michael and Niemeyer, Michael and Tombari, Federico and Geiger, Andreas},
journal = {arXiv preprint arXiv:2509.00269},
year = {2025}
}