OpenSplat3D:
Open-Vocabulary 3D Instance Segmentation using Gaussian Splatting

CVPRW 2025 - OpenSUN3D
1RWTH Aachen University 2Robert Bosch GmbH

Overview

PCA Feature Rendering
Color Rendering

Abstract

3D Gaussian Splatting (3DGS) has emerged as a powerful representation for neural scene reconstruction, offering high-quality novel view synthesis while maintaining computational efficiency. In this paper, we extend the capabilities of 3DGS beyond pure scene representation by introducing an approach for open-vocabulary 3D instance segmentation without requiring manual labeling, termed OpenSplat3D. Our method leverages feature-splatting techniques to associate semantic information with individual Gaussians, enabling fine-grained scene understanding. We incorporate Segment Anything Model instance masks with a contrastive loss formulation as guidance for the instance features to achieve accurate instance-level segmentation. Furthermore, we utilize language embeddings of a vision-language model, allowing for flexible, text-driven instance identification. This combination enables our system to identify and segment arbitrary objects in 3D scenes based on natural language descriptions. We show results on LERF-mask and LERF-OVS as well as the full ScanNet++ validation set, demonstrating the effectiveness of our approach.

Method

Method Overview

Overview of our proposed pipeline. On the left are the training inputs: posed RGB-images, a coarse SfM point cloud for initialization, and the extracted SAM masks. The middle section illustrates the instance learning with a Gaussian feature field optimization, as well as clustering to obtain coherent 3D instances. On the right, we demonstrate the language integration, where the top-k informative views are identified per instance, hierarchical crops are constructed and finally the language embedding per instance is computed.

Experiments

Semantic Segmentation

Method figurines ramen teatime mean
mIoU mBIoU mIoU mBIoU mIoU mBIoU mIoU mBIoU
LERF* 33.5 30.6 28.3 14.7 49.7 42.6 37.2 29.3
LangSplat* 52.8 50.5 50.4 44.7 69.5 65.6 57.6 53.6
Gaussian Grouping* 69.7 67.9 77.0 68.7 71.7 66.1 72.8 67.6
CGC 91.6 88.8 68.7 63.1 80.5 78.9 80.3 76.9
OpenSplat3D (Ours) 92.3 89.4 75.9 68.2 83.7 78.8 84.0 78.8

Semantic segmentation results on the LERF-mask dataset. We report the mean IoU and mean BIoU for each scene and the overall average. Our method achieves the best overall performance across all metrics. Only for the ramen scene, Gaussian Grouping performs slightly better. *: Results as reported in the Gaussian Grouping paper.

3D Object Selection

Method figurines ramen teatime waldo_kitchen mean
mIoU mAcc. mIoU mAcc. mIoU mAcc. mIoU mAcc. mIoU mAcc.
LangSplat 10.16  8.93  7.92 11.27 11.38 20.34  9.18  9.09  9.66 12.41
LEGaussians 17.99 23.21 15.79 26.76 19.27 27.12 11.78 18.18 16.21 23.82
OpenGaussian 39.29 55.36 31.01 42.25 60.44 76.27 22.70 31.82 38.36 51.43
OpenSplat3D (Ours) 60.71 85.71 49.20 76.06 73.27 88.14 55.63 77.27 59.70 81.79

LERF-OVS 3D object selection evaluation from textual query. Following OpenGaussian, only the Gaussians responding to the query are rendered, therefore the rendering does not respect occlusion by other objects in the scene. Accuracy is provided by mAcc@0.25. Note that OpenGaussian fine-tunes parameters per scene for best results.

3D Instance Segmentation

Method without post-processing with post-processing
AP AP50 AP25 AP AP50 AP25
SAM3D 3.9 9.3 22.1 8.4 16.1 30.0
Segment3D 13.0 23.8 38.3 20.2 30.9 42.7
Open3DIS* - - - 20.7 38.6 47.1
OpenSplat3D (Ours) 19.2 37.3 56.2 24.5 41.7 57.1

Class-agnostic instance segmentation on ScanNet++ validation split. *: Open3DIS uses superpoints produced by Felzenszwalb and Huttenlocher segmentation directly in their pipeline.

Method Setting AP AP50 AP25
SGFormer fully-supervised 23.9 37.5 46.6
Mask3D (+ OpenMask3D) open-vocabulary - 15.0 -
Segment3D (+ OpenMask3D) open-vocabulary - 18.5 -
OpenSplat3D (Ours) open-vocabulary 16.5 29.7 39.0

Instance Segmentation on the ScanNet++ validation split. Our method not only outperforms the other open-vocabulary methods by a large margin, it also reduces the gap to the state-of-the-art fully-supervised SGIFormer approach.

BibTeX

@InProceedings{piekenbrinck2025opensplat3d,
  title     = {{OpenSplat3D: Open-Vocabulary 3D Instance Segmentation using Gaussian Splatting}},
  author    = {Piekenbrinck, Jens and Schmidt, Christian and Hermans, Alexander and Vaskevicius, Narunas and Linder, Timm and Leibe, Bastian},
  booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages     = {5246--5255},
  year      = {2025}
}