OpenSplat3D: Open-Vocabulary 3D Instance Segmentation using Gaussian Splatting

Overview

PCA Feature Rendering

Color Rendering

Abstract

3D Gaussian Splatting (3DGS) has emerged as a powerful representation for neural scene reconstruction, offering high-quality novel view synthesis while maintaining computational efficiency. In this paper, we extend the capabilities of 3DGS beyond pure scene representation by introducing an approach for open-vocabulary 3D instance segmentation without requiring manual labeling, termed OpenSplat3D. Our method leverages feature-splatting techniques to associate semantic information with individual Gaussians, enabling fine-grained scene understanding. We incorporate Segment Anything Model instance masks with a contrastive loss formulation as guidance for the instance features to achieve accurate instance-level segmentation. Furthermore, we utilize language embeddings of a vision-language model, allowing for flexible, text-driven instance identification. This combination enables our system to identify and segment arbitrary objects in 3D scenes based on natural language descriptions. We show results on LERF-mask and LERF-OVS as well as the full ScanNet++ validation set, demonstrating the effectiveness of our approach.

Method

Overview of our proposed pipeline. On the left are the training inputs: posed RGB-images, a coarse SfM point cloud for initialization, and the extracted SAM masks. The middle section illustrates the instance learning with a Gaussian feature field optimization, as well as clustering to obtain coherent 3D instances. On the right, we demonstrate the language integration, where the top-k informative views are identified per instance, hierarchical crops are constructed and finally the language embedding per instance is computed.

Experiments

Semantic Segmentation

Method	figurines		ramen		teatime		mean
Method	mIoU	mBIoU	mIoU	mBIoU	mIoU	mBIoU	mIoU	mBIoU
LERF*	33.5	30.6	28.3	14.7	49.7	42.6	37.2	29.3
LangSplat*	52.8	50.5	50.4	44.7	69.5	65.6	57.6	53.6
Gaussian Grouping*	69.7	67.9	77.0	68.7	71.7	66.1	72.8	67.6
CGC	91.6	88.8	68.7	63.1	80.5	78.9	80.3	76.9
OpenSplat3D (Ours)	92.3	89.4	75.9	68.2	83.7	78.8	84.0	78.8

Semantic segmentation results on the LERF-mask dataset. We report the mean IoU and mean BIoU for each scene and the overall average. Our method achieves the best overall performance across all metrics. Only for the ramen scene, Gaussian Grouping performs slightly better. *: Results as reported in the Gaussian Grouping paper.

3D Object Selection

Method	figurines		ramen		teatime		waldo_kitchen		mean
Method	mIoU	mAcc.	mIoU	mAcc.	mIoU	mAcc.	mIoU	mAcc.	mIoU	mAcc.
LangSplat	10.16	8.93	7.92	11.27	11.38	20.34	9.18	9.09	9.66	12.41
LEGaussians	17.99	23.21	15.79	26.76	19.27	27.12	11.78	18.18	16.21	23.82
OpenGaussian	39.29	55.36	31.01	42.25	60.44	76.27	22.70	31.82	38.36	51.43
OpenSplat3D (Ours)	60.71	85.71	49.20	76.06	73.27	88.14	55.63	77.27	59.70	81.79

LERF-OVS 3D object selection evaluation from textual query. Following OpenGaussian, only the Gaussians responding to the query are rendered, therefore the rendering does not respect occlusion by other objects in the scene. Accuracy is provided by mAcc@0.25. Note that OpenGaussian fine-tunes parameters per scene for best results.

3D Instance Segmentation

Method	without post-processing			with post-processing
Method	AP	AP50	AP25	AP	AP50	AP25
SAM3D	3.9	9.3	22.1	8.4	16.1	30.0
Segment3D	13.0	23.8	38.3	20.2	30.9	42.7
Open3DIS*	-	-	-	20.7	38.6	47.1
OpenSplat3D (Ours)	19.2	37.3	56.2	24.5	41.7	57.1

Class-agnostic instance segmentation on ScanNet++ validation split. *: Open3DIS uses superpoints produced by Felzenszwalb and Huttenlocher segmentation directly in their pipeline.

Method	Setting	AP	AP50	AP25
SGFormer	fully-supervised	23.9	37.5	46.6
Mask3D (+ OpenMask3D)	open-vocabulary	-	15.0	-
Segment3D (+ OpenMask3D)	open-vocabulary	-	18.5	-
OpenSplat3D (Ours)	open-vocabulary	16.5	29.7	39.0

Instance Segmentation on the ScanNet++ validation split. Our method not only outperforms the other open-vocabulary methods by a large margin, it also reduces the gap to the state-of-the-art fully-supervised SGIFormer approach.

BibTeX

@InProceedings{piekenbrinck2025opensplat3d,
  title     = {{OpenSplat3D: Open-Vocabulary 3D Instance Segmentation using Gaussian Splatting}},
  author    = {Piekenbrinck, Jens and Schmidt, Christian and Hermans, Alexander and Vaskevicius, Narunas and Linder, Timm and Leibe, Bastian},
  booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages     = {5246--5255},
  year      = {2025}
}