We propose OmniObject3D, a large vocabulary 3D object dataset with massive high-quality real-scanned 3D objects to facilitate the development of 3D perception, reconstruction, and generation in the real world.
OmniObject3D has several appealing properties:
1) Large Vocabulary: It comprises 6,000 scanned objects in 190 daily categories, sharing common classes with popular 2D datasets (e.g., ImageNet and LVIS), benefiting the pursuit of generalizable 3D representations.
2) Rich Annotations: Each 3D object is captured with both 2D and 3D sensors, providing textured meshes, point clouds, multi-view rendered images, and multiple real-captured videos.
3) Realistic Scans: The professional scanners support high-quality object scans with precise shapes and realistic appearances.
With the vast exploration space offered by OmniObject3D, we carefully set up four evaluation tracks: a) robust 3D perception, b) novel-view synthesis, c) neural surface reconstruction, and d) 3D object generation.
Table 1. A comparison between OmniObject3D and other commonly-used 3D object datasets. It is the largest among all the real-world scanned object datasets.
Figure 1. Semantic distribution of our dataset.
OmniObject3D boosts robustness analysis of point cloud classification by disentangling the two critical out-of-distribution (OOD) challenges introduced in the paper, i.e., OOD styles and OOD corruptions.
Figure 2. Analysis on robustness to OOD styles and OOD corruptions.
We study several representative methods on OmniObject3D for novel view synthesis (NVS) in two settings: 1) training on a single scene with densely captured images and 2) learning priors across scenes from the dataset to explore the generalization ability of NeRF-style models. We show examples of single-scene NVS by Mip-NeRF.
We show examples of cross-scene NVS by pixelNeRF, MVSNeRF, and IBRNet given 3 views (ft denotes fine-tuned with 10 views).
Precise surface reconstruction from multi-view images enables a broad range of applications. We include representative methods for dense-view and sparse-view surface reconstruction, respectively. We show examples of dense-view surface reconstruction by NeuS. More results on the sparse-view setting can be found in the paper.
State-of-the-art generative models can directly generate textured 3D meshes. We train We train GET3D on OmniObject3D and show examples of the generated shapes.
Some concurrent works also focus on building large-scale 3D object datasets: