OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation

CVPR 2023 (Award Candidate)

Tong Wu1,2, Jiarui Zhang1,3, Xiao Fu1, Yuxin Wang1,4, Jiawei Ren5, Liang Pan5,
Wayne Wu1, Lei Yang1,3, Jiaqi Wang1, Chen Qian1, Dahua Lin1,2✉, Ziwei Liu5

1Shanghai Artificial Intelligence Laboratory, 2The Chinese University of Hong Kong, 3SenseTime Research,
4Hong Kong University of Science and Technology, 5S-Lab, Nanyang Technological University

Paper Code Dataset Challenge


We propose OmniObject3D, a large vocabulary 3D object dataset with massive high-quality real-scanned 3D objects to facilitate the development of 3D perception, reconstruction, and generation in the real world.

OmniObject3D has several appealing properties:
1) Large Vocabulary: It comprises 6,000 scanned objects in 190 daily categories, sharing common classes with popular 2D datasets (e.g., ImageNet and LVIS), benefiting the pursuit of generalizable 3D representations.
2) Rich Annotations: Each 3D object is captured with both 2D and 3D sensors, providing textured meshes, point clouds, multi-view rendered images, and multiple real-captured videos.
3) Realistic Scans: The professional scanners support high-quality object scans with precise shapes and realistic appearances.

With the vast exploration space offered by OmniObject3D, we carefully set up four evaluation tracks: a) robust 3D perception, b) novel-view synthesis, c) neural surface reconstruction, and d) 3D object generation.

Statistics and Distribution
Dataset Real Full 3D Video #Objects #Classes RLVIS(%)
ShapeNet 51k 55 4.1
ModelNet 12k 40 2.4
3D-Future 16k 34 1.3
ABO 8k 63 3.5
Toys4K 4k 105 7.7
CO3D 19K 50 4.2
DTU 124 - 0
ScanObjectNN 15k 15 1.3
GSO 1k 17 0.9
AKB-48 2k 48 1.8
Ours 6k 190 10.8

Table 1. A comparison between OmniObject3D and other commonly-used 3D object datasets. It is the largest among all the real-world scanned object datasets.

Figure 1. Semantic distribution of our dataset.


Robust 3D Perception

OmniObject3D boosts robustness analysis of point cloud classification by disentangling the two critical out-of-distribution (OOD) challenges introduced in the paper, i.e., OOD styles and OOD corruptions.

Figure 2. Analysis on robustness to OOD styles and OOD corruptions.

Novel View Synthesis

We study several representative methods on OmniObject3D for novel view synthesis (NVS) in two settings: 1) training on a single scene with densely captured images and 2) learning priors across scenes from the dataset to explore the generalization ability of NeRF-style models. We show examples of single-scene NVS by Mip-NeRF.

We show examples of cross-scene NVS by pixelNeRF, MVSNeRF, and IBRNet given 3 views (ft denotes fine-tuned with 13 views).

Neural Surface Reconstruction

Precise surface reconstruction from multi-view images enables a broad range of applications. We include representative methods for dense-view and sparse-view surface reconstruction, respectively. We show examples of dense-view surface reconstruction by NeuS. More results on the sparse-view setting can be found in the paper.

3D Object Generation

State-of-the-art generative models can directly generate textured 3D meshes. We train We train GET3D on OmniObject3D and show examples of the generated shapes.

Concurrent works

Some concurrent works also focus on building large-scale 3D object datasets:

@article{wu2023omniobject3d, author = {Tong Wu, Jiarui Zhang, Xiao Fu, Yuxin Wang, Jiawei Ren, Liang Pan, Wayne Wu, Lei Yang, Jiaqi Wang, Chen Qian, Dahua Lin, Ziwei Liu}, title = {OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation}, journal={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2023} }