Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation (CVPR2023)

Whilst the availability of 3D LiDAR point cloud data has significantly grown in recent years, annotation remains expensive and time-consuming, leading to a demand for semi-supervised semantic segmentation methods with application domains such as autonomous driving. Existing work very often employs relatively large segmentation backbone networks to improve segmentation accuracy, at the expense of computational costs. In addition, many use uniform sampling to reduce ground truth data requirements for learning needed, often resulting in sub-optimal performance. To address these issues, we propose a new pipeline that employs a smaller architecture, requiring fewer ground-truth annotations to achieve superior segmentation accuracy compared to contemporary approaches. This is facilitated via a novel Sparse Depthwise Separable Convolution module that significantly reduces the network parameter count while retaining overall task performance. To effectively sub-sample our training data, we propose a new Spatio-Temporal Redundant Frame Downsampling (ST-RFD) method that leverages knowledge of sensor motion within the environment to extract a more diverse subset of training data frame samples. To leverage the use of limited annotated data samples, we further propose a soft pseudo-label method informed by LiDAR reflectivity. Our method outperforms contemporary semi-supervised work in terms of mIoU, using less labeled data, on the SemanticKITTI (59.5@5%) and ScribbleKITTI (58.1@5%) benchmark datasets, based on a 2.3× reduction in model parameters and 641× fewer multiply-add operations whilst also demonstrating significant performance improvement on limited training data (i.e., Less is More).

Our Less is More (LiM3D) based methodologies require less training data and less training computation whilst offering (more) improved accuracy over contemporary state-of-the-art approaches.

Our Sparse Depthwise Separable Convolution (SDSC) module can reduce trainable network parameters and the likelihood of over-fitting which facilitates a deeper network architecture.

Illustration of the SDSC convolution module. — Illustration of the SDSC convolution module: SDSC(M, N, D_k, s = 1) = SDC ◦ SPC.

SDSC is the compound of the Sparse Depthwise Convolution (SDC) and the Sparse Pointwise Convolution (SPC), namely SDSC(M, N, D_k, s = 1) = SDC ◦ SPC. Using a sparse voxelized input representatin , and a series of such SDSC sub-modules we construct the popular Cylinder3D sub-architectures within our overall Mean Teacher architectural design.

Our Spatio-Temporal Redundant Frame Downsampling (ST-RFD) strategy extracts a maximally diverse data subset for training by removing temporal redundancy and hence future annotation requirements.

Our proposed ST-RFD strategy considers the redundancy attributable to temporary periods of stationary capture (e.g. due to traffic) or multi-pass repetition (e.g. due to loop closure).

We propose a novel soft pseudo-labeling method informed by LiDAR reflectivity as a proxy to in-scene object material properties, facilitating effective use of limited data annotation.

We append our point-wise reflectivity to the existing point features in order to enhance performance in presence of false or non-existent pseudo-labels at the distillation stage.

Unreliable pseudo-labels (i.e., the corresponding voxels with high entropy) can offer information that helps in discrimination. Voxels that correlate to unreliable predictions can alternatively be thought as negative samples for improbable categories.

Citation

If you are making use of this work in any way, you must please reference the following paper in any report, publication, presentation, software release or any other associated materials:

Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation (Li Li, Hubert P. H. Shum and Toby P. Breckon), In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), 2023. [homepage] [pdf] [video] [poster]

@InProceedings{li23lim3d,
  title      =    {Less Is {{More}}: {{Reducing Task}} and {{Model Complexity}} for {{3D Point Cloud Semantic Segmentation}}},
  author     =    {Li, Li and Shum, Hubert P. H. and Breckon, Toby P.},
  keywords   =    {point cloud, semantic segmentation, sparse convolution, depthwise separable convolution, autonomous driving},
  year       =    {2023},
  month      =    June,
  publisher  =    {{IEEE}},
  booktitle  =    {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
}

Acknowledgements

We would like to additionally thank the authors of the open source codebase ScribbleKITTI, Cylinder3D, and U2PL. We also thank Neelanjan Bhowmik, Tanqiu Qiao, and Lingdong Kong for their disscusions.

Less is More

Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation

LiM3D for unreliable pseudo-labels LiDAR semantic segmentation involves three stages: training, pseudo-labeling, and distillation with unreliable learning.

Abstract

Video

Highlights

Less is More (LiM3D)

Sparse Depthwise Separable Convolution (SDSC)

Spatio-Temporal Redundant Frame Downsampling (ST-RFD) Strategy

LiDAR Reflectivity & Unreliable Soft Pseudo-Labeling

Citation

Acknowledgements