Less is More

Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation

Conference on Computer Vision and Pattern Recognition (CVPR), 2023

1Department of Computer Science, 2Department of Engineering

LiM3D for unreliable pseudo-labels LiDAR semantic segmentation involves three stages: training, pseudo-labeling, and distillation with unreliable learning.

Abstract

Whilst the availability of 3D LiDAR point cloud data has significantly grown in recent years, annotation remains expensive and time-consuming, leading to a demand for semi-supervised semantic segmentation methods with application domains such as autonomous driving. Existing work very often employs relatively large segmentation backbone networks to improve segmentation accuracy, at the expense of computational costs. In addition, many use uniform sampling to reduce ground truth data requirements for learning needed, often resulting in sub-optimal performance. To address these issues, we propose a new pipeline that employs a smaller architecture, requiring fewer ground-truth annotations to achieve superior segmentation accuracy compared to contemporary approaches. This is facilitated via a novel Sparse Depthwise Separable Convolution module that significantly reduces the network parameter count while retaining overall task performance. To effectively sub-sample our training data, we propose a new Spatio-Temporal Redundant Frame Downsampling (ST-RFD) method that leverages knowledge of sensor motion within the environment to extract a more diverse subset of training data frame samples. To leverage the use of limited annotated data samples, we further propose a soft pseudo-label method informed by LiDAR reflectivity. Our method outperforms contemporary semi-supervised work in terms of mIoU, using less labeled data, on the SemanticKITTI (59.5@5%) and ScribbleKITTI (58.1@5%) benchmark datasets, based on a 2.3× reduction in model parameters and 641× fewer multiply-add operations whilst also demonstrating significant performance improvement on limited training data (i.e., Less is More).

Video

Highlights 

Less is More (LiM3D)

Our Less is More (LiM3D) based methodologies require less training data and less training computation whilst offering (more) improved accuracy over contemporary state-of-the-art approaches.

MIoU performance (%) against parameters and multiplyadd operations.
MIoU performance (%) against parameters and multiplyadd operations on SemanticKITTI (fully annotated) and ScribbleKITTI (weakly annotated) under the 5% sampling protocol.

Sparse Depthwise Separable Convolution (SDSC)

Our Sparse Depthwise Separable Convolution (SDSC) module can reduce trainable network parameters and the likelihood of over-fitting which facilitates a deeper network architecture.

Illustration of the SDSC convolution module.
Illustration of the SDSC convolution module: SDSC(M, N, D_k, s = 1) = SDC ◦ SPC.

SDSC is the compound of the Sparse Depthwise Convolution (SDC) and the Sparse Pointwise Convolution (SPC), namely SDSC(M, N, D_k, s = 1) = SDC ◦ SPC. Using a sparse voxelized input representatin , and a series of such SDSC sub-modules we construct the popular Cylinder3D sub-architectures within our overall Mean Teacher architectural design.


Spatio-Temporal Redundant Frame Downsampling (ST-RFD) Strategy

Our Spatio-Temporal Redundant Frame Downsampling (ST-RFD) strategy extracts a maximally diverse data subset for training by removing temporal redundancy and hence future annotation requirements.

Overview of our proposed Spatio-Temporal Redundant Frame Downsampling approach.
Overview of our proposed Spatio-Temporal Redundant Frame Downsampling approach.

Our proposed ST-RFD strategy considers the redundancy attributable to temporary periods of stationary capture (e.g. due to traffic) or multi-pass repetition (e.g. due to loop closure).

Illustration of LiDAR frame temporal correlation.
Illustration of LiDAR frame temporal correlation as [# frame ID] redundancy with 5% sampling on SemanticKITTI (sequence 00) using uniform sampling (selected frames in blu circle) and ST-RFD strategy (in red crecle).

LiDAR Reflectivity & Unreliable Soft Pseudo-Labeling

We propose a novel soft pseudo-labeling method informed by LiDAR reflectivity as a proxy to in-scene object material properties, facilitating effective use of limited data annotation.

We append our point-wise reflectivity to the existing point features in order to enhance performance in presence of false or non-existent pseudo-labels at the distillation stage.

Coarse histograms of Reflec-TTA bins (not to scale).
Coarse histograms of Reflec-TTA bins (not to scale). We apply various sizes of reflec-bins in cylindrical coordinates to analyze the intrinsic point distribution of the LiDAR sensor at varying resolutions (shown in red, green and blue).

Unreliable pseudo-labels (i.e., the corresponding voxels with high entropy) can offer information that helps in discrimination. Voxels that correlate to unreliable predictions can alternatively be thought as negative samples for improbable categories.

Illustration on unreliable pseudo-labels.
Illustration on unreliable pseudo-labels. Left: entropy predicted from an unlabeled point cloud, with lower entropy corresponding to greener color. Right: Category-wise probability of an unreliableprediction,only top-4 and last-4 probabilities shown.

Citation

If you are making use of this work in any way, you must please reference the following paper in any report, publication, presentation, software release or any other associated materials:

Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation (Li Li, Hubert P. H. Shum and Toby P. Breckon), In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), 2023. [homepage] [pdf] [video] [poster]

@InProceedings{li23lim3d,
  title      =    {Less Is {{More}}: {{Reducing Task}} and {{Model Complexity}} for {{3D Point Cloud Semantic Segmentation}}},
  author     =    {Li, Li and Shum, Hubert P. H. and Breckon, Toby P.},
  keywords   =    {point cloud, semantic segmentation, sparse convolution, depthwise separable convolution, autonomous driving},
  year       =    {2023},
  month      =    June,
  publisher  =    {{IEEE}},
  booktitle  =    {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
} 

Acknowledgements

We would like to additionally thank the authors of the open source codebase ScribbleKITTI, Cylinder3D, and U2PL. We also thank Neelanjan Bhowmik, Tanqiu Qiao, and Lingdong Kong for their disscusions.