MS3D++: Ensemble of Experts for Multi-Source Unsupervised Domain Adaption in 3D Object Detection

MS3D++ provides a straightforward approach to domain adaptation by generating high-quality pseudo-labels, enabling the adaptation of 3D detectors to a diverse range of lidar types, regardless of their density.

Paper and LLMs Domain Generalization

GitHub Link

The GitHub link is https://github.com/darrenjkt/ms3d

Introduce

The GitHub repository "darrenjkt/MS3D" presents the code for the MS3D framework, which focuses on auto-labeling point cloud sequences for 3D object detection. MS3D offers a self-training approach to generate high-quality labels for training 3D detectors, specifically for vehicles and pedestrians using various lidar densities. The system's generated labels exhibit robustness across different lidar types and architectures. Notably, MS3D provides benefits such as versatile ensemble customization, compatibility with 3D detectors, and real-time inference without architectural modifications. The repository includes installation instructions, usage guidelines, model results for different target domains, and pre-trained source models. The framework's Apache 2.0 license allows usage and citation for research purposes. The associated research paper is available for reference. MS3D++ provides a straightforward approach to domain adaptation by generating high-quality pseudo-labels, enabling the adaptation of 3D detectors to a diverse range of lidar types, regardless of their density.

Content

This is the official code release for MS3D is a simple self-training (aka. auto-labeling) framework for vehicles and pedestrians that generates high quality labels for training of 3D detectors on a variety of lidars, regardless of their density. Simply using our generated labels for training VoxelRCNN on the Waymo dataset achieves a vehicle detection of 70.3 BEV AP on the official validation dataset, only 3.5 BEV AP less than training with human-annotated labels. Read our papers to find out more. Our box fusion method, KBF, can be used for detector ensembling in a supervised setting as well and can outperform Weighted Box Fusion (WBF). See our first MS3D paper for comparison results and a simple demo here. Please refer to INSTALL.md for the installation of MS3D. For all tables below, "GT-FT" refers to fine-tuning the pre-trained detector using ground-truth labels from the target domain. Results are reported at IoU=0.7 evaluated at 40 recall levels (R40). Refer to our paper for detailed results. Models for target-nuscenes can be downloaded here. We also provide MS3D results for fine-tuning with multi-frame detection as is common on nuScenes models to demonstrate that we can further boost performance. All models below use SECOND-IoU. Models for target-lyft can be downloaded here. Similarly to nuScenes we show multi-frame detection results for MS3D. All models below use SECOND-IoU. Due to the Waymo Dataset License Agreement we do not provide links to models that are trained on waymo data. You can train your own model using our provided configs. If you want to download the models, please send me an email with your name, institute, a screenshot of the Waymo dataset registration confirmation mail and your intended usage. Please note that Waymo open dataset is under strict non-commercial license, so we are not allowed to share the model with you if it will use for any profit-oriented activities. We provide models trained on source-domain data used in our experiments. nuScenes pre-trained models can be downloaded here Lyft pre-trained models can be downloaded here For Waymo, please send me an email if you would like to download the source-trained models we used. MS3D is released under the Apache 2.0 license. If you find this project useful in your research, please give us a star and consider citing:

Alternatives & Similar Tools

Free Google Gemini: the best largest and most capable AI model Free

Google Gemini, a multimodal AI by DeepMind, processes text, audio, images, and more. Gemini outperforms in AI benchmarks, is optimized for varied devices, and has been tested for safety and bias, adhering to responsible AI practices.

Visit →

Video ReTalking-focuses on audio-based lip synchronization for talking head video editing Open Source

Video ReTalking, advanced real-world talking head video according to input audio, producing a high-quality

Visit →

UniSim-Chat Control Video and Virtual simulation Open Source

Then transplant it to the real world to solve complex problems

Visit →

LongLLaMA-handle very long text contexts, up to 256,000 tokens Open Source

LongLLaMA is a large language model designed to handle very long text contexts, up to 256,000 tokens. It's based on OpenLLaMA and uses a technique called Focused Transformer (FoT) for training. The repository provides a smaller 3B version of LongLLaMA for free use. It can also be used as a replacement for LLaMA models with shorter contexts.

Visit →

LLaVA-LLMs designed to connect a vision encoder with a language model Open Source

Large Language and Vision Assistant

Visit →

Ntropy Insights- Save 80% on underwriting businesses everywhere Freemium

Use bank data and Ntropy's AI. Parse bank feeds and statements, extract revenue and COGs, automatically re-create a P&L within milliseconds. Any industry, any geo.

Visit →