3DMOTFormer: Graph Transformer for Online 3D Multi-Object Tracking

Tracking 3D objects accurately and consistently is crucial for autonomous vehicles, enabling more reliable downstream tasks such as trajectory prediction and motion planning.

Paper and LLMs Autonomous Vehicles

GitHub Link

The GitHub link is https://github.com/dsx0511/3dmotformer

Introduce

The repository "3DMOTFormer" is the official implementation of the ICCV2023 paper titled "3DMOTFormer Graph Transformer for Online 3D Multi-Object Tracking." The paper addresses the challenge of accurate 3D object tracking for autonomous vehicles. It introduces a learned framework called 3DMOTFormer that leverages the transformer architecture. The framework uses an Edge-Augmented Graph Transformer to handle frame-by-frame reasoning on track-detection graphs and performs data association through edge classification. To mitigate the gap between training and inference, an innovative online training strategy is proposed. The approach achieves state-of-the-art results on nuScenes validation and test data using CenterPoint detections. The repository provides installation instructions and data preparation steps for replication. Tracking 3D objects accurately and consistently is crucial for autonomous vehicles, enabling more reliable downstream tasks such as trajectory prediction and motion planning.

Content

Tracking 3D objects accurately and consistently is crucial for autonomous vehicles, enabling more reliable downstream tasks such as trajectory prediction and motion planning. Based on the substantial progress in object detection in recent years, the tracking-by-detection paradigm has become a popular choice due to its simplicity and efficiency. State-of-the-art 3D multi-object tracking (MOT) works typically rely on non-learned model-based algorithms such as Kalman Filter but require many manually tuned parameters. On the other hand, learning-based approaches face the problem of adapting the training to the online setting, leading to inevitable distribution mismatch between training and inference as well as suboptimal performance. In this work, we propose 3DMOTFormer, a learned geometry-based 3D MOT framework building upon the transformer architecture. We use an Edge-Augmented Graph Transformer to reason on the track-detection bipartite graph frame-by-frame and conduct data association via edge classification. To reduce the distribution mismatch between training and inference, we propose a novel online training strategy with autoregressive and recurrent forward pass as well as sequential batch optimization. Using CenterPoint detections, our approach achieves state-of-the-art 71.2% and 68.2% AMOTA on nuScenes validation and test split. In addition, a trained 3DMOTFormer model generalizes well across different object detectors.

Alternatives & Similar Tools

Free Google Gemini: the best largest and most capable AI model Free

Google Gemini, a multimodal AI by DeepMind, processes text, audio, images, and more. Gemini outperforms in AI benchmarks, is optimized for varied devices, and has been tested for safety and bias, adhering to responsible AI practices.

Visit →

Video ReTalking-focuses on audio-based lip synchronization for talking head video editing Open Source

Video ReTalking, advanced real-world talking head video according to input audio, producing a high-quality

Visit →

UniSim-Chat Control Video and Virtual simulation Open Source

Then transplant it to the real world to solve complex problems

Visit →

LongLLaMA-handle very long text contexts, up to 256,000 tokens Open Source

LongLLaMA is a large language model designed to handle very long text contexts, up to 256,000 tokens. It's based on OpenLLaMA and uses a technique called Focused Transformer (FoT) for training. The repository provides a smaller 3B version of LongLLaMA for free use. It can also be used as a replacement for LLaMA models with shorter contexts.

Visit →

LLaVA-LLMs designed to connect a vision encoder with a language model Open Source

Large Language and Vision Assistant

Visit →

Ntropy Insights- Save 80% on underwriting businesses everywhere Freemium

Use bank data and Ntropy's AI. Parse bank feeds and statements, extract revenue and COGs, automatically re-create a P&L within milliseconds. Any industry, any geo.

Visit →