VLSOT
Monocular-Video-Based 3D Visual Language Tracking
- Visual-Language Tracking (VLT) is an emerging paradigm that bridges the human-machine performance gap by integrating visual and linguistic cues, extending single-object tracking to text-driven video comprehension.
- However, existing VLT research remains confined to 2D spatial domains, lacking the capability for 3D tracking in monocular video—a task traditionally reliant on expensive sensors (e.g., point clouds, depth measurements, radar) without corresponding language descriptions for their outputs.
- The code are publicly available (https://github.com/hongkai-wei/Mono3DVLT), advancing low-cost monocular 3D tracking with language grounding.

Submission Guidelines
Mono3DVLT-V2X Dataset : By applying, you can obtain the download link for the annotation file corresponding to the dataset. To avoid infringement, please obtain the images corresponding to the dataset from the V2X-Seq dataset: "V2x-seq: A large-scale sequential dataset for vehicle-infrastructure cooperative perception and forecasting. "
3D目标检测预测文件格式规范
1. 文件命名规则
预测文件必须按照以下格式命名:
{样本ID}_pred_box0_.txt
示例:
sample001_pred_box0_.txt
, car_003_pred_box0_.txt
2. 文件内容格式
每个文件包含30行,每行对应一帧的预测结果:
1 x_min y_min z_min x_max y_max z_max 2 x_min y_min z_min x_max y_max z_max 3 x_min y_min z_min x_max y_max z_max ... 30 x_min y_min z_min x_max y_max z_max
格式说明
列号 | 内容 | 说明 |
---|---|---|
第1列 | 1-30 |
帧索引(从1到30) |
第2-7列 | x_min y_min z_min x_max y_max z_max |
3D包围盒坐标 |
3. 坐标要求
- 坐标顺序:必须确保
x_min ≤ x_max
,y_min ≤ y_max
,z_min ≤ z_max
- 数据类型:浮点数,建议保留4位小数
- 单位:米(meters)
- 坐标系:3D世界坐标系
注意: 坐标值必须使用英文句点(.)作为小数点分隔符,不能使用逗号(,)
4. 示例文件内容
1 0.0413 -0.0061 -0.3771 0.4055 -0.2095 0.2607 2 0.0421 -0.0060 -0.3745 0.4040 -0.2057 0.2586 3 0.0423 -0.0060 -0.3760 0.4028 -0.2031 0.2531 4 0.0434 -0.0060 -0.3736 0.4024 -0.2010 0.2514 5 0.0441 -0.0060 -0.3724 0.4011 -0.1982 0.2453 ... 30 0.0605 -0.0056 -0.3588 0.3789 -0.1567 0.1814
5. 文件夹结构
提交的预测结果应按以下结构组织:
pred_folder/
├── 1_pred_box0_.txt
├── 2_pred_box0_.txt
├── 3_pred_box0_.txt
└── ...
├── 1_pred_box0_.txt
├── 2_pred_box0_.txt
├── 3_pred_box0_.txt
└── ...
建议: 使用ZIP格式压缩整个文件夹后提交,文件名如
submission_teamname.zip
Method Leaderboard
7 Methods
6 Metrics
This leaderboard shows methods that are online and have submitted results. Methods are ranked based on their performance metrics.
Method | SR@0.5 Higher is better | SR@0.7 Higher is better | AOR Higher is better | PR@1.0 Higher is better | ACE Lower is better | PR@0.5 Higher is better |
---|---|---|---|---|---|---|
Mono3DVLT-MT
Last submission: 2025-08-07
|
81.6300 | 68.9400 | 85.1200 | 81.5600 | 0.5210 | 62.3600 |
Mono3DVG-TR
Last submission: 2025-08-31
|
71.7500 | 63.4700 | 79.1300 | 75.8900 | 0.5940 | 58.9100 |
JointNLT+backproj
Last submission: 2025-08-31
|
61.4000 | 51.7000 | 68.3100 | 70.4300 | 0.6970 | 53.6300 |
TransVG+backproj
Last submission: 2025-08-31
|
54.5000 | 41.6200 | 58.6200 | 66.7800 | 0.7830 | 47.5600 |
ReSC+backproj
Last submission: 2025-08-31
|
50.3300 | 38.4100 | 53.4300 | 68.5200 | 0.7920 | 47.5600 |
ZSGNet+backproj
Last submission: 2025-08-31
|
37.2900 | 25.3700 | 40.6900 | 37.4100 | 1.1320 | 16.3000 |
FAOA+backproj
Last submission: 2025-08-31
|
35.4300 | 29.6200 | 41.1800 | 43.4000 | 1.0590 | 25.3200 |