This repository contains code and experiment outputs for a UAV/drone detection benchmark under synthetic adverse-weather surveillance conditions.
The project is based on a synthetic data augmentation pipeline for surveillance videos. Clear-weather scenes are transformed into adverse-weather conditions such as fog, rain, and snow, then evaluated with a fine-tuned RT-DETRv2 detector.
The main goal of this project is not to propose a new object detection architecture. Instead, the project focuses on generating synthetic adverse-weather surveillance videos and evaluating how those conditions affect UAV detection reliability.
Pipeline:
Clear Weather Image
-> Drone Added with Image-to-Image Generation
-> Adverse Weather Added with Image-to-Image Generation
-> Image-to-Video Generation
-> Synthetic Weather Video Dataset
-> RT-DETRv2 Detector Evaluation
The detector used in this project is RT-DETRv2 fine-tuned on the DUT Anti-UAV dataset.
Model configuration:
Detector: RT-DETRv2
Backbone: PResNet / ResNet-18
Input size: 640 x 640
Number of classes: 1, drone
Decoder layers: 3
Number of queries: 300
Feature strides: [8, 16, 32]
Parameters: 20,083,028
Training epochs: 180
Best epoch: 123
Validation performance on DUT Anti-UAV:
mAP@[0.50:0.95]: 69.97%
AP50: 95.76%
AP75: 80.34%
These validation metrics indicate that the detector is reasonably reliable on the DUT Anti-UAV validation set, but the detector itself is used here mainly as an evaluation model for the synthetic weather benchmark.
The detector was fine-tuned using the DUT Anti-UAV detection dataset converted to COCO detection format.
This repository also includes a small supplementary synthetic weather dataset:
dataset/
|-- fog/
|-- original image/
|-- rain/
`-- snow/
The supplementary dataset contains generated weather images and MP4 videos used for qualitative visualization and detector-behavior analysis. It does not include ground-truth bounding box annotations for the generated videos.
Expected dataset structure:
datasets/DUT_AntiUAV_Det/
|-- train/
| |-- img/
| `-- xml/
|-- val/
| |-- img/
| `-- xml/
`-- annotations/
|-- train.json
`-- val.json
Dataset configuration:
Config: submission_code/configs/dataset/dut_antiuav_detection.yml
Format: COCO detection
num_classes: 1
remap_mscoco_category: False
submission_code/tools/infer_video.py
Runs inference on a single video and exports an annotated MP4 with bounding boxes.
submission_code/tools/infer_videos_to_json.py
Runs inference on multiple videos and exports frame-level JSON results.
submission_code/tools/compute_confidence_drop.py
Computes detection rate, mean confidence score, and confidence drop from exported JSON results.
submission_code/configs/rtdetrv2/rtdetrv2_r18vd_180e_dut.yml
Main training configuration for the 180-epoch DUT Anti-UAV experiment.
submission_code/dataset_utils/voc2coco.py
submission_code/dataset_utils/fix_coco_labels.py
Utilities for converting VOC XML annotations to COCO JSON and fixing class label indices for 1-class training.
The full RT-DETR source tree and checkpoints are not included in this repository. The files under submission_code/ are the project-specific scripts and configuration files used for the experiment. To reproduce training or inference, place these files into a compatible RT-DETRv2 PyTorch checkout.
From the RT-DETRv2 PyTorch directory:
cd code/RT-DETR/rtdetrv2_pytorch
python tools/train.py \
-c configs/rtdetrv2/rtdetrv2_r18vd_180e_dut.yml \
-t ../../../checkpoints/rtdetrv2_r18vd_coco.pth \
--use-amp \
--device cuda:0Main outputs:
output/rtdetrv2_r18vd_180e_dut/best.pth
output/rtdetrv2_r18vd_180e_dut/last.pth
output/rtdetrv2_r18vd_180e_dut/checkpointXXXX.pth
output/rtdetrv2_r18vd_180e_dut/log.txt
Export an annotated detection video:
cd code/RT-DETR/rtdetrv2_pytorch
python tools/infer_video.py \
-c configs/rtdetrv2/rtdetrv2_r18vd_180e_dut.yml \
-r output/rtdetrv2_r18vd_180e_dut/best.pth \
--input ../../../videos_in/video1.mp4 \
--output ../../../videos_out/video1_det_180e_best.mp4 \
--device cuda:0 \
--conf 0.5Export frame-level JSON results for a folder of videos:
cd code/RT-DETR/rtdetrv2_pytorch
python tools/infer_videos_to_json.py \
-c configs/rtdetrv2/rtdetrv2_r18vd_180e_dut.yml \
-r output/rtdetrv2_r18vd_180e_dut/best.pth \
--input_dir ../../../videos_in \
--output_dir ../../../json_out \
--device cuda:0 \
--conf 0.5 \
--top1_onlyCompute confidence drop from generated JSON outputs:
cd code/RT-DETR/rtdetrv2_pytorch
python tools/compute_confidence_drop.py \
--json_dir ../../../json_out \
--clear video1 \
--label video1=Clear \
--label video2=Fog \
--label video3=Snow \
--label video4=Rain \
--output_csv ../../../json_out/confidence_drop_summary.csvFor detector validation on the annotated DUT Anti-UAV validation set, the main metrics are:
mAP@[0.50:0.95]
AP50
AP75
For generated synthetic videos without ground-truth bounding boxes, the JSON output should not be interpreted as accuracy. The frame-level JSON files contain detector outputs, not ground truth comparisons.
For synthetic video analysis, use:
Detection Rate = detected frames / total frames
Mean Confidence Score = average detector confidence over detected frames
Confidence Drop = clear-weather mean confidence - adverse-weather mean confidence
Important interpretation:
Detection rate is not the same as accuracy.
A 100% detection rate only means that the model produced at least one prediction above the confidence threshold in every frame. It does not prove that every prediction was correct unless ground-truth annotations are available.
The benchmark uses detection rate, mean confidence score, and confidence drop to compare clear and adverse-weather videos.
Condition | Detection Rate | Mean Confidence | Confidence Drop
Clear | 100.0% | 0.9026 | -
Fog | 82.6% | 0.7057 | 0.1969
Snow | 98.3% | 0.7302 | 0.1724
Rain | 86.8% | 0.7665 | 0.1361
Summary:
- Fog caused the largest confidence degradation.
- Snow maintained a high detection rate but reduced confidence.
- Rain showed the smallest confidence drop among the tested adverse-weather conditions.
- All adverse-weather conditions reduced confidence compared with clear weather.
uav/
|-- dataset/
| |-- fog/
| |-- original image/
| |-- rain/
| `-- snow/
|-- submission_code/
| |-- configs/
| |-- dataset_utils/
| `-- tools/
|-- README.md
`-- .gitignore
Large checkpoints, original DUT Anti-UAV training data, raw inference outputs, and full RT-DETR source files are not included in this repository.
- The detector is used as a benchmark model, not as the main methodological contribution.
- JSON outputs are raw detector predictions and confidence statistics.
- Ground-truth annotations are required to compute true accuracy, precision, recall, F1 score, or mAP on generated videos.
- The most reliable quantitative detector metric in this repository is the DUT Anti-UAV validation result: mAP 69.97%, AP50 95.76%, and AP75 80.34%.