Yiyang Wang
·
Xi Chen
·
Xiaogang Xu
·
Yu Liu
·
Hengshuang Zhao
The University of Hong Kong | The Chinese University of Hong Kong | Tongyi Lab
|
- [✅] Release the Gradio demo code.
- [✅] Release the command-line inference code.
- [TODO] Release the checkpoint.
- [TODO] Release the code of the dataset simulation pipeline.
Install with pip:
pip install -r requirements.txt- Download the DiffCamera checkpoint from TODO and put it in
checkpoints/. - Download Flux.1-Dev as the backbone model.
- [Optional] Download Depth-Anything-V2(https://huggingface.co/depth-anything/Depth-Anything-V2-Large-hf) to detect depth maps.
Tips: this requires a GPU to run.
If you have prepared a depth map: Run the following command:
python gradio_demo.py --resume_from_checkpoint $DiffCamera_checkpoint_path --pretrained_model_name_or_path $FLUX_checkpoint_pathIf you have don't have a depth map: Run the following command to enable the depth estimation model:
python gradio_demo.py --resume_from_checkpoint $DiffCamera_checkpoint_path --pretrained_model_name_or_path $FLUX_checkpoint_path --depth_model_path $Depth_anything_v2_larg_hf_pathAfter that, you should be able to see something like Running on local URL: http://127.0.0.1:8888. Click or paste it into your browser, and then you can see the gradio demo inferface:
Checklist before running the inference code:
- Downloading and placing the checkpoint of DiffCamera (
xxx.bin) into foldercheckpoints/. - Modify scripts/infer_flux.sh. Specify the paths to FLUX.1-Dev, DiffCamera checkpoint, depth estimation model, and the input image. You can also modify the bokeh level (blur strenghth, ranging from 0 to 30) and the focus coordinates. The coordinates are normalized from 0 to 1, and focus_point_x=0 stands for the top edge, focus_point_x=1 stands for the bottom edge, focus_point_y=0 stands for the leftmost edge, focus_point_y=1 stands for the rightmost edge. (Tips: you can also try the gradio demo locally to get a better intuitive idea of how these parameters affect the refocus result :)
After preparation, run the following command to refocus on a single image.
./scripts/infer.sh
- The current DiffCamera can only support a limited resolution of 512x512 and 1024x1024.
- The current DiffCamera cannot refer to a reference image as input to provide a customized refocus result on a blurry subject.
- The current DiffCamera may be subject to some artifacts on challenging scenes where complex details are blurry, i.e., the information of the blurry objects is very limited.
If you find this codebase useful for your research, please use the following entry.
@article{wang2025diffcamera,
title={DiffCamera: Arbitrary Refocusing on Images},
author={Wang, Yiyang and Chen, Xi and Xu, Xiaogang and Liu, Yu and Zhao, Hengshuang},
journal={SIGGRAPH Asia 2025 Conference Papers},
year={2025}
}