🔍 中文版本请点击:README_zh.md
This is a small project I built back in my sophomore year: you take a photo of your meal, run detection/segmentation/depth estimation, and it returns a rough estimate of food weight (grams) and calories. It’s not a “one-click 100% accurate” product, but it’s a complete end-to-end pipeline that’s handy for learning, demos, and further tweaking.
If you find it useful, feel free to star the repo, fork it, and hack around. Issues/PRs are welcome.
- YOLOv8: detects tableware / containers / food targets
- SAM: segments containers and food regions (and produces overlays for visual inspection)
- MiDaS: estimates relative depth (mainly used to infer “height”)
- Classifier + calorie table: maps recognized dishes to an approximate kcal/g
- Output: weight/calories (point + range), confidence, warnings, and debug image paths
Use a “common reference object” as a ruler (by default: a bank card). First segment the container and food to get a reasonable footprint area; then use a depth model to estimate relative height and approximate volume; finally convert volume to weight using a dish-specific density and a calorie table.
In practice, the biggest sources of error are usually not the model itself, but: whether the mask includes bowl/background, whether the reference object and food are on the same plane, and whether the photo is blurry/reflective/dim. That’s why this project also saves debug overlays—so you can quickly see where the bias comes from.
This repo contains two FastAPI apps (one full pipeline and one lightweight demo):
-
Full weight/calories analysis service (recommended)
- Entrypoint:
model_training/caloriscan_api.py - Port:
8000 - Endpoint:
POST /analyze - Notes: returns weight/calories + saves debug images to
model_training/outputs/
- Entrypoint:
-
Lightweight segmentation demo (segmentation only)
- Entrypoint:
main.py - Port:
8001 - Endpoint:
POST /estimate - Notes: a minimal service to quickly validate segmentation/recognition flow
- Entrypoint:
model_training/main.pyis an older offline inference entry (usingpredict.py), mainly kept for my own past debugging. It’s not recommended as the public API entry.
- Python: 3.9 / 3.10 recommended
- GPU: NVIDIA GPU is much faster; CPU-only also works (but slower)
- First run needs internet: MiDaS weights are downloaded from Hugging Face and then cached
Windows PowerShell:
python -m venv venv
.\venv\Scripts\activate
python -m pip install -U pippip install -r requirements.txtIf you hit issues installing torch==...+cuXXX (platform / GPU / driver differences can be painful), a practical approach is:
- Install
torch/torchvision/torchaudiousing the official PyTorch instructions for your machine (CPU or CUDA build) - Install the rest (and if needed, remove/adjust torch-related lines in
requirements.txt)
To keep the repository small, some large model weights are not committed. Place them as follows:
| Model | Purpose | File you need | Where to put it (relative to repo root) |
|---|---|---|---|
| SAM ViT-B | Segmentation | sam_vit_b_01ec64.pth |
model_training/models/sam_vit_b_01ec64.pth |
| YOLOv8n | Detection | yolov8n.pt |
model_training/yolov8n.pt |
| Cuisine classifier (optional) | Classification | cuisine_classifier_full.pt |
model_training/cuisine_classifier_full.pt |
| MiDaS (DPT) | Depth | Auto-download | No manual placement |
Notes:
- If
model_training/models/doesn’t exist, create it. - If your downloaded filenames differ, rename them to match the table to avoid editing code.
- If
yolov8n.ptis missing, the first run will download it automatically. - The cuisine classifier weights are not published by default. If you need them, please contact the author first (see the contact info in “Commercial Use / Commercial License”). If it’s missing, the API still runs but returns
unknownand uses a default calories factor.
If you don’t want to use my trained classifier, you can train your own and drop the outputs back into model_training/.
- Prepare your dataset in
ImageFolderformat (one folder per class):model_training/my_dataset/<class_name>/xxx.jpg
- Run training from the
model_training/directory:
python train_classifier.pyOutputs:
model_training/cuisine_classifier_full.ptmodel_training/cuisine_classifier.pthmodel_training/classes.txt
From the repository root:
python model_training\caloriscan_api.pySwagger UI:
From the repository root:
python main.pySwagger UI:
POST /analyze (multipart form):
file: image file (jpg/png)
curl example:
curl -X POST "http://localhost:8000/analyze" ^
-H "accept: application/json" ^
-H "Content-Type: multipart/form-data" ^
-F "file=@your_food.jpg"The response is a list; each item corresponds to one detected dish/container:
name: dish name (Chinese label)weight/weight_low/weight_high: weight estimate (g) and rangecalories/calories_low/calories_high: calories estimate and rangeconfidence:high/medium/lowwarnings: quality/model/scale/depth warnings (for diagnosing errors)card: reference object detection info (not always used for scaling)model_scores: debug numeric signals (yolo/cls/sam scores, depth stats, area ratio, scaling method, etc.)debug_files: paths to debug images (overlay/mask)image: base64 overlay image (handy for frontends)
Example (real output contains more fields):
[
{
"name": "炒饭",
"weight": 132,
"calories": 250,
"confidence": "low",
"warnings": ["blurry", "card_low_confidence"],
"debug_files": {
"container": { "overlay": "...", "mask": "..." },
"food": { "overlay": "...", "mask": "..." }
}
}
]Each /analyze call saves debug images into:
model_training/outputs/
Typical files:
..._food.jpg/..._food_mask.png..._container.jpg/..._container_mask.png
Overlay convention:
- Blue: container/bowl mask
- Green: food mask
- Red: reference object (bank card) outline (if 4 corners are available, it draws a quadrilateral)
- Use good lighting and avoid blur (
blurrysignificantly increases uncertainty) - Keep the scene simple (one main food target, or clear separation between targets)
- If you want to use a bank card / reference object for scaling:
- Place it on the same plane as the food (same tabletop) for best results
- A perfect top-down shot is not required; perspective/tilt is supported via 4-corner estimation
- If the card and food are clearly not on the same plane, it triggers
card_plane_mismatchand falls back to a default scale (better than a wrong calibration)
-
Can’t open
/docs/ connection refused- Usually the server hasn’t fully started, the port is in use, or you opened the wrong port (8000 vs 8001)
-
Missing
sam_vit_b_01ec64.pth- Make sure it’s under
model_training/models/and the filename matches exactly
- Make sure it’s under
-
First run takes a long time
- MiDaS downloads weights from Hugging Face; it caches them for future runs
-
Weight looks too high/too low and you want to locate the cause
- Check whether food/container masks in
model_training/outputs/look reasonable - Then check
food_container_ratio,depth_height,cm_per_pixel_method, andwarningsin the API response
- Check whether food/container masks in
Debug images are generated after you call the API and saved to model_training/outputs/ (this folder is not committed by default). Typical files look like:
..._food.jpg/..._food_mask.png..._container.jpg/..._container_mask.png
- Want higher accuracy? Data is the most effective path (a bit of self-collected photos + real weights). Contributions are welcome.
- Found a bug? Please open an issue, and ideally attach the overlay + mask images from
outputs/. - Want to add features? Fork it and go wild; PRs are also welcome.
This project is released under AGPL-3.0. If you want to use it commercially but integrate/deploy it as closed-source (or you cannot comply with AGPL obligations), please contact the author first for a separate commercial license.
- Email: 1374552774@qq.com
- WeChat: Akuri2133
AGPL-3.0. See LICENSE.