Skip to content

End-to-end FastAPI pipeline for estimating food weight and calories from a single photo (YOLOv8 + SAM + MiDaS).

License

Notifications You must be signed in to change notification settings

ChenYY-Offical/CalorieScan-Food-Calorie-API

Repository files navigation

CalorieScan API — A Food Weight and Calorie Estimation Solution (FastAPI)

🔍 中文版本请点击:README_zh.md

Python Version Framework License Stars

This is a small project I built back in my sophomore year: you take a photo of your meal, run detection/segmentation/depth estimation, and it returns a rough estimate of food weight (grams) and calories. It’s not a “one-click 100% accurate” product, but it’s a complete end-to-end pipeline that’s handy for learning, demos, and further tweaking.

If you find it useful, feel free to star the repo, fork it, and hack around. Issues/PRs are welcome.

What It Does

  • YOLOv8: detects tableware / containers / food targets
  • SAM: segments containers and food regions (and produces overlays for visual inspection)
  • MiDaS: estimates relative depth (mainly used to infer “height”)
  • Classifier + calorie table: maps recognized dishes to an approximate kcal/g
  • Output: weight/calories (point + range), confidence, warnings, and debug image paths

The Core Idea (Simple but Practical)

Use a “common reference object” as a ruler (by default: a bank card). First segment the container and food to get a reasonable footprint area; then use a depth model to estimate relative height and approximate volume; finally convert volume to weight using a dish-specific density and a calorie table.

In practice, the biggest sources of error are usually not the model itself, but: whether the mask includes bowl/background, whether the reference object and food are on the same plane, and whether the photo is blurry/reflective/dim. That’s why this project also saves debug overlays—so you can quickly see where the bias comes from.

Project Layout & Entrypoints

This repo contains two FastAPI apps (one full pipeline and one lightweight demo):

  1. Full weight/calories analysis service (recommended)

    • Entrypoint: model_training/caloriscan_api.py
    • Port: 8000
    • Endpoint: POST /analyze
    • Notes: returns weight/calories + saves debug images to model_training/outputs/
  2. Lightweight segmentation demo (segmentation only)

    • Entrypoint: main.py
    • Port: 8001
    • Endpoint: POST /estimate
    • Notes: a minimal service to quickly validate segmentation/recognition flow

model_training/main.py is an older offline inference entry (using predict.py), mainly kept for my own past debugging. It’s not recommended as the public API entry.

Requirements

  • Python: 3.9 / 3.10 recommended
  • GPU: NVIDIA GPU is much faster; CPU-only also works (but slower)
  • First run needs internet: MiDaS weights are downloaded from Hugging Face and then cached

Installation

1) Create and activate a virtual environment (recommended)

Windows PowerShell:

python -m venv venv
.\venv\Scripts\activate
python -m pip install -U pip

2) Install dependencies

pip install -r requirements.txt

If you hit issues installing torch==...+cuXXX (platform / GPU / driver differences can be painful), a practical approach is:

  1. Install torch/torchvision/torchaudio using the official PyTorch instructions for your machine (CPU or CUDA build)
  2. Install the rest (and if needed, remove/adjust torch-related lines in requirements.txt)

Model Files (Required)

To keep the repository small, some large model weights are not committed. Place them as follows:

Model Purpose File you need Where to put it (relative to repo root)
SAM ViT-B Segmentation sam_vit_b_01ec64.pth model_training/models/sam_vit_b_01ec64.pth
YOLOv8n Detection yolov8n.pt model_training/yolov8n.pt
Cuisine classifier (optional) Classification cuisine_classifier_full.pt model_training/cuisine_classifier_full.pt
MiDaS (DPT) Depth Auto-download No manual placement

Notes:

  • If model_training/models/ doesn’t exist, create it.
  • If your downloaded filenames differ, rename them to match the table to avoid editing code.
  • If yolov8n.pt is missing, the first run will download it automatically.
  • The cuisine classifier weights are not published by default. If you need them, please contact the author first (see the contact info in “Commercial Use / Commercial License”). If it’s missing, the API still runs but returns unknown and uses a default calories factor.

Train Your Own Cuisine Classifier (Optional, Recommended)

If you don’t want to use my trained classifier, you can train your own and drop the outputs back into model_training/.

  1. Prepare your dataset in ImageFolder format (one folder per class):
    • model_training/my_dataset/<class_name>/xxx.jpg
  2. Run training from the model_training/ directory:
python train_classifier.py

Outputs:

  • model_training/cuisine_classifier_full.pt
  • model_training/cuisine_classifier.pth
  • model_training/classes.txt

Run the Service

A. Run the full analysis service (recommended)

From the repository root:

python model_training\caloriscan_api.py

Swagger UI:

B. Run the lightweight demo service (optional)

From the repository root:

python main.py

Swagger UI:

API Usage (Full Analysis Service)

Request

POST /analyze (multipart form):

  • file: image file (jpg/png)

curl example:

curl -X POST "http://localhost:8000/analyze" ^
  -H "accept: application/json" ^
  -H "Content-Type: multipart/form-data" ^
  -F "file=@your_food.jpg"

Response Fields (simplified)

The response is a list; each item corresponds to one detected dish/container:

  • name: dish name (Chinese label)
  • weight / weight_low / weight_high: weight estimate (g) and range
  • calories / calories_low / calories_high: calories estimate and range
  • confidence: high/medium/low
  • warnings: quality/model/scale/depth warnings (for diagnosing errors)
  • card: reference object detection info (not always used for scaling)
  • model_scores: debug numeric signals (yolo/cls/sam scores, depth stats, area ratio, scaling method, etc.)
  • debug_files: paths to debug images (overlay/mask)
  • image: base64 overlay image (handy for frontends)

Example (real output contains more fields):

[
  {
    "name": "炒饭",
    "weight": 132,
    "calories": 250,
    "confidence": "low",
    "warnings": ["blurry", "card_low_confidence"],
    "debug_files": {
      "container": { "overlay": "...", "mask": "..." },
      "food": { "overlay": "...", "mask": "..." }
    }
  }
]

Debug Images (Highly Recommended)

Each /analyze call saves debug images into:

model_training/outputs/

Typical files:

  • ..._food.jpg / ..._food_mask.png
  • ..._container.jpg / ..._container_mask.png

Overlay convention:

  • Blue: container/bowl mask
  • Green: food mask
  • Red: reference object (bank card) outline (if 4 corners are available, it draws a quadrilateral)

Photography Tips (Accuracy Depends a Lot on the Photo)

  1. Use good lighting and avoid blur (blurry significantly increases uncertainty)
  2. Keep the scene simple (one main food target, or clear separation between targets)
  3. If you want to use a bank card / reference object for scaling:
    • Place it on the same plane as the food (same tabletop) for best results
    • A perfect top-down shot is not required; perspective/tilt is supported via 4-corner estimation
    • If the card and food are clearly not on the same plane, it triggers card_plane_mismatch and falls back to a default scale (better than a wrong calibration)

Common Issues (Troubleshooting)

  • Can’t open /docs / connection refused

    • Usually the server hasn’t fully started, the port is in use, or you opened the wrong port (8000 vs 8001)
  • Missing sam_vit_b_01ec64.pth

    • Make sure it’s under model_training/models/ and the filename matches exactly
  • First run takes a long time

    • MiDaS downloads weights from Hugging Face; it caches them for future runs
  • Weight looks too high/too low and you want to locate the cause

    • Check whether food/container masks in model_training/outputs/ look reasonable
    • Then check food_container_ratio, depth_height, cm_per_pixel_method, and warnings in the API response

Example Outputs

Debug images are generated after you call the API and saved to model_training/outputs/ (this folder is not committed by default). Typical files look like:

  • ..._food.jpg / ..._food_mask.png
  • ..._container.jpg / ..._container_mask.png

Contributing / Support

  • Want higher accuracy? Data is the most effective path (a bit of self-collected photos + real weights). Contributions are welcome.
  • Found a bug? Please open an issue, and ideally attach the overlay + mask images from outputs/.
  • Want to add features? Fork it and go wild; PRs are also welcome.

Commercial Use / Commercial License

This project is released under AGPL-3.0. If you want to use it commercially but integrate/deploy it as closed-source (or you cannot comply with AGPL obligations), please contact the author first for a separate commercial license.

License

AGPL-3.0. See LICENSE.

About

End-to-end FastAPI pipeline for estimating food weight and calories from a single photo (YOLOv8 + SAM + MiDaS).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages