Clarification on `load_mediapipe_directory` expected input format

I’ve been trying to convert MediaPipe Holistic pose estimations into the `.pose` format using `pose-format`, but I’m having some trouble understanding what exact JSON structure `load_mediapipe_directory` expects.

---

### 🧩 Context

I extracted the pose skeleton from a video using the following MediaPipe Holistic code:

```python
import cv2
import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
import os
import json

signal = "/home/gsantm/store/data/aligned_yolo_cropped_how2sign/test/clips/-fZc293MpJk_0-1-rgb_front.mp4"
output_root = "/home/gsantm/store/pose_estimators/mediapipe_holistic/output"
model_path = os.path.expanduser("~/store/pose_estimators/mediapipe_holistic/pose_landmarker_lite.task")

sample_name = os.path.splitext(os.path.basename(signal))[0]
output_dir = os.path.join(output_root, sample_name)
os.makedirs(output_dir, exist_ok=True)

BaseOptions = mp.tasks.BaseOptions
PoseLandmarker = mp.tasks.vision.PoseLandmarker
PoseLandmarkerOptions = mp.tasks.vision.PoseLandmarkerOptions
VisionRunningMode = mp.tasks.vision.RunningMode

options = PoseLandmarkerOptions(
    base_options=BaseOptions(model_asset_path=model_path),
    running_mode=VisionRunningMode.IMAGE
)

with PoseLandmarker.create_from_options(options) as landmarker:
    cap = cv2.VideoCapture(signal)
    if not cap.isOpened():
        raise RuntimeError(f"Error opening video file: {signal}")

    frame_idx = 0
    while True:
        ret, frame = cap.read()
        if not ret:
            break

        frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=frame_rgb)

        result = landmarker.detect(mp_image)

        if result.pose_landmarks:
            landmarks = [
                {"x": lm.x, "y": lm.y, "z": lm.z, "visibility": lm.visibility}
                for lm in result.pose_landmarks[0]
            ]
        else:
            landmarks = []

        frame_json_path = os.path.join(output_dir, f"frame_{frame_idx}.json")
        with open(frame_json_path, "w") as f:
            json.dump({
                "frame": frame_idx,
                "landmarks": landmarks
            }, f, indent=2)

        frame_idx += 1

    cap.release()
```

This creates a directory like this:

```
/home/.../-fZc293MpJk_0-1-rgb_front/
├── frame_0.json
├── frame_1.json
├── frame_2.json
└── ...
```

Each JSON looks like this:

```json
{
  "frame": 0,
  "landmarks": [
    {
      "x": 0.4936003088951111,
      "y": 0.19043749570846558,
      "z": -1.181049108505249,
      "visibility": 0.9997064471244812
    },
    {
      "x": 0.5202051401138306,
      "y": 0.15045350790023804,
      "z": -1.1277518272399902,
      "visibility": 0.9993149042129517
    },
(...)
    {
      "x": 0.4366815686225891,
      "y": 1.9366867542266846,
      "z": 0.12023007124662399,
      "visibility": 0.04526037722826004
    }
  ]
}
```

---

### 🧠 What I tried

Then I tried converting them into `.pose` format using:

```python
from pose_format.utils.holistic import load_mediapipe_directory

signal = "/home/.../-fZc293MpJk_0-1-rgb_front"
pose = load_mediapipe_directory(signal, fps=50, width=674, height=588)

with open(f"{signal}/pose.pose", "wb") as f:
    pose.write(f)
```

But I get this error:

```
Traceback (most recent call last):
  File "...", line 16, in <module>
    pose = load_mediapipe_directory(output_path, fps=fps, width=width, height=height)
  File ".../pose_format/utils/holistic.py", line 397, in load_mediapipe_directory
    num_pose_points = first_frame["pose_landmarks"]["num_landmarks"]
KeyError: 'num_landmarks'
```

This makes sense because my JSON files do not have a `num_landmarks` key.  
I couldn’t find where this value comes from, since it doesn’t seem to appear in the MediaPipe `result` object (after `result = landmarker.detect(mp_image)`).

---

### 🔍 What I’ve checked

Reading the code [here](https://github.com/sign-language-processing/pose/blob/4021ef1256c784208cb5fd38d6ee5ff5b0bfaf01/src/python/pose_format/utils/holistic.py#L194), it seems `num_landmarks` and related metadata are expected to already exist in the JSON files — and it looks like in your implementation you assign these values internally.

---

### ❓Questions

1. Is there any example or script that shows how you extracted and formatted the MediaPipe Holistic outputs to make them compatible with `load_mediapipe_directory`?
2. How can I know how many landmarks should be included for:
   - `face_landmarks`
   - `pose_landmarks`
   - `left_hand_landmarks`
   - `right_hand_landmarks`

I just want to add that, while I understand you expect people to use your internal code for pose estimation (which makes sense and has never encountered any issue using your code), my goal is simply to understand how to correctly map MediaPipe Holistic outputs to your `.pose` format — and potentially generalize this process for other pose estimators in the future.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification on `load_mediapipe_directory` expected input format #181

🧩 Context

🧠 What I tried

🔍 What I’ve checked

❓Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Clarification on load_mediapipe_directory expected input format #181

Description

🧩 Context

🧠 What I tried

🔍 What I’ve checked

❓Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Clarification on `load_mediapipe_directory` expected input format #181