-
Notifications
You must be signed in to change notification settings - Fork 28
Description
I’ve been trying to convert MediaPipe Holistic pose estimations into the .pose format using pose-format, but I’m having some trouble understanding what exact JSON structure load_mediapipe_directory expects.
🧩 Context
I extracted the pose skeleton from a video using the following MediaPipe Holistic code:
import cv2
import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
import os
import json
signal = "/home/gsantm/store/data/aligned_yolo_cropped_how2sign/test/clips/-fZc293MpJk_0-1-rgb_front.mp4"
output_root = "/home/gsantm/store/pose_estimators/mediapipe_holistic/output"
model_path = os.path.expanduser("~/store/pose_estimators/mediapipe_holistic/pose_landmarker_lite.task")
sample_name = os.path.splitext(os.path.basename(signal))[0]
output_dir = os.path.join(output_root, sample_name)
os.makedirs(output_dir, exist_ok=True)
BaseOptions = mp.tasks.BaseOptions
PoseLandmarker = mp.tasks.vision.PoseLandmarker
PoseLandmarkerOptions = mp.tasks.vision.PoseLandmarkerOptions
VisionRunningMode = mp.tasks.vision.RunningMode
options = PoseLandmarkerOptions(
base_options=BaseOptions(model_asset_path=model_path),
running_mode=VisionRunningMode.IMAGE
)
with PoseLandmarker.create_from_options(options) as landmarker:
cap = cv2.VideoCapture(signal)
if not cap.isOpened():
raise RuntimeError(f"Error opening video file: {signal}")
frame_idx = 0
while True:
ret, frame = cap.read()
if not ret:
break
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=frame_rgb)
result = landmarker.detect(mp_image)
if result.pose_landmarks:
landmarks = [
{"x": lm.x, "y": lm.y, "z": lm.z, "visibility": lm.visibility}
for lm in result.pose_landmarks[0]
]
else:
landmarks = []
frame_json_path = os.path.join(output_dir, f"frame_{frame_idx}.json")
with open(frame_json_path, "w") as f:
json.dump({
"frame": frame_idx,
"landmarks": landmarks
}, f, indent=2)
frame_idx += 1
cap.release()This creates a directory like this:
/home/.../-fZc293MpJk_0-1-rgb_front/
├── frame_0.json
├── frame_1.json
├── frame_2.json
└── ...
Each JSON looks like this:
{
"frame": 0,
"landmarks": [
{
"x": 0.4936003088951111,
"y": 0.19043749570846558,
"z": -1.181049108505249,
"visibility": 0.9997064471244812
},
{
"x": 0.5202051401138306,
"y": 0.15045350790023804,
"z": -1.1277518272399902,
"visibility": 0.9993149042129517
},
(...)
{
"x": 0.4366815686225891,
"y": 1.9366867542266846,
"z": 0.12023007124662399,
"visibility": 0.04526037722826004
}
]
}🧠 What I tried
Then I tried converting them into .pose format using:
from pose_format.utils.holistic import load_mediapipe_directory
signal = "/home/.../-fZc293MpJk_0-1-rgb_front"
pose = load_mediapipe_directory(signal, fps=50, width=674, height=588)
with open(f"{signal}/pose.pose", "wb") as f:
pose.write(f)But I get this error:
Traceback (most recent call last):
File "...", line 16, in <module>
pose = load_mediapipe_directory(output_path, fps=fps, width=width, height=height)
File ".../pose_format/utils/holistic.py", line 397, in load_mediapipe_directory
num_pose_points = first_frame["pose_landmarks"]["num_landmarks"]
KeyError: 'num_landmarks'
This makes sense because my JSON files do not have a num_landmarks key.
I couldn’t find where this value comes from, since it doesn’t seem to appear in the MediaPipe result object (after result = landmarker.detect(mp_image)).
🔍 What I’ve checked
Reading the code here, it seems num_landmarks and related metadata are expected to already exist in the JSON files — and it looks like in your implementation you assign these values internally.
❓Questions
- Is there any example or script that shows how you extracted and formatted the MediaPipe Holistic outputs to make them compatible with
load_mediapipe_directory? - How can I know how many landmarks should be included for:
face_landmarkspose_landmarksleft_hand_landmarksright_hand_landmarks
I just want to add that, while I understand you expect people to use your internal code for pose estimation (which makes sense and has never encountered any issue using your code), my goal is simply to understand how to correctly map MediaPipe Holistic outputs to your .pose format — and potentially generalize this process for other pose estimators in the future.