-
Notifications
You must be signed in to change notification settings - Fork 27
Open
Description
I am currently working on a VLA project and utilizing VLABench for evaluation. I encountered an unexpected behavior when testing a baseline policy.
To verify the control consistency, I modified the RandomPolicy to output the exact same values as the input ee_state (End-Effector state), effectively commanding the robot to remain stationary. However, upon reviewing the video playback of the test run, the robot still exhibits visible movement.
Steps to Reproduce:
- Use the
RandomPolicywithin the VLABench environment. - Modify the
RandomPolicy.predict(self, obs, **kwargs)to set the predicted action equal to the inputee_state. My code is as below:
def predict(self, obs, **kwargs):
delta_pos = np.random.uniform(-0.1, 0.1, 3)
delta_euler = np.random.uniform(-0.1, 0.1, 3)
gripper_open = np.random.uniform(0, 1, 1)
current_ee_state = obs["ee_state"]
if len(current_ee_state) == 8:
pos, quat = current_ee_state[:3], current_ee_state[3:]
euler = quaternion_to_euler(quat)
elif len(current_ee_state) == 7:
pos, euler = current_ee_state[:3], current_ee_state[3:]
# target_pos = np.array(pos) + delta_pos
# target_euler = euler + delta_euler
# modify the above two lines to below two lines
target_pos = np.array(pos)
target_euler = euler
gripper_state = np.ones(2)*0.04 if gripper_open >= 0.1 else np.zeros(2)
return target_pos, target_euler, gripper_state- Run the evaluation and observe the video output/execution. My eval script is as below:
from VLABench.evaluation.evaluator import Evaluator
from VLABench.evaluation.model.policy.base import RandomPolicy
demo_tasks = ["select_fruit"]
save_dir = "./logs"
evaluator = Evaluator(
tasks=demo_tasks,
n_episodes=2,
max_substeps=10,
save_dir=save_dir,
visulization=True
)
random_policy = RandomPolicy(model=None)
result = evaluator.evaluate(random_policy)Observed Behavior:
The robot arm moves/drifts instead of remaining perfectly still as commanded.
Questions:
- What is the underlying cause of this movement? (e.g., Is it due to controller latency, coordinate system mismatches, IK errors, or physics engine noise in the simulator?)
- To what extent will this execution error/drift affect the overall benchmark results for VLA models? I am concerned that this discrepancy might lead to an underestimation of model performance.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels