I am trying to fine-tune an existing Agent using Lightning agent — specifically its system prompt (agent behavior).
My requirements:
- I must use OpenAI models only, such as gpt-4o base.
- I want to use VERL to optimize or update the agent’s prompt/behavior.
- I prefer a minimal, validated, single-file example (or as simple as possible).
My Questions:
- Does Agent Lightning support fine-tuning or behavioral optimization using VERL when the underlying LLM is an OpenAI model?
- Is VERL compatible with Agent Lightning for updating prompts or performing reward-based optimization on an OpenAI-powered agent?
- Can you we have used verl with prompt optimization for the llm and get some a validated minimal example demonstrating how to integrate VERL + Agent Lightning + OpenAI (gpt-4o) for fine-tuning an agent system prompt?