Random things I do

I Got Claude to Optimize Agent Prompts (using agl library)

Automatic Prompt Optimization

I gave Claude the ability to automatically optimize prompts for LangChain/LangGraph agents using a new skill called training-apo-agents. Not just write optimization scripts, but to actually design reward functions, configure training pipelines, run experiments, and analyze results. This tutorial shows you how it works and how to use it yourself.

Claude Code can use "skills"β€”packaged instructions, scripts, and domain knowledgeβ€”to accomplish specialized tasks. The training-apo-agents skill teaches Claude everything it needs to know about APO (Automatic Prompt Optimization): how to wrap your agent for training, design multi-criteria reward functions, configure beam search parameters, and handle the dozens of decisions that go into successful prompt optimization.

With this skill, you can tell Claude things like:

Optimize my travel booking agent's prompts to improve routing accuracy

And Claude will:

  1. Analyze your existing agent code
  2. Create a factory function to rebuild agents with new prompts
  3. Design a reward function matching your evaluation criteria
  4. Wrap your agent in a LitAgent for APO training
  5. Run smoke tests to verify the pipeline works
  6. Execute full training with beam search optimization
  7. Save optimized prompts and compare before/after performance

The optimization runs locally while you do other things. When it's done, your optimized prompts are saved to a timestamped Python file, ready to drop into production.

This isn't a toy demo. The skill supports single agents and multi-agent swarms, custom reward functions with explicit weights, LangSmith dataset integration, and parallel training runners. You can optimize routing agents, specialist agents, or entire hierarchical swarm systems.

Table of Contents


Setup and Install

Before starting, you'll need:

Install the Skill

For Claude Code users, add the skill to your project:

# Create the skill directory
mkdir -p .claude/skills/training-apo-agents

# Copy the skill file (or create from template)
# The skill lives at: .claude/skills/training-apo-agents/SKILL.md

Install Dependencies

# Install Agent-Lightning with APO support
uv add "agentlightning[apo]"

# You'll also need these for LangChain agents
uv add langchain langchain-openai langgraph langgraph-swarm

Environment Setup

# Required environment variables
export OPENROUTER_API_KEY=your_key_here    # For APO algorithm
export LANGSMITH_API_KEY=your_key_here     # Optional: for dataset loading
export LANGSMITH_TRACING=true              # Enable tracing

Your First Training Run

Let's walk through a complete example. I'll optimize prompts for a multi-agent travel booking system.

Instruct Claude to Optimize

Start with a clear instruction describing your agent and optimization goals:

I have a multi-agent travel system with triage, routing, and specialist agents.
The triage agent often routes to the wrong specialist. Optimize the prompts
to improve routing accuracy. My evaluation dataset is in LangSmith as 
"multi_agent_evals_dataset".

Claude analyzes your request and prepares the training infrastructure:

I'll optimize your travel system prompts using APO. Here's the plan:

Configuration:
- Algorithm: APO with beam search
- Hardware: Local CPU with 4 parallel runners
- Beam width: 4 prompts, 3 optimization rounds
- Estimated time: ~20-30 minutes

Components to create:
1. Prompt templates for all 6 agents
2. Factory function to rebuild swarm with new prompts
3. Reward function with routing accuracy as primary metric
4. LitAgent wrapper for APO training

Should I proceed with the smoke test first?

Review Before Running

Before any training, you'll see the reward function configuration:

# Explicit reward weights - adjust based on your priorities
REWARD_WEIGHTS = {
    "term_matching": 0.30,      # Output contains expected terms
    "tool_calls": 0.25,         # Correct tools were called
    "agent_routing": 0.30,      # Correct agent sequence
    "efficiency": 0.15,         # Step count within bounds
}

This is your chance to adjust priorities. Want routing accuracy to matter more? Increase agent_routing weight. Care more about output quality? Boost term_matching. Once you approve, Claude runs the smoke test.

Smoke Test First

The skill always runs a smoke test before full training:

uv run train_multi_agent.py --config small

smoke_test_run_screenshot

Claude verifies:

Full Training Run

After smoke test passes, run full optimization:

uv run train_multi_agent.py --config full

full_train_run_screenshot

The training process:

  1. Runs initial validation with seed prompts
  2. Executes rollouts to collect performance data
  3. Generates critiques of failed trajectories
  4. Rewrites prompts based on critiques
  5. Keeps top performers via beam search
  6. Repeats for multiple rounds

Check Results

When training completes, optimized prompts are saved:

Training complete!
Best optimized prompt template:
----------------------------------------
You are a triage agent for a travel company. Your ONLY job is to
analyze user intent and IMMEDIATELY route to specialists using
transfer tools...
----------------------------------------
Optimized prompts saved to: optimized_prompts_20260116_223615.py

optimized_prompt_screenshot


How APO Works

APO (Automatic Prompt Optimization) uses "textual gradients" to improve prompts iteratively. Unlike traditional gradient descent that updates numeric weights, APO:

  1. Runs Rollouts: Execute your agent with current prompts
  2. Analyzes Failures: Identify where the agent went wrong
  3. Generates Critiques: Create natural language "gradients" describing what to improve
  4. Rewrites Prompts: Apply critiques to produce better prompts
  5. Beam Search: Keep the top-performing variants
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚ Seed Prompt β”‚
                    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β–Ό            β–Ό            β–Ό
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚Variant1β”‚  β”‚Variant2β”‚  β”‚Variant3β”‚  ← Branch Factor
         β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
             β”‚           β”‚           β”‚
         [Evaluate]  [Evaluate]  [Evaluate]
             β”‚           β”‚           β”‚
             β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
                   β–Ό           β–Ό
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚ Top 2  β”‚  β”‚ Top 2  β”‚          ← Beam Width
              β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
                  β”‚           β”‚
                  β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
                        β–Ό
                 [Next Round]                  ← Beam Rounds

Key Parameters:

Parameter What It Does
beam_width Number of top prompts to keep each round
branch_factor Variants generated per prompt
beam_rounds Number of optimization iterations
gradient_batch_size Rollouts sampled for critique generation

What the skill does under the hood: Implementation Deep Dive

Step 1: Create Prompt Templates

Define the prompts you want to optimize:

# training/prompts.py
import agentlightning as agl

TRIAGE_PROMPT = """You are a triage agent for a travel company. Your job is to 
route users to the correct specialist.
Based on user intent, IMMEDIATELY call the appropriate transfer tool:
- For flight or hotel BOOKINGS β†’ call transfer_to_booking_router_agent
- For flight delay/cancellation CLAIMS β†’ call transfer_to_claims_agent
- For general INFO/FAQ questions β†’ call transfer_to_faq_agent
IMPORTANT: You MUST use the transfer tools. Do NOT respond with text."""

def get_initial_resources() -> dict[str, agl.PromptTemplate]:
    return {
        "triage_prompt": agl.PromptTemplate(
            template=TRIAGE_PROMPT,
            engine="f-string"
        ),
        # Add more prompts for multi-agent systems...
    }

Step 2: Create Agent Factory

Build a factory that reconstructs your agent with new prompts:

# training/swarm_factory.py
from langchain.agents import create_agent
from langgraph_swarm import create_handoff_tool, create_swarm
from langgraph.checkpoint.memory import MemorySaver
import agentlightning as agl

def create_swarm_with_prompts(resources: agl.NamedResources):
    """Rebuild swarm with optimized prompts."""
    llm = get_llm()
    
    # Extract prompts from resources
    triage_prompt = resources["triage_prompt"].format()
    router_prompt = resources["router_prompt"].format()
    
    # Create agents with current prompts
    triage_agent = create_agent(
        name="triage_agent",
        model=llm,
        system_prompt=triage_prompt,
        tools=[
            create_handoff_tool("booking_router_agent", "For bookings"),
            create_handoff_tool("claims_agent", "For claims"),
            create_handoff_tool("faq_agent", "For questions"),
        ],
    )
    
    # ... create other agents ...
    
    workflow = create_swarm(
        [triage_agent, ...],
        default_active_agent="triage_agent"
    )
    return workflow.compile(checkpointer=MemorySaver())

Step 3: Design Your Reward Function

The reward function is crucialβ€”it defines what "good" means:

# training/reward.py
from langchain_core.messages import AIMessage

REWARD_WEIGHTS = {
    "term_matching": 0.30,      # Output contains expected terms
    "tool_calls": 0.25,         # Correct tools were called
    "agent_routing": 0.30,      # Correct agent sequence
    "efficiency": 0.15,         # Step count within bounds
}

def extract_trajectory_info(result: dict) -> dict:
    """Extract what happened during agent execution."""
    messages = result.get("messages", [])
    
    # Get final output
    final_output = ""
    for msg in reversed(messages):
        if isinstance(msg, AIMessage) and msg.content:
            final_output = msg.content
            break
    
    # Get tool calls
    tool_calls = []
    for msg in messages:
        if isinstance(msg, AIMessage) and msg.tool_calls:
            tool_calls.extend([tc["name"] for tc in msg.tool_calls])
    
    # Get agent routing sequence
    agent_sequence = ["Triage Agent"]
    agent_mapping = {
        "transfer_to_flight_booking_agent": "Flight Agent",
        "transfer_to_claims_agent": "Claims Agent",
        # ... etc
    }
    for msg in messages:
        if isinstance(msg, AIMessage) and msg.tool_calls:
            for tc in msg.tool_calls:
                if tc["name"] in agent_mapping:
                    agent_sequence.append(agent_mapping[tc["name"]])
    
    return {
        "final_output": final_output,
        "tool_calls": tool_calls,
        "agent_sequence": agent_sequence,
        "step_count": sum(1 for m in messages if isinstance(m, AIMessage)),
    }

def calculate_reward(outputs: dict, reference: dict) -> float:
    """Calculate weighted composite reward."""
    rewards = {}
    
    # 1. Term matching
    expected_terms = reference.get("expected_terms", [])
    if expected_terms:
        final_output = outputs.get("final_output", "").lower()
        matched = sum(1 for t in expected_terms if t.lower() in final_output)
        rewards["term_matching"] = matched / len(expected_terms)
    
    # 2. Tool calls
    expected_tools = reference.get("expected_tool_calls", [])
    if expected_tools:
        actual_tools = outputs.get("tool_calls", [])
        matched = sum(1 for t in expected_tools if t in actual_tools)
        rewards["tool_calls"] = matched / len(expected_tools)
    
    # 3. Agent routing (for multi-agent)
    expected_agents = reference.get("expected_agent_sequence", [])
    if expected_agents:
        agent_sequence = outputs.get("agent_sequence", [])
        # Subsequence matching
        exp_idx = 0
        for agent in agent_sequence:
            if exp_idx < len(expected_agents) and agent == expected_agents[exp_idx]:
                exp_idx += 1
        rewards["agent_routing"] = exp_idx / len(expected_agents)
    
    # 4. Efficiency
    step_count = outputs.get("step_count", 0)
    min_steps, max_steps = reference.get("min_steps", 1), reference.get("max_steps", 10)
    if min_steps <= step_count <= max_steps:
        rewards["efficiency"] = 1.0
    else:
        rewards["efficiency"] = max(0.0, 1.0 - abs(step_count - max_steps) / max_steps)
    
    # Weighted sum
    total = sum(rewards.get(k, 0) * v for k, v in REWARD_WEIGHTS.items())
    return total / sum(REWARD_WEIGHTS.values())

Step 4: Implement LitAgent

Wrap your agent for APO training:

# training/lit_agent.py
import uuid
import agentlightning as agl
from typing import TypedDict

class TravelTask(TypedDict):
    prompt: str
    expected_terms: list[str]
    expected_tool_calls: list[str]
    expected_agent_sequence: list[str]
    min_steps: int
    max_steps: int

class TravelSwarmLitAgent(agl.LitAgent[TravelTask]):
    def __init__(self):
        super().__init__()
        self._initial_resources = get_initial_resources()
    
    def rollout(
        self, 
        task: TravelTask, 
        resources: agl.NamedResources, 
        rollout: agl.Rollout
    ) -> float:
        # Merge optimized prompts with fixed prompts
        full_resources = dict(self._initial_resources)
        full_resources.update(resources)
        
        # Build and run agent
        app = create_swarm_with_prompts(full_resources)
        thread_id = f"training-{uuid.uuid4()}"
        
        try:
            result = app.invoke(
                {"messages": [{"role": "user", "content": task["prompt"]}]},
                config={"configurable": {"thread_id": thread_id}}
            )
        except Exception as e:
            print(f"Agent error: {e}")
            return 0.0
        
        outputs = extract_trajectory_info(result)
        return calculate_reward(outputs, task)

Step 5: Create Training Script

CRITICAL for Mac: Set multiprocessing start method first!

# train_multi_agent.py

# ===== MUST BE FIRST - before any other imports =====
import multiprocessing
try:
    multiprocessing.set_start_method("fork", force=True)
except RuntimeError:
    pass
# ====================================================

import os
import argparse
from dotenv import load_dotenv
from openai import AsyncOpenAI
import agentlightning as agl

from training import get_initial_resources, TravelSwarmLitAgent
from training.config import get_config

load_dotenv()
os.environ["LANGSMITH_TRACING"] = "true"

def main(config_name: str = "full"):
    config = get_config(config_name)
    
    # Configure OpenRouter client
    client = AsyncOpenAI(
        api_key=os.getenv("OPENROUTER_API_KEY"),
        base_url="https://openrouter.ai/api/v1"
    )
    
    # Configure APO
    algo = agl.APO(
        client,
        beam_width=config.beam_width,
        branch_factor=config.branch_factor,
        beam_rounds=config.beam_rounds,
    )
    
    # Load dataset
    train_data, val_data = load_dataset(config.dataset_name)
    
    # Configure Trainer
    trainer = agl.Trainer(
        algorithm=algo,
        n_runners=config.n_runners,
        initial_resources=get_initial_resources(),
        adapter=agl.TraceToMessages(),
    )
    
    # Run training
    trainer.fit(
        agent=TravelSwarmLitAgent(),
        train_dataset=train_data,
        val_dataset=val_data,
    )
    
    # Get optimized prompt
    best_prompt = algo.get_best_prompt()
    print(f"Optimized: {best_prompt.template}")

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--config", choices=["small", "full"], default="full")
    args = parser.parse_args()
    main(args.config)

Configuration Presets

The skill includes two presets optimized for different use cases:

Small Config (Smoke Test)

SMALL_CONFIG = TrainingConfig(
    beam_width=1,
    branch_factor=1,
    beam_rounds=1,
    n_runners=1,
    train_split=0.5,
)

Use for:

Full Config (Production)

FULL_CONFIG = TrainingConfig(
    beam_width=4,
    branch_factor=4,
    beam_rounds=3,
    n_runners=4,
    train_split=0.7,
)

Use for:

Config beam_width branch_factor beam_rounds n_runners Estimated Time
small 1 1 1 1 ~5 minutes
full 4 4 3 4 ~25 minutes

Monitoring and Debugging

LangSmith Tracing

All rollouts are traced to LangSmith automatically:

langsmith_trace_screenshot

Watch for:

Debugging Rewards

Add logging to understand reward breakdown:

def calculate_reward(outputs: dict, reference: dict) -> float:
    rewards = {}
    
    # ... calculate individual rewards ...
    
    # Debug logging
    print(f"  term_matching: {rewards.get('term_matching', 0):.2f}")
    print(f"  tool_calls: {rewards.get('tool_calls', 0):.2f}")
    print(f"  agent_routing: {rewards.get('agent_routing', 0):.2f}")
    print(f"  efficiency: {rewards.get('efficiency', 0):.2f}")
    
    total = ...
    print(f"  TOTAL: {total:.2f}")
    return total

reward_breakdown_screenshot


Multi-Agent Systems

The skill shines with complex multi-agent swarms. Here's how it handles a 6-agent travel system:

Architecture

                          User Query
                              β”‚
                              β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  Triage Agent   β”‚ ← Prompt optimized
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
           β–Ό                 β–Ό                 β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚Booking Routerβ”‚  β”‚ Claims Agent β”‚  β”‚  FAQ Agent   β”‚
    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚                    All prompts optimized
     β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”
     β–Ό           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Flight  β”‚ β”‚  Hotel   β”‚
β”‚  Agent   β”‚ β”‚  Agent   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Tracking Agent Sequences

For multi-agent systems, reward routing accuracy:

# In reward.py
expected_agents = reference.get("expected_agent_sequence", [])
# e.g., ["Triage Agent", "Booking Router Agent", "Flight Agent"]

actual_sequence = outputs.get("agent_sequence", [])
# e.g., ["Triage Agent", "Booking Router Agent", "Flight Agent"]

# Check subsequence matching (order matters)
exp_idx = 0
for agent in actual_sequence:
    if exp_idx < len(expected_agents) and agent == expected_agents[exp_idx]:
        exp_idx += 1
routing_score = exp_idx / len(expected_agents)

Dataset Format

Each training example specifies the expected agent path:

task = {
    "prompt": "I want to book a flight to Paris",
    "expected_terms": ["flight", "paris"],
    "expected_tool_calls": ["search_flights"],
    "expected_agent_sequence": [
        "Triage Agent", 
        "Booking Router Agent", 
        "Flight Booking Agent"
    ],
    "min_steps": 3,
    "max_steps": 8,
}

Real Results

Here's what prompt optimization achieved on our travel booking system:

Before Optimization

TRIAGE_PROMPT = """You are a triage agent for a travel company. 
Your job is to route users to the correct specialist.
Based on user intent, IMMEDIATELY call the appropriate transfer tool:
- For flight or hotel BOOKINGS β†’ call transfer_to_booking_router_agent
- For claims β†’ call transfer_to_claims_agent
- For general questions β†’ call transfer_to_faq_agent"""

Metrics:

After APO Optimization

OPTIMIZED_TRIAGE_PROMPT = """You are a triage agent for TravelCo. 
Your ONLY job is to analyze user intent and IMMEDIATELY route.

ROUTING RULES (apply strictly in order):
1. BOOKING keywords (book, reserve, find, search + flight/hotel) 
   β†’ transfer_to_booking_router_agent
2. CLAIM keywords (delay, cancel, refund, compensation, claim)
   β†’ transfer_to_claims_agent  
3. ALL OTHER queries (info, question, policy, how, what, when)
   β†’ transfer_to_faq_agent

CRITICAL: Call transfer tool immediately. Never respond with text.
If uncertain between booking and claim, prefer claims_agent."""

Metrics:


What's Next

I've shown that Claude can handle the full lifecycle of prompt optimization: analyzing agents, designing rewards, configuring training, running experiments, and comparing results. This turns what used to require ML expertise into something you can do through conversation.

Some things to try:

The skill is designed to be extended. You can customize reward functions, add new metrics, or integrate with different agent frameworks.


Resources

Agent-Lightning:

LangChain/LangGraph:

This Skill:


This skill was built on 17th January 2026 to make prompt optimization accessible to anyone working with LangChain agents. If you have feedback or want to contribute improvements, the source is in your this github repository.