Random things I do

Deep Dive: Automatic Prompt Optimization with AgentLightning

An exploration of Microsoft's AgentLightning library and how to save optimized prompts after training


Introduction

In this blog post, we'll explore AgentLightning, a framework by Microsoft for training and optimizing AI agents. Specifically, we'll deep-dive into the APO (Automatic Prompt Optimization) algorithm, understand its architecture, and learn how to persist the best-optimized prompt template after training completes.

This exploration came from a practical need: after running an expensive prompt optimization training loop, how do we save the best prompt for future use?


Table of Contents

  1. What is AgentLightning?
  2. The APO Algorithm
  3. Understanding the Trainer Architecture
  4. Execution Strategy: How State is Managed
  5. The PromptTemplate Resource
  6. Saving the Best Prompt After Training
  7. Complete Working Example
  8. Key Takeaways

What is AgentLightning?

AgentLightning is Microsoft's framework for training AI agents through various optimization algorithms. It provides:

The library follows a modular design where components can be swapped without changing the core training logic.


The APO Algorithm

Overview

APO (Automatic Prompt Optimization) uses textual gradients and beam search to iteratively improve prompts. It's based on ideas from:

How APO Works

The algorithm operates in rounds, where each round:

  1. Samples parent prompts from the current beam
  2. Generates new prompts by computing textual gradients and applying edits
  3. Evaluates all candidates on a validation set
  4. Selects top-k prompts for the next round

Key Configuration Parameters

From the source code at agentlightning/algorithm/apo/apo.py:

class APO(Algorithm, Generic[T_task]):
    def __init__(
        self,
        async_openai_client: AsyncOpenAI,
        *,
        gradient_model: str = "gpt-5-mini",      # Model for computing critiques
        apply_edit_model: str = "gpt-4.1-mini",  # Model for applying edits
        diversity_temperature: float = 1.0,      # Temperature for diversity
        gradient_batch_size: int = 4,            # Samples for gradient computation
        val_batch_size: int = 16,                # Validation batch size
        beam_width: int = 4,                     # Top-k prompts to keep
        branch_factor: int = 4,                  # New candidates per parent
        beam_rounds: int = 3,                    # Number of optimization rounds
        rollout_batch_timeout: float = 3600.0,   # Timeout for rollouts
        run_initial_validation: bool = True,     # Establish baseline score
    ):

Internal State Tracking

APO maintains its optimization history through instance variables:

self._history_best_prompt: Optional[PromptTemplate] = None
self._history_best_score: float = float("-inf")
self._history_best_version: Optional[str] = None

The get_best_prompt() Method

APO exposes a method to retrieve the optimized prompt:

def get_best_prompt(self) -> PromptTemplate:
    """
    Retrieve the best prompt discovered during optimization.

    Returns:
        The prompt template with the highest validation score found so far.

    Raises:
        ValueError: If no best prompt has been found yet (run() not called).
    """
    if self._history_best_prompt is None:
        raise ValueError("No best prompt found")
    return self._history_best_prompt

Understanding the Trainer Architecture

The Trainer Class

The Trainer class (agentlightning/trainer/trainer.py) is the high-level orchestration layer that wires:

Key Trainer Attributes

class Trainer(TrainerLegacy):
    algorithm: Optional[Algorithm]
    """Algorithm instance for training."""

    store: LightningStore
    """Store for tasks and traces."""

    runner: Runner[Any]
    """Runner for executing the agent."""

    initial_resources: Optional[NamedResources]
    """Bootstrap resources handed to the algorithm."""

    n_runners: int
    """Number of parallel agent runners."""

    strategy: ExecutionStrategy
    """Execution strategy for spawning algorithm and runners."""

The fit() Method

def fit(
    self,
    agent: LitAgent[T_co],
    train_dataset: Optional[Dataset[T_co]] = None,
    *,
    val_dataset: Optional[Dataset[T_co]] = None,
) -> None:
    """Execute the full algorithm/runner training loop."""
    agent.set_trainer(self)

    algorithm_bundle = functools.partial(
        self._algorithm_bundle,
        train_dataset=train_dataset,
        val_dataset=val_dataset,
        algorithm=self.algorithm,
    )
    runner_bundle = functools.partial(self._runner_bundle, agent=agent)

    self.strategy.execute(algorithm_bundle, runner_bundle, self.store)

Execution Strategy: How State is Managed

ClientServerExecutionStrategy

The default execution strategy is ClientServerExecutionStrategy (.venv/lib/python3.12/site-packages/agentlightning/execution/client_server.py).

It supports three roles:

Why the Algorithm Object Remains Stateful

This is critical for saving the best prompt. By default:

This means:

if self.main_process == "algorithm":
    logger.info("Spawning runner processes...")
    processes = self._spawn_runners(runner, store, stop_evt, ctx=ctx)
    try:
        logger.info("Running algorithm...")
        asyncio.run(self._execute_algorithm(algorithm, store, stop_evt))  # Main process!
    finally:
        stop_evt.set()

The algorithm runs on the main process, so its state (including _history_best_prompt) is preserved after trainer.fit() completes.

⚠️ Important: If main_process="runner" were used, the algorithm would run in a child process, and state would be isolated:

"When main_process == "runner" the algorithm and HTTP server execute in a child process. Store mutations remain isolated inside that process, so the original store instance passed to execute() is not updated."


The PromptTemplate Resource

Definition

From agentlightning/types/resources.py:

class PromptTemplate(Resource):
    """Resource describing a reusable prompt template."""

    resource_type: Literal["prompt_template"] = "prompt_template"
    template: str
    """The template string. The format depends on the engine."""
    engine: Literal["jinja", "f-string", "poml"]
    """The templating engine to use for rendering the prompt."""

    def format(self, **kwargs: Any) -> str:
        """Format the prompt using keyword arguments."""
        if self.engine == "f-string":
            return self.template.format(**kwargs)
        else:
            raise NotImplementedError(
                "Formatting prompt templates for non-f-string engines is not supported yet."
            )

PromptTemplate is a Pydantic Model

PromptTemplate inherits from Resource, which inherits from pydantic.BaseModel. This means:


Saving the Best Prompt After Training

The Solution

Since the APO algorithm object maintains state on the main process, we can access it directly after training:

def save_best_prompt(algo: APO, filename_prefix: str = "best_prompt") -> str:
    """Save the best prompt template from the APO algorithm to a file with timestamp.
    
    Args:
        algo: The APO algorithm instance after training has completed.
        filename_prefix: Prefix for the output filename.
        
    Returns:
        The filename where the best prompt was saved.
    """
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    filename = f"{filename_prefix}_{timestamp}.json"
    
    try:
        best_prompt = algo.get_best_prompt()
        prompt_data = {
            "template": best_prompt.template,
            "engine": best_prompt.engine,
        }
        best_score = algo._history_best_score
        best_version = algo._history_best_version
    except ValueError as e:
        logging.warning(f"Could not retrieve best prompt: {e}")
        prompt_data = {}
        best_score = None
        best_version = None
    
    output = {
        "timestamp": timestamp,
        "best_prompt_template": prompt_data,
        "best_score": best_score,
        "best_version": best_version,
    }
    
    with open(filename, "w") as f:
        json.dump(output, f, indent=2)
    
    logging.info(f"Best prompt template saved to: {filename}")
    return filename

Output Format

The saved JSON file looks like:

{
  "timestamp": "20260112_143052",
  "best_prompt_template": {
    "template": "Find a room on {date} at {time} for {duration_min} minutes...",
    "engine": "f-string"
  },
  "best_score": 0.85,
  "best_version": "v7"
}

Key Takeaways

1. APO Algorithm State is Accessible Post-Training

Because the default execution strategy runs the algorithm on the main process, the algo object retains its state after trainer.fit() completes.

2. Use algo.get_best_prompt() to Retrieve Optimized Prompts

The APO class provides a clean API method:

best_prompt = algo.get_best_prompt()  # Returns PromptTemplate

3. Access Internal Metrics for Logging

For detailed output, you can access:

4. PromptTemplate is a Pydantic Model

Serialization is straightforward:

prompt_data = {
    "template": best_prompt.template,
    "engine": best_prompt.engine,
}

5. Execution Strategy Matters for State Management

If you change the execution strategy or use main_process="runner", the algorithm state will not be accessible from the main process.

6. Always Add Timestamps to Saved Files

For experiment tracking, include timestamps in your output filenames to avoid overwriting previous results.


References

Component Source File
APO Algorithm agentlightning/algorithm/apo/apo.py
Trainer agentlightning/trainer/trainer.py
ClientServerExecutionStrategy agentlightning/execution/client_server.py
PromptTemplate agentlightning/types/resources.py
Resource Types agentlightning/types/__init__.py

Further Reading


Written during an exploration session on January 12, 2026