Deep Dive: Automatic Prompt Optimization with AgentLightning

12 Jan, 2026

An exploration of Microsoft's AgentLightning library and how to save optimized prompts after training

Introduction

In this blog post, we'll explore AgentLightning, a framework by Microsoft for training and optimizing AI agents. Specifically, we'll deep-dive into the APO (Automatic Prompt Optimization) algorithm, understand its architecture, and learn how to persist the best-optimized prompt template after training completes.

This exploration came from a practical need: after running an expensive prompt optimization training loop, how do we save the best prompt for future use?

What is AgentLightning?
The APO Algorithm
Understanding the Trainer Architecture
Execution Strategy: How State is Managed
The PromptTemplate Resource
Saving the Best Prompt After Training
Complete Working Example
Key Takeaways

What is AgentLightning?

AgentLightning is Microsoft's framework for training AI agents through various optimization algorithms. It provides:

Algorithms: Different optimization strategies (APO, Baseline, etc.)
Trainer: High-level orchestration for wiring algorithms, runners, and stores
Runners: Execute agents and collect rollout data
Store: Manages tasks, traces, and resources
Execution Strategies: Handle process management (shared memory, client-server)

The library follows a modular design where components can be swapped without changing the core training logic.

The APO Algorithm

Overview

APO (Automatic Prompt Optimization) uses textual gradients and beam search to iteratively improve prompts. It's based on ideas from:

How APO Works

The algorithm operates in rounds, where each round:

Samples parent prompts from the current beam
Generates new prompts by computing textual gradients and applying edits
Evaluates all candidates on a validation set
Selects top-k prompts for the next round

Key Configuration Parameters

From the source code at agentlightning/algorithm/apo/apo.py:

class APO(Algorithm, Generic[T_task]):
    def __init__(
        self,
        async_openai_client: AsyncOpenAI,
        *,
        gradient_model: str = "gpt-5-mini",      # Model for computing critiques
        apply_edit_model: str = "gpt-4.1-mini",  # Model for applying edits
        diversity_temperature: float = 1.0,      # Temperature for diversity
        gradient_batch_size: int = 4,            # Samples for gradient computation
        val_batch_size: int = 16,                # Validation batch size
        beam_width: int = 4,                     # Top-k prompts to keep
        branch_factor: int = 4,                  # New candidates per parent
        beam_rounds: int = 3,                    # Number of optimization rounds
        rollout_batch_timeout: float = 3600.0,   # Timeout for rollouts
        run_initial_validation: bool = True,     # Establish baseline score
    ):

Internal State Tracking

APO maintains its optimization history through instance variables:

self._history_best_prompt: Optional[PromptTemplate] = None
self._history_best_score: float = float("-inf")
self._history_best_version: Optional[str] = None

The `get_best_prompt()` Method

APO exposes a method to retrieve the optimized prompt:

def get_best_prompt(self) -> PromptTemplate:
    """
    Retrieve the best prompt discovered during optimization.

    Returns:
        The prompt template with the highest validation score found so far.

    Raises:
        ValueError: If no best prompt has been found yet (run() not called).
    """
    if self._history_best_prompt is None:
        raise ValueError("No best prompt found")
    return self._history_best_prompt

Understanding the Trainer Architecture

The Trainer Class

The Trainer class (agentlightning/trainer/trainer.py) is the high-level orchestration layer that wires:

Algorithm lifecycle: Instantiates algorithms, attaches stores and adapters
Runner fleet: Spawns runners that execute agents and collect traces
Execution strategy: Delegates process management
Telemetry plumbing: Ensures tracers and adapters flow data to the store

Key Trainer Attributes

class Trainer(TrainerLegacy):
    algorithm: Optional[Algorithm]
    """Algorithm instance for training."""

    store: LightningStore
    """Store for tasks and traces."""

    runner: Runner[Any]
    """Runner for executing the agent."""

    initial_resources: Optional[NamedResources]
    """Bootstrap resources handed to the algorithm."""

    n_runners: int
    """Number of parallel agent runners."""

    strategy: ExecutionStrategy
    """Execution strategy for spawning algorithm and runners."""

The `fit()` Method

def fit(
    self,
    agent: LitAgent[T_co],
    train_dataset: Optional[Dataset[T_co]] = None,
    *,
    val_dataset: Optional[Dataset[T_co]] = None,
) -> None:
    """Execute the full algorithm/runner training loop."""
    agent.set_trainer(self)

    algorithm_bundle = functools.partial(
        self._algorithm_bundle,
        train_dataset=train_dataset,
        val_dataset=val_dataset,
        algorithm=self.algorithm,
    )
    runner_bundle = functools.partial(self._runner_bundle, agent=agent)

    self.strategy.execute(algorithm_bundle, runner_bundle, self.store)

Execution Strategy: How State is Managed

ClientServerExecutionStrategy

The default execution strategy is ClientServerExecutionStrategy (.venv/lib/python3.12/site-packages/agentlightning/execution/client_server.py).

It supports three roles:

"algorithm": Run only the algorithm with an HTTP store server
"runner": Connect to an existing server and run agents
"both": Run both algorithm and runners (default)

Why the Algorithm Object Remains Stateful

This is critical for saving the best prompt. By default:

role = "both"
main_process = "algorithm"

This means:

if self.main_process == "algorithm":
    logger.info("Spawning runner processes...")
    processes = self._spawn_runners(runner, store, stop_evt, ctx=ctx)
    try:
        logger.info("Running algorithm...")
        asyncio.run(self._execute_algorithm(algorithm, store, stop_evt))  # Main process!
    finally:
        stop_evt.set()

The algorithm runs on the main process, so its state (including _history_best_prompt) is preserved after trainer.fit() completes.

⚠️ Important: If main_process="runner" were used, the algorithm would run in a child process, and state would be isolated:

"When main_process == "runner" the algorithm and HTTP server execute in a child process. Store mutations remain isolated inside that process, so the original store instance passed to execute() is not updated."

The PromptTemplate Resource

Definition

From agentlightning/types/resources.py:

class PromptTemplate(Resource):
    """Resource describing a reusable prompt template."""

    resource_type: Literal["prompt_template"] = "prompt_template"
    template: str
    """The template string. The format depends on the engine."""
    engine: Literal["jinja", "f-string", "poml"]
    """The templating engine to use for rendering the prompt."""

    def format(self, **kwargs: Any) -> str:
        """Format the prompt using keyword arguments."""
        if self.engine == "f-string":
            return self.template.format(**kwargs)
        else:
            raise NotImplementedError(
                "Formatting prompt templates for non-f-string engines is not supported yet."
            )

PromptTemplate is a Pydantic Model

PromptTemplate inherits from Resource, which inherits from pydantic.BaseModel. This means:

It has built-in serialization (model_dump(), model_dump_json())
It validates data on construction
It's immutable by default

Saving the Best Prompt After Training

The Solution

Since the APO algorithm object maintains state on the main process, we can access it directly after training:

def save_best_prompt(algo: APO, filename_prefix: str = "best_prompt") -> str:
    """Save the best prompt template from the APO algorithm to a file with timestamp.
    
    Args:
        algo: The APO algorithm instance after training has completed.
        filename_prefix: Prefix for the output filename.
        
    Returns:
        The filename where the best prompt was saved.
    """
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    filename = f"{filename_prefix}_{timestamp}.json"
    
    try:
        best_prompt = algo.get_best_prompt()
        prompt_data = {
            "template": best_prompt.template,
            "engine": best_prompt.engine,
        }
        best_score = algo._history_best_score
        best_version = algo._history_best_version
    except ValueError as e:
        logging.warning(f"Could not retrieve best prompt: {e}")
        prompt_data = {}
        best_score = None
        best_version = None
    
    output = {
        "timestamp": timestamp,
        "best_prompt_template": prompt_data,
        "best_score": best_score,
        "best_version": best_version,
    }
    
    with open(filename, "w") as f:
        json.dump(output, f, indent=2)
    
    logging.info(f"Best prompt template saved to: {filename}")
    return filename

Output Format

The saved JSON file looks like:

{
  "timestamp": "20260112_143052",
  "best_prompt_template": {
    "template": "Find a room on {date} at {time} for {duration_min} minutes...",
    "engine": "f-string"
  },
  "best_score": 0.85,
  "best_version": "v7"
}

Key Takeaways

1. APO Algorithm State is Accessible Post-Training

Because the default execution strategy runs the algorithm on the main process, the algo object retains its state after trainer.fit() completes.

2. Use `algo.get_best_prompt()` to Retrieve Optimized Prompts

The APO class provides a clean API method:

best_prompt = algo.get_best_prompt()  # Returns PromptTemplate

3. Access Internal Metrics for Logging

For detailed output, you can access:

algo._history_best_score - The highest validation score achieved
algo._history_best_version - Version identifier of the best prompt

4. PromptTemplate is a Pydantic Model

Serialization is straightforward:

prompt_data = {
    "template": best_prompt.template,
    "engine": best_prompt.engine,
}

5. Execution Strategy Matters for State Management

If you change the execution strategy or use main_process="runner", the algorithm state will not be accessible from the main process.

6. Always Add Timestamps to Saved Files

For experiment tracking, include timestamps in your output filenames to avoid overwriting previous results.

References

Component	Source File
APO Algorithm	`agentlightning/algorithm/apo/apo.py`
Trainer	`agentlightning/trainer/trainer.py`
ClientServerExecutionStrategy	`agentlightning/execution/client_server.py`
PromptTemplate	`agentlightning/types/resources.py`
Resource Types	`agentlightning/types/__init__.py`

Random things I do

Deep Dive: Automatic Prompt Optimization with AgentLightning

Introduction

Table of Contents

What is AgentLightning?

The APO Algorithm

Overview

How APO Works

Key Configuration Parameters

Internal State Tracking

The `get_best_prompt()` Method

Understanding the Trainer Architecture

The Trainer Class

Key Trainer Attributes

The `fit()` Method

Execution Strategy: How State is Managed

ClientServerExecutionStrategy

Why the Algorithm Object Remains Stateful

The PromptTemplate Resource

Definition

PromptTemplate is a Pydantic Model

Saving the Best Prompt After Training

The Solution

Output Format

Key Takeaways

1. APO Algorithm State is Accessible Post-Training

2. Use `algo.get_best_prompt()` to Retrieve Optimized Prompts

3. Access Internal Metrics for Logging

4. PromptTemplate is a Pydantic Model

5. Execution Strategy Matters for State Management

6. Always Add Timestamps to Saved Files

References

Further Reading

Deep Dive: Automatic Prompt Optimization with AgentLightning

Introduction

Table of Contents

What is AgentLightning?

The APO Algorithm

Overview

How APO Works

Key Configuration Parameters

Internal State Tracking

The get_best_prompt() Method

Understanding the Trainer Architecture

The Trainer Class

Key Trainer Attributes

The fit() Method

Execution Strategy: How State is Managed

ClientServerExecutionStrategy

Why the Algorithm Object Remains Stateful

The PromptTemplate Resource

Definition

PromptTemplate is a Pydantic Model

Saving the Best Prompt After Training

The Solution

Output Format

Key Takeaways

1. APO Algorithm State is Accessible Post-Training

2. Use algo.get_best_prompt() to Retrieve Optimized Prompts

3. Access Internal Metrics for Logging

4. PromptTemplate is a Pydantic Model

5. Execution Strategy Matters for State Management

6. Always Add Timestamps to Saved Files

References

Further Reading

The `get_best_prompt()` Method

The `fit()` Method

2. Use `algo.get_best_prompt()` to Retrieve Optimized Prompts