Deep Dive: Automatic Prompt Optimization with AgentLightning
An exploration of Microsoft's AgentLightning library and how to save optimized prompts after training
Introduction
In this blog post, we'll explore AgentLightning, a framework by Microsoft for training and optimizing AI agents. Specifically, we'll deep-dive into the APO (Automatic Prompt Optimization) algorithm, understand its architecture, and learn how to persist the best-optimized prompt template after training completes.
This exploration came from a practical need: after running an expensive prompt optimization training loop, how do we save the best prompt for future use?
Table of Contents
- What is AgentLightning?
- The APO Algorithm
- Understanding the Trainer Architecture
- Execution Strategy: How State is Managed
- The PromptTemplate Resource
- Saving the Best Prompt After Training
- Complete Working Example
- Key Takeaways
What is AgentLightning?
AgentLightning is Microsoft's framework for training AI agents through various optimization algorithms. It provides:
- Algorithms: Different optimization strategies (APO, Baseline, etc.)
- Trainer: High-level orchestration for wiring algorithms, runners, and stores
- Runners: Execute agents and collect rollout data
- Store: Manages tasks, traces, and resources
- Execution Strategies: Handle process management (shared memory, client-server)
The library follows a modular design where components can be swapped without changing the core training logic.
The APO Algorithm
Overview
APO (Automatic Prompt Optimization) uses textual gradients and beam search to iteratively improve prompts. It's based on ideas from:
How APO Works
The algorithm operates in rounds, where each round:
- Samples parent prompts from the current beam
- Generates new prompts by computing textual gradients and applying edits
- Evaluates all candidates on a validation set
- Selects top-k prompts for the next round
Key Configuration Parameters
From the source code at agentlightning/algorithm/apo/apo.py:
class APO(Algorithm, Generic[T_task]):
def __init__(
self,
async_openai_client: AsyncOpenAI,
*,
gradient_model: str = "gpt-5-mini", # Model for computing critiques
apply_edit_model: str = "gpt-4.1-mini", # Model for applying edits
diversity_temperature: float = 1.0, # Temperature for diversity
gradient_batch_size: int = 4, # Samples for gradient computation
val_batch_size: int = 16, # Validation batch size
beam_width: int = 4, # Top-k prompts to keep
branch_factor: int = 4, # New candidates per parent
beam_rounds: int = 3, # Number of optimization rounds
rollout_batch_timeout: float = 3600.0, # Timeout for rollouts
run_initial_validation: bool = True, # Establish baseline score
):
Internal State Tracking
APO maintains its optimization history through instance variables:
self._history_best_prompt: Optional[PromptTemplate] = None
self._history_best_score: float = float("-inf")
self._history_best_version: Optional[str] = None
The get_best_prompt() Method
APO exposes a method to retrieve the optimized prompt:
def get_best_prompt(self) -> PromptTemplate:
"""
Retrieve the best prompt discovered during optimization.
Returns:
The prompt template with the highest validation score found so far.
Raises:
ValueError: If no best prompt has been found yet (run() not called).
"""
if self._history_best_prompt is None:
raise ValueError("No best prompt found")
return self._history_best_prompt
Understanding the Trainer Architecture
The Trainer Class
The Trainer class (agentlightning/trainer/trainer.py) is the high-level orchestration layer that wires:
- Algorithm lifecycle: Instantiates algorithms, attaches stores and adapters
- Runner fleet: Spawns runners that execute agents and collect traces
- Execution strategy: Delegates process management
- Telemetry plumbing: Ensures tracers and adapters flow data to the store
Key Trainer Attributes
class Trainer(TrainerLegacy):
algorithm: Optional[Algorithm]
"""Algorithm instance for training."""
store: LightningStore
"""Store for tasks and traces."""
runner: Runner[Any]
"""Runner for executing the agent."""
initial_resources: Optional[NamedResources]
"""Bootstrap resources handed to the algorithm."""
n_runners: int
"""Number of parallel agent runners."""
strategy: ExecutionStrategy
"""Execution strategy for spawning algorithm and runners."""
The fit() Method
def fit(
self,
agent: LitAgent[T_co],
train_dataset: Optional[Dataset[T_co]] = None,
*,
val_dataset: Optional[Dataset[T_co]] = None,
) -> None:
"""Execute the full algorithm/runner training loop."""
agent.set_trainer(self)
algorithm_bundle = functools.partial(
self._algorithm_bundle,
train_dataset=train_dataset,
val_dataset=val_dataset,
algorithm=self.algorithm,
)
runner_bundle = functools.partial(self._runner_bundle, agent=agent)
self.strategy.execute(algorithm_bundle, runner_bundle, self.store)
Execution Strategy: How State is Managed
ClientServerExecutionStrategy
The default execution strategy is ClientServerExecutionStrategy (.venv/lib/python3.12/site-packages/agentlightning/execution/client_server.py).
It supports three roles:
"algorithm": Run only the algorithm with an HTTP store server"runner": Connect to an existing server and run agents"both": Run both algorithm and runners (default)
Why the Algorithm Object Remains Stateful
This is critical for saving the best prompt. By default:
role = "both"main_process = "algorithm"
This means:
if self.main_process == "algorithm":
logger.info("Spawning runner processes...")
processes = self._spawn_runners(runner, store, stop_evt, ctx=ctx)
try:
logger.info("Running algorithm...")
asyncio.run(self._execute_algorithm(algorithm, store, stop_evt)) # Main process!
finally:
stop_evt.set()
The algorithm runs on the main process, so its state (including _history_best_prompt) is preserved after trainer.fit() completes.
⚠️ Important: If
main_process="runner"were used, the algorithm would run in a child process, and state would be isolated:"When
main_process == "runner"the algorithm and HTTP server execute in a child process. Store mutations remain isolated inside that process, so the original store instance passed to execute() is not updated."
The PromptTemplate Resource
Definition
From agentlightning/types/resources.py:
class PromptTemplate(Resource):
"""Resource describing a reusable prompt template."""
resource_type: Literal["prompt_template"] = "prompt_template"
template: str
"""The template string. The format depends on the engine."""
engine: Literal["jinja", "f-string", "poml"]
"""The templating engine to use for rendering the prompt."""
def format(self, **kwargs: Any) -> str:
"""Format the prompt using keyword arguments."""
if self.engine == "f-string":
return self.template.format(**kwargs)
else:
raise NotImplementedError(
"Formatting prompt templates for non-f-string engines is not supported yet."
)
PromptTemplate is a Pydantic Model
PromptTemplate inherits from Resource, which inherits from pydantic.BaseModel. This means:
- It has built-in serialization (
model_dump(),model_dump_json()) - It validates data on construction
- It's immutable by default
Saving the Best Prompt After Training
The Solution
Since the APO algorithm object maintains state on the main process, we can access it directly after training:
def save_best_prompt(algo: APO, filename_prefix: str = "best_prompt") -> str:
"""Save the best prompt template from the APO algorithm to a file with timestamp.
Args:
algo: The APO algorithm instance after training has completed.
filename_prefix: Prefix for the output filename.
Returns:
The filename where the best prompt was saved.
"""
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"{filename_prefix}_{timestamp}.json"
try:
best_prompt = algo.get_best_prompt()
prompt_data = {
"template": best_prompt.template,
"engine": best_prompt.engine,
}
best_score = algo._history_best_score
best_version = algo._history_best_version
except ValueError as e:
logging.warning(f"Could not retrieve best prompt: {e}")
prompt_data = {}
best_score = None
best_version = None
output = {
"timestamp": timestamp,
"best_prompt_template": prompt_data,
"best_score": best_score,
"best_version": best_version,
}
with open(filename, "w") as f:
json.dump(output, f, indent=2)
logging.info(f"Best prompt template saved to: {filename}")
return filename
Output Format
The saved JSON file looks like:
{
"timestamp": "20260112_143052",
"best_prompt_template": {
"template": "Find a room on {date} at {time} for {duration_min} minutes...",
"engine": "f-string"
},
"best_score": 0.85,
"best_version": "v7"
}
Key Takeaways
1. APO Algorithm State is Accessible Post-Training
Because the default execution strategy runs the algorithm on the main process, the algo object retains its state after trainer.fit() completes.
2. Use algo.get_best_prompt() to Retrieve Optimized Prompts
The APO class provides a clean API method:
best_prompt = algo.get_best_prompt() # Returns PromptTemplate
3. Access Internal Metrics for Logging
For detailed output, you can access:
algo._history_best_score- The highest validation score achievedalgo._history_best_version- Version identifier of the best prompt
4. PromptTemplate is a Pydantic Model
Serialization is straightforward:
prompt_data = {
"template": best_prompt.template,
"engine": best_prompt.engine,
}
5. Execution Strategy Matters for State Management
If you change the execution strategy or use main_process="runner", the algorithm state will not be accessible from the main process.
6. Always Add Timestamps to Saved Files
For experiment tracking, include timestamps in your output filenames to avoid overwriting previous results.
References
| Component | Source File |
|---|---|
| APO Algorithm | agentlightning/algorithm/apo/apo.py |
| Trainer | agentlightning/trainer/trainer.py |
| ClientServerExecutionStrategy | agentlightning/execution/client_server.py |
| PromptTemplate | agentlightning/types/resources.py |
| Resource Types | agentlightning/types/__init__.py |
Further Reading
- AgentLightning Documentation
- ProTeGi Paper - Gradient-based prompt optimization
- TextGrad - Textual gradient framework
Written during an exploration session on January 12, 2026