ProTeGi (Prompt optimization with Textual Gradients) systematically improves prompts by identifying failure patterns, generating targeted critiques, and applying specific fixes. It uses beam search to maintain multiple candidate prompts and progressively refines them.Documentation Index
Fetch the complete documentation index at: https://futureagi.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
When to Use ProTeGi
✅ Best For
- Debugging specific failure modes
- Systematic error correction
- Tasks with clear failure patterns
- Iterative refinement workflows
❌ Not Ideal For
- Quick experiments (multi-stage process)
- Tasks where failures are random
- Very small datasets
- Budget-constrained projects
How It Works
ProTeGi follows a structured expansion and selection process:Generate Critiques
Teacher model analyzes failures and generates multiple specific critiques (“gradients”)
ProTeGi maintains a “beam” of candidate prompts throughout optimization, preventing premature convergence to local optima.
Basic Usage
Underlying Research
ProTeGi introduces a novel, gradient-inspired approach to prompt optimization, adapting concepts from numerical optimization to natural language.- Core Paper: The method originates from the paper “Automatic Prompt Optimization with “Gradient Descent” and Beam Search”, which details how to create “textual gradients” (critiques) to guide prompt improvement.
- Extensions: The core idea has been extended in subsequent research, such as “Momentum-Aided Gradient Descent Prompt Optimization”, which incorporates momentum to accelerate convergence.
- Classification: In surveys on automatic prompt engineering, ProTeGi is categorized as a pioneering gradient-based method for its innovative approach to error-driven refinement.
Configuration Parameters
Core Parameters
Powerful model for generating critiques and improved prompts. Recommended:
gpt-4o, claude-3-opus.Number of distinct critiques to generate for each prompt. More gradients = more diverse improvement directions.
Number of failed examples shown to teacher when generating each critique. Higher = more context but more expensive.
Number of new prompts to generate from each critique. Set to 2-3 for more exploration.
Number of top-performing prompts to keep each round. Larger beam = more diversity but slower.