Meta-Prompt uses a powerful teacher LLM to analyze how your prompt performs, understand why it fails on specific examples, formulate hypotheses about improvements, and completely rewrite the prompt. This approach is inspired by theDocumentation Index
Fetch the complete documentation index at: https://futureagi.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
promptim library and excels at tasks requiring deep reasoning.
When to Use Meta-Prompt
✅ Best For
- Complex reasoning tasks
- Tasks where understanding failures helps
- Refining well-scoped prompts
- Deep iterative improvement
❌ Not Ideal For
- Quick experiments (slower)
- Simple classification tasks
- Very large datasets (costly)
- Tasks with unclear failure patterns
How It Works
Meta-Prompt follows a systematic analysis-and-rewrite cycle:Unlike optimizers that tweak parts of a prompt, Meta-Prompt rewrites the entire prompt each iteration based on deep analysis.
Basic Usage
Configuration Parameters
Core Parameters
A powerful language model used for analyzing failures and generating improved prompts. Recommended:
gpt-4o, gpt-4-turbo, or claude-3-opus.Description of what you want the optimized prompt to achieve. More specific descriptions lead to better results.
Number of analysis-and-rewrite iterations. More rounds can lead to better results but cost more.
Number of examples to evaluate each round. Smaller = faster but less reliable signal.
The Meta-Prompt Process
What the Teacher Model Sees
In each round, the teacher model receives:- Current Prompt - The prompt being evaluated
- Previous Failed Attempts - Prompts that performed worse (to avoid repeating mistakes)
- Performance Data - Detailed results showing which examples failed and why
- Task Description - Your goal for the optimization
What the Teacher Model Returns
The teacher provides two things:Underlying Research
The Meta-Prompt optimizer is inspired by meta-learning and reflective AI systems, where a model improves its own processes.- Meta-Learning: The core idea is formalized in research like “System Prompt Optimization with Meta-Learning”, which uses bilevel optimization. Another related work is “metaTextGrad”, which optimizes both prompts and their surrounding structures.
- Industry Tools: This reflective approach is used in tools like Google’s Vertex AI Prompt Optimizer and is a key feature in advanced models for self-improvement.
- Frameworks: The concept is explored in libraries like
promptimand is classified in surveys as a leading LLM-driven optimization method.
Advanced Examples
With Detailed Task Description
With More Rounds for Complex Tasks
Combining with Other Optimizers
Use Meta-Prompt for deep refinement after initial exploration:Understanding the Results
Tracking Hypothesis Evolution
Meta-Prompt’s hypotheses show its reasoning process:Analyzing Improvement Patterns
Performance Tips
Use a powerful teacher model
Use a powerful teacher model
Meta-Prompt’s quality depends heavily on the teacher model’s reasoning ability. Use
gpt-4o, claude-3-opus, or similar high-end models.Provide detailed task descriptions
Provide detailed task descriptions
Specific task descriptions help the teacher make targeted improvements. Include constraints, desired output format, and edge cases to handle.
Start with 5 rounds
Start with 5 rounds
5 rounds is usually enough for meaningful improvement. Increase to 7-10 only for very complex tasks where you see continued progress.
Balance eval subset size
Balance eval subset size
- Too small (< 20): Unreliable signal, may optimize for noise
- Too large (> 50): Slow and expensive
- Sweet spot: 30-40 examples
Analyze failed examples
Analyze failed examples
Look at low-scoring examples in each round to understand what the optimizer is trying to fix:
Common Patterns
Complex Reasoning Tasks
Creative Writing with Constraints
Data Transformation Tasks
Troubleshooting
Scores plateau after few rounds
Scores plateau after few rounds
Problem: Improvement stops after 2-3 roundsSolution:
- Your initial prompt might already be good - check if score is already high
- Make task description more specific to guide further refinement
- Try a different teacher model for fresh perspective
- Increase
eval_subset_sizefor more reliable signal
Prompts become too verbose
Prompts become too verbose
Problem: Each iteration adds more instructions, making prompts unwieldySolution:
- Add to task description: “Keep the prompt concise and under 200 words”
- Manually select a mid-optimization prompt that balances quality and length
- Use fewer rounds (3-4 instead of 7-8)
High API costs
High API costs
Problem: Optimization is expensive with GPT-4Solution:
- Reduce
num_roundsto 3-5 - Decrease
eval_subset_sizeto 20-30 - Use
gpt-4o-minias teacher for initial experiments - Run on a smaller dataset subset first to validate approach
Inconsistent improvements
Inconsistent improvements
Problem: Score goes up and down between roundsSolution:
- Increase
eval_subset_sizefor more stable measurements - Check if your evaluation metric is too noisy
- Ensure dataset examples are high-quality and representative
- Consider using a different evaluation metric
Comparison with Other Optimizers
| Aspect | Meta-Prompt | Bayesian Search | ProTeGi |
|---|---|---|---|
| Approach | Analysis & rewrite | Few-shot selection | Error-driven fixing |
| Best for | Complex reasoning | Structured tasks | Systematic debugging |
| Speed | Medium | Fast | Slow |
| Prompt changes | Complete rewrites | Example selection | Targeted edits |
| Teacher dependency | High | Medium | High |
Next Steps
Try ProTeGi
For more systematic error analysis
Compare All Optimizers
See which optimizer fits your needs