Meta-Prompt Optimizer

Meta-Prompt uses a powerful teacher LLM to analyze how your prompt performs, understand why it fails on specific examples, formulate hypotheses about improvements, and completely rewrite the prompt. This approach is inspired by the promptim library and excels at tasks requiring deep reasoning.

When to Use Meta-Prompt

✅ Best For

Complex reasoning tasks
Tasks where understanding failures helps
Refining well-scoped prompts
Deep iterative improvement

❌ Not Ideal For

Quick experiments (slower)
Simple classification tasks
Very large datasets (costly)
Tasks with unclear failure patterns

How It Works

Meta-Prompt follows a systematic analysis-and-rewrite cycle:

Evaluate Current Prompt

Run the current prompt on a subset of your dataset and collect scores

Identify Failures

Focus on examples with low scores to understand what went wrong

Formulate Hypothesis

Teacher model analyzes failures and proposes a specific improvement theory

Rewrite Prompt

Generate a complete new prompt implementing the hypothesis

Repeat

Continue for multiple rounds, building on previous insights

Unlike optimizers that tweak parts of a prompt, Meta-Prompt rewrites the entire prompt each iteration based on deep analysis.

Basic Usage

from fi.opt.optimizers import MetaPromptOptimizer
from fi.opt.generators import LiteLLMGenerator
from fi.opt.datamappers import BasicDataMapper
from fi.opt.base.evaluator import Evaluator

# Setup teacher model (use a powerful model for analysis)
teacher = LiteLLMGenerator(
    model="gpt-4o",
    prompt_template="{prompt}"
)

# Setup evaluator
evaluator = Evaluator(
    eval_template="summary_quality",
    eval_model_name="turing_flash",
    fi_api_key="your_key",
    fi_secret_key="your_secret"
)

# Setup data mapper
data_mapper = BasicDataMapper(
    key_map={"input": "text", "output": "generated_output"}
)

# Create optimizer
optimizer = MetaPromptOptimizer(
    teacher_generator=teacher
)

# Run optimization
result = optimizer.optimize(
    evaluator=evaluator,
    data_mapper=data_mapper,
    dataset=dataset,
    initial_prompts=["Summarize this text: {text}"],
    task_description="Create concise, informative summaries",
    num_rounds=5,
    eval_subset_size=40
)

print(f"Improvement: {result.final_score:.2%}")
print(f"Best prompt:\n{result.best_generator.get_prompt_template()}")

Configuration Parameters

Core Parameters

teacher_generator

LiteLLMGenerator

required

A powerful language model used for analyzing failures and generating improved prompts. Recommended: gpt-4o, gpt-4-turbo, or claude-3-opus.

teacher = LiteLLMGenerator("gpt-4o", "{prompt}")

task_description

str

default:"I want to improve my prompt."

Description of what you want the optimized prompt to achieve. More specific descriptions lead to better results.

task_description="Generate summaries that capture key points while being under 50 words"

num_rounds

int

default:"5"

Number of analysis-and-rewrite iterations. More rounds can lead to better results but cost more.

eval_subset_size

int

default:"40"

Number of examples to evaluate each round. Smaller = faster but less reliable signal.

The Meta-Prompt Process

What the Teacher Model Sees

In each round, the teacher model receives:

Current Prompt - The prompt being evaluated
Previous Failed Attempts - Prompts that performed worse (to avoid repeating mistakes)
Performance Data - Detailed results showing which examples failed and why
Task Description - Your goal for the optimization

What the Teacher Model Returns

The teacher provides two things:

{
  "hypothesis": "The prompt fails on complex multi-sentence texts because it doesn't specify a structure. Adding explicit instruction to identify main points first should improve clarity.",
  "improved_prompt": "First identify the 2-3 main points in the following text. Then write a single concise sentence that captures these points:\n\n{text}"
}

Underlying Research

The Meta-Prompt optimizer is inspired by meta-learning and reflective AI systems, where a model improves its own processes.

Meta-Learning: The core idea is formalized in research like “System Prompt Optimization with Meta-Learning”, which uses bilevel optimization. Another related work is “metaTextGrad”, which optimizes both prompts and their surrounding structures.
Industry Tools: This reflective approach is used in tools like Google’s Vertex AI Prompt Optimizer and is a key feature in advanced models for self-improvement.
Frameworks: The concept is explored in libraries like promptim and is classified in surveys as a leading LLM-driven optimization method.

Advanced Examples

With Detailed Task Description

optimizer = MetaPromptOptimizer(
    teacher_generator=LiteLLMGenerator("gpt-4o", "{prompt}")
)

result = optimizer.optimize(
    evaluator=evaluator,
    data_mapper=data_mapper,
    dataset=dataset,
    initial_prompts=[initial_prompt],
    
    # Provide detailed context
    task_description="""
    I want to extract structured information from customer support tickets.
    The prompt should:
    - Identify the main issue
    - Extract customer sentiment (positive/negative/neutral)
    - Determine urgency level (low/medium/high)
    - Suggest appropriate department routing
    
    The output must be in JSON format and handle incomplete information gracefully.
    """,
    
    num_rounds=7,
    eval_subset_size=30
)

With More Rounds for Complex Tasks

# For very complex tasks, use more rounds
optimizer = MetaPromptOptimizer(
    teacher_generator=LiteLLMGenerator("gpt-4o", "{prompt}")
)

result = optimizer.optimize(
    evaluator=evaluator,
    data_mapper=data_mapper,
    dataset=complex_dataset,
    initial_prompts=[initial_prompt],
    task_description=detailed_description,
    num_rounds=10,  # More iterations for complex refinement
    eval_subset_size=50  # More examples for reliable signal
)

# Analyze the evolution
for i, iteration in enumerate(result.history):
    print(f"\nRound {i+1} Score: {iteration.average_score:.4f}")
    print(f"Prompt: {iteration.prompt[:150]}...")

Combining with Other Optimizers

Use Meta-Prompt for deep refinement after initial exploration:

# Stage 1: Quick exploration
random_result = random_search_optimizer.optimize(...)

# Stage 2: Deep refinement on best candidate
meta_optimizer = MetaPromptOptimizer(
    teacher_generator=LiteLLMGenerator("gpt-4o", "{prompt}")
)

final_result = meta_optimizer.optimize(
    evaluator=evaluator,
    data_mapper=data_mapper,
    dataset=dataset,
    initial_prompts=[random_result.best_generator.get_prompt_template()],
    task_description="Refine for clarity and consistency",
    num_rounds=5
)

Understanding the Results

Tracking Hypothesis Evolution

Meta-Prompt’s hypotheses show its reasoning process:

result = optimizer.optimize(...)

# View the optimization journey
for i, iteration in enumerate(result.history):
    print(f"\n{'='*60}")
    print(f"Round {i+1}")
    print(f"Score: {iteration.average_score:.4f}")
    print(f"\nPrompt:\n{iteration.prompt}")
    
    # Note: Hypothesis is internal to teacher model, 
    # but you can infer it from prompt evolution

Analyzing Improvement Patterns

scores = [iteration.average_score for iteration in result.history]

import matplotlib.pyplot as plt
plt.plot(scores, marker='o')
plt.xlabel('Round')
plt.ylabel('Score')
plt.title('Meta-Prompt Optimization Progress')
plt.show()

# Calculate improvement
initial_score = scores[0]
final_score = scores[-1]
improvement = ((final_score - initial_score) / initial_score) * 100
print(f"Total improvement: {improvement:.1f}%")

Performance Tips

Use a powerful teacher model

Meta-Prompt’s quality depends heavily on the teacher model’s reasoning ability. Use gpt-4o, claude-3-opus, or similar high-end models.

Provide detailed task descriptions

Specific task descriptions help the teacher make targeted improvements. Include constraints, desired output format, and edge cases to handle.

Start with 5 rounds

5 rounds is usually enough for meaningful improvement. Increase to 7-10 only for very complex tasks where you see continued progress.

Balance eval subset size

Too small (< 20): Unreliable signal, may optimize for noise
Too large (> 50): Slow and expensive
Sweet spot: 30-40 examples

Analyze failed examples

Look at low-scoring examples in each round to understand what the optimizer is trying to fix:

for iteration in result.history:
    failures = [r for r in iteration.individual_results if r.score < 0.5]
    print(f"Round failures: {len(failures)}")
    for f in failures[:3]:  # Show first 3
        print(f"  - Score: {f.score:.2f}, Reason: {f.reason}")

Common Patterns

Complex Reasoning Tasks

dataset = [
    {
        "problem": "Multi-step math word problem...",
        "solution": "Step-by-step solution..."
    }
]

optimizer = MetaPromptOptimizer(
    teacher_generator=LiteLLMGenerator("gpt-4o", "{prompt}")
)

result = optimizer.optimize(
    evaluator=evaluator,
    data_mapper=BasicDataMapper({
        "input": "problem",
        "output": "generated_output"
    }),
    dataset=dataset,
    initial_prompts=["Solve this problem: {problem}"],
    task_description="""
    Generate step-by-step solutions that:
    - Show clear reasoning at each step
    - Explain why each step is necessary
    - Arrive at the correct final answer
    """,
    num_rounds=8
)

Creative Writing with Constraints

optimizer = MetaPromptOptimizer(
    teacher_generator=LiteLLMGenerator("gpt-4o", "{prompt}")
)

result = optimizer.optimize(
    evaluator=evaluator,
    data_mapper=data_mapper,
    dataset=creative_dataset,
    initial_prompts=["Write a story based on: {prompt}"],
    task_description="""
    Generate engaging short stories (200-300 words) that:
    - Have a clear beginning, middle, and end
    - Include vivid sensory details
    - Match the tone specified in the prompt
    - Are appropriate for a general audience
    """,
    num_rounds=6,
    eval_subset_size=25
)

Data Transformation Tasks

optimizer = MetaPromptOptimizer(
    teacher_generator=LiteLLMGenerator("gpt-4o", "{prompt}")
)

result = optimizer.optimize(
    evaluator=evaluator,
    data_mapper=data_mapper,
    dataset=transformation_dataset,
    initial_prompts=["Convert this data: {input_data}"],
    task_description="""
    Transform unstructured text into JSON format with these fields:
    - name (string)
    - date (YYYY-MM-DD format)
    - amount (number)
    - category (one of: personal, business, travel)
    
    Handle missing fields by using null. Infer dates from context when possible.
    """,
    num_rounds=5
)

Troubleshooting

Scores plateau after few rounds

Problem: Improvement stops after 2-3 roundsSolution:

Your initial prompt might already be good - check if score is already high
Make task description more specific to guide further refinement
Try a different teacher model for fresh perspective
Increase eval_subset_size for more reliable signal

Prompts become too verbose

Problem: Each iteration adds more instructions, making prompts unwieldySolution:

Add to task description: “Keep the prompt concise and under 200 words”
Manually select a mid-optimization prompt that balances quality and length
Use fewer rounds (3-4 instead of 7-8)

High API costs

Problem: Optimization is expensive with GPT-4Solution:

Reduce num_rounds to 3-5
Decrease eval_subset_size to 20-30
Use gpt-4o-mini as teacher for initial experiments
Run on a smaller dataset subset first to validate approach

Inconsistent improvements

Problem: Score goes up and down between roundsSolution:

Increase eval_subset_size for more stable measurements
Check if your evaluation metric is too noisy
Ensure dataset examples are high-quality and representative
Consider using a different evaluation metric

Comparison with Other Optimizers

Aspect	Meta-Prompt	Bayesian Search	ProTeGi
Approach	Analysis & rewrite	Few-shot selection	Error-driven fixing
Best for	Complex reasoning	Structured tasks	Systematic debugging
Speed	Medium	Fast	Slow
Prompt changes	Complete rewrites	Example selection	Targeted edits
Teacher dependency	High	Medium	High

When to Use Meta-Prompt

✅ Best For

❌ Not Ideal For

How It Works

Basic Usage

Configuration Parameters

Core Parameters

The Meta-Prompt Process

What the Teacher Model Sees

What the Teacher Model Returns

Underlying Research

Advanced Examples

With Detailed Task Description

With More Rounds for Complex Tasks

Combining with Other Optimizers

Understanding the Results

Tracking Hypothesis Evolution

Analyzing Improvement Patterns

Performance Tips

Common Patterns

Complex Reasoning Tasks

Creative Writing with Constraints

Data Transformation Tasks

Troubleshooting

Comparison with Other Optimizers

Next Steps

Try ProTeGi

Compare All Optimizers

​When to Use Meta-Prompt

✅ Best For

❌ Not Ideal For

​How It Works

​Basic Usage

​Configuration Parameters

​Core Parameters

​The Meta-Prompt Process

​What the Teacher Model Sees

​What the Teacher Model Returns

​Underlying Research

​Advanced Examples

​With Detailed Task Description

​With More Rounds for Complex Tasks

​Combining with Other Optimizers

​Understanding the Results

​Tracking Hypothesis Evolution

​Analyzing Improvement Patterns

​Performance Tips

​Common Patterns

​Complex Reasoning Tasks

​Creative Writing with Constraints

​Data Transformation Tasks

​Troubleshooting

​Comparison with Other Optimizers

​Next Steps

Try ProTeGi

Compare All Optimizers

When to Use Meta-Prompt

How It Works

Basic Usage

Configuration Parameters

Core Parameters

The Meta-Prompt Process

What the Teacher Model Sees

What the Teacher Model Returns

Underlying Research

Advanced Examples

With Detailed Task Description

With More Rounds for Complex Tasks

Combining with Other Optimizers

Understanding the Results

Tracking Hypothesis Evolution

Analyzing Improvement Patterns

Performance Tips

Common Patterns

Complex Reasoning Tasks

Creative Writing with Constraints

Data Transformation Tasks

Troubleshooting

Comparison with Other Optimizers

Next Steps