Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions examples/circle_packing_with_artifacts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -318,6 +318,45 @@ Target ratio: 0.9997314619131079 (99.97% of AlphaEvolve's result)

This demonstrates that OpenEvolve can successfully reproduce the results from the AlphaEvolve paper on this mathematical optimization problem.

## Fast Convergence with Dual-Model Configuration

Using a dual-model configuration with weighted sampling, OpenEvolve achieves near-optimal results in remarkably few iterations:

![Evolution Plot](evolution_plot.png)

### Configuration

The `config.yaml` uses two Gemini models with different weights:
- `google/gemini-2.5-flash-lite` (weight: 0.8) - Fast, cost-effective for exploration
- `google/gemini-2.5-flash` (weight: 0.2) - Higher capability for breakthroughs

```yaml
llm:
models:
- name: "google/gemini-2.5-flash-lite"
weight: 0.8
- name: "google/gemini-2.5-flash"
weight: 0.2
```

### Rapid Convergence

The plot shows the evolution of sum_radii across program versions:

- **Version 0**: Starts at ~0.96 (basic initial program)
- **Version 6**: First major improvement to ~2.09
- **Version 21**: Reaches 2.63 (99.8% of target)
- **Final**: Achieves 2.6304 sum of radii

**Key insight**: OpenEvolve discovers the mathematical optimization approach (using `scipy.optimize.minimize` with SLSQP) by version 21, achieving 99.8% of the AlphaEvolve target in just ~40 program evaluations. The dual-model approach allows rapid exploration with the lighter model while leveraging the more capable model for breakthrough discoveries.

### Why It Works

1. **Artifacts provide rich feedback**: Failed programs return detailed error information (boundary violations, overlaps), helping the LLM quickly correct mistakes
2. **MAP-Elites diversity**: The feature dimensions (`radius_variance`, `spatial_spread`) maintain diverse solutions in the population
3. **Island-based evolution**: 4 islands evolve independently, preventing premature convergence
4. **Efficient model weighting**: 80% lightweight model for broad exploration, 20% capable model for sophisticated solutions

## Key Observations

The evolution process demonstrated several interesting patterns:
Expand Down
90 changes: 90 additions & 0 deletions examples/circle_packing_with_artifacts/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Circle Packing Benchmark Configuration with Thompson Sampling
# Based on config_benchmark.yaml but uses two models with Thompson sampling

max_iterations: 500
checkpoint_interval: 10
log_level: "INFO"
random_seed: 42

# Full rewrite mode (best for constructor-based problems)
diff_based_evolution: false
max_code_length: 50000

# LLM Configuration - Two models
llm:
api_base: "https://openrouter.ai/api/v1"
models:
- name: "google/gemini-2.5-flash-lite"
weight: 0.8
- name: "google/gemini-2.5-flash"
weight: 0.2

temperature: 0.4
top_p: 0.95
max_tokens: 16000
timeout: 180
retries: 3

# Prompt Configuration
prompt:
system_message: |
You are an expert mathematician specializing in circle packing problems and computational geometry.
Your task is to improve a constructor function that places 26 circles in a unit square to maximize the sum of their radii.

Target: AlphaEvolve achieved sum of radii = 2.635 for n=26.

Key insights:
- This is a constrained optimization problem with many local minima
- Local optimization methods may get stuck - consider approaches that explore the solution space more broadly
- Multiple starting points or perturbation strategies can help find better solutions
- Good initial placements matter: hexagonal patterns, corner utilization, edge circles
- The problem has 78 degrees of freedom (26 centers + 26 radii)

Think about how to formulate this mathematically and what optimization strategies might help escape local minima.

num_top_programs: 3
num_diverse_programs: 2

# Artifacts enabled for debugging/visualization data
include_artifacts: true
max_artifact_bytes: 20480 # 20KB

# Database Configuration
database:
population_size: 100
archive_size: 50
num_islands: 4 # Optimal island count

# Selection parameters
elite_selection_ratio: 0.1
exploration_ratio: 0.4 # Higher exploration to discover optimization approaches
exploitation_ratio: 0.5 # Balance with exploitation

# Feature dimensions for MAP-Elites (diversity-focused metrics)
# - radius_variance: separates uniform vs varied circle sizes (0-1 normalized)
# - spatial_spread: separates clustered vs distributed centers (0-1 normalized)
feature_dimensions: ["radius_variance", "spatial_spread"]
feature_bins: 10

# Migration parameters - faster sharing of breakthroughs
migration_interval: 10 # Share discoveries sooner
migration_rate: 0.15 # Migrate more programs

# Evaluator Configuration
evaluator:
timeout: 600 # Allow complex optimization programs to complete
max_retries: 3
cascade_evaluation: true
cascade_thresholds: [0.5, 0.8]
parallel_evaluations: 4
use_llm_feedback: false
enable_artifacts: true

# Novelty Detection - prevent duplicate evaluations
novelty:
enabled: true
embedding_backend: "local"
embedding_model: "all-MiniLM-L6-v2"
similarity_threshold: 0.95
max_regeneration_attempts: 3
temperature_increment: 0.15
40 changes: 36 additions & 4 deletions examples/circle_packing_with_artifacts/evaluator.py
Original file line number Diff line number Diff line change
Expand Up @@ -237,6 +237,8 @@ def evaluate(program_path):
"validity": 0.0,
"eval_time": float(eval_time),
"combined_score": 0.0,
"radius_variance": 0.0,
"spatial_spread": 0.0,
},
artifacts={
"stderr": shape_error,
Expand All @@ -250,6 +252,20 @@ def evaluate(program_path):
# Calculate sum
sum_radii = np.sum(radii) if valid else 0.0

# Calculate feature metrics for MAP-Elites diversity
# radius_variance: normalized variance of radii (0-1)
# Max theoretical variance for radii in [0, 0.5] is ~0.0625
radius_variance = float(np.var(radii) / 0.0625) if valid else 0.0
radius_variance = min(1.0, max(0.0, radius_variance)) # Clamp to [0, 1]

# spatial_spread: how spread out centers are (0-1)
# Based on std of distances from centroid, normalized by max possible (0.5 * sqrt(2))
centroid = np.mean(centers, axis=0)
distances_from_centroid = np.sqrt(np.sum((centers - centroid) ** 2, axis=1))
max_spread = 0.5 * np.sqrt(2) # Max distance from center to corner
spatial_spread = float(np.std(distances_from_centroid) / max_spread) if valid else 0.0
spatial_spread = min(1.0, max(0.0, spatial_spread)) # Clamp to [0, 1]

# Make sure reported_sum matches the calculated sum
sum_mismatch = abs(sum_radii - reported_sum) > 1e-6
if sum_mismatch:
Expand Down Expand Up @@ -306,6 +322,8 @@ def evaluate(program_path):
"validity": float(validity),
"eval_time": float(eval_time),
"combined_score": float(combined_score),
"radius_variance": radius_variance,
"spatial_spread": spatial_spread,
},
artifacts=artifacts,
)
Expand All @@ -320,6 +338,8 @@ def evaluate(program_path):
"validity": 0.0,
"eval_time": 600.0, # Timeout duration
"combined_score": 0.0,
"radius_variance": 0.0,
"spatial_spread": 0.0,
},
artifacts={
"stderr": error_msg,
Expand All @@ -339,6 +359,8 @@ def evaluate(program_path):
"validity": 0.0,
"eval_time": 0.0,
"combined_score": 0.0,
"radius_variance": 0.0,
"spatial_spread": 0.0,
},
artifacts={
"stderr": error_msg,
Expand Down Expand Up @@ -374,7 +396,7 @@ def evaluate_stage1(program_path):
shape_error = f"Invalid shapes: centers={centers.shape}, radii={radii.shape}"
print(shape_error)
return EvaluationResult(
metrics={"validity": 0.0, "combined_score": 0.0},
metrics={"validity": 0.0, "combined_score": 0.0, "radius_variance": 0.0, "spatial_spread": 0.0},
artifacts={
"stderr": shape_error,
"failure_stage": "stage1_shape_validation",
Expand All @@ -389,6 +411,14 @@ def evaluate_stage1(program_path):
# Calculate sum
actual_sum = np.sum(radii) if valid else 0.0

# Calculate feature metrics for MAP-Elites diversity
radius_variance = float(np.var(radii) / 0.0625) if valid else 0.0
radius_variance = min(1.0, max(0.0, radius_variance))
centroid = np.mean(centers, axis=0)
distances_from_centroid = np.sqrt(np.sum((centers - centroid) ** 2, axis=1))
spatial_spread = float(np.std(distances_from_centroid) / (0.5 * np.sqrt(2))) if valid else 0.0
spatial_spread = min(1.0, max(0.0, spatial_spread))

# Target from paper
target = 2.635

Expand Down Expand Up @@ -424,6 +454,8 @@ def evaluate_stage1(program_path):
"sum_radii": float(actual_sum),
"target_ratio": float(actual_sum / target if valid else 0.0),
"combined_score": float(combined_score),
"radius_variance": radius_variance,
"spatial_spread": spatial_spread,
},
artifacts=artifacts,
)
Expand All @@ -432,7 +464,7 @@ def evaluate_stage1(program_path):
error_msg = f"Stage 1 evaluation timed out: {e}"
print(error_msg)
return EvaluationResult(
metrics={"validity": 0.0, "combined_score": 0.0},
metrics={"validity": 0.0, "combined_score": 0.0, "radius_variance": 0.0, "spatial_spread": 0.0},
artifacts={
"stderr": error_msg,
"failure_stage": "stage1_timeout",
Expand All @@ -445,7 +477,7 @@ def evaluate_stage1(program_path):
print(error_msg)
print(traceback.format_exc())
return EvaluationResult(
metrics={"validity": 0.0, "combined_score": 0.0},
metrics={"validity": 0.0, "combined_score": 0.0, "radius_variance": 0.0, "spatial_spread": 0.0},
artifacts={
"stderr": error_msg,
"traceback": traceback.format_exc(),
Expand All @@ -459,7 +491,7 @@ def evaluate_stage1(program_path):
print(error_msg)
print(traceback.format_exc())
return EvaluationResult(
metrics={"validity": 0.0, "combined_score": 0.0},
metrics={"validity": 0.0, "combined_score": 0.0, "radius_variance": 0.0, "spatial_spread": 0.0},
artifacts={
"stderr": error_msg,
"traceback": traceback.format_exc(),
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading