algorithmicsuperintelligence · codelion · Dec 23, 2025 · Dec 23, 2025 · Dec 23, 2025 · Dec 23, 2025
diff --git a/examples/circle_packing_with_artifacts/README.md b/examples/circle_packing_with_artifacts/README.md
@@ -318,6 +318,45 @@ Target ratio: 0.9997314619131079 (99.97% of AlphaEvolve's result)
 
 This demonstrates that OpenEvolve can successfully reproduce the results from the AlphaEvolve paper on this mathematical optimization problem.
 
+## Fast Convergence with Dual-Model Configuration
+
+Using a dual-model configuration with weighted sampling, OpenEvolve achieves near-optimal results in remarkably few iterations:
+
+![Evolution Plot](evolution_plot.png)
+
+### Configuration
+
+The `config.yaml` uses two Gemini models with different weights:
+- `google/gemini-2.5-flash-lite` (weight: 0.8) - Fast, cost-effective for exploration
+- `google/gemini-2.5-flash` (weight: 0.2) - Higher capability for breakthroughs
+
+```yaml
+llm:
+  models:
+    - name: "google/gemini-2.5-flash-lite"
+      weight: 0.8
+    - name: "google/gemini-2.5-flash"
+      weight: 0.2
+```
+
+### Rapid Convergence
+
+The plot shows the evolution of sum_radii across program versions:
+
+- **Version 0**: Starts at ~0.96 (basic initial program)
+- **Version 6**: First major improvement to ~2.09
+- **Version 21**: Reaches 2.63 (99.8% of target)
+- **Final**: Achieves 2.6304 sum of radii
+
+**Key insight**: OpenEvolve discovers the mathematical optimization approach (using `scipy.optimize.minimize` with SLSQP) by version 21, achieving 99.8% of the AlphaEvolve target in just ~40 program evaluations. The dual-model approach allows rapid exploration with the lighter model while leveraging the more capable model for breakthrough discoveries.
+
+### Why It Works
+
+1. **Artifacts provide rich feedback**: Failed programs return detailed error information (boundary violations, overlaps), helping the LLM quickly correct mistakes
+2. **MAP-Elites diversity**: The feature dimensions (`radius_variance`, `spatial_spread`) maintain diverse solutions in the population
+3. **Island-based evolution**: 4 islands evolve independently, preventing premature convergence
+4. **Efficient model weighting**: 80% lightweight model for broad exploration, 20% capable model for sophisticated solutions
+
 ## Key Observations
 
 The evolution process demonstrated several interesting patterns:

diff --git a/examples/circle_packing_with_artifacts/config.yaml b/examples/circle_packing_with_artifacts/config.yaml
@@ -0,0 +1,90 @@
+# Circle Packing Benchmark Configuration with Thompson Sampling
+# Based on config_benchmark.yaml but uses two models with Thompson sampling
+
+max_iterations: 500
+checkpoint_interval: 10
+log_level: "INFO"
+random_seed: 42
+
+# Full rewrite mode (best for constructor-based problems)
+diff_based_evolution: false
+max_code_length: 50000
+
+# LLM Configuration - Two models
+llm:
+  api_base: "https://openrouter.ai/api/v1"
+  models:
+    - name: "google/gemini-2.5-flash-lite"
+      weight: 0.8
+    - name: "google/gemini-2.5-flash"
+      weight: 0.2
+
+  temperature: 0.4
+  top_p: 0.95
+  max_tokens: 16000
+  timeout: 180
+  retries: 3
+
+# Prompt Configuration
+prompt:
+  system_message: |
+    You are an expert mathematician specializing in circle packing problems and computational geometry.
+    Your task is to improve a constructor function that places 26 circles in a unit square to maximize the sum of their radii.
+
+    Target: AlphaEvolve achieved sum of radii = 2.635 for n=26.
+
+    Key insights:
+    - This is a constrained optimization problem with many local minima
+    - Local optimization methods may get stuck - consider approaches that explore the solution space more broadly
+    - Multiple starting points or perturbation strategies can help find better solutions
+    - Good initial placements matter: hexagonal patterns, corner utilization, edge circles
+    - The problem has 78 degrees of freedom (26 centers + 26 radii)
+
+    Think about how to formulate this mathematically and what optimization strategies might help escape local minima.
+
+  num_top_programs: 3
+  num_diverse_programs: 2
+
+  # Artifacts enabled for debugging/visualization data
+  include_artifacts: true
+  max_artifact_bytes: 20480  # 20KB
+
+# Database Configuration
+database:
+  population_size: 100
+  archive_size: 50
+  num_islands: 4  # Optimal island count
+
+  # Selection parameters
+  elite_selection_ratio: 0.1
+  exploration_ratio: 0.4   # Higher exploration to discover optimization approaches
+  exploitation_ratio: 0.5  # Balance with exploitation
+
+  # Feature dimensions for MAP-Elites (diversity-focused metrics)
+  # - radius_variance: separates uniform vs varied circle sizes (0-1 normalized)
+  # - spatial_spread: separates clustered vs distributed centers (0-1 normalized)
+  feature_dimensions: ["radius_variance", "spatial_spread"]
+  feature_bins: 10
+
+  # Migration parameters - faster sharing of breakthroughs
+  migration_interval: 10  # Share discoveries sooner
+  migration_rate: 0.15    # Migrate more programs
+
+# Evaluator Configuration
+evaluator:
+  timeout: 600  # Allow complex optimization programs to complete
+  max_retries: 3
+  cascade_evaluation: true
+  cascade_thresholds: [0.5, 0.8]
+  parallel_evaluations: 4
+  use_llm_feedback: false
+  enable_artifacts: true
+
+# Novelty Detection - prevent duplicate evaluations
+novelty:
+  enabled: true
+  embedding_backend: "local"
+  embedding_model: "all-MiniLM-L6-v2"
+  similarity_threshold: 0.95
+  max_regeneration_attempts: 3
+  temperature_increment: 0.15
diff --git a/examples/circle_packing_with_artifacts/evaluator.py b/examples/circle_packing_with_artifacts/evaluator.py
@@ -237,6 +237,8 @@ def evaluate(program_path):
                     "validity": 0.0,
                     "eval_time": float(eval_time),
                     "combined_score": 0.0,
+                    "radius_variance": 0.0,
+                    "spatial_spread": 0.0,
                 },
                 artifacts={
                     "stderr": shape_error,
@@ -250,6 +252,20 @@ def evaluate(program_path):
         # Calculate sum
         sum_radii = np.sum(radii) if valid else 0.0
 
+        # Calculate feature metrics for MAP-Elites diversity
+        # radius_variance: normalized variance of radii (0-1)
+        # Max theoretical variance for radii in [0, 0.5] is ~0.0625
+        radius_variance = float(np.var(radii) / 0.0625) if valid else 0.0
+        radius_variance = min(1.0, max(0.0, radius_variance))  # Clamp to [0, 1]
+
+        # spatial_spread: how spread out centers are (0-1)
+        # Based on std of distances from centroid, normalized by max possible (0.5 * sqrt(2))
+        centroid = np.mean(centers, axis=0)
+        distances_from_centroid = np.sqrt(np.sum((centers - centroid) ** 2, axis=1))
+        max_spread = 0.5 * np.sqrt(2)  # Max distance from center to corner
+        spatial_spread = float(np.std(distances_from_centroid) / max_spread) if valid else 0.0
+        spatial_spread = min(1.0, max(0.0, spatial_spread))  # Clamp to [0, 1]
+
         # Make sure reported_sum matches the calculated sum
         sum_mismatch = abs(sum_radii - reported_sum) > 1e-6
         if sum_mismatch:
@@ -306,6 +322,8 @@ def evaluate(program_path):
                 "validity": float(validity),
                 "eval_time": float(eval_time),
                 "combined_score": float(combined_score),
+                "radius_variance": radius_variance,
+                "spatial_spread": spatial_spread,
             },
             artifacts=artifacts,
         )
@@ -320,6 +338,8 @@ def evaluate(program_path):
                 "validity": 0.0,
                 "eval_time": 600.0,  # Timeout duration
                 "combined_score": 0.0,
+                "radius_variance": 0.0,
+                "spatial_spread": 0.0,
             },
             artifacts={
                 "stderr": error_msg,
@@ -339,6 +359,8 @@ def evaluate(program_path):
                 "validity": 0.0,
                 "eval_time": 0.0,
                 "combined_score": 0.0,
+                "radius_variance": 0.0,
+                "spatial_spread": 0.0,
             },
             artifacts={
                 "stderr": error_msg,
@@ -374,7 +396,7 @@ def evaluate_stage1(program_path):
                 shape_error = f"Invalid shapes: centers={centers.shape}, radii={radii.shape}"
                 print(shape_error)
                 return EvaluationResult(
-                    metrics={"validity": 0.0, "combined_score": 0.0},
+                    metrics={"validity": 0.0, "combined_score": 0.0, "radius_variance": 0.0, "spatial_spread": 0.0},
                     artifacts={
                         "stderr": shape_error,
                         "failure_stage": "stage1_shape_validation",
@@ -389,6 +411,14 @@ def evaluate_stage1(program_path):
             # Calculate sum
             actual_sum = np.sum(radii) if valid else 0.0
 
+            # Calculate feature metrics for MAP-Elites diversity
+            radius_variance = float(np.var(radii) / 0.0625) if valid else 0.0
+            radius_variance = min(1.0, max(0.0, radius_variance))
+            centroid = np.mean(centers, axis=0)
+            distances_from_centroid = np.sqrt(np.sum((centers - centroid) ** 2, axis=1))
+            spatial_spread = float(np.std(distances_from_centroid) / (0.5 * np.sqrt(2))) if valid else 0.0
+            spatial_spread = min(1.0, max(0.0, spatial_spread))
+
             # Target from paper
             target = 2.635
 
@@ -424,6 +454,8 @@ def evaluate_stage1(program_path):
                     "sum_radii": float(actual_sum),
                     "target_ratio": float(actual_sum / target if valid else 0.0),
                     "combined_score": float(combined_score),
+                    "radius_variance": radius_variance,
+                    "spatial_spread": spatial_spread,
                 },
                 artifacts=artifacts,
             )
@@ -432,7 +464,7 @@ def evaluate_stage1(program_path):
             error_msg = f"Stage 1 evaluation timed out: {e}"
             print(error_msg)
             return EvaluationResult(
-                metrics={"validity": 0.0, "combined_score": 0.0},
+                metrics={"validity": 0.0, "combined_score": 0.0, "radius_variance": 0.0, "spatial_spread": 0.0},
                 artifacts={
                     "stderr": error_msg,
                     "failure_stage": "stage1_timeout",
@@ -445,7 +477,7 @@ def evaluate_stage1(program_path):
             print(error_msg)
             print(traceback.format_exc())
             return EvaluationResult(
-                metrics={"validity": 0.0, "combined_score": 0.0},
+                metrics={"validity": 0.0, "combined_score": 0.0, "radius_variance": 0.0, "spatial_spread": 0.0},
                 artifacts={
                     "stderr": error_msg,
                     "traceback": traceback.format_exc(),
@@ -459,7 +491,7 @@ def evaluate_stage1(program_path):
         print(error_msg)
         print(traceback.format_exc())
         return EvaluationResult(
-            metrics={"validity": 0.0, "combined_score": 0.0},
+            metrics={"validity": 0.0, "combined_score": 0.0, "radius_variance": 0.0, "spatial_spread": 0.0},
             artifacts={
                 "stderr": error_msg,
                 "traceback": traceback.format_exc(),

diff --git a/examples/circle_packing_with_artifacts/evolution_plot.png b/examples/circle_packing_with_artifacts/evolution_plot.png