inter-co · mragan2 · Dec 12, 2025 · Dec 12, 2025 · Dec 12, 2025 · Dec 12, 2025
diff --git a/.gitignore b/.gitignore
@@ -210,4 +210,18 @@ __marimo__/
 
 # debug
 debug/
-mock/
+mock/
+
+# CodeEvolve run outputs / local env
+experiments/
+.conda/
+
+# API Keys and Secrets
+# NEVER commit API keys or credentials
+.api_keys
+*api_keys
+*.api_keys
+**/api_keys.sh
+**/.api_keys
+problems/.api_keys
+.codeevolve_api_keys
diff --git a/OPTIMIZATIONS.md b/OPTIMIZATIONS.md
@@ -0,0 +1,215 @@
+# CodeEvolve Optimizations and Future Enhancements
+
+This document summarizes the optimizations implemented and provides suggestions for future improvements to make CodeEvolve a world-class code evolution framework.
+
+## Implemented Optimizations
+
+### 1. Database Performance (database.py)
+
+**Problem**: The original implementation performed a full O(N log N) sort on every program insertion, which becomes a bottleneck as the population grows.
+
+**Solution**: Implemented incremental cache updates using the `bisect` module for O(log N) insertions:
+- Added `_incremental_update_cache()` method that uses binary search to find insertion points
+- Maintains a sorted list of `(-fitness, pid)` tuples
+- Only updates ranks for affected programs (those at or after the insertion point)
+
+**Impact**: Reduces insertion time from O(N log N) to O(log N), significantly improving performance for large populations.
+
+**Code Location**: `src/codeevolve/database.py:397-421`
+
+### 2. Memory Management (evaluator.py)
+
+**Problem**: Program stdout/stderr can be very large, potentially causing memory issues in long-running evolutionary processes.
+
+**Solution**: Added optional output size limits:
+- New `max_output_size` parameter in Evaluator constructor
+- Truncates output to specified size when enabled
+- Default behavior (no storage) preserved for backward compatibility
+
+**Impact**: Prevents memory exhaustion while maintaining debugging capability when needed.
+
+**Code Location**: `src/codeevolve/evaluator.py:79, 276-283`
+
+### 3. Build System Compatibility
+
+**Problem**: Python version requirement was too restrictive (>=3.13.5), preventing installation on most systems.
+
+**Solution**: Relaxed requirement to >=3.10, which is widely available and supports all features used in the codebase.
+
+**Code Location**: `pyproject.toml:10`
+
+## Documentation Improvements
+
+### Enhanced TODOs with Implementation Guidance
+
+1. **Sandboxing Enhancement** (evaluator.py:26-31)
+   - Documented options: Firejail, Docker, systemd-nspawn, seccomp
+   - Current implementation uses subprocess isolation with resource limits
+
+2. **Local LM Support** (lm.py:25-31)
+   - Documented integration strategies for open-source models
+   - Suggested frameworks: llama-cpp-python, vllm, HuggingFace, Ollama
+
+3. **Async Migration** (islands.py:255-263)
+   - Explained benefits of asynchronous migration without barriers
+   - Documented implementation considerations and tradeoffs
+
+## Recommended Future Optimizations
+
+### High Priority
+
+#### 1. Parallel Program Evaluation
+**Current State**: Programs are evaluated sequentially within each island.
+
+**Optimization**: Implement parallel evaluation using `asyncio` or `multiprocessing`:
+```python
+# Pseudo-code example
+async def evaluate_batch(programs: List[Program], evaluator: Evaluator):
+    tasks = [asyncio.create_subprocess_exec(...) for prog in programs]
+    results = await asyncio.gather(*tasks)
+    return results
+```
+
+**Expected Impact**: 2-10x speedup depending on available CPU cores.
+
+#### 2. LLM Request Batching
+**Current State**: LLM requests are made one at a time.
+
+**Optimization**: Batch multiple LLM requests when possible:
+- Collect multiple programs needing evolution
+- Send batch requests to LLM API
+- Most APIs support parallel processing of multiple prompts
+
+**Expected Impact**: Reduced API latency, better token efficiency, 1.5-3x throughput improvement.
+
+#### 3. Caching and Memoization
+**Current State**: No caching of previously evaluated programs or LLM responses.
+
+**Optimization**: Implement caching layers:
+- **Program Cache**: Hash program code and cache evaluation results
+- **LLM Cache**: Cache LLM responses for identical prompts
+- **Embedding Cache**: Cache embeddings for program similarity computations
+
+**Expected Impact**: 30-50% reduction in redundant computations.
+
+### Medium Priority
+
+#### 4. Database Indexing
+**Current State**: Linear search for certain operations.
+
+**Optimization**: Add indexes for common queries:
+- Fitness-based queries
+- Parent-child relationships
+- Feature space lookups in MAP-Elites
+
+**Expected Impact**: Faster query times, especially for large databases.
+
+#### 5. Adaptive Population Sizing
+**Current State**: Fixed population size per island.
+
+**Optimization**: Dynamically adjust population size based on:
+- Convergence rate
+- Diversity metrics
+- Available computational resources
+
+**Expected Impact**: Better resource utilization, faster convergence.
+
+#### 6. Smart Migration Strategy
+**Current State**: Fixed migration interval and strategy.
+
+**Optimization**: Implement adaptive migration:
+- Migrate based on diversity metrics rather than fixed intervals
+- Select migrants based on novelty, not just fitness
+- Use gradient-based migration patterns
+
+**Expected Impact**: Improved exploration, better solution diversity.
+
+### Lower Priority (Polish)
+
+#### 7. Profiling and Monitoring
+**Optimization**: Add built-in profiling:
+- Token usage tracking per operation
+- Time spent in each evolutionary operator
+- Memory usage patterns
+- Success rates for different strategies
+
+**Expected Impact**: Better observability, easier optimization identification.
+
+#### 8. Checkpoint Compression
+**Current State**: Checkpoints may be large for big populations.
+
+**Optimization**: Compress checkpoints using gzip or similar:
+```python
+import gzip
+import pickle
+
+def save_checkpoint_compressed(data, path):
+    with gzip.open(path, 'wb') as f:
+        pickle.dump(data, f)
+```
+
+**Expected Impact**: Reduced storage requirements, faster I/O.
+
+#### 9. Type Hints and Validation
+**Current State**: Some functions lack complete type hints.
+
+**Optimization**: Add comprehensive type hints and use `mypy` for static type checking:
+- Better IDE support
+- Catch type errors early
+- Improved code documentation
+
+## Code Quality Improvements
+
+### 1. Error Handling
+- Add specific exception types for different error conditions
+- Implement retry logic with exponential backoff for API calls
+- Better error messages with context
+
+### 2. Logging
+- Structured logging with JSON format for better parsing
+- Configurable log levels per component
+- Log aggregation support for distributed runs
+
+### 3. Testing
+- Add integration tests for the full evolutionary loop
+- Performance regression tests
+- Stress tests with large populations
+
+### 4. Documentation
+- Add inline examples in docstrings
+- Create tutorial notebooks
+- Document configuration parameters with examples
+
+## Performance Benchmarks
+
+To track optimization progress, consider implementing benchmarks for:
+
+1. **Insertion Time**: Measure time to add programs to database at different population sizes
+2. **Evolution Throughput**: Programs evolved per minute
+3. **Memory Usage**: Peak memory usage during runs
+4. **Convergence Speed**: Epochs to reach target fitness
+
+## Architecture Considerations
+
+### Distributed Computing
+For large-scale deployments, consider:
+- Ray or Dask for distributed computation
+- Redis for shared state management
+- Message queues (RabbitMQ, Kafka) for asynchronous communication
+
+### Cloud Optimization
+- Use spot instances for cost savings
+- Implement checkpointing for fault tolerance
+- Auto-scaling based on workload
+
+## Conclusion
+
+The implemented optimizations provide a solid foundation for performance. The recommended future optimizations, prioritized by impact and implementation complexity, can further improve CodeEvolve's efficiency and scalability.
+
+Focus areas for maximum impact:
+1. Parallel evaluation (highest ROI)
+2. LLM request batching
+3. Intelligent caching
+4. Better monitoring and profiling
+
+These optimizations align with the project's goal of being a transparent, reproducible, and community-driven framework for LLM-driven algorithm discovery.
diff --git a/README.md b/README.md
@@ -26,6 +26,14 @@ conda activate codeevolve
 ```
 The command-line version of codeevolve is implemented in ```src/codeevolve/cli.py```, and ```scripts/run.sh``` contains a bash script for running codeevolve on a given benchmark. The most important variables to be defined in this file are the ```API_KEY, API_BASE``` environment variables for connecting with an LLM provider.
 
+CodeEvolve now also supports an optional **NovelAgent** that injects exploratory prompt updates. Enable it by adding a `NOVEL_AGENT` block to your config (see `problems/problem_template/configs/config_mp.yaml`), which will occasionally replace the standard meta-prompting step with a more diversity-focused proposal.
+
+For competitive experiments, you can enable **adversarial islands** via the `ADVERSARIAL` block in the same config. Islands are partitioned into teams (e.g., red vs blue), each evolving independently with MAP-Elites while periodically cross-evaluating candidates against the rival team's current champions. Fitness can be based on win rate, Elo, or a hybrid score, and cross-play can be scheduled every _k_ epochs or alternated between teams to synchronize coevolutionary phases.
+
+You can also inject a lighthearted **climate pressure** by enabling the `CLIMATE` block. Each epoch belongs to a season (choose a single perpetual season or a 4-season cycle), and a small set of Python helpers are randomly tagged as "heat-tolerant" or "cold-resilient." Programs using functions aligned with the current season earn a configurable fitness multiplier, making heat-resistant code more likely to survive during hotter phases.
+
+For a concrete example, see the [F_time setup guide](problems/F_time/SETUP.md) for step-by-step instructions to clone the repository under `/home/rag/Projects`, configure the conda environment, and run the bundled benchmark script.
+
 More comprehensive tutorials will be released soon.
 
 ## Next steps

diff --git a/problems/.api_keys.example b/problems/.api_keys.example
@@ -0,0 +1,32 @@
+# Example API Keys Configuration File
+# 
+# USAGE:
+# 1. Copy this file: cp .api_keys.example .api_keys
+# 2. Add your actual API keys to .api_keys
+# 3. Source in your run.sh: source problems/.api_keys
+# 4. Add .api_keys to .gitignore (already done)
+#
+# SECURITY:
+# - NEVER commit the actual .api_keys file to git
+# - This .example file shows the format only
+# - Keep your keys secret!
+
+# OpenAI / Azure OpenAI
+export API_KEY="sk-your-openai-api-key-here"
+export API_BASE="https://api.openai.com/v1"
+
+# Google Gemini
+# export API_KEY="your-google-api-key-here"
+# export API_BASE="https://generativelanguage.googleapis.com/v1beta"
+
+# Azure OpenAI (custom endpoint)
+# export API_KEY="your-azure-key-here"
+# export API_BASE="https://your-resource.openai.azure.com/openai/deployments/your-deployment"
+
+# Anthropic Claude
+# export API_KEY="sk-ant-your-anthropic-key-here"
+# export API_BASE="https://api.anthropic.com/v1"
+
+# Custom / Self-hosted
+# export API_KEY="your-custom-key"
+# export API_BASE="http://localhost:8080/v1"