Add Makefile and benchmark automation framework for PromptCompEval #1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This PR implements a complete benchmark automation framework for PromptCompEval, providing Makefile boilerplate and scripts to easily run benchmarks for different prompt compilation techniques such as byLLM.
What's New
🛠️ Makefile Automation
Added a comprehensive Makefile with 15+ commands for common tasks:
make setup,make install,make install-devmake test,make coveragemake benchmark,make benchmark-fast,make benchmark-full,make benchmark-comparemake lint,make format,make type-checkmake clean,make resultsRun
make helpto see all available commands.📊 Benchmark Scripts
run_benchmarks.py: Main benchmark runner with CLI arguments for quick, standard, or full benchmark suitescompare_techniques.py: Automated comparison tool that analyzes and displays results across different techniquessetup.sh: One-command environment setup script🔧 Core Framework
Built a modular evaluation framework in
src/promptcompeval/with:OriginalTechnique- Baseline without compilationByLLMTechnique- LLM-based prompt compilationOptimizedTechnique- Hand-optimized promptsCompressedTechnique- Token-compressed prompts📋 Configuration & Data
benchmarks/configs/default.yaml.gitkeepfiles and.gitignorerules✅ Testing
pyproject.tomlmake coverage📚 Documentation
examples/basic_usage.pydemonstrating framework usage📦 Packaging
requirements.txtwith all necessary dependenciessetup.pyandpyproject.tomlfor package installationQuick Start
Example Output
Project Structure
Benefits
make benchmark)This framework provides everything needed to evaluate and compare different prompt compilation techniques in a systematic, reproducible way.
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.