Add Makefile and benchmark automation framework for PromptCompEval #1

Copilot · 2025-10-02T05:33:22Z

Overview

This PR implements a complete benchmark automation framework for PromptCompEval, providing Makefile boilerplate and scripts to easily run benchmarks for different prompt compilation techniques such as byLLM.

What's New

🛠️ Makefile Automation

Added a comprehensive Makefile with 15+ commands for common tasks:

Setup: make setup, make install, make install-dev
Testing: make test, make coverage
Benchmarking: make benchmark, make benchmark-fast, make benchmark-full, make benchmark-compare
Code Quality: make lint, make format, make type-check
Utilities: make clean, make results

Run make help to see all available commands.

📊 Benchmark Scripts

run_benchmarks.py: Main benchmark runner with CLI arguments for quick, standard, or full benchmark suites
compare_techniques.py: Automated comparison tool that analyzes and displays results across different techniques
setup.sh: One-command environment setup script

🔧 Core Framework

Built a modular evaluation framework in src/promptcompeval/ with:

Evaluator: Core evaluation logic for running benchmarks
Techniques: Four prompt compilation implementations:
- OriginalTechnique - Baseline without compilation
- ByLLMTechnique - LLM-based prompt compilation
- OptimizedTechnique - Hand-optimized prompts
- CompressedTechnique - Token-compressed prompts

📋 Configuration & Data

YAML configuration template in benchmarks/configs/default.yaml
Sample dataset with 5 task categories (translation, summarization, entity extraction, sentiment analysis, QA)
Proper directory structure with .gitkeep files and .gitignore rules

✅ Testing

Unit tests for all techniques (9 tests, all passing)
Pytest configuration in pyproject.toml
Coverage support via make coverage

📚 Documentation

README.md: Comprehensive documentation with usage instructions, project structure, and examples
QUICKSTART.md: Quick reference guide for getting started
CONTRIBUTING.md: Developer guidelines for contributing
LICENSE: MIT License
Example code: examples/basic_usage.py demonstrating framework usage

📦 Packaging

requirements.txt with all necessary dependencies
setup.py and pyproject.toml for package installation
Proper Python project structure following best practices

Quick Start

# Setup environment
make setup
source venv/bin/activate

# Run benchmarks
make benchmark-fast

# Compare techniques
make benchmark-compare

# View results
make results

Example Output

============================================================
BENCHMARK SUMMARY
============================================================

original:
  Benchmarks run: 2
  Average accuracy: 85.50%
  Average latency: 245.00ms
  Total cost: $0.0024

byLLM:
  Benchmarks run: 2
  Average accuracy: 88.25%
  Average latency: 312.00ms
  Total cost: $0.0031
...

Project Structure

PromptCompEval/
├── Makefile              # All automation commands
├── requirements.txt      # Python dependencies
├── src/promptcompeval/   # Core framework
├── tests/                # Unit tests
├── benchmarks/
│   ├── configs/          # Configurations
│   ├── data/             # Datasets
│   └── results/          # Results (auto-generated)
├── scripts/              # Automation scripts
├── examples/             # Usage examples
└── docs/                 # Documentation files

Benefits

✅ Easy to use: Single command to run benchmarks (make benchmark)
✅ Extensible: Modular design makes it easy to add new techniques and datasets
✅ Well-documented: Comprehensive guides for users and contributors
✅ Tested: Full test coverage with pytest
✅ Professional: Follows Python packaging best practices

This framework provides everything needed to evaluate and compare different prompt compilation techniques in a systematic, reproducible way.

Original prompt

This is a project where we use real prompt engineered applications, convert them to different prompt compilation techniques such as byLLM, and evaluate different metrics. This project is a evaluation framework, so grate to have a makefile boiler plate or scripts in place to just run these benchmarks

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

CLAassistant · 2025-10-02T05:33:28Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Co-authored-by: Jayanaka-98 <[email protected]>

Initial plan

9fca1a0

Copilot AI assigned Copilot and Jayanaka-98 Oct 2, 2025

Copilot started work on behalf of Jayanaka-98 October 2, 2025 05:33 View session

Copilot AI and others added 2 commits October 2, 2025 05:44

Add Makefile, scripts, and framework structure for benchmark automation

60b3eb8

Co-authored-by: Jayanaka-98 <[email protected]>

Add QUICKSTART guide, LICENSE, and improve documentation

cf1a2e5

Co-authored-by: Jayanaka-98 <[email protected]>

Copilot AI requested a review from Jayanaka-98 October 2, 2025 05:48

Copilot finished work on behalf of Jayanaka-98 October 2, 2025 05:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Makefile and benchmark automation framework for PromptCompEval #1

Add Makefile and benchmark automation framework for PromptCompEval #1

Uh oh!

Copilot AI commented Oct 2, 2025 •

edited

Loading

Uh oh!

CLAassistant commented Oct 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add Makefile and benchmark automation framework for PromptCompEval #1

Are you sure you want to change the base?

Add Makefile and benchmark automation framework for PromptCompEval #1

Uh oh!

Conversation

Copilot AI commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

What's New

🛠️ Makefile Automation

📊 Benchmark Scripts

🔧 Core Framework

📋 Configuration & Data

✅ Testing

📚 Documentation

📦 Packaging

Quick Start

Example Output

Project Structure

Benefits

Uh oh!

CLAassistant commented Oct 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Oct 2, 2025 •

edited

Loading