This repository contains the experimentation and evaluation framework for the Pricing Intelligence: Rethinking IS Engineering in Volatile SaaS Environments paper. It includes scripts to generate instantiated questions from templates, run experiments against a Pricing Intelligence agent (HARVEY), and evaluate the results.
- Python 3.x
- HARVEY (Pricing Intelligence Interpretation Process) running locally.
-
Clone this repository.
-
Install the required Python dependencies:
pip install -r requirements.txt
The typical workflow involves three main steps: Generation, Experimentation, and Evaluation.
Generate instantiated questions from templates and pricing data.
Script: Experimentation/generate_instantiated_questions.py
Usage:
Run from the project root:
python3 Experimentation/generate_instantiated_questions.py \
--templates Experimentation/pi_task_templates.json \
--spec Experimentation/instantiation_spec.json \
--output instantiated_questions.jsonArguments:
--templates: Path to the JSON file containing question templates.--spec: Path to the JSON file containing instantiation specifications (placeholders, overrides).--output: Path where the generated questions JSON will be saved. We recommend saving it toinstantiated_questions.jsonin the root directory sorun_experiment.pycan find it easily.--expected(Optional): Path to an existing file to verify equality.
Run the generated questions against the HARVEY agent.
Important: You must first launch the HARVEY project. Follow the instructions in the Pricing-Intelligence-Interpretation-Process repository to start the server. By default, it is expected to be running at http://localhost:8086/chat.
Script: Experimentation/run_experiment.py
Usage:
Run from the project root (ensure instantiated_questions.json is present in the root):
python3 Experimentation/run_experiment.pyConfiguration:
You may need to modify Experimentation/run_experiment.py to adjust configurations:
API_URL: The URL of the HARVEY agent (default:http://localhost:8086/chat).INPUT_FILE: The input file containing questions (default:instantiated_questions.json). Ensure this matches the output from the Generation step.OUTPUT_FILE: The file where results will be saved (default:experiment_results_gpt_5_nano.json).
Note: The script supports checkpointing. If interrupted, it will resume from where it left off, skipping already processed questions found in the output file.
Analyze the experiment results and generate a report.
Script: Evaluation/generate_evaluation_report.py
Usage:
python3 Evaluation/generate_evaluation_report.py \
--input Experimentation/experiment_results_gpt_5_nano.json \
--output_dir Evaluation/reports \
--outfile evaluation_report.jsonArguments:
--input: Path to the experiment results JSON file.--output_dir: Directory where the evaluation report will be saved.--outfile: Name of the output report file.
The repository structure reflects the evaluation workflow described in the PIIP paper:
Raw PI tasks/: Contains the initial pool of templates (
human_tasks.md: Raw templates elicited from human participants.ai_gemini3Pro_tasks.md&ai_qwen3Max_tasks.md: Raw templates generated by external LLMs.
Classified PI tasks/: Represents the curated and classified set of templates (
-
human_tasks.md&ai_tasks.md: Normalized templates tagged as "Answerable" ($AT$ ) or "Non answerable" ($UT$ ).
Experimentation/: Artifacts and scripts for generating the concrete dataset (
-
pi_task_templates.json: Contains Answerable Templates ($AT$ ) with their ideal strategies (ground truth plans). -
instantiation_spec.json: Specifications for instantiating templates with real values. -
generate_instantiated_questions.py: Script to generate the 150 concrete PI tasks ($Q$ ). -
run_experiment.py: Script to execute tasks against HARVEY ($RQ_1$ ).
Evaluation/: Tools to analyze HARVEY's performance.
-
generate_evaluation_report.py: Computes precision and recall metrics for$RQ_2$ and$RQ_3$ . -
statistical_evaluation.py&visualization.ipynb: Statistical analysis and visualization tools.
data/: Contains the real SaaS pricing data (from the TSC'25 dataset) used to instantiate the templates.
- URLs: To change the target API URL, edit the
API_URLconstant inExperimentation/run_experiment.py. - Input/Output Files: Default file paths are defined as constants (
INPUT_FILE,OUTPUT_FILE) inExperimentation/run_experiment.py. You can modify these or update the script to accept command-line arguments.