-
Notifications
You must be signed in to change notification settings - Fork 494
Description
Ticket Contents
🧠 Project 1: Query Method Evaluation System on Vector Databases for Grievance Data
📘 Overview
The Query Method Evaluation System is a robust framework designed to identify and recommend the most effective query methods for various categories within a grievance management system.
It leverages RAGAS metrics — Context Precision, Context Recall, and Response Relevancy — to comprehensively evaluate different search techniques on a large dataset of structured grievance information.
🎯 Project Scope and Impact
The project involved:
-
Structuring 15,000 grievance records
Transforming raw grievance data into a structured, query-ready format — crucial for consistent and accurate retrieval. -
Synthetic Query Generation
Creating 15,000 synthetic queries (one per grievance) to expand the dataset for thorough evaluation and benchmarking. -
Evaluation and Benchmarking
Testing and comparing multiple query methods using RAGAS metrics on structured data and synthetic queries.
⚙️ Features
🔍 9 Query Methods Evaluated
- Semantic Search
- Hybrid Search
- Keyword Search (BM25)
- Generative Search (RAG)
- Vector Similarity Search
- Multiple Target Vectors
- Reranking (Hybrid + Rerank)
- Aggregate Data
- Filtered Search
📊 3 Core RAGAS Metrics
| Metric | Meaning | Good Score |
|---|---|---|
| Context Precision | Proportion of relevant info within retrieved contexts (less noise) | 0.7 |
| Context Recall | Measures how many relevant pieces of info were retrieved (fewer misses) | 0.8 |
| Response Relevancy | How well the response addresses the query | 0.7 |
🧩 Additional Features
- Category-wise Analysis – Tailored recommendations for best methods by grievance category.
- Comprehensive Reporting – Exports detailed CSV and JSON reports.
- Scalable Evaluation – Efficiently handles large datasets for real-world deployment.
🛠️ Setup Instructions
1️⃣ Environment Setup
Create a .env file in your project root with:
WEAVIATE_URL=https://your-cluster.weaviate.network
WEAVIATE_API_KEY=your-weaviate-api-key
OPENAI_API_KEY=your-openai-api-key2️⃣ Install Dependencies
npm install weaviate-client dotenv node-fetch csv-writer
3️⃣ Data Preparation
Ensure your JSON data follows this format:
[ { "department_code": "DOTEL", "department_name": "Telecommunications", "category": "Mobile Related", "sub_category_1": "Call Drop", "description": "Detailed description of the issue...", "user_queries": [ "My phone calls keep getting dropped. What can I do?", "Why does my cell phone keep disconnecting during calls?" ] } ]
Run the data preparation script:
node data_preparation.js
4️⃣ Run Evaluation
node query_evaluation.js
💡 Query Methods and Best Use Cases
📂 Output Files
🧾 evaluation_results.csv
Contains:
Department, Category, Sub-category, User Query
Scores for each method
RAGAS metric values (Precision, Recall, Relevancy)
🗂️ recommendations.json
Category-wise recommendations for best-performing methods.
Example:
{ "Mobile Related > Call Drop": { "bestMethod": "hybrid", "bestScore": 0.856, "methodPerformance": { "hybrid": { "averageScore": 0.856, "contextPrecision": 0.89, "contextRecall": 0.82, "responseRelevancy": 0.86 } } } }
💻 Console Output
Real-time progress updates and summarized evaluation metrics during execution.
⚡ Evaluation Process Phases
🧩 Phase 1: Initialization
Load
.envvariablesConnect to Weaviate
Load evaluation JSON
Create evaluator instance
🔁 Phase 2: Evaluate All Methods
Loop through all categories and queries
Run all 9 query methods
Compute 3 RAGAS metrics
Log and store results
📊 Phase 3: Result Analysis + Export
Aggregate scores
Compute averages (P, R, RR, Score)
Identify best-performing method
Export to CSV and JSON
🏛️ BBMP Act Version History - GitBook Documentation
📖 Introduction
Welcome to the documentation for the BBMP Act Version History project.
This project addresses a crucial issue in Indian governance — the lack of version control and historical tracking for legal acts and amendments.
Legal documents like the Bruhat Bengaluru Mahanagara Palike (BBMP) Act) undergo numerous amendments, but these changes are rarely consolidated or easily accessible.
As a result, it becomes difficult for citizens, legal professionals, and researchers to understand the evolution of laws and track amendments over time.
🎯 Objective
The project provides a user-friendly interface to:
- View different versions of the BBMP Act.
- Compare versions to see insertions, deletions, and modifications.
- Track amendment history via a version-controlled timeline.
This documentation explains the complete project architecture — from data extraction and processing to the frontend UI.
🧩 Data Extraction and Processing
📂 Raw Data Source
- The initial data is stored as JSON files — each representing one or more chapters of the Act.
- Location:
bbmp_data_extractor/ - Example files:
chapter1.json,chapter2and3.json, etc. - Each JSON file contains an array of objects, where each object represents a page or section with a
markdownfield containing raw text.
📝 Text Extraction
To prepare data for processing, all markdown content is combined into a single string per chapter using the getChapterContent function.
getChapterContent(filePath)
Purpose:
Reads a chapter JSON file and merges all markdown text into a single string.
Process:
- Reads the file content from the given
filePath. - Parses the JSON data.
- Maps over each page to extract its markdown.
- Joins all markdown content into one continuous text block.
This produces a clean text representation of each chapter, ready for structuring.
⚙️ Conversion to Akoma Ntoso
Once the raw text is extracted, it is converted into a structured, hierarchical format using the Akoma Ntoso standard — an XML-based schema for legal documents.
🔧 The convertToAkomaNtoso Function
Found in the extraction scripts (extractor.js, extractor_gemini.js), this function is the core of the structuring process.
It sends the raw text to an AI model (e.g., OpenAI GPT or Google Gemini) with a prompt to produce a JSON representation adhering to Akoma Ntoso principles.
Key Features
- AI-Powered Parsing:
Uses LLMs to identify and structure legal entities like chapters, sections, and clauses. - Hierarchical Organization:
Outputs a nested JSON that reflects the Act’s legal hierarchy. - Accuracy and Completeness:
The AI prompt enforces full-text coverage without alteration.
🧱 Akoma Ntoso JSON Structure
The structured JSON is saved in bbmp_data_extractor/akomo-ntoso/.
Example Structure:
{
"akomaNtoso": {
"act": {
"meta": { ... },
"preamble": { ... },
"body": {
"chapter": {
"@eId": "ch_I",
"num": "CHAPTER I",
"heading": "PRELIMINARY",
"section": [
{
"@eId": "sec_3",
"num": "3.",
"heading": "Definitions.",
"content": [ ... ]
}
]
}
}
}
}
}Goals & Mid-Point Milestone
✅ Goals
-
Developed a version-controlled system for BBMP Act
→ Implemented structured tracking of amendments using Akoma Ntoso format. -
Automated data extraction and conversion pipeline
→ Extracted raw JSON chapters and converted them into hierarchical, machine-readable Akoma Ntoso JSON. -
Built an interactive Next.js frontend
→ Designed a user-friendly interface to view, compare, and track Act versions with diff highlighting. -
Implemented AI-powered parsing and diffing logic
→ Leveraged GPT/Gemini models for structured legal text parsing and accurate change detection. -
Goals Achieved By Mid-point Milestone
→ Complete pipeline from raw data → structured format → visual diff interface was successfully built and validated.
Setup/Installation
BBMP Act Version History — README
Welcome to the BBMP Act Version History project. This README contains a single, complete GitHub-ready Markdown document with setup and installation instructions. Copy this entire file to README.md in your repository.
🏛️ BBMP Act Version History
A version-controlled system for the Bruhat Bengaluru Mahanagara Palike (BBMP) Act that lets you extract, structure, diff, and view amendments using an Akoma Ntoso JSON representation and a Next.js frontend.
✅ Features (completed / achieved)
- ✅ Version-controlled representation of the BBMP Act using Akoma Ntoso-like JSON.
- ✅ Automated extraction pipeline from raw JSON markdown files.
- ✅ AI-powered parsing (OpenAI / Gemini) to convert raw text into structured legal JSON.
- ✅ Next.js frontend to view, compare, diff and export versions.
- ✅ Diffing modes: Cumulative and Incremental.
- ✅ Exporting:
.txtand.pdf. - ✅ Category-wise change summaries and timeline UI.
⚙️ Setup & Installation
Follow the steps below to set up and run the project locally.
1. Prerequisites
Make sure you have the following installed:
- Node.js v18+
- npm or yarn
- Git
- (Optional) OpenAI API Key or Gemini API Key for AI-based structuring
2. Clone the repository
git clone https://github.com/your-username/bbmp-act-version-history.git
cd bbmp-act-version-history3. Environment variables
Create a .env file at the project root and add the necessary keys:
# For OpenAI (optional if using Gemini)
OPENAI_API_KEY=your-openai-api-key
# For Gemini (optional if using OpenAI)
GEMINI_API_KEY=your-gemini-api-key
# Example Weaviate or other vector DB settings if used by other modules
WEAVIATE_URL=https://your-cluster.weaviate.network
WEAVIATE_API_KEY=your-weaviate-api-keyNote: Fill only the keys you will use. Keep
.envout of version control (.gitignore).
4. Install dependencies
Using npm:
npm installOr using yarn:
yarn install5. Project layout (important folders)
.
├─ bbmp_data_extractor/
│ ├─ chapter1.json
│ ├─ chapter2and3.json
│ └─ akomo-ntoso/ # generated Akoma Ntoso JSON output
├─ src/
│ └─ app/
│ └─ page.tsx # main Next.js UI
├─ extractor.js # extraction script (OpenAI)
├─ extractor_gemini.js # extraction script (Gemini)
├─ convertToAkomaNtoso.js # conversion helper (if separate)
├─ jsonToHtml.js # debug / view converter
├─ package.json
└─ README.md
6. Data preparation
Place your raw chapter JSON files in bbmp_data_extractor/. Each file should be an array of page objects with a markdown field:
[
{ "markdown": "Section or page content here..." },
{ "markdown": "Next page content ..." }
]Run the extraction (OpenAI):
node extractor.jsOr for Gemini:
node extractor_gemini.jsThese scripts will:
- Read JSON pages,
- Concatenate markdown per chapter (via
getChapterContent), - Send text to the model to parse into a structured Akoma Ntoso-style JSON,
- Save outputs to
bbmp_data_extractor/akomo-ntoso/.
7. (Optional) Convert JSON to HTML for debugging
node jsonToHtml.jsThis generates simple HTML views from the Akoma Ntoso JSON to inspect structure and verify parsing.
8. Run the frontend (Next.js)
Start the dev server:
npm run dev
# or
yarn devOpen the app:
http://localhost:3000
What you should be able to do:
- Browse chapters and versions
- Switch diff modes (Cumulative / Incremental)
- See insertions (
<ins>) and deletions (<del>) highlighted - Export current view as
.txtor print to PDF - View version timeline and change summaries
9. Build for production
npm run build
npm start
# or the yarn equivalents🔧 Scripts (example package.json entries)
Add or confirm these scripts in your package.json:
{
"scripts": {
"dev": "next dev",
"build": "next build",
"start": "next start",
"extract:openai": "node extractor.js",
"extract:gemini": "node extractor_gemini.js",
"convert:html": "node jsonToHtml.js"
}
}🧠 Implementation notes
- getChapterContent(filePath): Reads a chapter JSON and concatenates all
markdownfields into a single text string per chapter. - convertToAkomaNtoso: Sends chapter text to the selected LLM with a prompt instructing it to output a nested JSON reflecting chapters → sections → subsections → clauses (Akoma Ntoso-like structure). Ensure the prompt enforces completeness and faithful representation.
- Diffing:
applyAmendmentsVI,applyAmend1VII, etc., wrap deleted text in<del>and inserted text in<ins>.computeVersionsbuilds cumulative or incremental HTML based on diff mode. - summarizeDiff: Counts
<ins>and<del>tags and lists modified sections for quick UI summaries.
✅ Verification checklist
- Raw chapter JSONs placed in
bbmp_data_extractor/ - Extraction script runs and produces Akoma Ntoso JSON in
bbmp_data_extractor/akomo-ntoso/ -
jsonToHtml.jsgenerates viewable HTML for debugging - Next.js app runs at
http://localhost:3000 - Diffing, timeline and exports function in UI
📝 Example extractor.js usage snippet
Make sure your
extractor.jshas a function likegetChapterContentand calls to your LLM client forconvertToAkomaNtoso. Below is a minimalist example of how you might call it (pseudo):
const { getChapterContent, convertToAkomaNtoso } = require('./lib/extractorHelpers');
async function run() {
const chapterText = getChapterContent('bbmp_data_extractor/chapter1.json');
const akomaJson = await convertToAkomaNtoso(chapterText);
// write akomaJson to bbmp_data_extractor/akomo-ntoso/chapter1.json
}
run();📚 References & standards
- Akoma Ntoso — XML standard for parliamentary, legislative and judiciary documents (we use an Akoma Ntoso inspired JSON representation).
- Next.js + TailwindCSS — UI and styling stack.
⚖️ License & Contributing
Contributions are welcome. Please open issues and PRs. Add a CONTRIBUTING.md and CODE_OF_CONDUCT.md as needed for the project.
✉️ Contact
If you need help or want to collaborate:
- Create an issue in this repo
- Or contact the maintainer at
[email protected]
Expected Outcome
No response
Acceptance Criteria
No response
Implementation Details
https://gauravs-organization-13.gitbook.io/untitled/dmp-final-evaluation
Mockups/Wireframes
No response
Product Name
AI for Legal Justice ( OpenNyAI )
Organisation Name
If Me
Domain
Education
Tech Skills Needed
Artificial Intelligence
Mentor(s)
Prakahs (OpenNyAI)
Category
Machine Learning