[DMP 2025]: Gaurav Singh | OpenNyAI

### Ticket Contents

# 🧠 Project 1: Query Method Evaluation System on Vector Databases for Grievance Data

## 📘 Overview
The **Query Method Evaluation System** is a robust framework designed to identify and recommend the most effective query methods for various categories within a **grievance management system**. 
It leverages **RAGAS metrics** — *Context Precision, Context Recall, and Response Relevancy* — to comprehensively evaluate different search techniques on a large dataset of structured grievance information.

---

## 🎯 Project Scope and Impact

The project involved:

1. **Structuring 15,000 grievance records** 
 Transforming raw grievance data into a structured, query-ready format — crucial for consistent and accurate retrieval.

2. **Synthetic Query Generation** 
 Creating 15,000 synthetic queries (one per grievance) to expand the dataset for thorough evaluation and benchmarking.

3. **Evaluation and Benchmarking** 
 Testing and comparing multiple query methods using **RAGAS metrics** on structured data and synthetic queries.

---

## ⚙️ Features

### 🔍 9 Query Methods Evaluated
1. **Semantic Search** 
2. **Hybrid Search** 
3. **Keyword Search (BM25)** 
4. **Generative Search (RAG)** 
5. **Vector Similarity Search** 
6. **Multiple Target Vectors** 
7. **Reranking (Hybrid + Rerank)** 
8. **Aggregate Data** 
9. **Filtered Search**

---

### 📊 3 Core RAGAS Metrics
| Metric | Meaning | Good Score |
|:--|:--|:--|
| **Context Precision** | Proportion of relevant info within retrieved contexts (less noise) | `0.7` |
| **Context Recall** | Measures how many relevant pieces of info were retrieved (fewer misses) | `0.8` |
| **Response Relevancy** | How well the response addresses the query | `0.7` |

---

### 🧩 Additional Features
- **Category-wise Analysis** – Tailored recommendations for best methods by grievance category. 
- **Comprehensive Reporting** – Exports detailed CSV and JSON reports. 
- **Scalable Evaluation** – Efficiently handles large datasets for real-world deployment.

---

## 🛠️ Setup Instructions

### 1️⃣ Environment Setup
Create a `.env` file in your project root with:
```bash
WEAVIATE_URL=https://your-cluster.weaviate.network
WEAVIATE_API_KEY=your-weaviate-api-key
OPENAI_API_KEY=your-openai-api-key
```

<head></head><h3 data-start="2287" data-end="2315" style="font-style: normal; font-variant-caps: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);">2️⃣ Install Dependencies</h3><pre class="overflow-visible!" data-start="2316" data-end="2384" style="font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);"><div class="contain-inline-size rounded-2xl relative bg-token-sidebar-surface-primary"><div class="sticky top-9"><div class="absolute end-0 bottom-0 flex h-9 items-center pe-2"><div class="bg-token-bg-elevated-secondary text-token-text-secondary flex items-center gap-4 rounded-sm px-2 font-sans text-xs"></div></div></div><div class="overflow-y-auto p-4" dir="ltr"><code class="whitespace-pre! language-bash">npm install weaviate-client dotenv node-fetch csv-writer
</code></div></div></pre><hr data-start="2386" data-end="2389" style="font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"><h3 data-start="2391" data-end="2415" style="font-style: normal; font-variant-caps: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);">3️⃣ Data Preparation</h3><p data-start="2416" data-end="2458" style="font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);">Ensure your JSON data follows this format:<pre class="overflow-visible!" data-start="2459" data-end="2840" style="font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);"><div class="contain-inline-size rounded-2xl relative bg-token-sidebar-surface-primary"><div class="sticky top-9"><div class="absolute end-0 bottom-0 flex h-9 items-center pe-2"><div class="bg-token-bg-elevated-secondary text-token-text-secondary flex items-center gap-4 rounded-sm px-2 font-sans text-xs"></div></div></div><div class="overflow-y-auto p-4" dir="ltr"><code class="whitespace-pre! language-json">[
 {
 "department_code": "DOTEL",
 "department_name": "Telecommunications",
 "category": "Mobile Related",
 "sub_category_1": "Call Drop",
 "description": "Detailed description of the issue...",
 "user_queries": [
 "My phone calls keep getting dropped. What can I do?",
 "Why does my cell phone keep disconnecting during calls?"
 ]
 }
]
</code></div></div></pre><p data-start="2842" data-end="2874" style="font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);">Run the data preparation script:<pre class="overflow-visible!" data-start="2875" data-end="2911" style="font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);"><div class="contain-inline-size rounded-2xl relative bg-token-sidebar-surface-primary"><div class="sticky top-9"><div class="absolute end-0 bottom-0 flex h-9 items-center pe-2"><div class="bg-token-bg-elevated-secondary text-token-text-secondary flex items-center gap-4 rounded-sm px-2 font-sans text-xs"></div></div></div><div class="overflow-y-auto p-4" dir="ltr"><code class="whitespace-pre! language-bash">node data_preparation.js
</code></div></div></pre><hr data-start="2913" data-end="2916" style="font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"><h3 data-start="2918" data-end="2940" style="font-style: normal; font-variant-caps: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);">4️⃣ Run Evaluation</h3><pre class="overflow-visible!" data-start="2941" data-end="2977" style="font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);"><div class="contain-inline-size rounded-2xl relative bg-token-sidebar-surface-primary"><div class="sticky top-9"><div class="absolute end-0 bottom-0 flex h-9 items-center pe-2"><div class="bg-token-bg-elevated-secondary text-token-text-secondary flex items-center gap-4 rounded-sm px-2 font-sans text-xs"></div></div></div><div class="overflow-y-auto p-4" dir="ltr"><code class="whitespace-pre! language-bash">node query_evaluation.js
</code></div></div></pre><hr data-start="2979" data-end="2982" style="font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"><h2 data-start="2984" data-end="3022" style="font-style: normal; font-variant-caps: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);">💡 Query Methods and Best Use Cases</h2><div class="_tableContainer_1rjym_1" style="font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);"><div tabindex="-1" class="group _tableWrapper_1rjym_13 flex w-fit flex-col-reverse">
Method | Best For | Use Case
-- | -- | --
Semantic Search (nearText) | Natural language queries, conceptual search | “My internet is slow”
Hybrid Search | Balanced semantic + keyword matching | Mixed search patterns
Keyword Search (BM25) | Exact term matching, technical queries | “error code 404”
Generative Search (RAG) | Complex queries requiring synthesis | “What should I do about dropped calls?”
Vector Similarity Search | Semantic similarity without preprocessing | Finding conceptually similar records
Multiple Target Vectors | Multi-aspect queries, complex categorization | “technical + emotional” content
Reranking (Hybrid + Rerank) | High precision requirements | Legal or medical queries
Aggregate Data | Statistical, summarization queries | “How many network issues were reported?”
Filtered Search | Category-specific, scoped queries | “Only Mobile Related issues”

</div></div><hr data-start="3980" data-end="3983" style="font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"><h2 data-start="3985" data-end="4003" style="font-style: normal; font-variant-caps: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);">📂 Output Files</h2><h3 data-start="4005" data-end="4036" style="font-style: normal; font-variant-caps: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);">🧾 <code data-start="4012" data-end="4036">evaluation_results.csv</code></h3><p data-start="4037" data-end="4046" style="font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);">Contains:<ul data-start="4047" data-end="4173" style="font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);"><li data-start="4047" data-end="4095">Department, Category, Sub-category, User Query</li><li data-start="4096" data-end="4120">Scores for each method</li><li data-start="4121" data-end="4173">RAGAS metric values (Precision, Recall, Relevancy)</li></ul><hr data-start="4175" data-end="4178" style="font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"><h3 data-start="4180" data-end="4210" style="font-style: normal; font-variant-caps: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);">🗂️ <code data-start="4188" data-end="4210">recommendations.json</code></h3><p data-start="4211" data-end="4269" style="font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);">Category-wise recommendations for best-performing methods.<p data-start="4271" data-end="4279" style="font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);">Example:<pre class="overflow-visible!" data-start="4280" data-end="4574" style="font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);"><div class="contain-inline-size rounded-2xl relative bg-token-sidebar-surface-primary"><div class="sticky top-9"><div class="absolute end-0 bottom-0 flex h-9 items-center pe-2"><div class="bg-token-bg-elevated-secondary text-token-text-secondary flex items-center gap-4 rounded-sm px-2 font-sans text-xs"></div></div></div><div class="overflow-y-auto p-4" dir="ltr"><code class="whitespace-pre! language-json">{
 "Mobile Related &gt; Call Drop": {
 "bestMethod": "hybrid",
 "bestScore": 0.856,
 "methodPerformance": {
 "hybrid": {
 "averageScore": 0.856,
 "contextPrecision": 0.89,
 "contextRecall": 0.82,
 "responseRelevancy": 0.86
 }
 }
 }
}
</code></div></div></pre><hr data-start="4576" data-end="4579" style="font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"><h3 data-start="4581" data-end="4602" style="font-style: normal; font-variant-caps: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);">💻 Console Output</h3><p data-start="4603" data-end="4681" style="font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);">Real-time progress updates and summarized evaluation metrics during execution.<hr data-start="4683" data-end="4686" style="font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"><h2 data-start="4688" data-end="4718" style="font-style: normal; font-variant-caps: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);">⚡ Evaluation Process Phases</h2><h3 data-start="4720" data-end="4750" style="font-style: normal; font-variant-caps: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);">🧩 Phase 1: Initialization</h3><ul data-start="4751" data-end="4859" style="font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);"><li data-start="4751" data-end="4776">Load <code data-start="4758" data-end="4764">.env</code> variables</li><li data-start="4777" data-end="4804">Connect to Weaviate</li><li data-start="4805" data-end="4829">Load evaluation JSON</li><li data-start="4830" data-end="4859">Create evaluator instance</li></ul><h3 data-start="4861" data-end="4897" style="font-style: normal; font-variant-caps: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);">🔁 Phase 2: Evaluate All Methods</h3><ul data-start="4898" data-end="5023" style="font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);"><li data-start="4898" data-end="4941">Loop through all categories and queries</li><li data-start="4942" data-end="4969">Run all 9 query methods</li><li data-start="4970" data-end="4997">Compute 3 RAGAS metrics</li><li data-start="4998" data-end="5023">Log and store results</li></ul><h3 data-start="5025" data-end="5065" style="font-style: normal; font-variant-caps: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);">📊 Phase 3: Result Analysis + Export</h3><ul data-start="5066" data-end="5194" style="font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);"><li data-start="5066" data-end="5086">Aggregate scores</li><li data-start="5087" data-end="5125">Compute averages (P, R, RR, Score)</li><li data-start="5126" data-end="5161">Identify best-performing method</li><li data-start="5162" data-end="5194">Export to CSV and JSON</li></ul>




# 🏛️ BBMP Act Version History - GitBook Documentation

## 📖 Introduction
Welcome to the documentation for the **BBMP Act Version History** project. 
This project addresses a crucial issue in Indian governance — the **lack of version control and historical tracking** for legal acts and amendments.

Legal documents like the **Bruhat Bengaluru Mahanagara Palike (BBMP) Act)** undergo numerous amendments, but these changes are rarely consolidated or easily accessible. 
As a result, it becomes difficult for citizens, legal professionals, and researchers to understand the evolution of laws and track amendments over time.

### 🎯 Objective
The project provides a **user-friendly interface** to:
- View different versions of the **BBMP Act**.
- Compare versions to see **insertions, deletions, and modifications**.
- Track amendment history via a **version-controlled timeline**.

This documentation explains the complete project architecture — from **data extraction and processing** to the **frontend UI**.

---

## 🧩 Data Extraction and Processing

### 📂 Raw Data Source
- The initial data is stored as **JSON files** — each representing one or more chapters of the Act. 
- Location: `bbmp_data_extractor/` 
- Example files: `chapter1.json`, `chapter2and3.json`, etc. 
- Each JSON file contains an array of objects, where each object represents a page or section with a `markdown` field containing raw text.

---

### 📝 Text Extraction
To prepare data for processing, all markdown content is combined into a single string per chapter using the **`getChapterContent`** function.

#### **getChapterContent(filePath)**
**Purpose:** 
Reads a chapter JSON file and merges all markdown text into a single string.

**Process:**
1. Reads the file content from the given `filePath`.
2. Parses the JSON data.
3. Maps over each page to extract its markdown.
4. Joins all markdown content into one continuous text block.

This produces a clean text representation of each chapter, ready for structuring.

---

## ⚙️ Conversion to Akoma Ntoso

Once the raw text is extracted, it is converted into a **structured, hierarchical format** using the **Akoma Ntoso** standard — an XML-based schema for legal documents.

### 🔧 The `convertToAkomaNtoso` Function
Found in the extraction scripts (`extractor.js`, `extractor_gemini.js`), this function is the core of the structuring process.

It sends the raw text to an AI model (e.g., **OpenAI GPT** or **Google Gemini**) with a prompt to produce a **JSON representation** adhering to Akoma Ntoso principles.

#### **Key Features**
- **AI-Powered Parsing:** 
 Uses LLMs to identify and structure legal entities like chapters, sections, and clauses.
- **Hierarchical Organization:** 
 Outputs a nested JSON that reflects the Act’s legal hierarchy.
- **Accuracy and Completeness:** 
 The AI prompt enforces full-text coverage without alteration.

---

### 🧱 Akoma Ntoso JSON Structure
The structured JSON is saved in `bbmp_data_extractor/akomo-ntoso/`.

**Example Structure:**
```json
{
 "akomaNtoso": {
 "act": {
 "meta": { ... },
 "preamble": { ... },
 "body": {
 "chapter": {
 "@eId": "ch_I",
 "num": "CHAPTER I",
 "heading": "PRELIMINARY",
 "section": [
 {
 "@eId": "sec_3",
 "num": "3.",
 "heading": "Definitions.",
 "content": [ ... ]
 }
 ]
 }
 }
 }
 }
}
```

### Goals & Mid-Point Milestone

## ✅ Goals

- [x] **Developed a version-controlled system for BBMP Act** 
 → Implemented structured tracking of amendments using Akoma Ntoso format.

- [x] **Automated data extraction and conversion pipeline** 
 → Extracted raw JSON chapters and converted them into hierarchical, machine-readable Akoma Ntoso JSON.

- [x] **Built an interactive Next.js frontend** 
 → Designed a user-friendly interface to view, compare, and track Act versions with diff highlighting.

- [x] **Implemented AI-powered parsing and diffing logic** 
 → Leveraged GPT/Gemini models for structured legal text parsing and accurate change detection.

- [x] **Goals Achieved By Mid-point Milestone** 
 → Complete pipeline from raw data → structured format → visual diff interface was successfully built and validated.


### Setup/Installation

# BBMP Act Version History — README

Welcome to the **BBMP Act Version History** project. This README contains a single, complete GitHub-ready Markdown document with setup and installation instructions. Copy this entire file to `README.md` in your repository.

---

## 🏛️ BBMP Act Version History

A version-controlled system for the Bruhat Bengaluru Mahanagara Palike (BBMP) Act that lets you extract, structure, diff, and view amendments using an Akoma Ntoso JSON representation and a Next.js frontend.

---

## ✅ Features (completed / achieved)

* ✅ Version-controlled representation of the BBMP Act using Akoma Ntoso-like JSON.
* ✅ Automated extraction pipeline from raw JSON markdown files.
* ✅ AI-powered parsing (OpenAI / Gemini) to convert raw text into structured legal JSON.
* ✅ Next.js frontend to view, compare, diff and export versions.
* ✅ Diffing modes: Cumulative and Incremental.
* ✅ Exporting: `.txt` and `.pdf`.
* ✅ Category-wise change summaries and timeline UI.

---

## ⚙️ Setup & Installation

Follow the steps below to set up and run the project locally.

### 1. Prerequisites

Make sure you have the following installed:

* **Node.js** v18+
* **npm** or **yarn**
* **Git**
* (Optional) **OpenAI API Key** or **Gemini API Key** for AI-based structuring

---

### 2. Clone the repository

```bash
git clone https://github.com/your-username/bbmp-act-version-history.git
cd bbmp-act-version-history
```

---

### 3. Environment variables

Create a `.env` file at the project root and add the necessary keys:

```bash
# For OpenAI (optional if using Gemini)
OPENAI_API_KEY=your-openai-api-key

# For Gemini (optional if using OpenAI)
GEMINI_API_KEY=your-gemini-api-key

# Example Weaviate or other vector DB settings if used by other modules
WEAVIATE_URL=https://your-cluster.weaviate.network
WEAVIATE_API_KEY=your-weaviate-api-key
```

> **Note:** Fill only the keys you will use. Keep `.env` out of version control (`.gitignore`).

---

### 4. Install dependencies

Using npm:

```bash
npm install
```

Or using yarn:

```bash
yarn install
```

---

### 5. Project layout (important folders)

```
.
├─ bbmp_data_extractor/
│ ├─ chapter1.json
│ ├─ chapter2and3.json
│ └─ akomo-ntoso/ # generated Akoma Ntoso JSON output
├─ src/
│ └─ app/
│ └─ page.tsx # main Next.js UI
├─ extractor.js # extraction script (OpenAI)
├─ extractor_gemini.js # extraction script (Gemini)
├─ convertToAkomaNtoso.js # conversion helper (if separate)
├─ jsonToHtml.js # debug / view converter
├─ package.json
└─ README.md
```

---

### 6. Data preparation

Place your raw chapter JSON files in `bbmp_data_extractor/`. Each file should be an array of page objects with a `markdown` field:

```json
[
 { "markdown": "Section or page content here..." },
 { "markdown": "Next page content ..." }
]
```

Run the extraction (OpenAI):

```bash
node extractor.js
```

Or for Gemini:

```bash
node extractor_gemini.js
```

These scripts will:

* Read JSON pages,
* Concatenate markdown per chapter (via `getChapterContent`),
* Send text to the model to parse into a structured Akoma Ntoso-style JSON,
* Save outputs to `bbmp_data_extractor/akomo-ntoso/`.

---

### 7. (Optional) Convert JSON to HTML for debugging

```bash
node jsonToHtml.js
```

This generates simple HTML views from the Akoma Ntoso JSON to inspect structure and verify parsing.

---

### 8. Run the frontend (Next.js)

Start the dev server:

```bash
npm run dev
# or
yarn dev
```

Open the app:

```
http://localhost:3000
```

What you should be able to do:

* Browse chapters and versions
* Switch diff modes (Cumulative / Incremental)
* See insertions (`<ins>`) and deletions (`<del>`) highlighted
* Export current view as `.txt` or print to PDF
* View version timeline and change summaries

---

### 9. Build for production

```bash
npm run build
npm start
# or the yarn equivalents
```

---

## 🔧 Scripts (example `package.json` entries)

Add or confirm these scripts in your `package.json`:

```json
{
 "scripts": {
 "dev": "next dev",
 "build": "next build",
 "start": "next start",
 "extract:openai": "node extractor.js",
 "extract:gemini": "node extractor_gemini.js",
 "convert:html": "node jsonToHtml.js"
 }
}
```

---

## 🧠 Implementation notes

* **getChapterContent(filePath)**: Reads a chapter JSON and concatenates all `markdown` fields into a single text string per chapter.
* **convertToAkomaNtoso**: Sends chapter text to the selected LLM with a prompt instructing it to output a nested JSON reflecting chapters → sections → subsections → clauses (Akoma Ntoso-like structure). Ensure the prompt enforces completeness and faithful representation.
* **Diffing**: `applyAmendmentsVI`, `applyAmend1VII`, etc., wrap deleted text in `<del>` and inserted text in `<ins>`. `computeVersions` builds cumulative or incremental HTML based on diff mode.
* **summarizeDiff**: Counts `<ins>` and `<del>` tags and lists modified sections for quick UI summaries.

---

## ✅ Verification checklist

* [x] Raw chapter JSONs placed in `bbmp_data_extractor/`
* [x] Extraction script runs and produces Akoma Ntoso JSON in `bbmp_data_extractor/akomo-ntoso/`
* [x] `jsonToHtml.js` generates viewable HTML for debugging
* [x] Next.js app runs at `http://localhost:3000`
* [x] Diffing, timeline and exports function in UI

---

## 📝 Example `extractor.js` usage snippet

> Make sure your `extractor.js` has a function like `getChapterContent` and calls to your LLM client for `convertToAkomaNtoso`. Below is a minimalist example of how you might call it (pseudo):

```js
const { getChapterContent, convertToAkomaNtoso } = require('./lib/extractorHelpers');

async function run() {
 const chapterText = getChapterContent('bbmp_data_extractor/chapter1.json');
 const akomaJson = await convertToAkomaNtoso(chapterText);
 // write akomaJson to bbmp_data_extractor/akomo-ntoso/chapter1.json
}
run();
```

---

## 📚 References & standards

* **Akoma Ntoso** — XML standard for parliamentary, legislative and judiciary documents (we use an Akoma Ntoso *inspired* JSON representation).
* **Next.js + TailwindCSS** — UI and styling stack.

---

## ⚖️ License & Contributing

Contributions are welcome. Please open issues and PRs. Add a `CONTRIBUTING.md` and `CODE_OF_CONDUCT.md` as needed for the project.

---

## ✉️ Contact

If you need help or want to collaborate:

* Create an issue in this repo
* Or contact the maintainer at `euclidstellar@gmail.com`

---


### Expected Outcome

_No response_

### Acceptance Criteria

_No response_

### Implementation Details

https://gauravs-organization-13.gitbook.io/untitled/dmp-final-evaluation

### Mockups/Wireframes

_No response_

### Product Name

AI for Legal Justice ( OpenNyAI )

### Organisation Name

If Me

### Domain

⁠Education

### Tech Skills Needed

Artificial Intelligence

### Mentor(s)

Prakahs (OpenNyAI)

### Category

Machine Learning

Metric	Meaning	Good Score
Context Precision	Proportion of relevant info within retrieved contexts (less noise)	`0.7`
Context Recall	Measures how many relevant pieces of info were retrieved (fewer misses)	`0.8`
Response Relevancy	How well the response addresses the query	`0.7`

[DMP 2025]: Gaurav Singh | OpenNyAI #732

Description

Ticket Contents

🧠 Project 1: Query Method Evaluation System on Vector Databases for Grievance Data

📘 Overview

🎯 Project Scope and Impact

⚙️ Features

🔍 9 Query Methods Evaluated

📊 3 Core RAGAS Metrics

🧩 Additional Features

🛠️ Setup Instructions

1️⃣ Environment Setup

2️⃣ Install Dependencies

3️⃣ Data Preparation

4️⃣ Run Evaluation

💡 Query Methods and Best Use Cases

📂 Output Files

🧾 evaluation_results.csv

🗂️ recommendations.json

💻 Console Output

⚡ Evaluation Process Phases

🧩 Phase 1: Initialization

🔁 Phase 2: Evaluate All Methods

📊 Phase 3: Result Analysis + Export

🏛️ BBMP Act Version History - GitBook Documentation

📖 Introduction

🎯 Objective

🧩 Data Extraction and Processing

📂 Raw Data Source

📝 Text Extraction

getChapterContent(filePath)

⚙️ Conversion to Akoma Ntoso

🔧 The convertToAkomaNtoso Function

Key Features

🧱 Akoma Ntoso JSON Structure

Goals & Mid-Point Milestone

✅ Goals

Setup/Installation

BBMP Act Version History — README

🏛️ BBMP Act Version History

✅ Features (completed / achieved)

⚙️ Setup & Installation

1. Prerequisites

2. Clone the repository

3. Environment variables

4. Install dependencies

5. Project layout (important folders)

6. Data preparation

7. (Optional) Convert JSON to HTML for debugging

8. Run the frontend (Next.js)

9. Build for production

🔧 Scripts (example package.json entries)

🧠 Implementation notes

✅ Verification checklist

📝 Example extractor.js usage snippet

📚 References & standards

⚖️ License & Contributing

✉️ Contact

Expected Outcome

Acceptance Criteria

Implementation Details

Mockups/Wireframes

Product Name

Organisation Name

Domain

Tech Skills Needed

Mentor(s)

Category

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

🧾 `evaluation_results.csv`

🗂️ `recommendations.json`

🔧 The `convertToAkomaNtoso` Function

🔧 Scripts (example `package.json` entries)

📝 Example `extractor.js` usage snippet