Skip to content

[DMP 2025]: Gaurav Singh | OpenNyAI #732

@EuclidStellar

Description

@EuclidStellar

Ticket Contents

🧠 Project 1: Query Method Evaluation System on Vector Databases for Grievance Data

📘 Overview

The Query Method Evaluation System is a robust framework designed to identify and recommend the most effective query methods for various categories within a grievance management system.
It leverages RAGAS metricsContext Precision, Context Recall, and Response Relevancy — to comprehensively evaluate different search techniques on a large dataset of structured grievance information.


🎯 Project Scope and Impact

The project involved:

  1. Structuring 15,000 grievance records
    Transforming raw grievance data into a structured, query-ready format — crucial for consistent and accurate retrieval.

  2. Synthetic Query Generation
    Creating 15,000 synthetic queries (one per grievance) to expand the dataset for thorough evaluation and benchmarking.

  3. Evaluation and Benchmarking
    Testing and comparing multiple query methods using RAGAS metrics on structured data and synthetic queries.


⚙️ Features

🔍 9 Query Methods Evaluated

  1. Semantic Search
  2. Hybrid Search
  3. Keyword Search (BM25)
  4. Generative Search (RAG)
  5. Vector Similarity Search
  6. Multiple Target Vectors
  7. Reranking (Hybrid + Rerank)
  8. Aggregate Data
  9. Filtered Search

📊 3 Core RAGAS Metrics

Metric Meaning Good Score
Context Precision Proportion of relevant info within retrieved contexts (less noise) 0.7
Context Recall Measures how many relevant pieces of info were retrieved (fewer misses) 0.8
Response Relevancy How well the response addresses the query 0.7

🧩 Additional Features

  • Category-wise Analysis – Tailored recommendations for best methods by grievance category.
  • Comprehensive Reporting – Exports detailed CSV and JSON reports.
  • Scalable Evaluation – Efficiently handles large datasets for real-world deployment.

🛠️ Setup Instructions

1️⃣ Environment Setup

Create a .env file in your project root with:

WEAVIATE_URL=https://your-cluster.weaviate.network
WEAVIATE_API_KEY=your-weaviate-api-key
OPENAI_API_KEY=your-openai-api-key

2️⃣ Install Dependencies

npm install weaviate-client dotenv node-fetch csv-writer

3️⃣ Data Preparation

Ensure your JSON data follows this format:

[ { "department_code": "DOTEL", "department_name": "Telecommunications", "category": "Mobile Related", "sub_category_1": "Call Drop", "description": "Detailed description of the issue...", "user_queries": [ "My phone calls keep getting dropped. What can I do?", "Why does my cell phone keep disconnecting during calls?" ] } ]

Run the data preparation script:

node data_preparation.js

4️⃣ Run Evaluation

node query_evaluation.js

💡 Query Methods and Best Use Cases

Method | Best For | Use Case -- | -- | -- Semantic Search (nearText) | Natural language queries, conceptual search | “My internet is slow” Hybrid Search | Balanced semantic + keyword matching | Mixed search patterns Keyword Search (BM25) | Exact term matching, technical queries | “error code 404” Generative Search (RAG) | Complex queries requiring synthesis | “What should I do about dropped calls?” Vector Similarity Search | Semantic similarity without preprocessing | Finding conceptually similar records Multiple Target Vectors | Multi-aspect queries, complex categorization | “technical + emotional” content Reranking (Hybrid + Rerank) | High precision requirements | Legal or medical queries Aggregate Data | Statistical, summarization queries | “How many network issues were reported?” Filtered Search | Category-specific, scoped queries | “Only Mobile Related issues”

📂 Output Files

🧾 evaluation_results.csv

Contains:

  • Department, Category, Sub-category, User Query

  • Scores for each method

  • RAGAS metric values (Precision, Recall, Relevancy)


🗂️ recommendations.json

Category-wise recommendations for best-performing methods.

Example:

{ "Mobile Related > Call Drop": { "bestMethod": "hybrid", "bestScore": 0.856, "methodPerformance": { "hybrid": { "averageScore": 0.856, "contextPrecision": 0.89, "contextRecall": 0.82, "responseRelevancy": 0.86 } } } }

💻 Console Output

Real-time progress updates and summarized evaluation metrics during execution.


⚡ Evaluation Process Phases

🧩 Phase 1: Initialization

  • Load .env variables

  • Connect to Weaviate

  • Load evaluation JSON

  • Create evaluator instance

🔁 Phase 2: Evaluate All Methods

  • Loop through all categories and queries

  • Run all 9 query methods

  • Compute 3 RAGAS metrics

  • Log and store results

📊 Phase 3: Result Analysis + Export

  • Aggregate scores

  • Compute averages (P, R, RR, Score)

  • Identify best-performing method

  • Export to CSV and JSON

🏛️ BBMP Act Version History - GitBook Documentation

📖 Introduction

Welcome to the documentation for the BBMP Act Version History project.
This project addresses a crucial issue in Indian governance — the lack of version control and historical tracking for legal acts and amendments.

Legal documents like the Bruhat Bengaluru Mahanagara Palike (BBMP) Act) undergo numerous amendments, but these changes are rarely consolidated or easily accessible.
As a result, it becomes difficult for citizens, legal professionals, and researchers to understand the evolution of laws and track amendments over time.

🎯 Objective

The project provides a user-friendly interface to:

  • View different versions of the BBMP Act.
  • Compare versions to see insertions, deletions, and modifications.
  • Track amendment history via a version-controlled timeline.

This documentation explains the complete project architecture — from data extraction and processing to the frontend UI.


🧩 Data Extraction and Processing

📂 Raw Data Source

  • The initial data is stored as JSON files — each representing one or more chapters of the Act.
  • Location: bbmp_data_extractor/
  • Example files: chapter1.json, chapter2and3.json, etc.
  • Each JSON file contains an array of objects, where each object represents a page or section with a markdown field containing raw text.

📝 Text Extraction

To prepare data for processing, all markdown content is combined into a single string per chapter using the getChapterContent function.

getChapterContent(filePath)

Purpose:
Reads a chapter JSON file and merges all markdown text into a single string.

Process:

  1. Reads the file content from the given filePath.
  2. Parses the JSON data.
  3. Maps over each page to extract its markdown.
  4. Joins all markdown content into one continuous text block.

This produces a clean text representation of each chapter, ready for structuring.


⚙️ Conversion to Akoma Ntoso

Once the raw text is extracted, it is converted into a structured, hierarchical format using the Akoma Ntoso standard — an XML-based schema for legal documents.

🔧 The convertToAkomaNtoso Function

Found in the extraction scripts (extractor.js, extractor_gemini.js), this function is the core of the structuring process.

It sends the raw text to an AI model (e.g., OpenAI GPT or Google Gemini) with a prompt to produce a JSON representation adhering to Akoma Ntoso principles.

Key Features

  • AI-Powered Parsing:
    Uses LLMs to identify and structure legal entities like chapters, sections, and clauses.
  • Hierarchical Organization:
    Outputs a nested JSON that reflects the Act’s legal hierarchy.
  • Accuracy and Completeness:
    The AI prompt enforces full-text coverage without alteration.

🧱 Akoma Ntoso JSON Structure

The structured JSON is saved in bbmp_data_extractor/akomo-ntoso/.

Example Structure:

{
  "akomaNtoso": {
    "act": {
      "meta": { ... },
      "preamble": { ... },
      "body": {
        "chapter": {
          "@eId": "ch_I",
          "num": "CHAPTER I",
          "heading": "PRELIMINARY",
          "section": [
            {
              "@eId": "sec_3",
              "num": "3.",
              "heading": "Definitions.",
              "content": [ ... ]
            }
          ]
        }
      }
    }
  }
}

Goals & Mid-Point Milestone

✅ Goals

  • Developed a version-controlled system for BBMP Act
    → Implemented structured tracking of amendments using Akoma Ntoso format.

  • Automated data extraction and conversion pipeline
    → Extracted raw JSON chapters and converted them into hierarchical, machine-readable Akoma Ntoso JSON.

  • Built an interactive Next.js frontend
    → Designed a user-friendly interface to view, compare, and track Act versions with diff highlighting.

  • Implemented AI-powered parsing and diffing logic
    → Leveraged GPT/Gemini models for structured legal text parsing and accurate change detection.

  • Goals Achieved By Mid-point Milestone
    → Complete pipeline from raw data → structured format → visual diff interface was successfully built and validated.

Setup/Installation

BBMP Act Version History — README

Welcome to the BBMP Act Version History project. This README contains a single, complete GitHub-ready Markdown document with setup and installation instructions. Copy this entire file to README.md in your repository.


🏛️ BBMP Act Version History

A version-controlled system for the Bruhat Bengaluru Mahanagara Palike (BBMP) Act that lets you extract, structure, diff, and view amendments using an Akoma Ntoso JSON representation and a Next.js frontend.


✅ Features (completed / achieved)

  • ✅ Version-controlled representation of the BBMP Act using Akoma Ntoso-like JSON.
  • ✅ Automated extraction pipeline from raw JSON markdown files.
  • ✅ AI-powered parsing (OpenAI / Gemini) to convert raw text into structured legal JSON.
  • ✅ Next.js frontend to view, compare, diff and export versions.
  • ✅ Diffing modes: Cumulative and Incremental.
  • ✅ Exporting: .txt and .pdf.
  • ✅ Category-wise change summaries and timeline UI.

⚙️ Setup & Installation

Follow the steps below to set up and run the project locally.

1. Prerequisites

Make sure you have the following installed:

  • Node.js v18+
  • npm or yarn
  • Git
  • (Optional) OpenAI API Key or Gemini API Key for AI-based structuring

2. Clone the repository

git clone https://github.com/your-username/bbmp-act-version-history.git
cd bbmp-act-version-history

3. Environment variables

Create a .env file at the project root and add the necessary keys:

# For OpenAI (optional if using Gemini)
OPENAI_API_KEY=your-openai-api-key

# For Gemini (optional if using OpenAI)
GEMINI_API_KEY=your-gemini-api-key

# Example Weaviate or other vector DB settings if used by other modules
WEAVIATE_URL=https://your-cluster.weaviate.network
WEAVIATE_API_KEY=your-weaviate-api-key

Note: Fill only the keys you will use. Keep .env out of version control (.gitignore).


4. Install dependencies

Using npm:

npm install

Or using yarn:

yarn install

5. Project layout (important folders)

.
├─ bbmp_data_extractor/
│  ├─ chapter1.json
│  ├─ chapter2and3.json
│  └─ akomo-ntoso/          # generated Akoma Ntoso JSON output
├─ src/
│  └─ app/
│     └─ page.tsx           # main Next.js UI
├─ extractor.js             # extraction script (OpenAI)
├─ extractor_gemini.js      # extraction script (Gemini)
├─ convertToAkomaNtoso.js   # conversion helper (if separate)
├─ jsonToHtml.js            # debug / view converter
├─ package.json
└─ README.md

6. Data preparation

Place your raw chapter JSON files in bbmp_data_extractor/. Each file should be an array of page objects with a markdown field:

[
  { "markdown": "Section or page content here..." },
  { "markdown": "Next page content ..." }
]

Run the extraction (OpenAI):

node extractor.js

Or for Gemini:

node extractor_gemini.js

These scripts will:

  • Read JSON pages,
  • Concatenate markdown per chapter (via getChapterContent),
  • Send text to the model to parse into a structured Akoma Ntoso-style JSON,
  • Save outputs to bbmp_data_extractor/akomo-ntoso/.

7. (Optional) Convert JSON to HTML for debugging

node jsonToHtml.js

This generates simple HTML views from the Akoma Ntoso JSON to inspect structure and verify parsing.


8. Run the frontend (Next.js)

Start the dev server:

npm run dev
# or
yarn dev

Open the app:

http://localhost:3000

What you should be able to do:

  • Browse chapters and versions
  • Switch diff modes (Cumulative / Incremental)
  • See insertions (<ins>) and deletions (<del>) highlighted
  • Export current view as .txt or print to PDF
  • View version timeline and change summaries

9. Build for production

npm run build
npm start
# or the yarn equivalents

🔧 Scripts (example package.json entries)

Add or confirm these scripts in your package.json:

{
  "scripts": {
    "dev": "next dev",
    "build": "next build",
    "start": "next start",
    "extract:openai": "node extractor.js",
    "extract:gemini": "node extractor_gemini.js",
    "convert:html": "node jsonToHtml.js"
  }
}

🧠 Implementation notes

  • getChapterContent(filePath): Reads a chapter JSON and concatenates all markdown fields into a single text string per chapter.
  • convertToAkomaNtoso: Sends chapter text to the selected LLM with a prompt instructing it to output a nested JSON reflecting chapters → sections → subsections → clauses (Akoma Ntoso-like structure). Ensure the prompt enforces completeness and faithful representation.
  • Diffing: applyAmendmentsVI, applyAmend1VII, etc., wrap deleted text in <del> and inserted text in <ins>. computeVersions builds cumulative or incremental HTML based on diff mode.
  • summarizeDiff: Counts <ins> and <del> tags and lists modified sections for quick UI summaries.

✅ Verification checklist

  • Raw chapter JSONs placed in bbmp_data_extractor/
  • Extraction script runs and produces Akoma Ntoso JSON in bbmp_data_extractor/akomo-ntoso/
  • jsonToHtml.js generates viewable HTML for debugging
  • Next.js app runs at http://localhost:3000
  • Diffing, timeline and exports function in UI

📝 Example extractor.js usage snippet

Make sure your extractor.js has a function like getChapterContent and calls to your LLM client for convertToAkomaNtoso. Below is a minimalist example of how you might call it (pseudo):

const { getChapterContent, convertToAkomaNtoso } = require('./lib/extractorHelpers');

async function run() {
  const chapterText = getChapterContent('bbmp_data_extractor/chapter1.json');
  const akomaJson = await convertToAkomaNtoso(chapterText);
  // write akomaJson to bbmp_data_extractor/akomo-ntoso/chapter1.json
}
run();

📚 References & standards

  • Akoma Ntoso — XML standard for parliamentary, legislative and judiciary documents (we use an Akoma Ntoso inspired JSON representation).
  • Next.js + TailwindCSS — UI and styling stack.

⚖️ License & Contributing

Contributions are welcome. Please open issues and PRs. Add a CONTRIBUTING.md and CODE_OF_CONDUCT.md as needed for the project.


✉️ Contact

If you need help or want to collaborate:


Expected Outcome

No response

Acceptance Criteria

No response

Implementation Details

https://gauravs-organization-13.gitbook.io/untitled/dmp-final-evaluation

Mockups/Wireframes

No response

Product Name

AI for Legal Justice ( OpenNyAI )

Organisation Name

If Me

Domain

⁠Education

Tech Skills Needed

Artificial Intelligence

Mentor(s)

Prakahs (OpenNyAI)

Category

Machine Learning

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions