Skip to content

Add Support for Pinecone Integrated Reranking (Native Inference API) #5387

@TravisP-Greener

Description

@TravisP-Greener

Feature Description

Add support for Pinecone's integrated reranking capabilities, allowing two-stage retrieval with native reranking models (pinecone-rerank-v0, cohere-rerank-3.5, bge-reranker-v2-m3) directly in the query operation without additional processing steps.

Feature Category

Integration

Problem Statement

Pinecone now offers integrated reranking as part of their inference API, enabling significantly improved retrieval quality (up to 60% accuracy boost) with a single query call. Flowise currently:

  1. No Native Pinecone Reranking Support
    The Pinecone vector store node doesn't expose the rerank parameter available in Pinecone's query API. This forces users to:

    • Retrieve more results than needed (increasing costs and latency)
    • Manually implement reranking in separate nodes
    • Use third-party reranking services
  2. Limited Cohere Rerank Integration
    While Flowise has a "Cohere Rerank Retriever" node (docs show it's incomplete), it:

    • Requires separate Cohere API setup and billing
    • Adds latency from cross-service calls
    • Doesn't leverage Pinecone's hosted Cohere model
    • Requires additional orchestration steps
  3. Missing Two-Stage Retrieval Pattern
    Modern RAG architectures use:

    • Stage 1: Fast, broad retrieval (top_k = 50-100 results)
    • Stage 2: Precise reranking (top_n = 3-10 final results)

    This pattern dramatically improves relevance while maintaining speed. Without integrated reranking, users either:

    • Return too few results initially (missing relevant docs)
    • Return too many results to LLM (wasting context, increasing costs)
    • Build complex multi-node flows that are hard to maintain

Real-world impact:
For agent-based systems with complex queries, reranking is essential. Example: A sustainability compliance agent querying standards documents needs to:

  1. Retrieve 50 potentially relevant document chunks (broad recall)
  2. Rerank to the 5 most relevant chunks (precision)
  3. Pass only those 5 to the LLM for reasoning

Without integrated reranking, we're forced to either compromise on recall or waste LLM context/costs.

Proposed Solution

Phase 1: Add Integrated Reranking to Pinecone Vector Store Node

Add "Reranking Configuration" section with:

  • Enable Reranking (toggle)
  • Rerank Model dropdown:
    • pinecone-rerank-v0 (best accuracy, 512 tokens, 100 docs max)
    • bge-reranker-v2-m3 (multilingual, 1024 tokens, 100 docs max)
    • cohere-rerank-3.5 (balanced, 40K tokens, 200 docs max)
  • Top K (initial retrieval): Number of docs to retrieve from index
  • Top N (after rerank): Number of docs to return after reranking
  • Rank Fields: Array of metadata fields to use for reranking (default: ["text"])

Model-specific parameters:

  • truncate (pinecone-rerank-v0, bge-reranker-v2-m3): "END" | "NONE"
  • max_chunks_per_doc (cohere-rerank-3.5): integer 1-3072

Phase 2: Enhance Existing Cohere Rerank Retriever

  • Update to support Pinecone-hosted Cohere models (no separate API key needed)
  • Add option to use Pinecone's cohere-rerank-3.5 vs external Cohere API
  • Display reranking scores in debug output

Phase 3: Visual Feedback

  • Show reranking metrics in execution logs:
    • Initial retrieval count
    • Reranked result count
    • Average relevance score improvement
  • Display per-result rerank scores in debug mode

Mockups or References

Pinecone Reranking Documentation:

Available Models:

TypeScript SDK reference:

Pinecone query API reference:

Additional Context

Benefits of integrated reranking:

  1. Improved Accuracy: Up to 60% better search results (per Pinecone benchmarks)
  2. Single Platform: No need for separate Cohere API setup/billing
  3. Lower Latency: Reranking happens in the same data centre as the index
  4. Cost-Effective: Pinecone's pricing includes reranking in their inference costs
  5. Simplified Architecture: One API call instead of orchestrating multiple services

Use cases:

  • Complex agent queries requiring high precision
  • Multi-document knowledge bases with diverse content
  • Domain-specific RAG (legal, medical, technical standards)
  • Question-answering systems where relevance is critical
  • Any RAG system with >10 candidate documents per query

Cost consideration:
Reranking is charged per query+document pair. Users should control costs via the Top K limit.

Availability:

  • Standard, Enterprise, and Dedicated Pinecone plans
  • US region only (as of now)

Existing workarounds:

  • Users currently implement reranking via separate Cohere API calls or custom code
  • Requires complex multi-node flows
  • Misses performance benefits of Pinecone-hosted models

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions