-
-
Notifications
You must be signed in to change notification settings - Fork 23k
Description
Feature Description
Add support for Pinecone's integrated reranking capabilities, allowing two-stage retrieval with native reranking models (pinecone-rerank-v0, cohere-rerank-3.5, bge-reranker-v2-m3) directly in the query operation without additional processing steps.
Feature Category
Integration
Problem Statement
Pinecone now offers integrated reranking as part of their inference API, enabling significantly improved retrieval quality (up to 60% accuracy boost) with a single query call. Flowise currently:
-
No Native Pinecone Reranking Support
The Pinecone vector store node doesn't expose thererankparameter available in Pinecone's query API. This forces users to:- Retrieve more results than needed (increasing costs and latency)
- Manually implement reranking in separate nodes
- Use third-party reranking services
-
Limited Cohere Rerank Integration
While Flowise has a "Cohere Rerank Retriever" node (docs show it's incomplete), it:- Requires separate Cohere API setup and billing
- Adds latency from cross-service calls
- Doesn't leverage Pinecone's hosted Cohere model
- Requires additional orchestration steps
-
Missing Two-Stage Retrieval Pattern
Modern RAG architectures use:- Stage 1: Fast, broad retrieval (top_k = 50-100 results)
- Stage 2: Precise reranking (top_n = 3-10 final results)
This pattern dramatically improves relevance while maintaining speed. Without integrated reranking, users either:
- Return too few results initially (missing relevant docs)
- Return too many results to LLM (wasting context, increasing costs)
- Build complex multi-node flows that are hard to maintain
Real-world impact:
For agent-based systems with complex queries, reranking is essential. Example: A sustainability compliance agent querying standards documents needs to:
- Retrieve 50 potentially relevant document chunks (broad recall)
- Rerank to the 5 most relevant chunks (precision)
- Pass only those 5 to the LLM for reasoning
Without integrated reranking, we're forced to either compromise on recall or waste LLM context/costs.
Proposed Solution
Phase 1: Add Integrated Reranking to Pinecone Vector Store Node
Add "Reranking Configuration" section with:
- Enable Reranking (toggle)
- Rerank Model dropdown:
- pinecone-rerank-v0 (best accuracy, 512 tokens, 100 docs max)
- bge-reranker-v2-m3 (multilingual, 1024 tokens, 100 docs max)
- cohere-rerank-3.5 (balanced, 40K tokens, 200 docs max)
- Top K (initial retrieval): Number of docs to retrieve from index
- Top N (after rerank): Number of docs to return after reranking
- Rank Fields: Array of metadata fields to use for reranking (default: ["text"])
Model-specific parameters:
- truncate (pinecone-rerank-v0, bge-reranker-v2-m3): "END" | "NONE"
- max_chunks_per_doc (cohere-rerank-3.5): integer 1-3072
Phase 2: Enhance Existing Cohere Rerank Retriever
- Update to support Pinecone-hosted Cohere models (no separate API key needed)
- Add option to use Pinecone's cohere-rerank-3.5 vs external Cohere API
- Display reranking scores in debug output
Phase 3: Visual Feedback
- Show reranking metrics in execution logs:
- Initial retrieval count
- Reranked result count
- Average relevance score improvement
- Display per-result rerank scores in debug mode
Mockups or References
Pinecone Reranking Documentation:
- Main guide: https://docs.pinecone.io/guides/search/rerank-results
- Integrated inference: https://www.pinecone.io/blog/integrated-inference/
- Python SDK example: https://github.com/pinecone-io/pinecone-ts-client/blob/main/README.md
Available Models:
- pinecone-rerank-v0: https://docs.pinecone.io/models/pinecone-rerank-v0
- cohere-rerank-3.5: https://docs.pinecone.io/models/cohere-rerank-3.5
- bge-reranker-v2-m3: https://docs.pinecone.io/models/bge-reranker-v2-m3
TypeScript SDK reference:
Pinecone query API reference:
Additional Context
Benefits of integrated reranking:
- Improved Accuracy: Up to 60% better search results (per Pinecone benchmarks)
- Single Platform: No need for separate Cohere API setup/billing
- Lower Latency: Reranking happens in the same data centre as the index
- Cost-Effective: Pinecone's pricing includes reranking in their inference costs
- Simplified Architecture: One API call instead of orchestrating multiple services
Use cases:
- Complex agent queries requiring high precision
- Multi-document knowledge bases with diverse content
- Domain-specific RAG (legal, medical, technical standards)
- Question-answering systems where relevance is critical
- Any RAG system with >10 candidate documents per query
Cost consideration:
Reranking is charged per query+document pair. Users should control costs via the Top K limit.
Availability:
- Standard, Enterprise, and Dedicated Pinecone plans
- US region only (as of now)
Existing workarounds:
- Users currently implement reranking via separate Cohere API calls or custom code
- Requires complex multi-node flows
- Misses performance benefits of Pinecone-hosted models