Add Support for Pinecone Integrated Reranking (Native Inference API)

### Feature Description

Add support for Pinecone's integrated reranking capabilities, allowing two-stage retrieval with native reranking models (pinecone-rerank-v0, cohere-rerank-3.5, bge-reranker-v2-m3) directly in the query operation without additional processing steps.

### Feature Category

Integration

### Problem Statement

Pinecone now offers integrated reranking as part of their inference API, enabling significantly improved retrieval quality (up to 60% accuracy boost) with a single query call. Flowise currently:

1. **No Native Pinecone Reranking Support**
   The Pinecone vector store node doesn't expose the `rerank` parameter available in Pinecone's query API. This forces users to:
   - Retrieve more results than needed (increasing costs and latency)
   - Manually implement reranking in separate nodes
   - Use third-party reranking services

2. **Limited Cohere Rerank Integration**
   While Flowise has a "Cohere Rerank Retriever" node (docs show it's incomplete), it:
   - Requires separate Cohere API setup and billing
   - Adds latency from cross-service calls
   - Doesn't leverage Pinecone's hosted Cohere model
   - Requires additional orchestration steps

3. **Missing Two-Stage Retrieval Pattern**
   Modern RAG architectures use:
   - Stage 1: Fast, broad retrieval (top_k = 50-100 results)
   - Stage 2: Precise reranking (top_n = 3-10 final results)

   This pattern dramatically improves relevance while maintaining speed. Without integrated reranking, users either:
   - Return too few results initially (missing relevant docs)
   - Return too many results to LLM (wasting context, increasing costs)
   - Build complex multi-node flows that are hard to maintain

**Real-world impact:**
   For agent-based systems with complex queries, reranking is essential. Example: A sustainability compliance agent querying standards    documents needs to:
   1. Retrieve 50 potentially relevant document chunks (broad recall)
   2. Rerank to the 5 most relevant chunks (precision)
   3. Pass only those 5 to the LLM for reasoning

Without integrated reranking, we're forced to either compromise on recall or waste LLM context/costs.

### Proposed Solution

**Phase 1: Add Integrated Reranking to Pinecone Vector Store Node**

Add "Reranking Configuration" section with:
- Enable Reranking (toggle)
- Rerank Model dropdown: 
  - pinecone-rerank-v0 (best accuracy, 512 tokens, 100 docs max)
  - bge-reranker-v2-m3 (multilingual, 1024 tokens, 100 docs max)
  - cohere-rerank-3.5 (balanced, 40K tokens, 200 docs max)
- Top K (initial retrieval): Number of docs to retrieve from index
- Top N (after rerank): Number of docs to return after reranking
- Rank Fields: Array of metadata fields to use for reranking (default: ["text"])

Model-specific parameters:
- truncate (pinecone-rerank-v0, bge-reranker-v2-m3): "END" | "NONE"
- max_chunks_per_doc (cohere-rerank-3.5): integer 1-3072

**Phase 2: Enhance Existing Cohere Rerank Retriever**
- Update to support Pinecone-hosted Cohere models (no separate API key needed)
- Add option to use Pinecone's cohere-rerank-3.5 vs external Cohere API
- Display reranking scores in debug output

**Phase 3: Visual Feedback**
- Show reranking metrics in execution logs:
  - Initial retrieval count
  - Reranked result count
  - Average relevance score improvement
- Display per-result rerank scores in debug mode

### Mockups or References

**Pinecone Reranking Documentation:**
- Main guide: https://docs.pinecone.io/guides/search/rerank-results
- Integrated inference: https://www.pinecone.io/blog/integrated-inference/
- Python SDK example: https://github.com/pinecone-io/pinecone-ts-client/blob/main/README.md

**Available Models:**
- pinecone-rerank-v0: https://docs.pinecone.io/models/pinecone-rerank-v0
- cohere-rerank-3.5: https://docs.pinecone.io/models/cohere-rerank-3.5  
- bge-reranker-v2-m3: https://docs.pinecone.io/models/bge-reranker-v2-m3

**TypeScript SDK reference:**
- https://github.com/pinecone-io/pinecone-ts-client

**Pinecone query API reference:**
- https://docs.pinecone.io/reference/api/latest/data-plane/search_records

### Additional Context

**Benefits of integrated reranking:**
1. **Improved Accuracy**: Up to 60% better search results (per Pinecone benchmarks)
2. **Single Platform**: No need for separate Cohere API setup/billing
3. **Lower Latency**: Reranking happens in the same data centre as the index
4. **Cost-Effective**: Pinecone's pricing includes reranking in their inference costs
5. **Simplified Architecture**: One API call instead of orchestrating multiple services

**Use cases:**
- Complex agent queries requiring high precision
- Multi-document knowledge bases with diverse content
- Domain-specific RAG (legal, medical, technical standards)
- Question-answering systems where relevance is critical
- Any RAG system with >10 candidate documents per query

**Cost consideration:**
Reranking is charged per query+document pair. Users should control costs via the Top K limit.

**Availability:**
- Standard, Enterprise, and Dedicated Pinecone plans
- US region only (as of now)

**Existing workarounds:**
- Users currently implement reranking via separate Cohere API calls or custom code
- Requires complex multi-node flows
- Misses performance benefits of Pinecone-hosted models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add Support for Pinecone Integrated Reranking (Native Inference API) #5387

Feature Description

Feature Category

Problem Statement

Proposed Solution

Mockups or References

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Add Support for Pinecone Integrated Reranking (Native Inference API) #5387

Description

Feature Description

Feature Category

Problem Statement

Proposed Solution

Mockups or References

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions