[feat]: implement github contributor recommendation tool #110

smokeyScraper · 2025-07-22T21:49:10Z

closes #96

Interactions

Data present in Weaviate DB

weaviate_data_readable.json

Summary by CodeRabbit

New Features
- Introduced contributor recommendation functionality, allowing users to receive tailored suggestions for potential contributors based on GitHub issues or general technical queries.
- Implemented advanced query alignment and issue summarization to optimize contributor search results.
- Added specialized formatting for contributor recommendations in assistant responses, providing clear summaries, search details, and contributor expertise.
Enhancements
- Expanded prompt instructions and examples for improved intent analysis and response accuracy.
- Improved summarization and embedding of GitHub issue content for more relevant contributor matching.
Bug Fixes
- Corrected an import path for user profiling from GitHub, ensuring proper functionality.

…contributor recommendation tool

…rize and embed it

…thub authorization

coderabbitai · 2025-07-22T21:49:18Z

Walkthrough

This change introduces a complete contributor recommendation workflow to the DevRel GitHub agent. It adds prompt templates for intent analysis and query alignment, implements a new handler and workflow for contributor recommendation using LLM-based query refinement and hybrid search, and provides a GitHub issue processor for summarization and embedding. Prompt instructions and formatting for contributor recommendations are updated accordingly.

Changes

File(s)	Change Summary
backend/app/agents/devrel/github/github_toolkit.py	Enabled the contributor recommendation handler in the execution path.
backend/app/agents/devrel/github/prompts/contributor_recommendation/issue_summarization.py	Added `ISSUE_SUMMARIZATION_PROMPT` constant for technical summarization of GitHub issues.
backend/app/agents/devrel/github/prompts/contributor_recommendation/query_alignment.py	Added `QUERY_ALIGNMENT_PROMPT` constant for aligning contributor search queries.
backend/app/agents/devrel/github/prompts/intent_analysis.py	Expanded prompt content for contributor recommendation intent classification with examples and guidelines.
backend/app/agents/devrel/github/tools/contributor_recommendation.py	Added a new module implementing the contributor recommendation workflow and handler function.
backend/app/agents/devrel/prompts/response_prompt.py	Updated formatting instructions and prompt content for contributor recommendations.
backend/app/api/v1/auth.py	Changed import path for `profile_user_from_github` to correct module.
backend/app/services/github/issue_processor.py	Added `GitHubIssueProcessor` class for fetching, summarizing, and embedding GitHub issue content.
backend/app/services/github/user/profiling.py	Added a new public async `request` method as a stable wrapper for internal GitHub API requests.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant DevRelAgent
    participant ContributorRecommendationWorkflow
    participant GitHubIssueProcessor
    participant LLM
    participant SearchService

    User->>DevRelAgent: Submit contributor recommendation query
    DevRelAgent->>ContributorRecommendationWorkflow: handle_contributor_recommendation(query)
    ContributorRecommendationWorkflow->>LLM: Align user query (QUERY_ALIGNMENT_PROMPT)
    alt Query contains GitHub issue URL
        ContributorRecommendationWorkflow->>GitHubIssueProcessor: get_embedding_for_issue()
        GitHubIssueProcessor->>GitHub: Fetch issue content & comments
        GitHubIssueProcessor->>LLM: Summarize issue (ISSUE_SUMMARIZATION_PROMPT)
        GitHubIssueProcessor->>EmbeddingService: Generate embedding
        ContributorRecommendationWorkflow->>SearchService: Hybrid search (embedding + keywords)
    else
        ContributorRecommendationWorkflow->>EmbeddingService: Generate embedding for aligned query
        ContributorRecommendationWorkflow->>SearchService: Hybrid search (embedding + keywords)
    end
    SearchService-->>ContributorRecommendationWorkflow: Contributor search results
    ContributorRecommendationWorkflow-->>DevRelAgent: Formatted recommendations
    DevRelAgent-->>User: Display recommendations (special formatting)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~90 minutes

Possibly related PRs

[feat]: add user profile summarizing and generation of embeddings #91: Closely related; adds user profile summarization and embedding generation, supporting the contributor recommendation pipeline.

Suggested reviewers

chandansgowda

Poem

In the warren where code bunnies dwell,
We sniff out experts—oh, so well!
With prompts aligned and issues parsed,
We search for skills, both near and far.
Now contributors hop into view,
Thanks to workflows shiny and new!
🐇✨

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (6)

backend/app/agents/devrel/github/prompts/contributor_recommendation/query_alignment.py (1)

38-38: Remove the trailing empty line.
-Return the JSON object only:"""
-
+Return the JSON object only:"""

backend/app/services/github/issue_processor.py (2)

72-83: Consider more specific exception handling.

The broad except Exception catch could mask specific errors. Consider catching more specific exceptions like ValueError for content issues and network-related exceptions separately.

-        except Exception as e:
-            logger.error(f"Error processing issue {self.owner}/{self.repo}#{self.issue_number}: {str(e)}")
-            raise e
+        except ValueError as e:
+            logger.error(f"Content error processing issue {self.owner}/{self.repo}#{self.issue_number}: {str(e)}")
+            raise
+        except Exception as e:
+            logger.error(f"Unexpected error processing issue {self.owner}/{self.repo}#{self.issue_number}: {str(e)}")
+            raise

63-64: Potential log truncation might hide important information.

The summary is truncated to 100 characters in the log, which might cut off important debugging information.

-        logger.info(f"Generated summary: {response.content.strip()[:100]}")
+        logger.info(f"Generated summary: {response.content.strip()[:200]}...")

backend/app/agents/devrel/github/tools/contributor_recommendation.py (3)

36-43: Add validation for GitHub issue URL extraction.

The regex matches GitHub issue URLs but doesn't validate if the URL is properly formed. Consider adding validation after extraction.

         url_match = re.search(r'https?://github\.com/[\w-]+/[\w.-]+/issues/\d+', query)
 
         if url_match:
-            issue_content = await self._fetch_github_issue_content(url_match.group(0))
-            full_query = f"{query}\n\nIssue content: {issue_content}"
+            try:
+                issue_content = await self._fetch_github_issue_content(url_match.group(0))
+                full_query = f"{query}\n\nIssue content: {issue_content}"
+            except Exception as e:
+                logger.warning(f"Failed to fetch issue content: {e}, proceeding with original query")
+                full_query = query
         else:
             full_query = query

100-106: Document the weight parameters for hybrid search.

The vector and BM25 weights are hardcoded. Consider making them configurable or at least documenting why these specific values were chosen.

+        # Weights optimized for technical contributor matching:
+        # - 0.7 vector weight prioritizes semantic understanding
+        # - 0.3 BM25 weight ensures keyword relevance
         results = await search_contributors(
             query_embedding=query_embedding,
             keywords=alignment_result.get("keywords", []),
             limit=5,
             vector_weight=0.7,  # Semantic similarity
             bm25_weight=0.3     # Keyword matching
         )

131-135: Improve readability of reason construction.

The reason construction logic could be more readable.

             reason_parts = []
             if languages:
-                reason_parts.append(f"Expert in {', '.join(languages)}")
+                reason_parts.append(f"Expert in {', '.join(languages[:3])}")  # Limit to top 3 languages
             if topics:
-                reason_parts.append(f"Active in {', '.join(topics)}")
+                reason_parts.append(f"Active in {', '.join(topics[:3])}")  # Limit to top 3 topics

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8fdf2d4 and 1bf9e8b.

📒 Files selected for processing (8)

backend/app/agents/devrel/github/github_toolkit.py (2 hunks)
backend/app/agents/devrel/github/prompts/contributor_recommendation/issue_summarization.py (1 hunks)
backend/app/agents/devrel/github/prompts/contributor_recommendation/query_alignment.py (1 hunks)
backend/app/agents/devrel/github/prompts/intent_analysis.py (2 hunks)
backend/app/agents/devrel/github/tools/contributor_recommendation.py (1 hunks)
backend/app/agents/devrel/prompts/response_prompt.py (1 hunks)
backend/app/api/v1/auth.py (1 hunks)
backend/app/services/github/issue_processor.py (1 hunks)

🧠 Learnings (2)

📓 Common learnings

Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#85
File: tests/test_supabase.py:1-3
Timestamp: 2025-06-28T14:45:55.244Z
Learning: In the Devr.AI project, smokeyScraper prefers to defer comprehensive test refactoring to separate PRs/efforts when doing major backend restructuring, rather than expanding the scope of the current refactoring PR to include test updates.

Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#87
File: tests/test_supabase.py:1-3
Timestamp: 2025-06-28T23:15:13.374Z
Learning: In the Devr.AI project, smokeyScraper prefers to defer test updates and fixes (like missing imports after module reorganization) to separate PRs rather than expanding the scope of module update/chore PRs to include comprehensive test refactoring.

Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#90
File: backend/app/agents/devrel/nodes/react_supervisor.py:97-101
Timestamp: 2025-07-05T04:33:39.840Z
Learning: In the Devr.AI project, smokeyScraper prefers to defer code deduplication refactoring (like extracting duplicate functions to shared utilities) until there are more common functionalities present among tools/workflow. With only two files using the same function, they consider it not a problem currently and prefer to "align later in a more better way" once more patterns emerge.

Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#85
File: backend/app/services/auth/management.py:32-33
Timestamp: 2025-06-28T14:44:36.819Z
Learning: In the Devr.AI project, smokeyScraper prefers using machine timezone (IST) for datetime operations during development and testing for easier debugging, with plans to switch to UTC for deployment later.

backend/app/agents/devrel/github/github_toolkit.py (2)

Learnt from: smokeyScraper
PR: #72
File: backend/app/agents/devrel/nodes/handle_technical_support_node.py:6-17
Timestamp: 2025-06-08T13:15:40.536Z
Learning: The handle_technical_support_node function in backend/app/agents/devrel/nodes/handle_technical_support_node.py is intentionally minimal and will be extended after database configuration is completed.

Learnt from: smokeyScraper
PR: #72
File: backend/app/agents/devrel/nodes/handle_web_search_node.py:31-42
Timestamp: 2025-06-08T13:31:11.572Z
Learning: In backend/app/agents/devrel/tools/search_tool.py, the TavilySearchTool.search() method has partial error handling for missing API key, AttributeError, ConnectionError, and TimeoutError, but lacks a comprehensive Exception catch-all block, so calling functions may still need additional error handling for other potential exceptions.

🧬 Code Graph Analysis (2)

backend/app/api/v1/auth.py (1)

backend/app/services/github/user/profiling.py (1)

profile_user_from_github (297-329)

backend/app/agents/devrel/github/github_toolkit.py (1)

backend/app/agents/devrel/github/tools/contributor_recommendation.py (1)

handle_contributor_recommendation (81-170)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#85
File: tests/test_supabase.py:1-3
Timestamp: 2025-06-28T14:45:55.244Z
Learning: In the Devr.AI project, smokeyScraper prefers to defer comprehensive test refactoring to separate PRs/efforts when doing major backend restructuring, rather than expanding the scope of the current refactoring PR to include test updates.

Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#87
File: tests/test_supabase.py:1-3
Timestamp: 2025-06-28T23:15:13.374Z
Learning: In the Devr.AI project, smokeyScraper prefers to defer test updates and fixes (like missing imports after module reorganization) to separate PRs rather than expanding the scope of module update/chore PRs to include comprehensive test refactoring.

Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#90
File: backend/app/agents/devrel/nodes/react_supervisor.py:97-101
Timestamp: 2025-07-05T04:33:39.840Z
Learning: In the Devr.AI project, smokeyScraper prefers to defer code deduplication refactoring (like extracting duplicate functions to shared utilities) until there are more common functionalities present among tools/workflow. With only two files using the same function, they consider it not a problem currently and prefer to "align later in a more better way" once more patterns emerge.

Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#85
File: backend/app/services/auth/management.py:32-33
Timestamp: 2025-06-28T14:44:36.819Z
Learning: In the Devr.AI project, smokeyScraper prefers using machine timezone (IST) for datetime operations during development and testing for easier debugging, with plans to switch to UTC for deployment later.

backend/app/agents/devrel/github/github_toolkit.py (2)

Learnt from: smokeyScraper
PR: #72
File: backend/app/agents/devrel/nodes/handle_technical_support_node.py:6-17
Timestamp: 2025-06-08T13:15:40.536Z
Learning: The handle_technical_support_node function in backend/app/agents/devrel/nodes/handle_technical_support_node.py is intentionally minimal and will be extended after database configuration is completed.

Learnt from: smokeyScraper
PR: #72
File: backend/app/agents/devrel/nodes/handle_web_search_node.py:31-42
Timestamp: 2025-06-08T13:31:11.572Z
Learning: In backend/app/agents/devrel/tools/search_tool.py, the TavilySearchTool.search() method has partial error handling for missing API key, AttributeError, ConnectionError, and TimeoutError, but lacks a comprehensive Exception catch-all block, so calling functions may still need additional error handling for other potential exceptions.

🧬 Code Graph Analysis (2)

backend/app/api/v1/auth.py (1)

backend/app/services/github/user/profiling.py (1)

profile_user_from_github (297-329)

backend/app/agents/devrel/github/github_toolkit.py (1)

backend/app/agents/devrel/github/tools/contributor_recommendation.py (1)

handle_contributor_recommendation (81-170)

🔇 Additional comments (10)

backend/app/api/v1/auth.py (1)

5-5: Import path update looks correct.

The function profile_user_from_github has been correctly relocated to the new module path, aligning with the GitHub services restructuring.

backend/app/agents/devrel/github/prompts/contributor_recommendation/issue_summarization.py (1)

1-20: Well-structured prompt for issue summarization.

The prompt effectively guides technical analysis of GitHub issues for contributor search. It covers all essential aspects (technologies, skills, components) and the "job requirement" approach is a clever way to optimize for contributor matching.

backend/app/agents/devrel/github/github_toolkit.py (2)

9-9: Import activation looks correct.

The import for handle_contributor_recommendation is properly uncommented to enable the contributor recommendation functionality.

105-105: Function call implementation is correct.

The placeholder has been properly replaced with an asynchronous call to handle_contributor_recommendation(query), which integrates seamlessly with the existing toolkit pattern.

backend/app/agents/devrel/github/prompts/intent_analysis.py (2)

5-5: Enhanced description clarifies functionality scope.

The updated description clearly indicates support for both GitHub issue URLs and general contributor queries, providing better guidance for intent classification.

15-21: Comprehensive examples improve classification accuracy.

The detailed examples cover all major contributor recommendation scenarios (PR reviews, expertise finding, assignees, URLs, general help) which should significantly improve intent classification accuracy.

backend/app/agents/devrel/prompts/response_prompt.py (3)

29-29: Improved emoji instruction flexibility.

Changing from specific emoji examples to general guidance allows for more contextual emoji usage while maintaining visual appeal.

33-39: Comprehensive formatting guidelines for contributor recommendations.

The detailed formatting instructions ensure consistent presentation of contributor recommendation results, covering all key information (search details, scores, expertise, metadata) in a user-friendly format.

48-48: Clear instruction for special formatting.

The explicit instruction (#7) ensures the special contributor recommendation formatting will be applied when relevant, completing the integration.

backend/app/agents/devrel/github/prompts/contributor_recommendation/query_alignment.py (1)

1-37: Well-structured prompt template with clear instructions.

The prompt effectively guides the LLM to extract technical requirements and generate structured search queries. The examples cover different scenarios, and the formatting rules ensure clean JSON output.

backend/app/agents/devrel/github/tools/contributor_recommendation.py

backend/app/services/github/issue_processor.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

backend/app/agents/devrel/github/tools/contributor_recommendation.py (1)
48-48: Move import to module level.

The json import should be moved to the top of the file with other imports rather than being imported inside the method.
+import json
 import logging
 import re
 from typing import Any, Dict
Then remove the import from inside the method:
         try:
-            import json
             result = json.loads(response.content.strip())

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1bf9e8b and 48b8b27.

📒 Files selected for processing (5)

backend/app/agents/devrel/github/github_toolkit.py (2 hunks)
backend/app/agents/devrel/github/prompts/contributor_recommendation/query_alignment.py (1 hunks)
backend/app/agents/devrel/github/tools/contributor_recommendation.py (1 hunks)
backend/app/services/github/issue_processor.py (1 hunks)
backend/app/services/github/user/profiling.py (1 hunks)

✅ Files skipped from review due to trivial changes (1)

backend/app/agents/devrel/github/prompts/contributor_recommendation/query_alignment.py

🚧 Files skipped from review as they are similar to previous changes (2)

backend/app/agents/devrel/github/github_toolkit.py
backend/app/services/github/issue_processor.py

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#85
File: tests/test_supabase.py:1-3
Timestamp: 2025-06-28T14:45:55.244Z
Learning: In the Devr.AI project, smokeyScraper prefers to defer comprehensive test refactoring to separate PRs/efforts when doing major backend restructuring, rather than expanding the scope of the current refactoring PR to include test updates.

Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#87
File: tests/test_supabase.py:1-3
Timestamp: 2025-06-28T23:15:13.374Z
Learning: In the Devr.AI project, smokeyScraper prefers to defer test updates and fixes (like missing imports after module reorganization) to separate PRs rather than expanding the scope of module update/chore PRs to include comprehensive test refactoring.

Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#90
File: backend/app/agents/devrel/nodes/react_supervisor.py:97-101
Timestamp: 2025-07-05T04:33:39.840Z
Learning: In the Devr.AI project, smokeyScraper prefers to defer code deduplication refactoring (like extracting duplicate functions to shared utilities) until there are more common functionalities present among tools/workflow. With only two files using the same function, they consider it not a problem currently and prefer to "align later in a more better way" once more patterns emerge.

Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#110
File: backend/app/agents/devrel/github/tools/contributor_recommendation.py:49-49
Timestamp: 2025-07-25T23:11:27.301Z
Learning: In the Devr.AI project, smokeyScraper prefers to keep debug print statements (like print(response)) during the development phase for easier troubleshooting, with plans to remove them when moving to production rather than immediately cleaning them up.

Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#85
File: backend/app/services/auth/management.py:32-33
Timestamp: 2025-06-28T14:44:36.819Z
Learning: In the Devr.AI project, smokeyScraper prefers using machine timezone (IST) for datetime operations during development and testing for easier debugging, with plans to switch to UTC for deployment later.

backend/app/agents/devrel/github/tools/contributor_recommendation.py (6)

Learnt from: smokeyScraper
PR: #110
File: backend/app/agents/devrel/github/tools/contributor_recommendation.py:49-49
Timestamp: 2025-07-25T23:11:27.301Z
Learning: In the Devr.AI project, smokeyScraper prefers to keep debug print statements (like print(response)) during the development phase for easier troubleshooting, with plans to remove them when moving to production rather than immediately cleaning them up.

Learnt from: smokeyScraper
PR: #85
File: tests/test_supabase.py:1-3
Timestamp: 2025-06-28T14:45:55.244Z
Learning: In the Devr.AI project, smokeyScraper prefers to defer comprehensive test refactoring to separate PRs/efforts when doing major backend restructuring, rather than expanding the scope of the current refactoring PR to include test updates.

Learnt from: smokeyScraper
PR: #85
File: backend/app/services/auth/management.py:32-33
Timestamp: 2025-06-28T14:44:36.819Z
Learning: In the Devr.AI project, smokeyScraper prefers using machine timezone (IST) for datetime operations during development and testing for easier debugging, with plans to switch to UTC for deployment later.

Learnt from: smokeyScraper
PR: #87
File: tests/test_supabase.py:1-3
Timestamp: 2025-06-28T23:15:13.374Z
Learning: In the Devr.AI project, smokeyScraper prefers to defer test updates and fixes (like missing imports after module reorganization) to separate PRs rather than expanding the scope of module update/chore PRs to include comprehensive test refactoring.

Learnt from: smokeyScraper
PR: #90
File: backend/app/agents/devrel/nodes/react_supervisor.py:97-101
Timestamp: 2025-07-05T04:33:39.840Z
Learning: In the Devr.AI project, smokeyScraper prefers to defer code deduplication refactoring (like extracting duplicate functions to shared utilities) until there are more common functionalities present among tools/workflow. With only two files using the same function, they consider it not a problem currently and prefer to "align later in a more better way" once more patterns emerge.

Learnt from: smokeyScraper
PR: #85
File: backend/app/services/auth/management.py:83-83
Timestamp: 2025-06-28T14:44:34.399Z
Learning: In the backend/app/services/auth/management.py file, the team prefers to use datetime.now() (local timezone/IST) during development and testing for easier debugging, with plans to change to UTC timezone-aware datetime (datetime.now(timezone.utc)) during deployment.

🔇 Additional comments (4)

backend/app/services/github/user/profiling.py (1)

77-79: LGTM! Clean public API wrapper.

The new request method provides a clean public interface to the internal HTTP functionality. The implementation is straightforward and follows good encapsulation practices by wrapping the private _make_request method.

backend/app/agents/devrel/github/tools/contributor_recommendation.py (3)

1-28: LGTM! Well-structured imports and initialization.

The imports are clean and properly organized. The class initialization correctly sets up the LLM with appropriate temperature (0.1) for consistent query alignment tasks and initializes the embedding service.

62-80: LGTM! Robust GitHub issue content fetching.

The method properly parses GitHub issue URLs, validates the structure, and handles errors appropriately. The URL parsing logic correctly extracts the owner, repository, and issue number components.

81-171: LGTM! Comprehensive and well-structured main workflow.

The main handler function is excellently implemented with:

Clear workflow steps: Query alignment → embedding generation → hybrid search → result formatting

Comprehensive logging: Each step is logged for debugging and monitoring

Robust error handling: Proper try/catch with structured error responses

Rich metadata: Detailed scoring, search parameters, and result statistics

Graceful empty results: Appropriate handling when no contributors are found

The hybrid search configuration (70% semantic, 30% keyword) is a sensible default, and the result formatting provides valuable insights with multiple scoring metrics.

smokeyScraper · 2025-07-26T23:53:01Z

hey @chandansgowda, could you please review and merge?

I have addressed the minor changes as suggested by coderrabbit. Now we can do a vector search to recommend contributors for any issue :)
Currently, the system returns their GitHub username, which can be changed to a GitHub URL as well as a Discord ID (tagging them). Changing to a GitHub URL is highly flexible and easy to do, but for tagging in Discord, we either need to add a Field for Discord ID in Weaviate or make a cross-DB call.

chandansgowda

@smokeyScraper Can we use CODEOWNERS for this?

smokeyScraper · 2025-07-27T10:32:28Z

@chandansgowda
ig no we can't. The thing is, both have very different use case.
What I see for CODEOWNERS is that it is generally to be used in cases where, let's suppose, a part of a codebase is written by some maintainer, so any PRs associated with that specific region of codebase should probably be reviewed by that specific maintainer or the list of maintainers provided in the config file.

Compared to CODEOWNERS, this specific handler is to be used to get recommendations for the contributors to whom we can assign a specific issue. Let's say an issue is very much oriented towards some security vulnerability in code, so this handler will recommend to us the top K involved around the required topic, which is scraped off via their PRs, Profile, Repos.... and compressed in a concise form to perform a vector similarity search with the issue description in an concise form. We can also use this somewhat like, let's say someone in Discord is wanting to connect with some experts in any domain, so yeah, it can simply retrieve the top K peeps in that specific org aligned to that specific tech stack.

The system is aligned for queries like:
could you please recommend me a few contributors for https://github.com/AOSSIE-Org/Devr.AI/issues/96

A few problems to be addressed:

data staleness (for users who have left the org/left contributions); probably some timeout with the GitHub authentication will work. If not re-authenticated, then probably inactive, but requires a separate bg service.
whether if the recommended user is not willing to work and has already some assigned issues; can have some logic to opt out contributors and have our whole list of contributors in a queue data structure probably?

smokeyScraper · 2025-07-27T10:39:27Z

Even the CODEOWNERS work in a manual way with aligning the config for it ordered manually, facing the same problems as present in our system for PRs, but yeah, we can handle data staleness one probably later.

When a PR affects files covered by these patterns, all relevant owners (users or teams) are automatically requested for review, regardless of their existing workload or current review queue

chandansgowda

Got it. Good to go.

smokeyScraper added 4 commits July 23, 2025 03:00

[feat]: update intent analysis and response prompt to better support …

400b5eb

…contributor recommendation tool

[refactor]: migrate user profiling logic to github/user

c8bacc7

[feat]: implement github issue processor to scraper issue body, summa…

35fd68b

…rize and embed it

[feat]: changes to implement user background async profiling after gi…

52d4e52

…thub authorization

coderabbitai bot reviewed Jul 22, 2025

View reviewed changes

backend/app/agents/devrel/github/tools/contributor_recommendation.py Show resolved Hide resolved

backend/app/agents/devrel/github/tools/contributor_recommendation.py Outdated Show resolved Hide resolved

backend/app/services/github/issue_processor.py Outdated Show resolved Hide resolved

smokeyScraper added 2 commits July 27, 2025 05:08

[feat]: add contributor recommendation tool

5cfa38f

[refactor]: add public wrapper for private _make_request

48b8b27

smokeyScraper force-pushed the contributor_recommendation_tool branch from 1bf9e8b to 48b8b27 Compare July 26, 2025 23:43

coderabbitai bot reviewed Jul 26, 2025

View reviewed changes

smokeyScraper requested a review from chandansgowda July 26, 2025 23:54

chandansgowda reviewed Jul 27, 2025

View reviewed changes

smokeyScraper requested a review from chandansgowda July 27, 2025 10:42

chandansgowda approved these changes Jul 27, 2025

View reviewed changes

chandansgowda merged commit a1159d7 into AOSSIE-Org:main Jul 27, 2025
1 check passed

coderabbitai bot mentioned this pull request Jul 30, 2025

Update prompt.py #114

Closed

4 tasks

This was referenced Aug 24, 2025

Added the HIL Feature Using LangChains interrupt() command #130

Closed

Add GitHub MCP microservice for repository queries #131

Merged

coderabbitai bot mentioned this pull request Sep 8, 2025

feat: Added a Basic Authentication System #123

Closed

4 tasks

coderabbitai bot mentioned this pull request Oct 16, 2025

[feat]: add codegraph support by FalkorDB for repo indexing #138

Merged

[feat]: implement github contributor recommendation tool #110

[feat]: implement github contributor recommendation tool #110

Uh oh!

Conversation

smokeyScraper commented Jul 22, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

closes #96

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

smokeyScraper commented Jul 26, 2025

Uh oh!

chandansgowda left a comment

Choose a reason for hiding this comment

Uh oh!

smokeyScraper commented Jul 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

smokeyScraper commented Jul 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chandansgowda left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

smokeyScraper commented Jul 22, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 22, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

smokeyScraper commented Jul 27, 2025 •

edited

Loading

smokeyScraper commented Jul 27, 2025 •

edited

Loading