-
Notifications
You must be signed in to change notification settings - Fork 69
[feat]: implement github contributor recommendation tool #110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat]: implement github contributor recommendation tool #110
Conversation
…contributor recommendation tool
…rize and embed it
…thub authorization
WalkthroughThis change introduces a complete contributor recommendation workflow to the DevRel GitHub agent. It adds prompt templates for intent analysis and query alignment, implements a new handler and workflow for contributor recommendation using LLM-based query refinement and hybrid search, and provides a GitHub issue processor for summarization and embedding. Prompt instructions and formatting for contributor recommendations are updated accordingly. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant DevRelAgent
participant ContributorRecommendationWorkflow
participant GitHubIssueProcessor
participant LLM
participant SearchService
User->>DevRelAgent: Submit contributor recommendation query
DevRelAgent->>ContributorRecommendationWorkflow: handle_contributor_recommendation(query)
ContributorRecommendationWorkflow->>LLM: Align user query (QUERY_ALIGNMENT_PROMPT)
alt Query contains GitHub issue URL
ContributorRecommendationWorkflow->>GitHubIssueProcessor: get_embedding_for_issue()
GitHubIssueProcessor->>GitHub: Fetch issue content & comments
GitHubIssueProcessor->>LLM: Summarize issue (ISSUE_SUMMARIZATION_PROMPT)
GitHubIssueProcessor->>EmbeddingService: Generate embedding
ContributorRecommendationWorkflow->>SearchService: Hybrid search (embedding + keywords)
else
ContributorRecommendationWorkflow->>EmbeddingService: Generate embedding for aligned query
ContributorRecommendationWorkflow->>SearchService: Hybrid search (embedding + keywords)
end
SearchService-->>ContributorRecommendationWorkflow: Contributor search results
ContributorRecommendationWorkflow-->>DevRelAgent: Formatted recommendations
DevRelAgent-->>User: Display recommendations (special formatting)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~90 minutes Possibly related PRs
Suggested reviewers
Poem
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (6)
backend/app/agents/devrel/github/prompts/contributor_recommendation/query_alignment.py (1)
38-38: Remove the trailing empty line.-Return the JSON object only:""" - +Return the JSON object only:"""backend/app/services/github/issue_processor.py (2)
72-83: Consider more specific exception handling.The broad
except Exceptioncatch could mask specific errors. Consider catching more specific exceptions likeValueErrorfor content issues and network-related exceptions separately.- except Exception as e: - logger.error(f"Error processing issue {self.owner}/{self.repo}#{self.issue_number}: {str(e)}") - raise e + except ValueError as e: + logger.error(f"Content error processing issue {self.owner}/{self.repo}#{self.issue_number}: {str(e)}") + raise + except Exception as e: + logger.error(f"Unexpected error processing issue {self.owner}/{self.repo}#{self.issue_number}: {str(e)}") + raise
63-64: Potential log truncation might hide important information.The summary is truncated to 100 characters in the log, which might cut off important debugging information.
- logger.info(f"Generated summary: {response.content.strip()[:100]}") + logger.info(f"Generated summary: {response.content.strip()[:200]}...")backend/app/agents/devrel/github/tools/contributor_recommendation.py (3)
36-43: Add validation for GitHub issue URL extraction.The regex matches GitHub issue URLs but doesn't validate if the URL is properly formed. Consider adding validation after extraction.
url_match = re.search(r'https?://github\.com/[\w-]+/[\w.-]+/issues/\d+', query) if url_match: - issue_content = await self._fetch_github_issue_content(url_match.group(0)) - full_query = f"{query}\n\nIssue content: {issue_content}" + try: + issue_content = await self._fetch_github_issue_content(url_match.group(0)) + full_query = f"{query}\n\nIssue content: {issue_content}" + except Exception as e: + logger.warning(f"Failed to fetch issue content: {e}, proceeding with original query") + full_query = query else: full_query = query
100-106: Document the weight parameters for hybrid search.The vector and BM25 weights are hardcoded. Consider making them configurable or at least documenting why these specific values were chosen.
+ # Weights optimized for technical contributor matching: + # - 0.7 vector weight prioritizes semantic understanding + # - 0.3 BM25 weight ensures keyword relevance results = await search_contributors( query_embedding=query_embedding, keywords=alignment_result.get("keywords", []), limit=5, vector_weight=0.7, # Semantic similarity bm25_weight=0.3 # Keyword matching )
131-135: Improve readability of reason construction.The reason construction logic could be more readable.
reason_parts = [] if languages: - reason_parts.append(f"Expert in {', '.join(languages)}") + reason_parts.append(f"Expert in {', '.join(languages[:3])}") # Limit to top 3 languages if topics: - reason_parts.append(f"Active in {', '.join(topics)}") + reason_parts.append(f"Active in {', '.join(topics[:3])}") # Limit to top 3 topics
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
backend/app/agents/devrel/github/github_toolkit.py(2 hunks)backend/app/agents/devrel/github/prompts/contributor_recommendation/issue_summarization.py(1 hunks)backend/app/agents/devrel/github/prompts/contributor_recommendation/query_alignment.py(1 hunks)backend/app/agents/devrel/github/prompts/intent_analysis.py(2 hunks)backend/app/agents/devrel/github/tools/contributor_recommendation.py(1 hunks)backend/app/agents/devrel/prompts/response_prompt.py(1 hunks)backend/app/api/v1/auth.py(1 hunks)backend/app/services/github/issue_processor.py(1 hunks)
🧠 Learnings (2)
📓 Common learnings
Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#85
File: tests/test_supabase.py:1-3
Timestamp: 2025-06-28T14:45:55.244Z
Learning: In the Devr.AI project, smokeyScraper prefers to defer comprehensive test refactoring to separate PRs/efforts when doing major backend restructuring, rather than expanding the scope of the current refactoring PR to include test updates.
Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#87
File: tests/test_supabase.py:1-3
Timestamp: 2025-06-28T23:15:13.374Z
Learning: In the Devr.AI project, smokeyScraper prefers to defer test updates and fixes (like missing imports after module reorganization) to separate PRs rather than expanding the scope of module update/chore PRs to include comprehensive test refactoring.
Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#90
File: backend/app/agents/devrel/nodes/react_supervisor.py:97-101
Timestamp: 2025-07-05T04:33:39.840Z
Learning: In the Devr.AI project, smokeyScraper prefers to defer code deduplication refactoring (like extracting duplicate functions to shared utilities) until there are more common functionalities present among tools/workflow. With only two files using the same function, they consider it not a problem currently and prefer to "align later in a more better way" once more patterns emerge.
Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#85
File: backend/app/services/auth/management.py:32-33
Timestamp: 2025-06-28T14:44:36.819Z
Learning: In the Devr.AI project, smokeyScraper prefers using machine timezone (IST) for datetime operations during development and testing for easier debugging, with plans to switch to UTC for deployment later.
backend/app/agents/devrel/github/github_toolkit.py (2)
Learnt from: smokeyScraper
PR: #72
File: backend/app/agents/devrel/nodes/handle_technical_support_node.py:6-17
Timestamp: 2025-06-08T13:15:40.536Z
Learning: The handle_technical_support_node function in backend/app/agents/devrel/nodes/handle_technical_support_node.py is intentionally minimal and will be extended after database configuration is completed.
Learnt from: smokeyScraper
PR: #72
File: backend/app/agents/devrel/nodes/handle_web_search_node.py:31-42
Timestamp: 2025-06-08T13:31:11.572Z
Learning: In backend/app/agents/devrel/tools/search_tool.py, the TavilySearchTool.search() method has partial error handling for missing API key, AttributeError, ConnectionError, and TimeoutError, but lacks a comprehensive Exception catch-all block, so calling functions may still need additional error handling for other potential exceptions.
🧬 Code Graph Analysis (2)
backend/app/api/v1/auth.py (1)
backend/app/services/github/user/profiling.py (1)
profile_user_from_github(297-329)
backend/app/agents/devrel/github/github_toolkit.py (1)
backend/app/agents/devrel/github/tools/contributor_recommendation.py (1)
handle_contributor_recommendation(81-170)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#85
File: tests/test_supabase.py:1-3
Timestamp: 2025-06-28T14:45:55.244Z
Learning: In the Devr.AI project, smokeyScraper prefers to defer comprehensive test refactoring to separate PRs/efforts when doing major backend restructuring, rather than expanding the scope of the current refactoring PR to include test updates.
Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#87
File: tests/test_supabase.py:1-3
Timestamp: 2025-06-28T23:15:13.374Z
Learning: In the Devr.AI project, smokeyScraper prefers to defer test updates and fixes (like missing imports after module reorganization) to separate PRs rather than expanding the scope of module update/chore PRs to include comprehensive test refactoring.
Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#90
File: backend/app/agents/devrel/nodes/react_supervisor.py:97-101
Timestamp: 2025-07-05T04:33:39.840Z
Learning: In the Devr.AI project, smokeyScraper prefers to defer code deduplication refactoring (like extracting duplicate functions to shared utilities) until there are more common functionalities present among tools/workflow. With only two files using the same function, they consider it not a problem currently and prefer to "align later in a more better way" once more patterns emerge.
Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#85
File: backend/app/services/auth/management.py:32-33
Timestamp: 2025-06-28T14:44:36.819Z
Learning: In the Devr.AI project, smokeyScraper prefers using machine timezone (IST) for datetime operations during development and testing for easier debugging, with plans to switch to UTC for deployment later.
backend/app/agents/devrel/github/github_toolkit.py (2)
Learnt from: smokeyScraper
PR: #72
File: backend/app/agents/devrel/nodes/handle_technical_support_node.py:6-17
Timestamp: 2025-06-08T13:15:40.536Z
Learning: The handle_technical_support_node function in backend/app/agents/devrel/nodes/handle_technical_support_node.py is intentionally minimal and will be extended after database configuration is completed.
Learnt from: smokeyScraper
PR: #72
File: backend/app/agents/devrel/nodes/handle_web_search_node.py:31-42
Timestamp: 2025-06-08T13:31:11.572Z
Learning: In backend/app/agents/devrel/tools/search_tool.py, the TavilySearchTool.search() method has partial error handling for missing API key, AttributeError, ConnectionError, and TimeoutError, but lacks a comprehensive Exception catch-all block, so calling functions may still need additional error handling for other potential exceptions.
🧬 Code Graph Analysis (2)
backend/app/api/v1/auth.py (1)
backend/app/services/github/user/profiling.py (1)
profile_user_from_github(297-329)
backend/app/agents/devrel/github/github_toolkit.py (1)
backend/app/agents/devrel/github/tools/contributor_recommendation.py (1)
handle_contributor_recommendation(81-170)
🔇 Additional comments (10)
backend/app/api/v1/auth.py (1)
5-5: Import path update looks correct.The function
profile_user_from_githubhas been correctly relocated to the new module path, aligning with the GitHub services restructuring.backend/app/agents/devrel/github/prompts/contributor_recommendation/issue_summarization.py (1)
1-20: Well-structured prompt for issue summarization.The prompt effectively guides technical analysis of GitHub issues for contributor search. It covers all essential aspects (technologies, skills, components) and the "job requirement" approach is a clever way to optimize for contributor matching.
backend/app/agents/devrel/github/github_toolkit.py (2)
9-9: Import activation looks correct.The import for
handle_contributor_recommendationis properly uncommented to enable the contributor recommendation functionality.
105-105: Function call implementation is correct.The placeholder has been properly replaced with an asynchronous call to
handle_contributor_recommendation(query), which integrates seamlessly with the existing toolkit pattern.backend/app/agents/devrel/github/prompts/intent_analysis.py (2)
5-5: Enhanced description clarifies functionality scope.The updated description clearly indicates support for both GitHub issue URLs and general contributor queries, providing better guidance for intent classification.
15-21: Comprehensive examples improve classification accuracy.The detailed examples cover all major contributor recommendation scenarios (PR reviews, expertise finding, assignees, URLs, general help) which should significantly improve intent classification accuracy.
backend/app/agents/devrel/prompts/response_prompt.py (3)
29-29: Improved emoji instruction flexibility.Changing from specific emoji examples to general guidance allows for more contextual emoji usage while maintaining visual appeal.
33-39: Comprehensive formatting guidelines for contributor recommendations.The detailed formatting instructions ensure consistent presentation of contributor recommendation results, covering all key information (search details, scores, expertise, metadata) in a user-friendly format.
48-48: Clear instruction for special formatting.The explicit instruction (#7) ensures the special contributor recommendation formatting will be applied when relevant, completing the integration.
backend/app/agents/devrel/github/prompts/contributor_recommendation/query_alignment.py (1)
1-37: Well-structured prompt template with clear instructions.The prompt effectively guides the LLM to extract technical requirements and generate structured search queries. The examples cover different scenarios, and the formatting rules ensure clean JSON output.
backend/app/agents/devrel/github/tools/contributor_recommendation.py
Outdated
Show resolved
Hide resolved
1bf9e8b to
48b8b27
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
backend/app/agents/devrel/github/tools/contributor_recommendation.py (1)
48-48: Move import to module level.The
jsonimport should be moved to the top of the file with other imports rather than being imported inside the method.+import json import logging import re from typing import Any, DictThen remove the import from inside the method:
try: - import json result = json.loads(response.content.strip())
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
backend/app/agents/devrel/github/github_toolkit.py(2 hunks)backend/app/agents/devrel/github/prompts/contributor_recommendation/query_alignment.py(1 hunks)backend/app/agents/devrel/github/tools/contributor_recommendation.py(1 hunks)backend/app/services/github/issue_processor.py(1 hunks)backend/app/services/github/user/profiling.py(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- backend/app/agents/devrel/github/prompts/contributor_recommendation/query_alignment.py
🚧 Files skipped from review as they are similar to previous changes (2)
- backend/app/agents/devrel/github/github_toolkit.py
- backend/app/services/github/issue_processor.py
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#85
File: tests/test_supabase.py:1-3
Timestamp: 2025-06-28T14:45:55.244Z
Learning: In the Devr.AI project, smokeyScraper prefers to defer comprehensive test refactoring to separate PRs/efforts when doing major backend restructuring, rather than expanding the scope of the current refactoring PR to include test updates.
Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#87
File: tests/test_supabase.py:1-3
Timestamp: 2025-06-28T23:15:13.374Z
Learning: In the Devr.AI project, smokeyScraper prefers to defer test updates and fixes (like missing imports after module reorganization) to separate PRs rather than expanding the scope of module update/chore PRs to include comprehensive test refactoring.
Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#90
File: backend/app/agents/devrel/nodes/react_supervisor.py:97-101
Timestamp: 2025-07-05T04:33:39.840Z
Learning: In the Devr.AI project, smokeyScraper prefers to defer code deduplication refactoring (like extracting duplicate functions to shared utilities) until there are more common functionalities present among tools/workflow. With only two files using the same function, they consider it not a problem currently and prefer to "align later in a more better way" once more patterns emerge.
Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#110
File: backend/app/agents/devrel/github/tools/contributor_recommendation.py:49-49
Timestamp: 2025-07-25T23:11:27.301Z
Learning: In the Devr.AI project, smokeyScraper prefers to keep debug print statements (like print(response)) during the development phase for easier troubleshooting, with plans to remove them when moving to production rather than immediately cleaning them up.
Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#85
File: backend/app/services/auth/management.py:32-33
Timestamp: 2025-06-28T14:44:36.819Z
Learning: In the Devr.AI project, smokeyScraper prefers using machine timezone (IST) for datetime operations during development and testing for easier debugging, with plans to switch to UTC for deployment later.
backend/app/agents/devrel/github/tools/contributor_recommendation.py (6)
Learnt from: smokeyScraper
PR: #110
File: backend/app/agents/devrel/github/tools/contributor_recommendation.py:49-49
Timestamp: 2025-07-25T23:11:27.301Z
Learning: In the Devr.AI project, smokeyScraper prefers to keep debug print statements (like print(response)) during the development phase for easier troubleshooting, with plans to remove them when moving to production rather than immediately cleaning them up.
Learnt from: smokeyScraper
PR: #85
File: tests/test_supabase.py:1-3
Timestamp: 2025-06-28T14:45:55.244Z
Learning: In the Devr.AI project, smokeyScraper prefers to defer comprehensive test refactoring to separate PRs/efforts when doing major backend restructuring, rather than expanding the scope of the current refactoring PR to include test updates.
Learnt from: smokeyScraper
PR: #85
File: backend/app/services/auth/management.py:32-33
Timestamp: 2025-06-28T14:44:36.819Z
Learning: In the Devr.AI project, smokeyScraper prefers using machine timezone (IST) for datetime operations during development and testing for easier debugging, with plans to switch to UTC for deployment later.
Learnt from: smokeyScraper
PR: #87
File: tests/test_supabase.py:1-3
Timestamp: 2025-06-28T23:15:13.374Z
Learning: In the Devr.AI project, smokeyScraper prefers to defer test updates and fixes (like missing imports after module reorganization) to separate PRs rather than expanding the scope of module update/chore PRs to include comprehensive test refactoring.
Learnt from: smokeyScraper
PR: #90
File: backend/app/agents/devrel/nodes/react_supervisor.py:97-101
Timestamp: 2025-07-05T04:33:39.840Z
Learning: In the Devr.AI project, smokeyScraper prefers to defer code deduplication refactoring (like extracting duplicate functions to shared utilities) until there are more common functionalities present among tools/workflow. With only two files using the same function, they consider it not a problem currently and prefer to "align later in a more better way" once more patterns emerge.
Learnt from: smokeyScraper
PR: #85
File: backend/app/services/auth/management.py:83-83
Timestamp: 2025-06-28T14:44:34.399Z
Learning: In the backend/app/services/auth/management.py file, the team prefers to use datetime.now() (local timezone/IST) during development and testing for easier debugging, with plans to change to UTC timezone-aware datetime (datetime.now(timezone.utc)) during deployment.
🔇 Additional comments (4)
backend/app/services/github/user/profiling.py (1)
77-79: LGTM! Clean public API wrapper.The new
requestmethod provides a clean public interface to the internal HTTP functionality. The implementation is straightforward and follows good encapsulation practices by wrapping the private_make_requestmethod.backend/app/agents/devrel/github/tools/contributor_recommendation.py (3)
1-28: LGTM! Well-structured imports and initialization.The imports are clean and properly organized. The class initialization correctly sets up the LLM with appropriate temperature (0.1) for consistent query alignment tasks and initializes the embedding service.
62-80: LGTM! Robust GitHub issue content fetching.The method properly parses GitHub issue URLs, validates the structure, and handles errors appropriately. The URL parsing logic correctly extracts the owner, repository, and issue number components.
81-171: LGTM! Comprehensive and well-structured main workflow.The main handler function is excellently implemented with:
- Clear workflow steps: Query alignment → embedding generation → hybrid search → result formatting
- Comprehensive logging: Each step is logged for debugging and monitoring
- Robust error handling: Proper try/catch with structured error responses
- Rich metadata: Detailed scoring, search parameters, and result statistics
- Graceful empty results: Appropriate handling when no contributors are found
The hybrid search configuration (70% semantic, 30% keyword) is a sensible default, and the result formatting provides valuable insights with multiple scoring metrics.
|
hey @chandansgowda, could you please review and merge? I have addressed the minor changes as suggested by coderrabbit. Now we can do a vector search to recommend contributors for any issue :) |
chandansgowda
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smokeyScraper Can we use CODEOWNERS for this?
|
@chandansgowda Compared to CODEOWNERS, this specific handler is to be used to get recommendations for the contributors to whom we can assign a specific issue. Let's say an issue is very much oriented towards some security vulnerability in code, so this handler will recommend to us the top K involved around the required topic, which is scraped off via their PRs, Profile, Repos.... and compressed in a concise form to perform a vector similarity search with the issue description in an concise form. We can also use this somewhat like, let's say someone in Discord is wanting to connect with some experts in any domain, so yeah, it can simply retrieve the top K peeps in that specific org aligned to that specific tech stack.
A few problems to be addressed:
|
|
Even the CODEOWNERS work in a manual way with aligning the config for it ordered manually, facing the same problems as present in our system for PRs, but yeah, we can handle data staleness one probably later.
|
chandansgowda
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Good to go.
closes #96
Interactions
Data present in Weaviate DB
weaviate_data_readable.json
Summary by CodeRabbit
New Features
Enhancements
Bug Fixes