|
| 1 | +# Searching Tools (`searching_mcp_server.py`) |
| 2 | + |
| 3 | +The Searching MCP Server provides comprehensive search capabilities including Google search, Wikipedia content retrieval, archive searching, and web scraping functionality. |
| 4 | + |
| 5 | +## Environment Variables Used in Tools |
| 6 | +- `SERPER_API_KEY`: Required API key for Serper service, Used by `google_search` and as a fallback for `scrape_website` |
| 7 | +- `JINA_API_KEY`: Required API key for JINA service. Default choice for scraping websites in `scrape_website` |
| 8 | +- `REMOVE_SNIPPETS`: Set to "true" to filter out snippets from results. Used in `google_search` to filter the search results returned by Serper. |
| 9 | +- `REMOVE_KNOWLEDGE_GRAPH`: Set to "true" to remove knowledge graph data. Used in `google_search` to filter the search results returned by Serper. |
| 10 | +- `REMOVE_ANSWER_BOX`: Set to "true" to remove answer box content. Used in `google_search` to filter the search results returned by Serper. |
| 11 | + |
| 12 | +### `google_search(q: str, gl: str = "us", hl: str = "en", location: str = None, num: int = 10, tbs: str = None, page: int = 1)` |
| 13 | +Perform Google searches via Serper API and retrieve rich search results including organic results, people also ask, related searches, and knowledge graph. |
| 14 | + |
| 15 | +**Parameters:** |
| 16 | + |
| 17 | +- `q`: Search query string |
| 18 | +- `gl`: Country context for search (e.g., 'us' for United States, 'cn' for China, 'uk' for United Kingdom). Default: 'us' |
| 19 | +- `hl`: Google interface language (e.g., 'en' for English, 'zh' for Chinese, 'es' for Spanish). Default: 'en' |
| 20 | +- `location`: City-level location for search results (e.g., 'SoHo, New York, United States', 'California, United States') |
| 21 | +- `num`: Number of results to return. Default: 10 |
| 22 | +- `tbs`: Time-based search filter ('qdr:h' for past hour, 'qdr:d' for past day, 'qdr:w' for past week, 'qdr:m' for past month, 'qdr:y' for past year) |
| 23 | +- `page`: Page number of results to return. Default: 1 |
| 24 | + |
| 25 | +**Returns:** |
| 26 | + |
| 27 | +- `str`: JSON formatted search results with organic results and related information |
| 28 | + |
| 29 | +**Features:** |
| 30 | + |
| 31 | +- Automatic retry mechanism (up to 5 attempts) |
| 32 | +- Configurable result filtering via environment variables |
| 33 | +- Support for regional and language-specific searches |
| 34 | + |
| 35 | +### `wiki_get_page_content(entity: str, first_sentences: int = 10)` |
| 36 | +Get specific Wikipedia page content for entities (people, places, concepts, events) and return structured information. |
| 37 | + |
| 38 | +**Parameters:** |
| 39 | + |
| 40 | +- `entity`: The entity to search for in Wikipedia |
| 41 | +- `first_sentences`: Number of first sentences to return from the page. Set to 0 to return full content. Default: 10 |
| 42 | + |
| 43 | +**Returns:** |
| 44 | + |
| 45 | +- `str`: Formatted content containing page title, introduction/full content, and URL |
| 46 | + |
| 47 | +**Features:** |
| 48 | + |
| 49 | +- Handles disambiguation pages automatically |
| 50 | +- Provides clean, structured output |
| 51 | +- Fallback search suggestions when page not found |
| 52 | +- Automatic content truncation for manageable output |
| 53 | + |
| 54 | +### `search_wiki_revision(entity: str, year: int, month: int, max_revisions: int = 50)` |
| 55 | +Search for an entity in Wikipedia and return the revision history for a specific month. |
| 56 | + |
| 57 | +**Parameters:** |
| 58 | + |
| 59 | +- `entity`: The entity to search for in Wikipedia |
| 60 | +- `year`: The year of the revision (e.g., 2024) |
| 61 | +- `month`: The month of the revision (1-12) |
| 62 | +- `max_revisions`: Maximum number of revisions to return. Default: 50 |
| 63 | + |
| 64 | +**Returns:** |
| 65 | + |
| 66 | +- `str`: Formatted revision history with timestamps, revision IDs, and URLs |
| 67 | + |
| 68 | +**Features:** |
| 69 | + |
| 70 | +- Automatic date validation and adjustment |
| 71 | +- Support for date range from 2000 to current year |
| 72 | +- Detailed revision metadata including timestamps and direct links |
| 73 | +- Clear error handling for invalid dates or missing pages |
| 74 | + |
| 75 | +### `search_archived_webpage(url: str, year: int, month: int, day: int)` |
| 76 | +Search the Wayback Machine (archive.org) for archived versions of a webpage for a specific date. |
| 77 | + |
| 78 | +**Parameters:** |
| 79 | + |
| 80 | +- `url`: The URL to search for in the Wayback Machine |
| 81 | +- `year`: The target year (e.g., 2023) |
| 82 | +- `month`: The target month (1-12) |
| 83 | +- `day`: The target day (1-31) |
| 84 | + |
| 85 | +**Returns:** |
| 86 | + |
| 87 | +- `str`: Formatted archive information including archived URL, timestamp, and availability status |
| 88 | + |
| 89 | +**Features:** |
| 90 | + |
| 91 | +- Automatic URL protocol detection and correction |
| 92 | +- Date validation and adjustment (1995 to present) |
| 93 | +- Fallback to most recent archive if specific date not found |
| 94 | +- Special handling for Wikipedia URLs with tool suggestions |
| 95 | +- Automatic retry mechanism for reliable results |
| 96 | + |
| 97 | +### `scrape_website(url: str)` |
| 98 | +Scrape website content including support for regular websites and YouTube video information. |
| 99 | + |
| 100 | +**Parameters:** |
| 101 | + |
| 102 | +- `url`: The URL of the website to scrape |
| 103 | + |
| 104 | +**Returns:** |
| 105 | + |
| 106 | +- `str`: Scraped website content including text, metadata, and structured information |
| 107 | + |
| 108 | +**Features:** |
| 109 | + |
| 110 | +- Support for various website types |
| 111 | +- YouTube video information extraction (subtitles, titles, descriptions, key moments) |
| 112 | +- Automatic content parsing and cleaning |
| 113 | +- Integration with Jina API for enhanced scraping capabilities |
| 114 | + |
| 115 | +**Usage Notes:** |
| 116 | + |
| 117 | +- Search engines are not supported by this tool |
| 118 | +- For YouTube videos, provides non-visual information only |
| 119 | +- Content may be incomplete for some complex websites |
| 120 | + |
| 121 | +--- |
| 122 | + |
| 123 | +**Last Updated:** Sep 2025 |
| 124 | +**Doc Contributor:** Team @ MiroMind AI |
| 125 | + |
0 commit comments