ShopScraper is a modular, no-browser web scraper for online shops with a modern Qt interface.
It extracts product data (schema.org / JSON-LD), supports keyword search and domain crawling with pagination, keeps a persistent run history, and exports clean CSV/Excel files — all wrapped in a polished UI.
- 
🔎 Scraping Modes
- Keyword mode: search inside target domains using seed keywords, collect valid URLs, and fetch results.
 - Domain mode: provide multiple domains/URLs and crawl with simple pagination (
?page=2,/page/2, etc.). 
 - 
📦 Product Extraction
Extractstitle,brand,price,currency,availability, andimagesfrom JSON-LD product blocks. - 
💾 Outputs
- CSV and Excel (dynamic columns)
 - Persistent history stored in SQLite and visible in the Runs tab
 
 - 
⚙️ Settings
Delay & jitter, retries & backoff, custom User-Agent, proxy, Light/Dark themes, output format (CSV/Excel). - 
🖥 Modern UI
PySide6 + custom QSS themes, icons, and Linux desktop launcher support. 
➡️ Download the latest Linux build here
Currently only the Linux tar.gz package is available.
Windows builds will be added in future releases.
git clone https://github.com/hesameworks/shop-scraper.git
cd shop-scraper
python -m venv .venv
source .venv/bin/activate
pip install -U pip -r requirements.txt
PYTHONPATH=src python src/ui/main_window.pypyinstaller src/ui/main_window.py \
  --name ShopScraper \
  --noconsole \
  --paths src \
  --add-data "assets:assets"
./dist/ShopScraper/ShopScraperCreate a file at ~/.local/share/applications/ShopScraper.desktop:
[Desktop Entry]
Type=Application
Name=ShopScraper
Comment=Modular shop web scraper
Exec=/absolute/path/to/dist/ShopScraper/ShopScraper
Icon=/absolute/path/to/dist/ShopScraper/assets/logo.png
Terminal=false
Categories=Utility;Development;Network;
StartupNotify=trueThen make it executable:
chmod +x ~/.local/share/applications/ShopScraper.desktopThis project ships with a GitHub Actions workflow (.github/workflows/build.yml) that automatically builds Linux (and later Windows) binaries on every push to main.
Artifacts are uploaded as build outputs, and can also be attached to GitHub Releases when tagging a version (e.g. v1.0.0).
- Add Windows builds (zip + exe)
 - CSS-based extraction fallback (when JSON-LD is missing)
 - Smarter rate limiting and concurrency controls
 - AppImage / 
.debpackages for Linux - Code signing for Windows builds
 
Pull requests and issues are welcome! Please:
- Run code style checks (
ruff/flake8if available). - Provide a clear description of your changes.
 - Add screenshots if you modify the UI.
 
Released under the MIT License.