Go-Crawl is a simple, concurrency-safe web-crawler written in GO. It allows users to crawl websites concurrently and extract internal links.
Go-Crawl includes the following features:
- Concurrency Support: Go-Crawl uses goroutines to crawl multiple URLs concurrently, making it ideal for large-scale web scraping tasks.
- Robust Error Handling: The project includes robust error handling mechanisms, including logging and exception handling.
- Customizable Configuration Options: Users can customize the crawler's behavior by modifying configuration options.
To use Go-Crawl, you need:
- GO 1.22 or later
net/urlpackagegolang.org/x/net/htmlpackage
To install Go-Crawl, run the following commands from your terminal:
- create directory and cd into
> mkdir project && cd project- clone the repo down
> git clone https://github.com/etrinque/go-crawl- build the project
> go build -o crawl main.goFrom console, run the resulting file with args
// enter a target URL to start from and number of go routine workers (optional)
> crawl {url} {worker-pool-size}