A robust and professional web scraping tool built with Python to extract product data from paginated websites. It uses requests and BeautifulSoup with logging, retry logic, and CSV export capabilities.
- π Fetch HTML pages with custom browser headers.
- π Extract product details:
- Product name
- Product price
- Product link
- π *Retry logic for failed requests.
- π Scrape multiple pages automatically.
- πΎ Save results to CSV file.
- π Logging for progress and error tracking.
- π Easily customizable CSS selectors for any website structure.
Python 3.x- Python libraries:
requestsbeautifulsoup4pandas(optional for CSV formatting)
Install dependencies:
pip install requests beautifulsoup4 pandas-
Open
Web Scraper Code.py. -
Modify the BASE_URL to target the website you want to scrape.
-
Adjust pagination in
scrape_all_pages(start, end). -
Run the script:
python Web Scraper Code.py
The script will log progress and save all scraped products to:
products.csv
| Name | Price | Link |
|---|---|---|
| Product 1 | $99.99 | /products/product1 |
| Product 2 | $49.99 | /products/product2 |
| Product 3 | $149.99 | /products/product3 |
β
Always check the website's robots.txt before scraping.
β
Use time.sleep() between requests to avoid overwhelming servers.
β Use headers to mimic a real browser.
β‘ For dynamic content (JS-loaded pages), consider Selenium.
π§ Customize CSS selectors in parse_products() for each website.
π For large datasets, you can save output to JSON or a database.
-
The script logs:
-
URL fetch attempts
-
Status codes and errors
-
Number of products found per page
-
CSV save confirmation
-
You can extend this project to:
-
Scrape multiple websites simultaneously
-
Schedule scraping tasks with cron or task scheduler
-
Visualize product trends with Matplotlib or Seaborn
-
Integrate with APIs or dashboards for real-time updates