A comprehensive workshop for learning web scraping using Zyte API and Cursor IDE. This repository contains examples, exercises, and solutions for various web scraping scenarios.
Learn to build robust web scrapers using Zyte API, handling different scraping scenarios:
- Network traffic capture and API analysis
- Classic pagination handling
- Infinite scroll management
- Form submission and interaction
- Error handling and best practices
- Python 3.8 or higher
- Zyte API account and API key
- Basic understanding of Python and web scraping concepts
- Clone the repository:
git clone https://github.com/NehaSetia-DA/zyte-api-training
cd zyte-api-training- Create and activate virtual environment:
# Create virtual environment
python -m venv .venv
# Activate virtual environment
# On Windows:
.venv\Scripts\activate
# On macOS/Linux:
source .venv/bin/activate- Install dependencies:
pip install -r requirements.txt- Configure environment:
# Create .env file
cp .env.example .env
# Add your Zyte API key to .env
echo "ZYTE_API_KEY=your-api-key-here" > .env- Verify setup:
python check_setup.pyReady-to-use example implementations:
01_network_capture.py- Network traffic capture and analysis02_pagination_classic.py- Classic pagination handling03_pagination_infinite.py- Infinite scroll implementation04_form_submission.py- Form handling and submissionbasic-extraction.py- Basic data extraction
Practice exercises with increasing complexity:
01_network_capture.py- Nike product data extraction02_pagination_classic.py- Job listings scraper03_infinite_scroll.py- Nike Product Extraction using Infinite scroll actions Zyte API.04_form_submission.py- Quote search form automationpractice_scenarios.py- Additional challenges
Complete implementations of exercises with best practices:
- Error handling
- Rate limiting
- Data validation
- Optimal performance
Helper functions and configurations:
- API configuration
- Common utilities
- Shared functions
- Capturing API endpoints
- Analyzing network traffic
- Extracting product data
- Handling pagination
- Page-by-page navigation
- Data extraction
- Error handling
- Rate limiting
- Dynamic content loading
- Scroll management
- Duplicate detection
- Performance optimization
- Form interaction
- Multi-step processes
- Response validation
- Error recovery
-
Rate Limiting
- Implement delays between requests
- Use exponential backoff
- Handle API limits
-
Error Handling
- Try-except blocks
- Retry mechanisms
- Logging and monitoring
-
Data Management
- Proper storage formats
- Data validation
- Duplicate handling
-
Code Organization
- Modular structure
- Clear documentation
- Reusable components
- Zyte API - All in one Web Scraping API.
- Cursor IDE - AI-powered development environment
- Getting Started Guide - Complete overview and usage guide
- API Usage Examples - Common usage patterns and examples
- HTTP Mode - HTTP request handling
- Browser Automation Mode - Browser automation features
- Browser Actions - Available browser interactions
- Extraction API - Data extraction capabilities
- Proxy Mode - Proxy configuration and usage
Feel free to:
- Report issues
- Suggest improvements
- Submit pull requests
This project is licensed under the MIT License - see the LICENSE file for details.