A powerful web crawler that converts web pages to markdown format, making them ready for LLM consumption.
- Single page scraping with markdown conversion
- Full website crawling using sitemap.xml
- Proxy support for web scraping
- Simple CLI interface
- Easy to use as a Python package
pip install TezzCrawler- Scrape a single page:
tezzcrawler scrape-page https://example.com --output ./output- Crawl from sitemap:
tezzcrawler crawl-from-sitemap https://example.com/sitemap.xml --output ./output- Using with proxy:
tezzcrawler scrape-page https://example.com \
--proxy-url proxy.example.com \
--proxy-port 8080 \
--proxy-username user \
--proxy-password pass \
--output ./outputfrom tezzcrawler import Scraper, Crawler
from pathlib import Path
# Scrape a single page
scraper = Scraper()
scraper.scrape_page("https://example.com", Path("./output"))
# Crawl from sitemap
crawler = Crawler()
crawler.crawl_sitemap("https://example.com/sitemap.xml", Path("./output"))
# With proxy configuration
scraper = Scraper(
proxy_url="proxy.example.com",
proxy_port=8080,
proxy_username="user",
proxy_password="pass"
)- Clone the repository:
git clone https://github.com/TezzLabs/TezzCrawler.git
cd TezzCrawler- Install development dependencies:
pip install -e ".[dev]"MIT License - see LICENSE file for details.