TezzCrawler

A powerful web crawler that converts web pages to markdown format, making them ready for LLM consumption.

Features

Single page scraping with markdown conversion
Full website crawling using sitemap.xml
Proxy support for web scraping
Simple CLI interface
Easy to use as a Python package

Installation

pip install TezzCrawler

Usage

Command Line Interface

Scrape a single page:

tezzcrawler scrape-page https://example.com --output ./output

Crawl from sitemap:

tezzcrawler crawl-from-sitemap https://example.com/sitemap.xml --output ./output

Using with proxy:

tezzcrawler scrape-page https://example.com \
    --proxy-url proxy.example.com \
    --proxy-port 8080 \
    --proxy-username user \
    --proxy-password pass \
    --output ./output

Python Package

from tezzcrawler import Scraper, Crawler
from pathlib import Path

# Scrape a single page
scraper = Scraper()
scraper.scrape_page("https://example.com", Path("./output"))

# Crawl from sitemap
crawler = Crawler()
crawler.crawl_sitemap("https://example.com/sitemap.xml", Path("./output"))

# With proxy configuration
scraper = Scraper(
    proxy_url="proxy.example.com",
    proxy_port=8080,
    proxy_username="user",
    proxy_password="pass"
)

Development

Clone the repository:

git clone https://github.com/TezzLabs/TezzCrawler.git
cd TezzCrawler

Install development dependencies:

pip install -e ".[dev]"

License

MIT License - see LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github/workflows		.github/workflows
tezzcrawler		tezzcrawler
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TezzCrawler

Features

Installation

Usage

Command Line Interface

Python Package

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TezzCrawler

Features

Installation

Usage

Command Line Interface

Python Package

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages