diff --git a/.lycheeignore b/.lycheeignore index de264fb2c..17a617d96 100644 --- a/.lycheeignore +++ b/.lycheeignore @@ -1,6 +1,9 @@ -# Ignore all files +# Ignore all file:// URIs file://.* +# Ignore internal application routes (relative paths starting with /) +^/[^/].* + # This is used as an example when creating a pull request https://github.com/Your_Github_Handle.* # Heroku is not guaranteed to be up @@ -27,3 +30,6 @@ https://www.vaultproject.io/* # Issues with lychee: https://github.com/topics/secrets-detection + +# Docker Hub returns 403 Forbidden errors +https://hub.docker.com/* diff --git a/README.md b/README.md index d64f8d0d0..052d99f5a 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ # OWASP WrongSecrets -[![Tweet](https://img.shields.io/badge/-Twitter-%232B90D9?style=for-the-badge&logo=x&logoColor=white)](https://twitter.com/intent/tweet?text=Want%20to%20dive%20into%20secrets%20management%20and%20do%20some%20hunting?%20try%20this&url=https://github.com/OWASP/wrongsecrets&hashtags=secretsmanagement,secrets,hunting,p0wnableapp,OWASP,WrongSecrets) [](https://tootpick.org/#text=Want%20to%20dive%20into%20secrets%20management%20and%20do%20some%20hunting?%20try%20this%0A%0Ahttps://github.com/OWASP/wrongsecrets%20%23secretsmanagement,%20%23secrets,%20%23hunting,%20%23p0wnableapp,%20%23OWASP,%20%23WrongSecrets) [](https://bsky.app/intent/compose?text=Want%20to%20dive%20into%20secrets%20management%20and%20do%20some%20hunting?%20try%20this%0A%0Ahttps://github.com/OWASP/wrongsecrets%20%23secretsmanagement%20%23secrets%20%23hunting%20%23p0wnableapp%20%23OWASP%20%23WrongSecrets) [](https://www.linkedin.com/shareArticle/?url=https://www.github.com/OWASP/wrongsecrets&title=OWASP%20WrongSecrets) +[![Tweet](https://img.shields.io/badge/-Twitter-%232B90D9?style=for-the-badge&logo=x&logoColor=white)](https://twitter.com/intent/tweet?text=Want%20to%20dive%20into%20secrets%20management%20and%20do%20some%20hunting?%20try%20this&url=https://github.com/OWASP/wrongsecrets&hashtags=secretsmanagement,secrets,hunting,p0wnableapp,OWASP,WrongSecrets) [](https://tootpick.org/#text=Want%20to%20dive%20into%20secrets%20management%20and%20do%20some%20hunting?%20try%20this%0A%0Ahttps://github.com/OWASP/wrongsecrets%20%23secretsmanagement,%20%23secrets,%20%23hunting,%20%23p0wnableapp,%20%23OWASP,%20%23WrongSecrets) [](https://bsky.app/intent/compose?text=Want%20to%20dive%20into%20secrets%20management%20and%20do%20some%20hunting?%20try%20this%0A%0Ahttps://github.com/OWASP/wrongsecrets%20%23secretsmanagement%20%23secrets%20%23hunting%20%23p0wnableapp%20%23OWASP%20%23WrongSecrets) [](https://www.linkedin.com/shareArticle/?url=https://github.com/OWASP/wrongsecrets&title=OWASP%20WrongSecrets) [![Java checkstyle and testing](https://github.com/OWASP/wrongsecrets/actions/workflows/main.yml/badge.svg)](https://github.com/OWASP/wrongsecrets/actions/workflows/main.yml) [![Pre-commit](https://github.com/OWASP/wrongsecrets/actions/workflows/pre-commit.yml/badge.svg)](https://github.com/OWASP/wrongsecrets/actions/workflows/pre-commit.yml) [![Terraform FMT](https://github.com/OWASP/wrongsecrets/actions/workflows/terraform.yml/badge.svg)](https://github.com/OWASP/wrongsecrets/actions/workflows/terraform.yml) [![CodeQL](https://github.com/OWASP/wrongsecrets/actions/workflows/codeql-analysis.yml/badge.svg)](https://github.com/OWASP/wrongsecrets/actions/workflows/codeql-analysis.yml) [![Dead Link Checker](https://github.com/OWASP/wrongsecrets/actions/workflows/link_checker.yml/badge.svg)](https://github.com/OWASP/wrongsecrets/actions/workflows/link_checker.yml) [![Javadoc and Swaggerdoc generator](https://github.com/OWASP/wrongsecrets/actions/workflows/java_swagger_doc.yml/badge.svg)](https://github.com/OWASP/wrongsecrets/actions/workflows/java_swagger_doc.yml) [![Test Heroku with cypress](https://github.com/OWASP/wrongsecrets/actions/workflows/heroku_tests.yml/badge.svg)](https://github.com/OWASP/wrongsecrets/actions/workflows/heroku_tests.yml) @@ -380,66 +380,66 @@ You can enable Swagger documentation and the Swagger UI by overriding the `SPRIN Leaders: -- [Ben de Haan @bendehaan](https://www.github.com/bendehaan) -- [Jeroen Willemsen @commjoen](https://www.github.com/commjoen) +- [Ben de Haan @bendehaan](https://github.com/bendehaan) +- [Jeroen Willemsen @commjoen](https://github.com/commjoen) Top contributors: -- [Jannik Hollenbach @J12934](https://www.github.com/J12934) -- [Puneeth Y @puneeth072003](https://www.github.com/puneeth072003) -- [Joss Sparkes @RemakingEden](https://www.github.com/RemakingEden) +- [Jannik Hollenbach @J12934](https://github.com/J12934) +- [Puneeth Y @puneeth072003](https://github.com/puneeth072003) +- [Joss Sparkes @RemakingEden](https://github.com/RemakingEden) Contributors: -- [Nanne Baars @nbaars](https://www.github.com/nbaars) -- [Marcin Nowak @drnow4u](https://www.github.com/drnow4u) -- [Rodolfo Neves @roddas](https://www.github.com/roddas) -- [Osama Magdy @osamamagdy](https://www.github.com/osamamagdy) -- [Pastekitoo @Pastekitoo](https://www.github.com/Pastekitoo) -- [Shubham Patel @Shubham-Patel07](https://www.github.com/Shubham-Patel07) -- [za @za](https://www.github.com/za) -- [Divyanshu Dev @Novice-expert](https://www.github.com/Novice-expert) -- [Tibor Hercz @tiborhercz](https://www.github.com/tiborhercz) -- [Chris Elbring Jr. @neatzsche](https://www.github.com/neatzsche) -- [Adarsh A @adarsh-a-tw](https://www.github.com/adarsh-a-tw) -- [Diamond Rivero @diamant3](https://www.github.com/diamant3) -- [Norbert Wolniak @nwolniak](https://www.github.com/nwolniak) -- [Filip Chyla @fchyla](https://www.github.com/fchyla) -- [Dmitry Litosh @Dlitosh](https://www.github.com/Dlitosh) -- [Vineeth Jagadeesh @djvinnie](https://www.github.com/djvinnie) -- [Mahaputra Ilham Awal @mahaputrailhamawal](https://www.github.com/mahaputrailhamawal) -- [Turjo Chowdhury @turjoc120](https://www.github.com/turjoc120) -- [SndR @SndR85](https://www.github.com/SndR85) -- [Josh Grossman @tghosth](https://www.github.com/tghosth) -- [alphasec @alphasecio](https://www.github.com/alphasecio) -- [CaduRoriz @CaduRoriz](https://www.github.com/CaduRoriz) -- [Madhu Akula @madhuakula](https://www.github.com/madhuakula) -- [Mike Woudenberg @mikewoudenberg](https://www.github.com/mikewoudenberg) -- [Spyros @northdpole](https://www.github.com/northdpole) -- [RubenAtBinx @RubenAtBinx](https://www.github.com/RubenAtBinx) -- [Alex Bender @alex-bender](https://www.github.com/alex-bender) -- [Danny Lloyd @dannylloyd](https://www.github.com/dannylloyd) -- [Nicolas Humblot @nhumblot](https://www.github.com/nhumblot) -- [Rick M @kingthorin](https://www.github.com/kingthorin) -- [Shlomo Zalman Heigh @szh](https://www.github.com/szh) -- [Fern @f3rn0s](https://www.github.com/f3rn0s) -- [Jeff Tong @Wind010](https://www.github.com/Wind010) +- [Nanne Baars @nbaars](https://github.com/nbaars) +- [Marcin Nowak @drnow4u](https://github.com/drnow4u) +- [Rodolfo Neves @roddas](https://github.com/roddas) +- [Osama Magdy @osamamagdy](https://github.com/osamamagdy) +- [Pastekitoo @Pastekitoo](https://github.com/Pastekitoo) +- [Shubham Patel @Shubham-Patel07](https://github.com/Shubham-Patel07) +- [za @za](https://github.com/za) +- [Divyanshu Dev @Novice-expert](https://github.com/Novice-expert) +- [Tibor Hercz @tiborhercz](https://github.com/tiborhercz) +- [Chris Elbring Jr. @neatzsche](https://github.com/neatzsche) +- [Adarsh A @adarsh-a-tw](https://github.com/adarsh-a-tw) +- [Diamond Rivero @diamant3](https://github.com/diamant3) +- [Norbert Wolniak @nwolniak](https://github.com/nwolniak) +- [Filip Chyla @fchyla](https://github.com/fchyla) +- [Dmitry Litosh @Dlitosh](https://github.com/Dlitosh) +- [Vineeth Jagadeesh @djvinnie](https://github.com/djvinnie) +- [Mahaputra Ilham Awal @mahaputrailhamawal](https://github.com/mahaputrailhamawal) +- [Turjo Chowdhury @turjoc120](https://github.com/turjoc120) +- [SndR @SndR85](https://github.com/SndR85) +- [Josh Grossman @tghosth](https://github.com/tghosth) +- [alphasec @alphasecio](https://github.com/alphasecio) +- [CaduRoriz @CaduRoriz](https://github.com/CaduRoriz) +- [Madhu Akula @madhuakula](https://github.com/madhuakula) +- [Mike Woudenberg @mikewoudenberg](https://github.com/mikewoudenberg) +- [Spyros @northdpole](https://github.com/northdpole) +- [RubenAtBinx @RubenAtBinx](https://github.com/RubenAtBinx) +- [Alex Bender @alex-bender](https://github.com/alex-bender) +- [Danny Lloyd @dannylloyd](https://github.com/dannylloyd) +- [Nicolas Humblot @nhumblot](https://github.com/nhumblot) +- [Rick M @kingthorin](https://github.com/kingthorin) +- [Shlomo Zalman Heigh @szh](https://github.com/szh) +- [Fern @f3rn0s](https://github.com/f3rn0s) +- [Jeff Tong @Wind010](https://github.com/Wind010) Testers: -- [Dave van Stein @davevs](https://www.github.com/davevs) -- [Marcin Nowak @drnow4u](https://www.github.com/drnow4u) -- [Marc Chang Sing Pang @mchangsp](https://www.github.com/mchangsp) -- [Vineeth Jagadeesh @djvinnie](https://www.github.com/djvinnie) +- [Dave van Stein @davevs](https://github.com/davevs) +- [Marcin Nowak @drnow4u](https://github.com/drnow4u) +- [Marc Chang Sing Pang @mchangsp](https://github.com/mchangsp) +- [Vineeth Jagadeesh @djvinnie](https://github.com/djvinnie) Special thanks: -- [Madhu Akula @madhuakula @madhuakula](https://www.github.com/madhuakula) -- [Nanne Baars @nbaars @nbaars](https://www.github.com/nbaars) -- [Björn Kimminich @bkimminich](https://www.github.com/bkimminich) -- [Dan Gora @devsecops](https://www.github.com/devsecops) -- [Xiaolu Dai @saragluna](https://www.github.com/saragluna) -- [Jonathan Giles @jonathanGiles](https://www.github.com/jonathanGiles) +- [Madhu Akula @madhuakula @madhuakula](https://github.com/madhuakula) +- [Nanne Baars @nbaars @nbaars](https://github.com/nbaars) +- [Björn Kimminich @bkimminich](https://github.com/bkimminich) +- [Dan Gora @devsecops](https://github.com/devsecops) +- [Xiaolu Dai @saragluna](https://github.com/saragluna) +- [Jonathan Giles @jonathanGiles](https://github.com/jonathanGiles) ### Sponsorships diff --git a/docs/PRE_COMMIT.md b/docs/PRE_COMMIT.md new file mode 100644 index 000000000..800904640 --- /dev/null +++ b/docs/PRE_COMMIT.md @@ -0,0 +1,78 @@ +# Lychee Pre-commit Hooks + +This repository provides three pre-commit hook options for lychee link checking: + +## Quick Start + +Add this to your `.pre-commit-config.yaml`: + +```yaml +repos: + - repo: https://github.com/lycheeverse/lychee + rev: lychee-v0.20.1 # Use latest lychee-v* tag + hooks: + - id: lychee # Auto-installs lychee +``` + +## Hook Options + +### 1. `lychee` (Recommended) + +- **Auto-installs** lychee using cargo-binstall (fast) or cargo install (fallback) +- **Best user experience** - no manual setup required +- **Fast** - uses pre-built binaries when available + +```yaml +- id: lychee + args: ["--no-progress", "--exclude", "file://"] +``` + +### 2. `lychee-system` + +- **Requires manual installation**: `cargo install lychee` +- **Fastest** - no installation overhead +- **For users who already have lychee installed** + +```yaml +- id: lychee-system + args: ["--no-progress", "--exclude", "file://"] +``` + +### 3. `lychee-docker` + +- **Auto-installs** via Docker image +- **Slower** - pulls Docker image +- **For environments where cargo is not available** + +```yaml +- id: lychee-docker + args: ["--no-progress", "--exclude", "file://"] +``` + +## Version Format + +⚠️ **Important**: Use `lychee-v*` format for tags (e.g., `lychee-v0.20.1`), not `v*` format. + +The tag format changed after v0.15.1 to support cargo-binstall URL patterns: +- ❌ `rev: v0.20.1` (doesn't exist) +- ✅ `rev: lychee-v0.20.1` (correct format) + +## Common Configuration + +```yaml +repos: + - repo: https://github.com/lycheeverse/lychee + rev: lychee-v0.20.1 + hooks: + - id: lychee + args: + - --no-progress + - --exclude=file:// + - --exclude=mailto: +``` + +## Troubleshooting + +**"Executable `lychee` not found"**: Use the default `lychee` hook (not `lychee-system`) for auto-installation. + +**Tag format issues**: Ensure you're using `lychee-v*` format, not `v*` format for versions after 0.15.1. \ No newline at end of file diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 000000000..f1489576f --- /dev/null +++ b/docs/README.md @@ -0,0 +1,896 @@ + +![lychee](assets/logo.svg) + +[![Homepage](https://img.shields.io/badge/Homepage-Online-EA3A97)](https://lycheeverse.github.io) +[![GitHub Marketplace](https://img.shields.io/badge/Marketplace-lychee-blue.svg?colorA=24292e&colorB=0366d6&style=flat&longCache=true&logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAA4AAAAOCAYAAAAfSC3RAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAAM6wAADOsB5dZE0gAAABl0RVh0U29mdHdhcmUAd3d3Lmlua3NjYXBlLm9yZ5vuPBoAAAERSURBVCiRhZG/SsMxFEZPfsVJ61jbxaF0cRQRcRJ9hlYn30IHN/+9iquDCOIsblIrOjqKgy5aKoJQj4O3EEtbPwhJbr6Te28CmdSKeqzeqr0YbfVIrTBKakvtOl5dtTkK+v4HfA9PEyBFCY9AGVgCBLaBp1jPAyfAJ/AAdIEG0dNAiyP7+K1qIfMdonZic6+WJoBJvQlvuwDqcXadUuqPA1NKAlexbRTAIMvMOCjTbMwl1LtI/6KWJ5Q6rT6Ht1MA58AX8Apcqqt5r2qhrgAXQC3CZ6i1+KMd9TRu3MvA3aH/fFPnBodb6oe6HM8+lYHrGdRXW8M9bMZtPXUji69lmf5Cmamq7quNLFZXD9Rq7v0Bpc1o/tp0fisAAAAASUVORK5CYII=)](https://github.com/marketplace/actions/lychee-broken-link-checker) +[![Rust](https://github.com/hello-rust/lychee/workflows/CI/badge.svg)](https://github.com/lycheeverse/lychee/actions/workflows/ci.yml) +[![docs.rs](https://docs.rs/lychee-lib/badge.svg)](https://docs.rs/lychee-lib) +[![Check Links](https://github.com/lycheeverse/lychee/actions/workflows/links.yml/badge.svg)](https://github.com/lycheeverse/lychee/actions/workflows/links.yml) +[![Docker Pulls](https://img.shields.io/docker/pulls/lycheeverse/lychee?color=%23099cec&logo=Docker)](https://hub.docker.com/r/lycheeverse/lychee) + +⚡ A fast, async, stream-based link checker written in Rust.\ +Finds broken hyperlinks and mail addresses inside Markdown, HTML, +reStructuredText, or any other text file or website! + +Available as a command-line utility, a library and a [GitHub Action](https://github.com/lycheeverse/lychee-action). + +![Lychee demo](./assets/screencast.svg) + + + +## Table of Contents + +- [Development](#development) +- [Installation](#installation) +- [Features](#features) +- [Commandline usage](#commandline-usage) +- [Library usage](#library-usage) +- [GitHub Action Usage](#github-action-usage) +- [Pre-commit Usage](#pre-commit-usage) +- [Contributing to lychee](#contributing-to-lychee) +- [Troubleshooting and Workarounds](#troubleshooting-and-workarounds) +- [Users](#users) +- [Credits](#credits) +- [License](#license) + + + +## Development + +After [installing Rust](https://www.rust-lang.org/tools/install) use [Cargo](https://doc.rust-lang.org/cargo/) for building and testing. +On Linux the OpenSSL package [is required](https://github.com/seanmonstar/reqwest?tab=readme-ov-file#requirements) to compile `reqwest`, a dependency of lychee. +For Nix we provide a flake so you can use `nix develop` and `nix build`. + +## Installation + +### Arch Linux + +```sh +pacman -S lychee +``` + +### OpenSUSE Tumbleweed + +```sh +zypper in lychee +``` + +### Ubuntu + +```sh +snap install lychee +``` + +### macOS + +Via [Homebrew](https://brew.sh): + +```sh +brew install lychee +``` + +Via [MacPorts](https://www.macports.org): + +```sh +sudo port install lychee +``` + +### Docker + +```sh +docker pull lycheeverse/lychee +``` + +### NixOS + +```sh +nix-env -iA nixos.lychee +``` + +### Nixpkgs + +- [`lychee` package](https://search.nixos.org/packages?show=lychee&query=lychee) for configurations, Nix shells, etc. + +- Let Nix check a packaged site with \ + [`testers.lycheeLinkCheck`](https://nixos.org/manual/nixpkgs/stable/#tester-lycheeLinkCheck) `{ site = …; }` + +### FreeBSD + +```sh +pkg install lychee +``` + +### Scoop + +```sh +scoop install lychee +``` + +### Termux + +```sh +pkg install lychee +``` + +### Alpine Linux + +```sh + # available for Alpine Edge in testing repositories +apk add lychee +``` + +### WinGet (Windows) + +```sh +winget install --id lycheeverse.lychee +``` + +### Chocolatey (Windows) + +```sh +choco install lychee +``` + +### Conda + +```sh +conda install lychee -c conda-forge +``` + +### Pre-built binaries + +We provide binaries for Linux, macOS, and Windows for every release. \ +You can download them from the [releases page](https://github.com/lycheeverse/lychee/releases). + +### Cargo + +#### Build dependencies + +On APT/dpkg-based Linux distros (e.g. Debian, Ubuntu, Linux Mint and Kali Linux) +the following commands will install all required build dependencies, including +the Rust toolchain and `cargo`: + +```sh +curl -sSf 'https://sh.rustup.rs' | sh +apt install gcc pkg-config libc6-dev libssl-dev +``` + +#### Compile and install lychee + +```sh +cargo install lychee +``` + +#### Feature flags + +Lychee supports several feature flags: + +- `native-tls` enables the platform-native TLS crate [native-tls](https://crates.io/crates/native-tls). +- `vendored-openssl` compiles and statically links a copy of OpenSSL. See the corresponding feature of the [openssl](https://crates.io/crates/openssl) crate. +- `rustls-tls` enables the alternative TLS crate [rustls](https://crates.io/crates/rustls). +- `email-check` enables checking email addresses using the [check-if-email-exists](https://crates.io/crates/check-if-email-exists) crate. This feature requires the `native-tls` feature. +- `check_example_domains` allows checking example domains such as `example.com`. This feature is useful for testing. + +By default, `native-tls` and `email-check` are enabled. + +## Features + +This comparison is made on a best-effort basis. Please create a PR to fix +outdated information. + +| | lychee | [awesome_bot] | [muffet] | [broken-link-checker] | [linkinator] | [linkchecker] | [markdown-link-check] | [fink] | +| -------------------- | ------- | ------------- | -------- | --------------------- | ------------ | -------------------- | --------------------- | ------ | +| Language | Rust | Ruby | Go | JS | TypeScript | Python | JS | PHP | +| Async/Parallel | ![yes] | ![yes] | ![yes] | ![yes] | ![yes] | ![yes] | ![yes] | ![yes] | +| JSON output | ![yes] | ![no] | ![yes] | ![yes] | ![yes] | ![maybe]1 | ![yes] | ![yes] | +| Static binary | ![yes] | ![no] | ![yes] | ![no] | ![no] | ️![no] | ![no] | ![no] | +| Markdown files | ![yes] | ![yes] | ![no] | ![no] | ![no] | ![yes] | ![yes] | ![no] | +| HTML files | ![yes] | ![no] | ![no] | ![yes] | ![yes] | ![no] | ![yes] | ![no] | +| Text files | ![yes] | ![no] | ![no] | ![no] | ![no] | ![no] | ![no] | ![no] | +| Website support | ![yes] | ![no] | ![yes] | ![yes] | ![yes] | ![yes] | ![no] | ![yes] | +| Chunked encodings | ![yes] | ![maybe] | ![maybe] | ![maybe] | ![maybe] | ![no] | ![yes] | ![yes] | +| GZIP compression | ![yes] | ![maybe] | ![maybe] | ![yes] | ![maybe] | ![yes] | ![maybe] | ![no] | +| Basic Auth | ![yes] | ![no] | ![no] | ![yes] | ![no] | ![yes] | ![no] | ![no] | +| Custom user agent | ![yes] | ![no] | ![no] | ![yes] | ![no] | ![yes] | ![no] | ![no] | +| Relative URLs | ![yes] | ![yes] | ![no] | ![yes] | ![yes] | ![yes] | ![yes] | ![yes] | +| Anchors/Fragments | ![yes] | ![no] | ![no] | ![no] | ![no] | ![yes] | ![yes] | ![no] | +| Skip relative URLs | ![yes] | ![no] | ![no] | ![maybe] | ![no] | ![no] | ![no] | ![no] | +| Include patterns | ![yes]️ | ![yes] | ![no] | ![yes] | ![no] | ![no] | ![no] | ![no] | +| Exclude patterns | ![yes] | ![no] | ![yes] | ![yes] | ![yes] | ![yes] | ![yes] | ![yes] | +| Handle redirects | ![yes] | ![yes] | ![yes] | ![yes] | ![yes] | ![yes] | ![yes] | ![yes] | +| Ignore insecure SSL | ![yes] | ![yes] | ![yes] | ![no] | ![no] | ![yes] | ![no] | ![yes] | +| File globbing | ![yes] | ![yes] | ![no] | ![no] | ![yes] | ![no] | ![yes] | ![no] | +| Limit scheme | ![yes] | ![no] | ![no] | ![yes] | ![no] | ![yes] | ![no] | ![no] | +| [Custom headers] | ![yes] | ![no] | ![yes] | ![no] | ![no] | ![no] | ![yes] | ![yes] | +| Summary | ![yes] | ![yes] | ![yes] | ![maybe] | ![yes] | ![yes] | ![no] | ![yes] | +| `HEAD` requests | ![yes] | ![yes] | ![no] | ![yes] | ![yes] | ![yes] | ![no] | ![no] | +| Colored output | ![yes] | ![maybe] | ![yes] | ![maybe] | ![yes] | ![yes] | ![no] | ![yes] | +| [Filter status code] | ![yes] | ![yes] | ![no] | ![no] | ![no] | ![no] | ![yes] | ![no] | +| Custom timeout | ![yes] | ![yes] | ![yes] | ![no] | ![yes] | ![yes] | ![no] | ![yes] | +| E-mail links | ![yes] | ![no] | ![no] | ![no] | ![no] | ![yes] | ![no] | ![no] | +| Progress bar | ![yes] | ![yes] | ![no] | ![no] | ![no] | ![yes] | ![yes] | ![yes] | +| Retry and backoff | ![yes] | ![no] | ![no] | ![no] | ![yes] | ![no] | ![yes] | ![no] | +| Skip private domains | ![yes] | ![no] | ![no] | ![no] | ![no] | ![no] | ![no] | ![no] | +| [Use as library] | ![yes] | ![yes] | ![no] | ![yes] | ![yes] | ![no] | ![yes] | ![no] | +| Quiet mode | ![yes] | ![no] | ![no] | ![no] | ![yes] | ![yes] | ![yes] | ![yes] | +| [Config file] | ![yes] | ![no] | ![no] | ![no] | ![yes] | ![yes] | ![yes] | ![no] | +| Cookies | ![yes] | ![no] | ![yes] | ![no] | ![no] | ![yes] | ![no] | ![yes] | +| Recursion | ![no] | ![no] | ![yes] | ![yes] | ![yes] | ![yes] | ![yes] | ![no] | +| Amazing lychee logo | ![yes] | ![no] | ![no] | ![no] | ![no] | ![no] | ![no] | ![no] | + +[awesome_bot]: https://github.com/dkhamsing/awesome_bot +[muffet]: https://github.com/raviqqe/muffet +[broken-link-checker]: https://github.com/stevenvachon/broken-link-checker +[linkinator]: https://github.com/JustinBeckwith/linkinator +[linkchecker]: https://github.com/linkchecker/linkchecker +[markdown-link-check]: https://github.com/tcort/markdown-link-check +[fink]: https://github.com/dantleech/fink +[yes]: ./assets/yes.svg +[no]: ./assets/no.svg +[maybe]: ./assets/maybe.svg +[custom headers]: https://github.com/rust-lang/crates.io/issues/788 +[filter status code]: https://github.com/tcort/markdown-link-check/issues/94 +[skip private domains]: https://github.com/appscodelabs/liche/blob/a5102b0bf90203b467a4f3b4597d22cd83d94f99/url_checker.go +[use as library]: https://github.com/raviqqe/liche/issues/13 +[config file]: https://github.com/lycheeverse/lychee/blob/master/lychee.example.toml + +1 Other machine-readable formats like CSV are supported. + +## Commandline usage + +Recursively check all links in supported files inside the current directory + +```sh +lychee . +``` + +You can also specify various types of inputs: + +```sh +# check links in specific local file(s): +lychee README.md +lychee test.html info.txt + +# check links on a website: +lychee https://endler.dev + +# check links in directory but block network requests +lychee --offline path/to/directory + +# check links in a remote file: +lychee https://raw.githubusercontent.com/lycheeverse/lychee/master/README.md + +# check links in local files via shell glob: +lychee ~/projects/*/README.md + +# check links in local files (lychee supports advanced globbing and ~ expansion): +lychee "~/projects/big_project/**/README.*" + +# ignore case when globbing and check result for each link: +lychee --glob-ignore-case "~/projects/**/[r]eadme.*" + +# check links from epub file (requires atool: https://www.nongnu.org/atool) +acat -F zip {file.epub} "*.xhtml" "*.html" | lychee - +``` + +lychee parses other file formats as plaintext and extracts links using [linkify](https://github.com/robinst/linkify). +This generally works well if there are no format or encoding specifics, +but in case you need dedicated support for a new file format, please consider creating an issue. + +### Docker Usage + +Here's how to mount a local directory into the container and check some input +with lychee. + +- The `--init` parameter is passed so that lychee can be stopped from the terminal. +- We also pass `-it` to start an interactive terminal, which is required to show the progress bar. +- The `--rm` removes not used anymore container from the host after the run (self-cleanup). +- The `-w /input` points to `/input` as the default workspace +- The `-v $(pwd):/input` does local volume mounting to the container for lychee access. + +> By default a Debian-based Docker image is used. If you want to run an Alpine-based image, use the `latest-alpine` tag. +> For example, `lycheeverse/lychee:latest-alpine` + +#### Linux/macOS shell command + +```sh +docker run --init -it --rm -w /input -v $(pwd):/input lycheeverse/lychee README.md +``` + +#### Windows PowerShell command + +```powershell +docker run --init -it --rm -w /input -v ${PWD}:/input lycheeverse/lychee README.md +``` + +### GitHub Token + +To avoid getting rate-limited while checking GitHub links, you can optionally +set an environment variable with your GitHub token like so `GITHUB_TOKEN=xxxx`, +or use the `--github-token` CLI option. It can also be set in the config file. +[Here is an example config file][config file]. + +The token can be generated on your [GitHub account settings page](https://github.com/settings/tokens). +A personal access token with no extra permissions is enough to be able to check public repo links. + +For more scalable organization-wide scenarios you can consider a [GitHub App][github-app-overview]. +It has a higher rate limit than personal access tokens but requires additional configuration steps on your GitHub workflow. +Please follow the [GitHub App Setup][github-app-setup] example. + +[github-app-overview]: https://docs.github.com/en/apps/overview +[github-app-setup]: https://github.com/github/combine-prs/blob/main/docs/github-app-setup.md#github-app-setup + +### Commandline Parameters + +There is an extensive list of command line parameters to customize the behavior. +See below for a full list. + +```help-message +lychee is a fast, asynchronous link checker which detects broken URLs and mail addresses in local files and websites. It supports Markdown and HTML and works well with many plain text file formats. + +lychee is powered by lychee-lib, the Rust library for link checking. + +Usage: lychee [OPTIONS] [inputs]... + +Arguments: + [inputs]... + Inputs for link checking (where to get links to check from). These can be: + files (e.g. `README.md`), file globs (e.g. `'~/git/*/README.md'`), remote URLs + (e.g. `https://example.com/README.md`), or standard input (`-`). Alternatively, + use `--files-from` to read inputs from a file. + + NOTE: Use `--` to separate inputs from options that allow multiple arguments. + +Options: + -a, --accept + A List of accepted status codes for valid links + + The following accept range syntax is supported: [start]..[[=]end]|code. Some valid + examples are: + + - 200 (accepts the 200 status code only) + - ..204 (accepts any status code < 204) + - ..=204 (accepts any status code <= 204) + - 200..=204 (accepts any status code from 200 to 204 inclusive) + - 200..205 (accepts any status code from 200 to 205 excluding 205, same as 200..=204) + + Use "lychee --accept '200..=204, 429, 500' ..." to provide a comma- + separated list of accepted status codes. This example will accept 200, 201, + 202, 203, 204, 429, and 500 as valid status codes. + + [default: 100..=103,200..=299] + + --archive + Specify the use of a specific web archive. Can be used in combination with `--suggest` + + [possible values: wayback] + + -b, --base-url + Base URL to use when resolving relative URLs in local files. If specified, + relative links in local files are interpreted as being relative to the given + base URL. + + For example, given a base URL of `https://example.com/dir/page`, the link `a` + would resolve to `https://example.com/dir/a` and the link `/b` would resolve + to `https://example.com/b`. This behavior is not affected by the filesystem + path of the file containing these links. + + Note that relative URLs without a leading slash become siblings of the base + URL. If, instead, the base URL ended in a slash, the link would become a child + of the base URL. For example, a base URL of `https://example.com/dir/page/` and + a link of `a` would resolve to `https://example.com/dir/page/a`. + + Basically, the base URL option resolves links as if the local files were hosted + at the given base URL address. + + The provided base URL value must either be a URL (with scheme) or an absolute path. + Note that certain URL schemes cannot be used as a base, e.g., `data` and `mailto`. + + --base + Deprecated; use `--base-url` instead + + --basic-auth + Basic authentication support. E.g. `http://example.com username:password` + + -c, --config + Configuration file to use + + [default: lychee.toml] + + --cache + Use request cache stored on disk at `.lycheecache` + + --cache-exclude-status + A list of status codes that will be ignored from the cache + + The following exclude range syntax is supported: [start]..[[=]end]|code. Some valid + examples are: + + - 429 (excludes the 429 status code only) + - 500.. (excludes any status code >= 500) + - ..100 (excludes any status code < 100) + - 500..=599 (excludes any status code from 500 to 599 inclusive) + - 500..600 (excludes any status code from 500 to 600 excluding 600, same as 500..=599) + + Use "lychee --cache-exclude-status '429, 500..502' ..." to provide a + comma-separated list of excluded status codes. This example will not cache results + with a status code of 429, 500 and 501. + + --cookie-jar + Tell lychee to read cookies from the given file. Cookies will be stored in the + cookie jar and sent with requests. New cookies will be stored in the cookie jar + and existing cookies will be updated. + + --default-extension + This is the default file extension that is applied to files without an extension. + + This is useful for files without extensions or with unknown extensions. The extension will be used to determine the file type for processing. Examples: --default-extension md, --default-extension html + + --dump + Don't perform any link checking. Instead, dump all the links extracted from inputs that would be checked + + --dump-inputs + Don't perform any link extraction and checking. Instead, dump all input sources from which links would be collected + + -E, --exclude-all-private + Exclude all private IPs from checking. + Equivalent to `--exclude-private --exclude-link-local --exclude-loopback` + + --exclude + Exclude URLs and mail addresses from checking. The values are treated as regular expressions + + --exclude-file + Deprecated; use `--exclude-path` instead + + --exclude-link-local + Exclude link-local IP address range from checking + + --exclude-loopback + Exclude loopback IP address range and localhost from checking + + --exclude-path + Exclude paths from getting checked. The values are treated as regular expressions + + --exclude-private + Exclude private IP address ranges from checking + + --extensions + Test the specified file extensions for URIs when checking files locally. + + Multiple extensions can be separated by commas. Note that if you want to check filetypes, + which have multiple extensions, e.g. HTML files with both .html and .htm extensions, you need to + specify both extensions explicitly. + + [default: md,mkd,mdx,mdown,mdwn,mkdn,mkdown,markdown,html,htm,txt] + + -f, --format + Output format of final status report + + [default: compact] + [possible values: compact, detailed, json, markdown, raw] + + --fallback-extensions + When checking locally, attempts to locate missing files by trying the given + fallback extensions. Multiple extensions can be separated by commas. Extensions + will be checked in order of appearance. + + Example: --fallback-extensions html,htm,php,asp,aspx,jsp,cgi + + Note: This option takes effect on `file://` URIs which do not exist and on + `file://` URIs pointing to directories which resolve to themself (by the + --index-files logic). + + --files-from + Read input filenames from the given file or stdin (if path is '-'). + + This is useful when you have a large number of inputs that would be + cumbersome to specify on the command line directly. + + Examples: + lychee --files-from list.txt + find . -name '*.md' | lychee --files-from - + echo 'README.md' | lychee --files-from - + + File Format: + Each line should contain one input (file path, URL, or glob pattern). + Lines starting with '#' are treated as comments and ignored. + Empty lines are also ignored. + + --generate + Generate special output (e.g. the man page) instead of performing link checking + + [possible values: man] + + --github-token + GitHub API token to use when checking github.com links, to avoid rate limiting + + [env: GITHUB_TOKEN] + + --glob-ignore-case + Ignore case when expanding filesystem path glob inputs + + -h, --help + Print help (see a summary with '-h') + + -H, --header + Set custom header for requests + + Some websites require custom headers to be passed in order to return valid responses. + You can specify custom headers in the format 'Name: Value'. For example, 'Accept: text/html'. + This is the same format that other tools like curl or wget use. + Multiple headers can be specified by using the flag multiple times. + + --hidden + Do not skip hidden directories and files + + -i, --insecure + Proceed for server connections considered insecure (invalid TLS) + + --include + URLs to check (supports regex). Has preference over all excludes + + --include-fragments + Enable the checking of fragments in links + + --include-mail + Also check email addresses + + --include-verbatim + Find links in verbatim sections like `pre`- and `code` blocks + + --include-wikilinks + Check WikiLinks in Markdown files + + --index-files + When checking locally, resolves directory links to a separate index file. + The argument is a comma-separated list of index file names to search for. Index + names are relative to the link's directory and attempted in the order given. + + If `--index-files` is specified, then at least one index file must exist in + order for a directory link to be considered valid. Additionally, the special + name `.` can be used in the list to refer to the directory itself. + + If unspecified (the default behavior), index files are disabled and directory + links are considered valid as long as the directory exists on disk. + + Example 1: `--index-files index.html,readme.md` looks for index.html or readme.md + and requires that at least one exists. + + Example 2: `--index-files index.html,.` will use index.html if it exists, but + still accept the directory link regardless. + + Example 3: `--index-files ''` will reject all directory links because there are + no valid index files. This will require every link to explicitly name + a file. + + Note: This option only takes effect on `file://` URIs which exist and point to a directory. + + -m, --max-redirects + Maximum number of allowed redirects + + [default: 5] + + --max-cache-age + Discard all cached requests older than this duration + + [default: 1d] + + --max-concurrency + Maximum number of concurrent network requests + + [default: 128] + + --max-retries + Maximum number of retries per request + + [default: 3] + + --min-tls + Minimum accepted TLS Version + + [possible values: TLSv1_0, TLSv1_1, TLSv1_2, TLSv1_3] + + --mode + Set the output display mode. Determines how results are presented in the terminal + + [default: color] + [possible values: plain, color, emoji, task] + + -n, --no-progress + Do not show progress bar. + This is recommended for non-interactive shells (e.g. for continuous integration) + + --no-ignore + Do not skip files that would otherwise be ignored by '.gitignore', '.ignore', or the global ignore file + + -o, --output + Output file of status report + + --offline + Only check local files and block network requests + + -q, --quiet... + Less output per occurrence (e.g. `-q` or `-qq`) + + -r, --retry-wait-time + Minimum wait time in seconds between retries of failed requests + + [default: 1] + + --remap + Remap URI matching pattern to different URI + + --require-https + When HTTPS is available, treat HTTP links as errors + + --root-dir + Root directory to use when checking absolute links in local files. This option is + required if absolute links appear in local files, otherwise those links will be + flagged as errors. This must be an absolute path (i.e., one beginning with `/`). + + If specified, absolute links in local files are resolved by prefixing the given + root directory to the requested absolute link. For example, with a root-dir of + `/root/dir`, a link to `/page.html` would be resolved to `/root/dir/page.html`. + + This option can be specified alongside `--base-url`. If both are given, an + absolute link is resolved by constructing a URL from three parts: the domain + name specified in `--base-url`, followed by the `--root-dir` directory path, + followed by the absolute link's own path. + + -s, --scheme + Only test links with the given schemes (e.g. https). Omit to check links with + any other scheme. At the moment, we support http, https, file, and mailto. + + --skip-missing + Skip missing input files (default is to error if they don't exist) + + --suggest + Suggest link replacements for broken links, using a web archive. The web archive can be specified with `--archive` + + -t, --timeout + Website timeout in seconds from connect to response finished + + [default: 20] + + -T, --threads + Number of threads to utilize. Defaults to number of cores available to the system + + -u, --user-agent + User agent + + [default: lychee/0.20.1] + + -v, --verbose... + Set verbosity level; more output per occurrence (e.g. `-v` or `-vv`) + + -V, --version + Print version + + -X, --method + Request method + + [default: get] +``` + +### Exit codes + +0 Success. The operation was completed successfully as instructed. + +1 Missing inputs or any unexpected runtime failures or configuration errors + +2 Link check failures. At least one non-excluded link failed the check. + +3 Encountered errors in the config file. + +### Ignoring links + +You can exclude links from getting checked by specifying regex patterns +with `--exclude` (e.g. `--exclude example\.(com|org)`). + +Here are some examples: + +```bash +# Exclude LinkedIn URLs (note that we match on the full URL, including the schema to avoid false-positives) +lychee --exclude '^https://www\.linkedin\.com' + +# Exclude LinkedIn and Archive.org URLs +lychee --exclude '^https://www\.linkedin\.com' --exclude '^https://web\.archive\.org/web/' + +# Exclude all links to PDF files +lychee --exclude '\.pdf$' . + +# Exclude links to specific domains +lychee --exclude '(facebook|twitter|linkedin)\.com' . + +# Exclude links with certain URL parameters +lychee --exclude '\?utm_source=' . + +# Exclude all mailto links +lychee --exclude '^mailto:' . +``` + +For excluding files/directories from being scanned use `lychee.toml` +and `exclude_path`. + +```toml +exclude_path = ["some/path", "*/dev/*"] +``` + +If a file named `.lycheeignore` exists in the current working directory, its +contents are excluded as well. The file allows you to list multiple regular +expressions for exclusion (one pattern per line). + +For more advanced usage and detailed explanations, check out our comprehensive [guide on excluding links](https://lychee.cli.rs/recipes/excluding-links/). + +### Caching + +If the `--cache` flag is set, lychee will cache responses in a file called +`.lycheecache` in the current directory. If the file exists and the flag is set, +then the cache will be loaded on startup. This can greatly speed up future runs. +Note that by default lychee will not store any data on disk. + +## Library usage + +You can use lychee as a library for your own projects! +Here is a "hello world" example: + +```rust +use lychee_lib::Result; + +#[tokio::main] +async fn main() -> Result<()> { + let response = lychee_lib::check("https://github.com/lycheeverse/lychee").await?; + println!("{response}"); + Ok(()) +} +``` + +This is equivalent to the following snippet, in which we build our own client: + +```rust +use lychee_lib::{ClientBuilder, Result, Status}; + +#[tokio::main] +async fn main() -> Result<()> { + let client = ClientBuilder::default().client()?; + let response = client.check("https://github.com/lycheeverse/lychee").await?; + assert!(response.status().is_success()); + Ok(()) +} +``` + +The client builder is very customizable: + +```rust, ignore +let client = lychee_lib::ClientBuilder::builder() + .includes(includes) + .excludes(excludes) + .max_redirects(cfg.max_redirects) + .user_agent(cfg.user_agent) + .allow_insecure(cfg.insecure) + .custom_headers(headers) + .method(method) + .timeout(timeout) + .github_token(cfg.github_token) + .scheme(cfg.scheme) + .accepted(accepted) + .build() + .client()?; +``` + +All options that you set will be used for all link checks. +See the [builder documentation](https://docs.rs/lychee-lib/latest/lychee_lib/struct.ClientBuilder.html) +for all options. For more information, check out the [examples](examples) +directory. The examples can be run with `cargo run --example `. + +## GitHub Action Usage + +A GitHub Action that uses lychee is available as a separate repository: [lycheeverse/lychee-action](https://github.com/lycheeverse/lychee-action) +which includes usage instructions. + +## Pre-commit Usage + +Lychee can also be used as a [pre-commit](https://pre-commit.com/) hook. + +```yaml +# .pre-commit-config.yaml +repos: + - repo: https://github.com/lycheeverse/lychee.git + rev: v0.15.1 + hooks: + - id: lychee + # Optionally include additional CLI arguments + args: ["--no-progress", "--exclude", "file://"] +``` + +Rather than running on staged-files only, Lychee can be run against an entire repository. + +```yaml +- id: lychee + args: ["--no-progress", "."] + pass_filenames: false +``` + +## Contributing to lychee + +We'd be thankful for any contribution. \ +We try to keep the issue tracker up-to-date so you can quickly find a task to work on. + +Try one of these links to get started: + +- [good first issues](https://github.com/lycheeverse/lychee/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22) +- [help wanted](https://github.com/lycheeverse/lychee/issues?q=is%3Aissue+is%3Aopen+label%3A%22help+wanted%22) + +For more detailed instructions, head over to [`CONTRIBUTING.md`](/CONTRIBUTING.md). + +## Troubleshooting and Workarounds + +We collect a list of common workarounds for various websites in our [troubleshooting guide](./docs/TROUBLESHOOTING.md). + +## Users + +Here is a list of some notable projects who are using lychee. + +- https://github.com/InnerSourceCommons/InnerSourcePatterns +- https://github.com/opensearch-project/OpenSearch +- https://github.com/ramitsurana/awesome-kubernetes +- https://github.com/papers-we-love/papers-we-love +- https://github.com/pingcap/docs +- https://github.com/microsoft/WhatTheHack +- https://github.com/nix-community/awesome-nix +- https://github.com/balena-io/docs +- https://github.com/launchdarkly/LaunchDarkly-Docs +- https://github.com/pawroman/links +- https://github.com/analysis-tools-dev/static-analysis +- https://github.com/analysis-tools-dev/dynamic-analysis +- https://github.com/mre/idiomatic-rust +- https://github.com/bencherdev/bencher +- https://github.com/sindresorhus/execa +- https://github.com/tldr-pages/tldr-maintenance +- https://github.com/git-ecosystem/git-credential-manager +- https://github.com/git/git-scm.com +- https://github.com/OWASP/threat-dragon +- https://github.com/oxc-project/oxc +- https://github.com/hugsy/gef +- https://github.com/mermaid-js/mermaid +- https://github.com/hashicorp/consul +- https://github.com/Unleash/unleash +- https://github.com/fastify/fastify +- https://github.com/nuxt/nuxt +- https://github.com/containerd/containerd +- https://github.com/rolldown/rolldown +- https://github.com/rerun-io/rerun +- https://github.com/0xAX/asm +- https://github.com/mainmatter/100-exercises-to-learn-rust +- https://github.com/GoogleCloudPlatform/generative-ai +- https://github.com/DioxusLabs/dioxus +- https://github.com/ministryofjustice/modernisation-platform +- https://github.com/orhun/binsider +- https://github.com/NVIDIA/aistore +- https://github.com/gradle/gradle +- https://github.com/forus-labs/forui +- https://github.com/FreeBSD-Ask/FreeBSD-Ask +- https://github.com/prosekit/prosekit +- https://github.com/lycheeverse/lychee (yes, lychee is checked with lychee 🤯) + +If you are using lychee for your project, **please add it here**. + +## Credits + +The first prototype of lychee was built in [episode 10 of Hello +Rust](https://hello-rust.github.io/10/). Thanks to all GitHub and Patreon sponsors +for supporting the development since the beginning. Also, thanks to all the +great contributors who have since made this project more mature. + +## License + +lychee is licensed under either of + +- Apache License, Version 2.0, ([LICENSE-APACHE](https://github.com/lycheeverse/lychee/blob/master/LICENSE-APACHE) or + https://www.apache.org/licenses/LICENSE-2.0) +- MIT license ([LICENSE-MIT](https://github.com/lycheeverse/lychee/blob/master/LICENSE-MIT) or https://opensource.org/licenses/MIT) + +at your option. + +

+[🔼 Back to top](#back-to-top) diff --git a/docs/TROUBLESHOOTING.md b/docs/TROUBLESHOOTING.md new file mode 100644 index 000000000..d44c3c95f --- /dev/null +++ b/docs/TROUBLESHOOTING.md @@ -0,0 +1,75 @@ +# Troubleshooting Guide + +This document describes common edge-cases and workarounds for checking links to various sites. \ +Please add your own findings and send us a pull request if you can. + +## GitHub Rate Limiting + +GitHub has a quite aggressive rate limiter. \ +If you're seeing errors like: + +``` +GitHub token not specified. To check GitHub links reliably, use `--github-token` flag / `GITHUB_TOKEN` env var. +``` + +That means you're getting rate-limited. As per the message, you can make lychee \ +use a GitHub personal access token to circumvent this. + +For more details, see ["GitHub token" section in README.md](https://github.com/lycheeverse/lychee#github-token). + +## Too Many Open Files + +The number of concurrent network requests (`MAX_CONCURRENCY`) is set to 128 by default. +Every network request maps to an open socket, which is represented as a file on UNIX systems. +If you see error messages like "error trying to connect: tcp open error: Too +many open files (os error 24)" then you ran out of file handles. + +You have two options: + +1. Lower the concurrency by setting `--max-concurrency` to something more + conservative like 32. This works, but it also comes with a performance + penalty. +2. Increase the number of maximum file handles. See instructions + [here](https://web.archive.org/web/20241127024709/https://wilsonmar.github.io/maximum-limits/) or + [here](https://synthomat.de/blog/2020/01/increasing-the-file-descriptor-limit-on-macos/). + +## Unexpected Status Codes + +Some websites don't respond with a `200` (OK) status code. \ +Instead they might send `204` (No Content), `206` (Partial Content), or +[something else entirely](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/418). + +If you run into such issues you can work around that by providing a custom \ +list of accepted status codes, such as `--accept 200,204,206`. + +## Website Expects Custom Headers + +Some sites expect one or more custom headers to return a valid response. \ +For example, crates.io expects a `Accept: text/html` header or else it \ +will [return a 404](https://github.com/rust-lang/crates.io/issues/788). + +To fix that you can pass additional headers like so: `--header "Accept: text/html"`. \ +You can use that argument multiple times to add more headers. \ +Or, you can accept all content/MIME types: `--header "Accept: */*"`. + +See more info about the Accept header +[over at MDN](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept). + +## Unreachable Mail Address + +You can check email addresses by providing the `--include-mail` flag. +We use https://github.com/reacherhq/check-if-email-exists for email checking. +You can test your mail address with curl: + +```bash + curl -X POST \ + 'https://api.reacher.email/v0/check_email' \ + -H 'content-type: application/json' \ + -H 'authorization: test_api_token' \ + -d '{"to_email": "box@domain.test"}' +``` + +Some settings on your mail server (such as `SPF` Policy, `DNSBL`) may prevent +your email from being verified. If you have an error with checking a working +email, you can exclude specific addresses with the `--exclude` flag or skip +all email addresses by removing the `--include-mail` flag. diff --git a/docs/lychee.1 b/docs/lychee.1 new file mode 100644 index 000000000..b217a6e0a --- /dev/null +++ b/docs/lychee.1 @@ -0,0 +1,404 @@ +.ie \n(.g .ds Aq \(aq +.el .ds Aq ' +.TH lychee 1 %cs "lychee 0.21.0" +.ie \n(.g .ds Aq \(aq +.el .ds Aq ' +.SH NAME +lychee \- A fast, async link checker +.ie \n(.g .ds Aq \(aq +.el .ds Aq ' +.SH SYNOPSIS +\fBlychee\fR [\fB\-\-files\-from\fR] [\fB\-c\fR|\fB\-\-config\fR] [\fB\-v\fR|\fB\-\-verbose\fR]... [\fB\-q\fR|\fB\-\-quiet\fR]... [\fB\-n\fR|\fB\-\-no\-progress\fR] [\fB\-\-extensions\fR] [\fB\-\-default\-extension\fR] [\fB\-\-cache\fR] [\fB\-\-max\-cache\-age\fR] [\fB\-\-cache\-exclude\-status\fR] [\fB\-\-dump\fR] [\fB\-\-dump\-inputs\fR] [\fB\-\-archive\fR] [\fB\-\-suggest\fR] [\fB\-m\fR|\fB\-\-max\-redirects\fR] [\fB\-\-max\-retries\fR] [\fB\-\-min\-tls\fR] [\fB\-\-max\-concurrency\fR] [\fB\-T\fR|\fB\-\-threads\fR] [\fB\-u\fR|\fB\-\-user\-agent\fR] [\fB\-i\fR|\fB\-\-insecure\fR] [\fB\-s\fR|\fB\-\-scheme\fR] [\fB\-\-offline\fR] [\fB\-\-include\fR] [\fB\-\-exclude\fR] [\fB\-\-exclude\-file\fR] [\fB\-\-exclude\-path\fR] [\fB\-E\fR|\fB\-\-exclude\-all\-private\fR] [\fB\-\-exclude\-private\fR] [\fB\-\-exclude\-link\-local\fR] [\fB\-\-exclude\-loopback\fR] [\fB\-\-include\-mail\fR] [\fB\-\-remap\fR] [\fB\-\-fallback\-extensions\fR] [\fB\-\-index\-files\fR] [\fB\-H\fR|\fB\-\-header\fR] [\fB\-a\fR|\fB\-\-accept\fR] [\fB\-\-include\-fragments\fR] [\fB\-t\fR|\fB\-\-timeout\fR] [\fB\-r\fR|\fB\-\-retry\-wait\-time\fR] [\fB\-X\fR|\fB\-\-method\fR] [\fB\-\-base\fR] [\fB\-b\fR|\fB\-\-base\-url\fR] [\fB\-\-root\-dir\fR] [\fB\-\-basic\-auth\fR] [\fB\-\-github\-token\fR] [\fB\-\-skip\-missing\fR] [\fB\-\-no\-ignore\fR] [\fB\-\-hidden\fR] [\fB\-\-include\-verbatim\fR] [\fB\-\-glob\-ignore\-case\fR] [\fB\-o\fR|\fB\-\-output\fR] [\fB\-\-mode\fR] [\fB\-f\fR|\fB\-\-format\fR] [\fB\-\-generate\fR] [\fB\-\-require\-https\fR] [\fB\-\-cookie\-jar\fR] [\fB\-\-include\-wikilinks\fR] [\fB\-h\fR|\fB\-\-help\fR] [\fB\-V\fR|\fB\-\-version\fR] [\fIinputs\fR] +.ie \n(.g .ds Aq \(aq +.el .ds Aq ' +.SH DESCRIPTION +lychee is a fast, asynchronous link checker which detects broken URLs and mail addresses in local files and websites. It supports Markdown and HTML and works well with many plain text file formats. +.PP +lychee is powered by lychee\-lib, the Rust library for link checking. +.ie \n(.g .ds Aq \(aq +.el .ds Aq ' +.SH OPTIONS +.TP +\fB\-\-files\-from\fR \fI\fR +Read input filenames from the given file or stdin (if path is \*(Aq\-\*(Aq). + +This is useful when you have a large number of inputs that would be +cumbersome to specify on the command line directly. + +Examples: + lychee \-\-files\-from list.txt + find . \-name \*(Aq*.md\*(Aq | lychee \-\-files\-from \- + echo \*(AqREADME.md\*(Aq | lychee \-\-files\-from \- + +File Format: + Each line should contain one input (file path, URL, or glob pattern). + Lines starting with \*(Aq#\*(Aq are treated as comments and ignored. + Empty lines are also ignored. +.TP +\fB\-c\fR, \fB\-\-config\fR \fI\fR +Configuration file to use + +[default: lychee.toml] +.TP +\fB\-v\fR, \fB\-\-verbose\fR +Set verbosity level; more output per occurrence (e.g. `\-v` or `\-vv`) +.TP +\fB\-q\fR, \fB\-\-quiet\fR +Less output per occurrence (e.g. `\-q` or `\-qq`) +.TP +\fB\-n\fR, \fB\-\-no\-progress\fR +Do not show progress bar. +This is recommended for non\-interactive shells (e.g. for continuous integration) +.TP +\fB\-\-extensions\fR \fI\fR [default: md,mkd,mdx,mdown,mdwn,mkdn,mkdown,markdown,html,htm,txt] +Test the specified file extensions for URIs when checking files locally. + +Multiple extensions can be separated by commas. Note that if you want to check filetypes, +which have multiple extensions, e.g. HTML files with both .html and .htm extensions, you need to +specify both extensions explicitly. +.TP +\fB\-\-default\-extension\fR \fI\fR +This is the default file extension that is applied to files without an extension. + +This is useful for files without extensions or with unknown extensions. The extension will be used to determine the file type for processing. Examples: \-\-default\-extension md, \-\-default\-extension html +.TP +\fB\-\-cache\fR +Use request cache stored on disk at `.lycheecache` +.TP +\fB\-\-max\-cache\-age\fR \fI\fR [default: 1d] +Discard all cached requests older than this duration +.TP +\fB\-\-cache\-exclude\-status\fR \fI\fR +A list of status codes that will be ignored from the cache + +The following exclude range syntax is supported: [start]..[[=]end]|code. Some valid +examples are: + +\- 429 (excludes the 429 status code only) +\- 500.. (excludes any status code >= 500) +\- ..100 (excludes any status code < 100) +\- 500..=599 (excludes any status code from 500 to 599 inclusive) +\- 500..600 (excludes any status code from 500 to 600 excluding 600, same as 500..=599) + +Use "lychee \-\-cache\-exclude\-status \*(Aq429, 500..502\*(Aq ..." to provide a +comma\-separated list of excluded status codes. This example will not cache results +with a status code of 429, 500 and 501. +.TP +\fB\-\-dump\fR +Don\*(Aqt perform any link checking. Instead, dump all the links extracted from inputs that would be checked +.TP +\fB\-\-dump\-inputs\fR +Don\*(Aqt perform any link extraction and checking. Instead, dump all input sources from which links would be collected +.TP +\fB\-\-archive\fR \fI\fR +Specify the use of a specific web archive. Can be used in combination with `\-\-suggest` +.br + +.br +[\fIpossible values: \fRwayback] +.TP +\fB\-\-suggest\fR +Suggest link replacements for broken links, using a web archive. The web archive can be specified with `\-\-archive` +.TP +\fB\-m\fR, \fB\-\-max\-redirects\fR \fI\fR [default: 5] +Maximum number of allowed redirects +.TP +\fB\-\-max\-retries\fR \fI\fR [default: 3] +Maximum number of retries per request +.TP +\fB\-\-min\-tls\fR \fI\fR +Minimum accepted TLS Version +.br + +.br +[\fIpossible values: \fRTLSv1_0, TLSv1_1, TLSv1_2, TLSv1_3] +.TP +\fB\-\-max\-concurrency\fR \fI\fR [default: 128] +Maximum number of concurrent network requests +.TP +\fB\-T\fR, \fB\-\-threads\fR \fI\fR +Number of threads to utilize. Defaults to number of cores available to the system +.TP +\fB\-u\fR, \fB\-\-user\-agent\fR \fI\fR [default: lychee/0.21.0] +User agent +.TP +\fB\-i\fR, \fB\-\-insecure\fR +Proceed for server connections considered insecure (invalid TLS) +.TP +\fB\-s\fR, \fB\-\-scheme\fR \fI\fR +Only test links with the given schemes (e.g. https). Omit to check links with +any other scheme. At the moment, we support http, https, file, and mailto. +.TP +\fB\-\-offline\fR +Only check local files and block network requests +.TP +\fB\-\-include\fR \fI\fR +URLs to check (supports regex). Has preference over all excludes +.TP +\fB\-\-exclude\fR \fI\fR +Exclude URLs and mail addresses from checking. The values are treated as regular expressions +.TP +\fB\-\-exclude\-file\fR \fI\fR +Deprecated; use `\-\-exclude\-path` instead +.TP +\fB\-\-exclude\-path\fR \fI\fR +Exclude paths from getting checked. The values are treated as regular expressions +.TP +\fB\-E\fR, \fB\-\-exclude\-all\-private\fR +Exclude all private IPs from checking. +Equivalent to `\-\-exclude\-private \-\-exclude\-link\-local \-\-exclude\-loopback` +.TP +\fB\-\-exclude\-private\fR +Exclude private IP address ranges from checking +.TP +\fB\-\-exclude\-link\-local\fR +Exclude link\-local IP address range from checking +.TP +\fB\-\-exclude\-loopback\fR +Exclude loopback IP address range and localhost from checking +.TP +\fB\-\-include\-mail\fR +Also check email addresses +.TP +\fB\-\-remap\fR \fI\fR +Remap URI matching pattern to different URI +.TP +\fB\-\-fallback\-extensions\fR \fI\fR +When checking locally, attempts to locate missing files by trying the given +fallback extensions. Multiple extensions can be separated by commas. Extensions +will be checked in order of appearance. + +Example: \-\-fallback\-extensions html,htm,php,asp,aspx,jsp,cgi + +Note: This option takes effect on `file://` URIs which do not exist and on + `file://` URIs pointing to directories which resolve to themself (by the + \-\-index\-files logic). +.TP +\fB\-\-index\-files\fR \fI\fR +When checking locally, resolves directory links to a separate index file. +The argument is a comma\-separated list of index file names to search for. Index +names are relative to the link\*(Aqs directory and attempted in the order given. + +If `\-\-index\-files` is specified, then at least one index file must exist in +order for a directory link to be considered valid. Additionally, the special +name `.` can be used in the list to refer to the directory itself. + +If unspecified (the default behavior), index files are disabled and directory +links are considered valid as long as the directory exists on disk. + +Example 1: `\-\-index\-files index.html,readme.md` looks for index.html or readme.md + and requires that at least one exists. + +Example 2: `\-\-index\-files index.html,.` will use index.html if it exists, but + still accept the directory link regardless. + +Example 3: `\-\-index\-files \*(Aq\*(Aq` will reject all directory links because there are + no valid index files. This will require every link to explicitly name + a file. + +Note: This option only takes effect on `file://` URIs which exist and point to a directory. +.TP +\fB\-H\fR, \fB\-\-header\fR \fI\fR +Set custom header for requests + +Some websites require custom headers to be passed in order to return valid responses. +You can specify custom headers in the format \*(AqName: Value\*(Aq. For example, \*(AqAccept: text/html\*(Aq. +This is the same format that other tools like curl or wget use. +Multiple headers can be specified by using the flag multiple times. +.TP +\fB\-a\fR, \fB\-\-accept\fR \fI\fR [default: 100..=103,200..=299] +A List of accepted status codes for valid links + +The following accept range syntax is supported: [start]..[[=]end]|code. Some valid +examples are: + +\- 200 (accepts the 200 status code only) +\- ..204 (accepts any status code < 204) +\- ..=204 (accepts any status code <= 204) +\- 200..=204 (accepts any status code from 200 to 204 inclusive) +\- 200..205 (accepts any status code from 200 to 205 excluding 205, same as 200..=204) + +Use "lychee \-\-accept \*(Aq200..=204, 429, 500\*(Aq ..." to provide a comma\- +separated list of accepted status codes. This example will accept 200, 201, +202, 203, 204, 429, and 500 as valid status codes. +.TP +\fB\-\-include\-fragments\fR +Enable the checking of fragments in links +.TP +\fB\-t\fR, \fB\-\-timeout\fR \fI\fR [default: 20] +Website timeout in seconds from connect to response finished +.TP +\fB\-r\fR, \fB\-\-retry\-wait\-time\fR \fI\fR [default: 1] +Minimum wait time in seconds between retries of failed requests +.TP +\fB\-X\fR, \fB\-\-method\fR \fI\fR [default: get] +Request method +.TP +\fB\-\-base\fR \fI\fR +Deprecated; use `\-\-base\-url` instead +.TP +\fB\-b\fR, \fB\-\-base\-url\fR \fI\fR +Base URL to use when resolving relative URLs in local files. If specified, +relative links in local files are interpreted as being relative to the given +base URL. + +For example, given a base URL of `https://example.com/dir/page`, the link `a` +would resolve to `https://example.com/dir/a` and the link `/b` would resolve +to `https://example.com/b`. This behavior is not affected by the filesystem +path of the file containing these links. + +Note that relative URLs without a leading slash become siblings of the base +URL. If, instead, the base URL ended in a slash, the link would become a child +of the base URL. For example, a base URL of `https://example.com/dir/page/` and +a link of `a` would resolve to `https://example.com/dir/page/a`. + +Basically, the base URL option resolves links as if the local files were hosted +at the given base URL address. + +The provided base URL value must either be a URL (with scheme) or an absolute path. +Note that certain URL schemes cannot be used as a base, e.g., `data` and `mailto`. +.TP +\fB\-\-root\-dir\fR \fI\fR +Root directory to use when checking absolute links in local files. This option is +required if absolute links appear in local files, otherwise those links will be +flagged as errors. This must be an absolute path (i.e., one beginning with `/`). + +If specified, absolute links in local files are resolved by prefixing the given +root directory to the requested absolute link. For example, with a root\-dir of +`/root/dir`, a link to `/page.html` would be resolved to `/root/dir/page.html`. + +This option can be specified alongside `\-\-base\-url`. If both are given, an +absolute link is resolved by constructing a URL from three parts: the domain +name specified in `\-\-base\-url`, followed by the `\-\-root\-dir` directory path, +followed by the absolute link\*(Aqs own path. +.TP +\fB\-\-basic\-auth\fR \fI\fR +Basic authentication support. E.g. `http://example.com username:password` +.TP +\fB\-\-github\-token\fR \fI\fR +GitHub API token to use when checking github.com links, to avoid rate limiting +.RS +May also be specified with the \fBGITHUB_TOKEN\fR environment variable. +.RE +.TP +\fB\-\-skip\-missing\fR +Skip missing input files (default is to error if they don\*(Aqt exist) +.TP +\fB\-\-no\-ignore\fR +Do not skip files that would otherwise be ignored by \*(Aq.gitignore\*(Aq, \*(Aq.ignore\*(Aq, or the global ignore file +.TP +\fB\-\-hidden\fR +Do not skip hidden directories and files +.TP +\fB\-\-include\-verbatim\fR +Find links in verbatim sections like `pre`\- and `code` blocks +.TP +\fB\-\-glob\-ignore\-case\fR +Ignore case when expanding filesystem path glob inputs +.TP +\fB\-o\fR, \fB\-\-output\fR \fI\fR +Output file of status report +.TP +\fB\-\-mode\fR \fI\fR [default: color] +Set the output display mode. Determines how results are presented in the terminal +.br + +.br +[\fIpossible values: \fRplain, color, emoji, task] +.TP +\fB\-f\fR, \fB\-\-format\fR \fI\fR [default: compact] +Output format of final status report +.br + +.br +[\fIpossible values: \fRcompact, detailed, json, markdown, raw] +.TP +\fB\-\-generate\fR \fI\fR +Generate special output (e.g. the man page) instead of performing link checking +.br + +.br +[\fIpossible values: \fRman] +.TP +\fB\-\-require\-https\fR +When HTTPS is available, treat HTTP links as errors +.TP +\fB\-\-cookie\-jar\fR \fI\fR +Tell lychee to read cookies from the given file. Cookies will be stored in the +cookie jar and sent with requests. New cookies will be stored in the cookie jar +and existing cookies will be updated. +.TP +\fB\-\-include\-wikilinks\fR +Check WikiLinks in Markdown files +.TP +\fB\-h\fR, \fB\-\-help\fR +Print help (see a summary with \*(Aq\-h\*(Aq) +.TP +\fB\-V\fR, \fB\-\-version\fR +Print version +.TP +[\fIinputs\fR] +Inputs for link checking (where to get links to check from). These can be: +files (e.g. `README.md`), file globs (e.g. `\*(Aq~/git/*/README.md\*(Aq`), remote URLs +(e.g. `https://example.com/README.md`), or standard input (`\-`). Alternatively, +use `\-\-files\-from` to read inputs from a file. + +NOTE: Use `\-\-` to separate inputs from options that allow multiple arguments. +.ie \n(.g .ds Aq \(aq +.el .ds Aq ' +.SH EXAMPLES +Check all links in supported files by specifying a directory + + $ lychee . + +Specify files explicitly or use glob patterns + + $ lychee README.md test.html info.txt + $ lychee \*(Aqpublic/**/*.html\*(Aq \*(Aq*.md\*(Aq + +Check all links on a website + + $ lychee https://example.com + +Check links from stdin + + $ cat test.md | lychee \- + $ echo \*(Aqhttps://example.com\*(Aq | lychee \- + +Links can be excluded and included with regular expressions + + $ lychee \-\-exclude \*(Aq^https?://blog\\.example\\.com\*(Aq \-\-exclude \*(Aq\\.(pdf|zip|png|jpg)$\*(Aq . + +Further examples can be found in the online documentation at + + +.ie \n(.g .ds Aq \(aq +.el .ds Aq ' +.SH "EXIT CODES" + +0 Success. The operation was completed successfully as instructed. + +1 Missing inputs or any unexpected runtime failures or configuration errors + +2 Link check failures. At least one non\-excluded link failed the check. + +3 Encountered errors in the config file. + +.ie \n(.g .ds Aq \(aq +.el .ds Aq ' +.SH "REPORTING BUGS" +Report any bugs or questions to + +Questions can also be asked on +.ie \n(.g .ds Aq \(aq +.el .ds Aq ' +.SH VERSION +v0.21.0 +.ie \n(.g .ds Aq \(aq +.el .ds Aq ' +.SH AUTHORS +Matthias Endler + +Thomas Zahner + +A huge thank you to all the wonderful contributors who helped make this project a success. diff --git a/src/main/resources/templates/welcome.html b/src/main/resources/templates/welcome.html index a0b8edf5f..08c137ec7 100644 --- a/src/main/resources/templates/welcome.html +++ b/src/main/resources/templates/welcome.html @@ -114,70 +114,70 @@
🚀 Ready to Start?
OWASP Project Leaders: Top Contributors: Contributors: Testers: Special mentions for helping out: diff --git a/static-site/pr-2125/pages/welcome.html b/static-site/pr-2125/pages/welcome.html index f1d3591b8..961c7abbc 100644 --- a/static-site/pr-2125/pages/welcome.html +++ b/static-site/pr-2125/pages/welcome.html @@ -626,70 +626,70 @@
🚀 Ready to Start?
OWASP Project Leaders: Top Contributors: Contributors: Testers: Special mentions for helping out: