# haul

> haul is a self-hosted web scraping and extraction API. One endpoint, four escalating tiers — from a plain HTTP fetch to a headless browser to residential proxies — plus structured LLM extraction, a shared cache, and built-in observability. Uncapped and open core, running on your own Railway or Docker infrastructure.

You never choose a tier. haul starts with the cheapest fetch and escalates only when a site fights back, so easy pages stay fast and hard pages still come back.

## How it works

- **Fetch** — a fast HTTP GET with a real browser fingerprint. Handles the majority of static and server-rendered marketing, docs, and pricing pages, for free.
- **Browser** — renders JS-heavy pages and SPAs in real Chromium when a plain fetch comes back as an empty shell.
- **Proxy** — routes through rotating residential proxies when a site blocks datacenter IPs, without you changing a line of code.
- **Extract** — structured LLM extraction that turns a page into typed JSON against your schema.

## Why haul

- Self-hosted and uncapped: no per-page limits, no per-request billing. You run it on your own infrastructure.
- Open core: the engine is open source.
- Built-in shared cache and observability so repeat scrapes are cheap and every request is traceable.
- A drop-in replacement for hosted scraping APIs like Firecrawl.

## Links

- Homepage: https://haul.sh
- API base: https://api.haul.sh