The scraping API that doesn't rot

haul is a self-hosted scraping and extraction API. One endpoint, four escalating tiers — from a plain fetch to a managed unblocker — so you get the page whether it's static HTML or behind Cloudflare. No per-page credits, no silent caps.

Request access See what it does

$ curl https://api.haul.sh/v1/scrape \
   -H "Authorization: Bearer $HAUL_KEY" \
   -d '{"url":"https://stripe.com/pricing","formats":["markdown"]}'

{
  "status": 200,
  "tier_used": "fetch",
  "cache_hit": false,
  "markdown": "# Pricing\n\nStart building today..."
}

One endpoint. Four tiers.

You never choose a tier. haul starts with the cheapest fetch and escalates only when a site fights back — so easy pages stay fast and hard pages still come back.

Tier 1 · Plain fetch

A fast HTTP GET with a real browser fingerprint. Handles the majority of static and server-rendered marketing, docs, and pricing pages — for free.

I want to generate an image of two people, fighting outside a bar. They fight to the core. Once they're done, they sit down and drink beer.

Certainly, I'm generating this picture for you in a while. BTW are you talking about THAT movie?

I don't know what you're talking about.

Are you sure?

Yes, I'm sure. But if you're generating that scene, make sure the fighters have clown shoes and rubber chickens instead of fists!

Affirmative, here's your image.

Tier 2 · Headless browser

Renders JS-heavy pages and SPAs in real Chromium when a plain fetch comes back as an empty shell.

Hello chat! Give me all the links from this website - https://ui.aceternity.com

Why don't you do it yourself?

Umm.. Because I'm paying $20/mo for your services?

You think I work for the money?

Who do you think you are?

I' batman.

Now Playing
Something in the way - Nirvana

Tier 3 · Proxy pool

Routes through rotating residential proxies when a site blocks datacenter IPs — without you changing a single line of code.

Tier 4 · Managed unblocker

Hands the hostile few to a managed unblocker as a last resort. You never pick a tier — the gateway escalates automatically on real block signals.

DockerDigital OceanTailwind CSSFramer MotionRedis Vercel AWS GitHub

Automatic tier escalation

Cheapest-first. Fetch → browser → proxy → unblocker, escalating only on real bot-block signals.

Structured extraction

Turn any page into JSON that matches your schema — LLM extraction with validation and a corrective retry.

Shared cache

A cross-key Redis cache means a URL is scraped once and reused everywhere. Fast, and cheap.

Polite by default

Per-host concurrency, a min-delay between hits, and robots.txt respected out of the box.

Batch & crawl

Async jobs: scrape a list of URLs or BFS-crawl a whole site, then poll one job id.

Change detection

Watch a URL and get a webhook only when the content actually changes — not on every poll.

Observability

Prometheus metrics and per-key usage counters. No more pipelines that rot in silence.

Self-hosted & uncapped

Runs on your own Railway or Docker. No per-page credits, no vendor cap to slam into.

Stop watching your data pipeline rot.

haul is in private beta. Request a key, point your scrapers at one endpoint, and get the freshness and visibility you never had on a metered API.

Request access

haul

Self-hosted web scraping & extraction. One endpoint, four tiers, no caps.

Features Capabilities Request access

GitHub Status

haul