The scraping API that doesn't rot

haul is a self-hosted scraping and extraction API. One endpoint, four escalating tiers — from a plain fetch to a managed unblocker — so you get the page whether it's static HTML or behind Cloudflare. No per-page credits, no silent caps.

$ curl https://api.haul.sh/v1/scrape \
   -H "Authorization: Bearer $HAUL_KEY" \
   -d '{"url":"https://stripe.com/pricing","formats":["markdown"]}'

{
  "status": 200,
  "tier_used": "fetch",
  "cache_hit": false,
  "markdown": "# Pricing\n\nStart building today..."
}

One endpoint. Four tiers.

You never choose a tier. haul starts with the cheapest fetch and escalates only when a site fights back — so easy pages stay fast and hard pages still come back.

Tier 1 · Plain fetch

A fast HTTP GET with a real browser fingerprint. Handles the majority of static and server-rendered marketing, docs, and pricing pages — for free.

avatar

I want to generate an image of two people, fighting outside a bar. They fight to the core. Once they're done, they sit down and drink beer.

Certainly, I'm generating this picture for you in a while. BTW are you talking about THAT movie?

avatar

I don't know what you're talking about.

Are you sure?

avatar

Yes, I'm sure. But if you're generating that scene, make sure the fighters have clown shoes and rubber chickens instead of fists!

Affirmative, here's your image.

header
header

Tier 2 · Headless browser

Renders JS-heavy pages and SPAs in real Chromium when a plain fetch comes back as an empty shell.

Hello chat! Give me all the links from this website - https://ui.aceternity.com
Why don't you do it yourself?
Umm.. Because I'm paying $20/mo for your services?
You think I work for the money?
Who do you think you are?
I' batman.
Now Playing
Something in the way - Nirvana

Tier 3 · Proxy pool

Routes through rotating residential proxies when a site blocks datacenter IPs — without you changing a single line of code.

Add LLM

Add

Groq LLM

23rd March

OpenAI GPT0

21st March

Stable DIffusion

3rd May

Llama 2

1st April

Claude 200k

2nd June

Tier 4 · Managed unblocker

Hands the hostile few to a managed unblocker as a last resort. You never pick a tier — the gateway escalates automatically on real block signals.

DockerDigital Oceanfile_type_tailwindTailwind CSSFramer MotionRedis Vercel AWS GitHub
DockerDigital Oceanfile_type_tailwindTailwind CSSFramer MotionRedis Vercel AWS GitHub
DockerDigital Oceanfile_type_tailwindTailwind CSSFramer MotionRedis Vercel AWS GitHub
Automatic tier escalation

Cheapest-first. Fetch → browser → proxy → unblocker, escalating only on real bot-block signals.

Structured extraction

Turn any page into JSON that matches your schema — LLM extraction with validation and a corrective retry.

Shared cache

A cross-key Redis cache means a URL is scraped once and reused everywhere. Fast, and cheap.

Polite by default

Per-host concurrency, a min-delay between hits, and robots.txt respected out of the box.

Batch & crawl

Async jobs: scrape a list of URLs or BFS-crawl a whole site, then poll one job id.

Change detection

Watch a URL and get a webhook only when the content actually changes — not on every poll.

Observability

Prometheus metrics and per-key usage counters. No more pipelines that rot in silence.

Self-hosted & uncapped

Runs on your own Railway or Docker. No per-page credits, no vendor cap to slam into.

Stop watching your data pipeline rot.

haul is in private beta. Request a key, point your scrapers at one endpoint, and get the freshness and visibility you never had on a metered API.

Self-hosted web scraping & extraction. One endpoint, four tiers, no caps.
Copyright © 2026 haul
All rights reserved

haul