~ / guides / Web Scraping With cURL: A Practical Guide

Web Scraping With cURL: A Practical Guide

MR
Marcus Reed
Founder & lead tester · about the author
the short version
  • cURL is a command-line tool for sending HTTP requests. It's the fastest way to inspect a page, test headers, and confirm what a server returns before you write a scraper.
  • The core flags: -A sets a user agent, -H adds headers, -L follows redirects, -o saves output, -d posts data.
  • Every command below was run with curl 8.19; the response codes and sizes are the real output.
  • cURL fetches raw HTML. It doesn't parse it or run JavaScript, so it pairs with a parser like BeautifulSoup for the extraction step.

cURL is the tool I open before writing a single line of scraper code. It tells me what a server actually returns: the status, the headers, whether my user agent gets blocked, and what the HTML looks like. This guide covers the commands that matter for scraping, each one run with curl 8.19 in June 2026 so the output is real.

What is cURL and why use it for scraping?

cURL is a command-line tool that sends HTTP requests and prints the response. For scraping it does two jobs: it fetches raw HTML you can pipe into a parser, and it’s the fastest way to debug why a scraper is failing. When a Python request returns a block page, I reproduce it in one cURL line and change headers until it works, then port that back to code.

cURL handles the transport only. It downloads bytes; it doesn’t parse HTML or run JavaScript. So in a real workflow it fetches and a library like BeautifulSoup parses.

How do you fetch a page with cURL?

Run curl followed by the URL to print the page. Add flags to control the request and capture useful information:

curl -s -A "Mozilla/5.0" "https://quotes.toscrape.com/page/1/" \
  -o page1.html -w "code=%{http_code} size=%{size_download} time=%{time_total}s\n"

That fetched the page and reported the result without dumping the HTML to the terminal:

code=200 size=11064 time=0.623561s

The flags doing the work:

How do you inspect response headers?

Use -I to fetch only the headers, which is the quickest way to read status, content type, and caching without downloading the body:

curl -s -I "https://quotes.toscrape.com/"
HTTP/1.1 200 OK
Date: Fri, 12 Jun 2026 11:48:09 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 11064
Connection: keep-alive

This tells me the page is HTML, returns 200, and is about 11 KB before I commit to parsing it. Use -i instead of -I to get headers and body together.

The cURL flags that matter for scraping

These are the flags I use constantly, with the request behavior each one controls:

FlagDoesScraping use
-ASets user agentAvoid default-cURL blocks
-HAdds a headerSet Accept, Referer, cookies
-LFollows redirectsLand on the final page
-o / -OSaves to fileKeep the HTML for parsing
-sSilent modeClean, scriptable output
-IHeaders onlyQuick status and type check
-dSends POST bodyHit search and form endpoints
--compressedRequests gzipMatch what browsers send

Stacking a few of these reproduces a realistic browser request: curl -sL --compressed -A "Mozilla/5.0" -H "Accept-Language: en-US" URL.

How do you POST data with cURL?

Use -d to send a request body, which is how you hit search endpoints and APIs that expect form or JSON input. Posting JSON to a test endpoint:

curl -s -X POST "https://httpbin.org/post" \
  -H "Content-Type: application/json" \
  -d '{"q":"scraping"}'

The endpoint echoed the payload back, confirming the POST went through:

  "data": "{\"q\":\"scraping\"}",
    "q": "scraping"

-d implies a POST, so -X POST is optional here, but I keep it for clarity. For form-encoded data, drop the JSON header and pass -d "q=scraping&page=1".

From cURL to a real scraper

cURL gets you a verified request. The next step is moving it into code so you can parse and loop. Each flag maps cleanly to Python Requests:

import requests

# the equivalent of: curl -A "Mozilla/5.0" -H "Accept-Language: en-US" URL
resp = requests.get(
    "https://quotes.toscrape.com/page/1/",
    headers={"User-Agent": "Mozilla/5.0", "Accept-Language": "en-US"},
)
print(resp.status_code, len(resp.text))

Once that works, hand resp.text to a parser and you have a scraper. When a request that works in cURL starts failing at scale, the cause is usually blocking rather than your command, and that’s the point where a scraper API like ChocoData takes over the fetch. The full tradeoff is in the web scraping guide.

FAQ

Can you scrape a website with cURL?

Yes, cURL fetches the raw HTML of a page, which is the first half of scraping. You then pass that HTML to a parser to extract data. cURL is also the best tool for debugging a scraper: it shows you exactly what the server returns for a given set of headers.

How do you set a user agent in cURL?

Use the -A flag followed by the user-agent string, for example curl -A "Mozilla/5.0" https://example.com. Many sites return different responses or block requests that use cURL's default user agent, so setting a realistic one is often the first fix when a request fails.

How do you convert a cURL command to Python?

Map each flag to a Requests argument: -A and -H become the headers dict, -d becomes data or json, -L maps to allow_redirects (on by default). A curl -A "UA" -H "Accept: application/json" URL becomes requests.get(URL, headers={"User-Agent": "UA", "Accept": "application/json"}).

MR
Marcus Reed
I've built and run web scrapers for the better part of a decade. On this site I put scraper APIs and scraping tools through real jobs against real targets, then write up what actually holds up.