Web Scraping With Scrapy: A Complete Beginner Tutorial

by Marcus Reed · Updated June 12, 2026 · code tested June 2026 · 9 min read

Marcus Reed

Founder & lead tester · about the author

the short version

Scrapy is a Python framework for scraping at scale. It handles requests, concurrency, retries, and data export so you write only the extraction logic.
A spider defines where to start, how to parse a page, and how to follow links. The whole thing fits in about 15 lines.
I ran the spider below against quotes.toscrape.com and it scraped all 100 quotes across 10 pages into JSON.
Use Scrapy when you're crawling many pages. For a quick one-page pull, BeautifulSoup is less setup.

Scrapy is what you graduate to when a one-file script stops being enough. It runs requests concurrently, retries failures, respects delays, and exports your data, all from a small spider class you write. This tutorial builds a working spider from scratch and runs it against quotes.toscrape.com, confirmed in June 2026.

What is Scrapy?

Scrapy is a Python framework for crawling and scraping websites at scale. Where BeautifulSoup parses one page you fetched, Scrapy manages the whole operation: it queues requests, runs many in parallel, follows links, retries on errors, and writes the results to a file. You supply the extraction logic and the crawl rules; the framework runs the machine around them.

That structure is worth setup cost when you’re crawling many pages and overkill when you’re grabbing one. Here’s how the two compare:

	BeautifulSoup	Scrapy
Type	Parsing library	Full framework
Best for	One page, quick scripts	Crawling many pages
Concurrency	You add it	Built in
Retries and delays	You add them	Built in
Data export	You write it	Built in (`-o`)
Setup	Minimal	A spider class

How do you install Scrapy?

Install Scrapy with pip, ideally inside a virtual environment because it pulls in several dependencies:

pip install scrapy

Confirm it installed by checking the version:

scrapy version

Scrapy works on Windows, macOS, and Linux. On Windows it pulls in a Twisted networking dependency, which pip handles automatically on current versions.

How do you write a Scrapy spider?

A spider is a Python class that defines a start URL, a parse method to extract data, and rules for following links. Save this as quotes_spider.py:

import scrapy

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = ["https://quotes.toscrape.com/"]
    custom_settings = {"USER_AGENT": "my-project/1.0", "DOWNLOAD_DELAY": 1}

    def parse(self, response):
        for quote in response.css("div.quote"):
            yield {
                "text": quote.css("span.text::text").get(),
                "author": quote.css("small.author::text").get(),
                "tags": quote.css("div.tags a.tag::text").getall(),
            }
        next_page = response.css("li.next a::attr(href)").get()
        if next_page:
            yield response.follow(next_page, callback=self.parse)

What each piece does:

name identifies the spider when you run it.
start_urls is where the crawl begins.
parse runs on every downloaded page. yield-ing a dict emits a scraped item.
response.css("span.text::text").get() reads text; ::attr(href) reads an attribute; .getall() returns a list (used for the multiple tags).
response.follow(next_page, ...) queues the next page through the same parse method, which is how pagination works in Scrapy.
DOWNLOAD_DELAY: 1 waits a second between requests so you’re polite by default.

How do you run the spider and export data?

Run a single-file spider with scrapy runspider and use -o to export the scraped items. The file format follows the extension:

scrapy runspider quotes_spider.py -o quotes.json

When I ran it, Scrapy crawled all ten pages and wrote 100 items. Reading the output back confirmed it:

scraped items: 100
first author: Albert Einstein
first tags: ['change', 'deep-thoughts', 'thinking', 'world']

Swap the extension to export differently: -o quotes.csv for CSV, or -o quotes.jsonl for line-delimited JSON that’s better for large crawls. Scrapy appends by default, so delete the file or use -O (capital) to overwrite between runs.

Scrapy selectors: css and xpath

Scrapy’s response object supports both CSS and XPath, the same split covered in the selectors guide. The only Scrapy-specific part is the ::text and ::attr() pseudo-elements:

Goal	Scrapy CSS	Scrapy XPath
Element text	`.css("h1::text").get()`	`.xpath("//h1/text()").get()`
Attribute	`.css("a::attr(href)").get()`	`.xpath("//a/@href").get()`
All matches	`.css("a.tag::text").getall()`	`.xpath("//a/text()").getall()`

.get() returns the first match or None, and .getall() returns a list, which keeps your parse code from crashing on missing fields.

When to use Scrapy

Use Scrapy when the job is a crawl: many pages, many links to follow, and a need for speed, retries, and clean exports. For a single page or a quick experiment, BeautifulSoup is less ceremony. And for either tool, a site that blocks you is a separate problem from parsing, one I cover in the web scraping guide, where a scraper API like ChocoData handles the fetching so your spider keeps running.

FAQ

Is Scrapy better than BeautifulSoup?

They solve different problems. BeautifulSoup is a parsing library for small scripts. Scrapy is a full framework with crawling, concurrency, retries, and export pipelines built in. For a single page, BeautifulSoup is faster to write. For crawling thousands of pages, Scrapy's structure and speed win.

How do I run a Scrapy spider?

For a single-file spider, use scrapy runspider spider.py -o output.json. Inside a full Scrapy project, use scrapy crawl spidername. The -o flag exports scraped items to a file, with the format inferred from the extension (.json, .csv, or .jsonl).

Does Scrapy handle JavaScript?

Not by itself. Scrapy fetches HTML over HTTP and won't run page JavaScript. For JS-rendered sites, add scrapy-playwright to render pages with a real browser, or use a scraper API that returns rendered HTML to your spider.

Marcus Reed

I've built and run web scrapers for the better part of a decade. On this site I put scraper APIs and scraping tools through real jobs against real targets, then write up what actually holds up.

How I test · Methodology