Member-only story

Get started web scraping with Scrapy and Python

First up install Pip

Connor Leech
1 min readMar 24, 2022

Pip is a package manager for Python. There are a couple strange gotchas, the first is a highly active stackoverflow question that the install script for pip on Mac doesn’t quite work, instead you need to run:

$ python -m ensurepip --upgrade --user

With that you should have the “pip3” command available to you. Why it isn’t just “pip” has to do with a bunch of drama in the Python community I think but w/e we’ll just go with it.

$ pip3 --version
pip 21.3.1 from /usr/local/lib/python3.9/site-packages/pip (python 3.9)

Install Scrapy

Now that we have pip we can use it to install Scrapy:

$ pip3 install Scrapy

Then we see:

$ scrapy --version
Scrapy 2.6.1 - no active project
Usage:
scrapy <command> [options] [args]
Available commands:
bench Run quick benchmark test
commands
fetch Fetch a URL using the Scrapy downloader
genspider Generate new spider using pre-defined templates
runspider Run a self-contained spider (without creating a project)
settings Get settings values
shell Interactive scraping console
startproject Create new project
version Print Scrapy version
view Open URL in browser, as seen by Scrapy
[ more ] More commands available when run from project directoryUse "scrapy <command> -h" to see more info about a command

Create a project

Now that we’ve got Scrapy installed we can follow the getting started instructions to make a new project.

This will create a project called “dataharvester”:

$ scrapy startproject dataharvester

--

--

Connor Leech
Connor Leech

Written by Connor Leech

Girl Dad x 2. Cofounder @Employbl. Software Engineer @CommentSold.

No responses yet