Member-only story
Get started web scraping with Scrapy and Python
First up install Pip
Pip is a package manager for Python. There are a couple strange gotchas, the first is a highly active stackoverflow question that the install script for pip on Mac doesn’t quite work, instead you need to run:
$ python -m ensurepip --upgrade --user
With that you should have the “pip3” command available to you. Why it isn’t just “pip” has to do with a bunch of drama in the Python community I think but w/e we’ll just go with it.
$ pip3 --version
pip 21.3.1 from /usr/local/lib/python3.9/site-packages/pip (python 3.9)
Install Scrapy
Now that we have pip we can use it to install Scrapy:
$ pip3 install Scrapy
Then we see:
$ scrapy --version
Scrapy 2.6.1 - no active projectUsage:
scrapy <command> [options] [args]Available commands:
bench Run quick benchmark test
commands
fetch Fetch a URL using the Scrapy downloader
genspider Generate new spider using pre-defined templates
runspider Run a self-contained spider (without creating a project)
settings Get settings values
shell Interactive scraping console
startproject Create new project
version Print Scrapy version
view Open URL in browser, as seen by Scrapy[ more ] More commands available when run from project directoryUse "scrapy <command> -h" to see more info about a command
Create a project
Now that we’ve got Scrapy installed we can follow the getting started instructions to make a new project.
This will create a project called “dataharvester”:
$ scrapy startproject dataharvester