A simple Scrapy based parser that collects data from Python documentation.
The parser collects information about all PEPs in 2 csv files:
-
the first file contains the numbers of all PEPs, their names and statuses
-
the second one shows the number of PEPs with a particular status, as well as the total number of PEPs
-
Python 3.9.5
-
Scrapy 2.5.1
Clone the repository and go to it using the command line:
git clone
cd scrapy_parser_pep
Create and activate a virtual environment:
Windows:
py -3 -m venv env
. venv/Scripts/activate
py -m pip install --upgrade pip
macOS/Linux:
python3 -m venv .venv
source env/bin/activate
python3 -m pip install --upgrade pip
Install dependencies from a file requirements.txt:
pip install -r requirements.txt
Launch:
In order to launch you just have to use the following command:
scrapy crawl pep - it generates both files at once
MIT