-
Create a virtual environment using virtualenv.
virtualenv venv
-
Enter the virtual environment
source venv/bin/activate
-
Install the requirements
pip install -r requirements.txt
-
Duplicate login_credentials_sample.py and name the file login_credentials.py. Enter your log-in information or ask me for mine if you don't have the right permissions to see everything.
-
Run scraper with
python scraper.py
-
You will see HTML files written to a directory called
page_html
after Firefox gets opened and closed to scrape one page at a time.If you want to scrape smaller chunks you can modify the sheet range being used inurl_list.py
.
23koivisto / drupal-migration-1 Goto Github PK
View Code? Open in Web Editor NEWThis project forked from tsboom/drupal-migration
Script to scrape page body HTML from a list of Drupal URLs and IDs from a Google ssheet.