Create a solution that crawls for articles from a news website BBC.com, cleanses the response, stores in a mongo database.
- Scrapy framework based crawler which traverses page links recursively and uses css response to fetch article details and text, then stores to external MongoDB server
- Download (or clone) the repo to your computer and unzip it.
- Ensure Python 3.x is installed:
python -V
- Install required libraries:
pip install -r REQUIREMENTS.txt
- You should have Python 3 installed in your computer.
Use the package manager pip to install these required packages :
pip install pymongo
Please make sure you have good internet connection (to avoid speed issues).
- Run your terminal.
- Navigate (change directory) to the BBC_News_article_web_scraper/NewsApp/ folder.
- Type the command :
scrapy crawl bbc