ETL Vivanuncios.com.mx Pipeline
The main purpose of this project is to develop a sqlite database storing data extracted from vivanuncios.com.mx, specificaly Querétaro. The first step is to extract specific data by doing some web-scraping using selenium & scrapy and then, as a second step, append it to a sqlite database by running a pipeline.
Installation
In order to run this spider, it is mandatory to have the following libraries installed:
pip install scrapy
pip install scrapy_selenium
Files
- homes.py: Main python file which contains the code to fetch the data from vivanuncios.com.mx
- pipeline.py: A simple function to execute the ETL process in homes.py
- settings.py: This file contains only settings considered important or commonly used.
To run the spider:
scrapy crawl homes