Python Phones and Logos Scrapper Test

Here, in this repository, you can find my solution to the Python challenge of the recruitment process for CIAL dun&bradstreet.

Using the Scrapy framework, the contactinfo spider can fetch phone numbers through a regex pattern and image logos urls through an attributes analysis.

Technical infos

Python version: 3.8.0
Main framework: Scrapy

Running

You can execute the script using both Docker or terminal.

Terminal

After cloning the repo and installing the correct Python version, you need to activate venv, Python's virtual environment, inside /pythonscrapptertest folder.

Then, still on this folder, execute the following command so Python can install all needed dependencies.

pip install -r requirements.txt

Pip will install all libraries that are used in this project, in the specific versions pointed in the requirements.txt file.

Finally, for the spider to run, you must go inside the scrapy project folder and run the main file (which is run.py), using a txt file with the websites to be scrapped. Note that there's already an example file, sites_example.txt::

Terminal:

cd contactscrapper/
cat ../sites_example.txt | python run.py

After that, you can watch the contactinfo spider execution logs, along with scrapped phone numbers and URL image logos.

Docker

After cloning the repo and installing Docker on your computer, go to /pythonscrappertest, project's root folder, where there is a Dockerfile. There, run the following command with your terminal:

Terminal:

sudo docker build -t pythonscrappertest .

Once the Docker image is ready, you can run it, using a txt file with the websites to be scrapped. Note that there's already an example file, sites_example.txt:

Terminal:

cat sites_example.txt | docker run -i pythonscrappertest

You can customize the sites file. Just remember to put one URL for each line.

After that, you can watch the contactinfo spider execution logs, along with scrapped phone numbers and URL image logos.

Contact

You can contact me through:

joaopedroguimaraes / pythonscrappertest Goto Github PK

pythonscrappertest's Introduction

Python Phones and Logos Scrapper Test

Technical infos

Running

Terminal

Docker

Contact

pythonscrappertest's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent