git clone https://github.com/KayvanShah1/thomasnet-scraper.git
Enter into the repository's root directory
cd thomasnet-scraper
Create a virtual environment and install the dependencies
python -m venv ENV
source ENV/bin/activate
pip install -r requirements.txt
- Link to the source
- Find the heading for the product (to be passed as an argument below to run the script)
- Locate the heading in the URL after searching for the interested product
https://www.thomasnet.com/nsearch.html?cov=NA&heading=21650809&searchsource=suppliers&searchterm=Hydraulic+Cylinders&searchx=true&what=Hydraulic+Cylinders&which=prod
- Here the heading is
21650809
/thomasnet-scraper> py src/main.py -h
usage: Thomasnet Data Scraper [-h] -k KEYWORD -hd HEADING [-f]
Scrape Suppliers Data from Thomas website
optional arguments:
-h, --help show this help message and exit
-k KEYWORD, --keyword KEYWORD
Product Name to search
-hd HEADING, --heading HEADING
Heading for the product from website
-f, --fast Fast Scraping
py src/main.py -k "hydraulic cylinder" -hd 21650809 -f
- Find the data exported in CSV files in data folder
- The data of your interest is {abc}_clean_data.csv
- Create an issue here
- For more details about the scraping process read the WIKI documentation
Kayvan Shah - Data Engineer | NLP | ML/DL Enthusiast