Giter Site home page Giter Site logo

scrapehero-code / amazon-scraper Goto Github PK

View Code? Open in Web Editor NEW
302.0 8.0 155.0 17 KB

A simple web scraper to extract Product Data and Pricing from Amazon

Home Page: https://www.scrapehero.com/tutorial-how-to-scrape-amazon-product-details-using-python-and-selectorlib/

Python 100.00%
amazon-scraper page-scraper scrape-products web-scraping web-scraping-tutorials web-crawling

amazon-scraper's Issues

Can we use this for wine products?

I am looking for a scraper for Amazon Vine products

[https://www.amazon.it/vine/vine-items?queue=last_chance&size=60](this a example link, but you must be vine)

Empty results

Running the script without changes gives me an output.jsonl that looks like this:

{"name": null, "price": null, "short_description": null, "images": null, "rating": null, "number_of_reviews": null, "variants": null, "product_description": null, "sales_rank": null, "link_to_all_reviews": null}
{"name": null, "price": null, "short_description": null, "images": null, "rating": null, "number_of_reviews": null, "variants": null, "product_description": null, "sales_rank": null, "link_to_all_reviews": null}
{"name": null, "price": null, "short_description": null, "images": null, "rating": null, "number_of_reviews": null, "variants": null, "product_description": null, "sales_rank": null, "link_to_all_reviews": null}

it works but not always

Hi,
I'm trying to scrape the PS5 page on Amazon.it:

https://www.amazon.it/dp/B08KKJ37F7

the most of times I do not get the data, sometimes I get it.
I scheduled the script to run every 10 seconds, herebelow you can see six executions. Only one get the data properly.
Which could be the reason? Maybe it is a timeout issue?

{"name": null, "price": null, "short_description": null, "images": null, "rating": null, "number_of_reviews": null, "variants": null, "product_description": null, "sales_rank": null, "link_to_all_reviews": null}

{"name": null, "price": null, "short_description": null, "images": null, "rating": null, "number_of_reviews": null, "variants": null, "product_description": null, "sales_rank": null, "link_to_all_reviews": null}

{"name": "Sony PlayStation 5", "price": null, "short_description": "Prova un caricamento ultra rapido con un'unit\u00e0 SSD ad altissima velocit\u00e0, un coinvolgimento ancora maggiore grazie al supporto per il feedback aptico, ai grilletti adattivi e all'audio 3D e scopri una nuova generazione di incredibili giochi PlayStation Lasciati stupire dalla grafica incredibile e prova le nuove funzionalit\u00e0 di PS5. Scopri un'esperienza di gioco pi\u00f9 profonda con supporto per feedback tattile, trigger adattivi e tecnologia audio 3D Ray Tracing - Immergiti in mondi che offrono un livello di realismo senza precedenti, con ogni raggio di luce simulato individualmente, creando effetti di ombre e riflessi ultra realistici sui giochi PS5 compatibili. Fino a 120 FPS con uscita a 120 Hz - Goditi un gameplay fluido con frame rate elevato fino a 120 FPS per giochi compatibili, con supporto per l'uscita a 120 Hz su display 4K Tecnologia HDR - Su una TV HDR, i giochi PS5 compatibili mostrano una gamma di colori vivaci e realistici Uscita 8K - Le console PS5 supportano l'uscita 8K, permettendoti di giocare sul tuo display 4320p Feedback tattile - Prova il feedback tattile tramite il controller wireless DualSense mentre giochi a determinati giochi per PS5 e senti gli effetti e l'impatto delle tue azioni di gioco attraverso il feedback sensoriale dinamico. Trigger adattivi: fai i conti con i trigger adattivi coinvolgenti e i loro livelli di resistenza dinamica che simulano l'impatto fisico delle tue azioni di gioco in alcuni titoli PS5 . Descrizione completa", "images": "{"https://images-na.ssl-images-amazon.com/images/I/71PMC4DWWFL._AC_SX466_.jpg\":[350,466],\"https://images-na.ssl-images-amazon.com/images/I/71PMC4DWWFL._AC_SX679_.jpg\":[511,679],\"https://images-na.ssl-images-amazon.com/images/I/71PMC4DWWFL._AC_SX342_.jpg\":[257,342],\"https://images-na.ssl-images-amazon.com/images/I/71PMC4DWWFL._AC_SX522_.jpg\":[393,522],\"https://images-na.ssl-images-amazon.com/images/I/71PMC4DWWFL._AC_SX425_.jpg\":[320,425],\"https://images-na.ssl-images-amazon.com/images/I/71PMC4DWWFL._AC_SX385_.jpg\":[290,385],\"https://images-na.ssl-images-amazon.com/images/I/71PMC4DWWFL._AC_SX569_.jpg\":[428,569]}", "rating": null, "number_of_reviews": null, "variants": null, "product_description": null, "sales_rank": null, "link_to_all_reviews": "/Sony-PlayStation-5/product-reviews/B08KKJ37F7?reviewerType=all_reviews"}

{"name": null, "price": null, "short_description": null, "images": null, "rating": null, "number_of_reviews": null, "variants": null, "product_description": null, "sales_rank": null, "link_to_all_reviews": null}

{"name": null, "price": null, "short_description": null, "images": null, "rating": null, "number_of_reviews": null, "variants": null, "product_description": null, "sales_rank": null, "link_to_all_reviews": null}

{"name": null, "price": null, "short_description": null, "images": null, "rating": null, "number_of_reviews": null, "variants": null, "product_description": null, "sales_rank": null, "link_to_all_reviews": null}

empty search results

you can try to make it know wich type of page this is bcz amazon shows multiple like pages that dont have same html that show results u can try to make a scraper for each results and some thing that checks wich page this is

Page blocked by Amazon

I tried running the code and I got the following error:

Downloading https://www.amazon.com/s?k=laptops
Page https://www.amazon.com/s?k=laptops was blocked by Amazon. Please try using better proxies

Could you look into this issue?

empty results

Hi,
The code works well when i give search page URL, but it does not work with the URL https://www.amazon.com/gp/goldbox

Can you look into it?

Thanks.

Does not Work with Amazon.de / Amazon Europe

If we use the scraper for every other Amazon Marketplace, like Amazon.de we get the following error:

Downloading https://www.amazon.de/s?k=laptop
Traceback (most recent call last):
  File "searchresults.py", line 43, in <module>
    for product in data['products']:
TypeError: 'NoneType' object is not iterable

When we use the SelectorLib Google Chrome Plugin, the markups are okay. We can see the prices, title, ratings e.g.

Question: why is this URL not returning anything?

Firstly, I just wanted to say, great work and great article!

So I am making a web scraper based on yours, that is going to search amazon for a list of DVDs or Blu-rays based on title alone (hopefully with little error), and return the price. I have the test URL https://www.amazon.com/Sean-Connery/dp/B011MHCHTQ/, but it is returning null in every field. Do you know why this is?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.