cpatrickalves / scraping-ebay Goto Github PK
View Code? Open in Web Editor NEWScraping Ebay's products using Scrapy Web Crawling Framework
License: MIT License
Scraping Ebay's products using Scrapy Web Crawling Framework
License: MIT License
Hello,
Great app but i couldnt add the auction completion line but it doesn’t work please correct me
endedDate = product.xpath('.//*[@Class="s-item__time-end"]/text()').extract_first()
"EndedDate":endedDate,
Thanks!
The files in scrapers/
contain almost the same code.
This means if features are added to one (like adding seller information) they aren't added to the other.
It would be good to pull similar code out of them and reuse it by reference, or pull out and parameterize the differences between them.
Any way to show also Shipping price, Import charges, Country/Region?
Hi,
It seems when I run this excellent tool, I get values of "0", "Wat", and "Pre" for "Stars", and "Ratings" is always 0.
(additionally it would be nice if seller information were returned: number of items sold or % positive reviews)
Hello nice work! I tried to test it with several inputs but there are problems output is not working
scrapy crawl ebay -o products.json -a search="Samsung galaxy s7"
I take the result
2019-04-18 08:37:48 [scrapy.utils.log] INFO: Scrapy 1.6.0 started (bot: scraping_ebay) 2019-04-18 08:37:48 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.20.0, Twisted 19.2.0, Python 3.4.8 (default, Feb 5 2018, 11:23:17) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)], pyOpenSSL 19.0.0 (OpenSSL 1.1.0h 27 Mar 2018), cryptography 2.3, Platform Linux-3.10.0-957.10.1.el7.x86_64-x86_64-with-centos-7.6.1810-Core 2019-04-18 08:37:48 [scrapy.crawler] INFO: Overridden settings: {'ROBOTSTXT_OBEY': True, 'BOT_NAME': 'scraping_ebay', 'NEWSPIDER_MODULE': 'scraping_ebay.spiders', 'SPIDER_MODULES': ['scraping_ebay.spiders'], 'FEED_FORMAT': 'json', 'FEED_URI': 'products.json'} 2019-04-18 08:37:48 [scrapy.extensions.telnet] INFO: Telnet Password: 4fde495aabaaad3c 2019-04-18 08:37:48 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.memusage.MemoryUsage', 'scrapy.extensions.feedexport.FeedExporter', 'scrapy.extensions.logstats.LogStats'] 2019-04-18 08:37:48 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2019-04-18 08:37:48 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2019-04-18 08:37:48 [scrapy.middleware] INFO: Enabled item pipelines: [] 2019-04-18 08:37:48 [scrapy.core.engine] INFO: Spider opened 2019-04-18 08:37:48 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2019-04-18 08:37:48 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2019-04-18 08:37:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/robots.txt> (referer: None) 2019-04-18 08:37:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com> (referer: None) 2019-04-18 08:37:51 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.ebay.com/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=Samsung+galaxy+s7&_ipg=200> from <GET http://www.ebay.com/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=Samsung+galaxy+s7&_ipg=200> 2019-04-18 08:37:53 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=Samsung+galaxy+s7&_ipg=200> (referer: None) 2019-04-18 08:37:55 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=2> (referer: https://www.ebay.com/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=Samsung+galaxy+s7&_ipg=200) 2019-04-18 08:37:57 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=3> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=2) 2019-04-18 08:37:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=4> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=3) 2019-04-18 08:38:01 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=5> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=4) 2019-04-18 08:38:03 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=6> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=5) 2019-04-18 08:38:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=7> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=6) 2019-04-18 08:38:07 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=8> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=7) 2019-04-18 08:38:09 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=9> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=8) 2019-04-18 08:38:11 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=10> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=9) 2019-04-18 08:38:13 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=11> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=10) 2019-04-18 08:38:15 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=12> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=11) 2019-04-18 08:38:17 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=13> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=12) 2019-04-18 08:38:20 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=14> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=13) 2019-04-18 08:38:21 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=15> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=14) 2019-04-18 08:38:24 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=16> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=15) 2019-04-18 08:38:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=17> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=16) 2019-04-18 08:38:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=18> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=17) 2019-04-18 08:38:30 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=19> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=18) 2019-04-18 08:38:32 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=20> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=19) 2019-04-18 08:38:34 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=21> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=20) 2019-04-18 08:38:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=22> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=21) 2019-04-18 08:38:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=23> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=22) 2019-04-18 08:38:42 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=24> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=23) 2019-04-18 08:38:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=25> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=24) 2019-04-18 08:38:46 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=26> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=25) 2019-04-18 08:38:48 [scrapy.extensions.logstats] INFO: Crawled 28 pages (at 28 pages/min), scraped 0 items (at 0 items/min) 2019-04-18 08:38:48 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=27> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=26) 2019-04-18 08:38:50 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=28> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=27) 2019-04-18 08:38:53 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=29> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=28) 2019-04-18 08:38:55 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=30> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=29) 2019-04-18 08:38:58 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=31> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=30) 2019-04-18 08:39:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=32> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=31) 2019-04-18 08:39:02 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=33> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=32) 2019-04-18 08:39:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=34> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=33) 2019-04-18 08:39:08 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=35> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=34) 2019-04-18 08:39:11 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=36> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=35) 2019-04-18 08:39:13 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=37> (referer: https://www.ebay.com/sch/i.html?_from=R40&_nkw=Samsung+galaxy+s7&_ipg=200&_pgn=36) 2019-04-18 08:39:13 [ebay] DEBUG: eBay products collected successfully !!! 2019-04-18 08:39:13 [scrapy.core.engine] INFO: Closing spider (finished) 2019-04-18 08:39:13 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 29140, 'downloader/request_count': 40, 'downloader/request_method_count/GET': 40, 'downloader/response_bytes': 3222681, 'downloader/response_count': 40, 'downloader/response_status_count/200': 39, 'downloader/response_status_count/301': 1, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2019, 4, 18, 12, 39, 13, 928618), 'log_count/DEBUG': 41, 'log_count/INFO': 10, 'memusage/max': 164339712, 'memusage/startup': 47673344, 'request_depth_max': 37, 'response_received_count': 39, 'robotstxt/request_count': 1, 'robotstxt/response_count': 1, 'robotstxt/response_status_count/200': 1, 'scheduler/dequeued': 39, 'scheduler/dequeued/memory': 39, 'scheduler/enqueued': 39, 'scheduler/enqueued/memory': 39, 'start_time': datetime.datetime(2019, 4, 18, 12, 37, 48, 939034)} 2019-04-18 08:39:13 [scrapy.core.engine] INFO: Spider closed (finished)
products.json is empty!!!
thank you for the app! this helps me alot!
By the way, I see the datetime request in the ebay.py file inside the spider folder. however, it seems the data isn't fetched by the .extract_first() command, is there any way to fix it?
Hello, what a nice comprehensive Scraper. But unfortunately when I use it just the convenient way, I get an empty CSV/JSON-File in return. The script runs just fine on Debian and Mac OS but without a result like example in the data-Folder.
Greets
Hey,
great work. I've created a new spider for ebay Germany (ebay.de) and I don't get any results.
Here are the changes I've made for the new spider compared to the original one for ebay.com:
name = "ebay_de" allowed_domains = ["ebay.de"] start_urls = ["https://www.ebay.de"] ... yield scrapy.Request("http://www.ebay.de/sch/i.html?_from=R40&_trksid=" + trksid + "&_nkw=" + self.search_string.replace(' ','+') + "&_ipg=200", callback=self.parse_link)
Input scrapy crawl ebay_de -o products_de.csv -a search="MacBook Pro 13 2016"
Output
2019-11-23 22:50:17 [scrapy.utils.log] INFO: Scrapy 1.8.0 started (bot: scraping_ebay) 2019-11-23 22:50:17 [scrapy.utils.log] INFO: Versions: lxml 4.4.1.0, libxml2 2.9.9, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 19.10.0, Python 2.7.17rc1 (default, Oct 10 2019, 10:26:01) - [GCC 9.2.1 20191008], pyOpenSSL 19.1.0 (OpenSSL 1.1.1c 28 May 2019), cryptography 2.6.1, Platform Linux-5.3.0-23-generic-x86_64-with-Ubuntu-19.10-eoan 2019-11-23 22:50:17 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'scraping_ebay.spiders', 'FEED_FORMAT': 'csv', 'SPIDER_MODULES': ['scraping_ebay.spiders'], 'FEED_URI': 'products_de.csv', 'BOT_NAME': 'scraping_ebay'} 2019-11-23 22:50:17 [scrapy.extensions.telnet] INFO: Telnet Password: 383b88df45692b23 2019-11-23 22:50:17 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.feedexport.FeedExporter', 'scrapy.extensions.memusage.MemoryUsage', 'scrapy.extensions.logstats.LogStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.corestats.CoreStats'] 2019-11-23 22:50:17 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2019-11-23 22:50:17 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2019-11-23 22:50:17 [scrapy.middleware] INFO: Enabled item pipelines: [] 2019-11-23 22:50:17 [scrapy.core.engine] INFO: Spider opened 2019-11-23 22:50:17 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2019-11-23 22:50:17 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2019-11-23 22:50:18 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.de> (referer: None) 2019-11-23 22:50:18 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.ebay.de/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=MacBook+Pro+13+2016&_ipg=200> from <GET http://www.ebay.de/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=MacBook+Pro+13+2016&_ipg=200> 2019-11-23 22:50:20 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.de/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=MacBook+Pro+13+2016&_ipg=200> (referer: None) 2019-11-23 22:50:20 [ebay_de] DEBUG: eBay products collected successfully !!! 2019-11-23 22:50:20 [scrapy.core.engine] INFO: Closing spider (finished) 2019-11-23 22:50:20 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 1327, 'downloader/request_count': 3, 'downloader/request_method_count/GET': 3, 'downloader/response_bytes': 108803, 'downloader/response_count': 3, 'downloader/response_status_count/200': 2, 'downloader/response_status_count/301': 1, 'elapsed_time_seconds': 2.400946, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2019, 11, 23, 21, 50, 20, 273904), 'log_count/DEBUG': 4, 'log_count/INFO': 10, 'memusage/max': 54169600, 'memusage/startup': 54169600, 'request_depth_max': 1, 'response_received_count': 2, 'scheduler/dequeued': 3, 'scheduler/dequeued/memory': 3, 'scheduler/enqueued': 3, 'scheduler/enqueued/memory': 3, 'start_time': datetime.datetime(2019, 11, 23, 21, 50, 17, 872958)} 2019-11-23 22:50:20 [scrapy.core.engine] INFO: Spider closed (finished)
products_de.csv is empty
Thanks!
Hi @cpatrickalves ,
How to add more information like ending time of bid, determine the item is bidding or not, shipping fee? thanks.
System: macOS 10.12.6, Anaconda and pip with required packages, Python 3.7
I have tried both the normal scrapy crawl ebay -o products.csv
and the serch string scrapy crawl ebay -o products.csv -a search="Xbox one X"
executions.
They both give me the following message and a 0 Kb csv file:
(base) COMPUTER:scraping-ebay-master USERNAME$ scrapy crawl ebay -o products.csv 2019-09-12 00:35:58 [scrapy.utils.log] INFO: Scrapy 1.7.3 started (bot: scraping_ebay) 2019-09-12 00:35:58 [scrapy.utils.log] INFO: Versions: lxml 4.3.2.0, libxml2 2.9.9, cssselect 1.1.0, parsel 1.5.1, w3lib 1.20.0, Twisted 19.7.0, Python 3.7.3 (default, Mar 27 2019, 16:54:48) - [Clang 4.0.1 (tags/RELEASE_401/final)], pyOpenSSL 19.0.0 (OpenSSL 1.1.1b 26 Feb 2019), cryptography 2.6.1, Platform Darwin-16.7.0-x86_64-i386-64bit 2019-09-12 00:35:58 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'scraping_ebay', 'FEED_FORMAT': 'csv', 'FEED_URI': 'products.csv', 'NEWSPIDER_MODULE': 'scraping_ebay.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['scraping_ebay.spiders']} 2019-09-12 00:35:58 [scrapy.extensions.telnet] INFO: Telnet Password: be0a708b17f43b19 2019-09-12 00:35:58 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.memusage.MemoryUsage', 'scrapy.extensions.feedexport.FeedExporter', 'scrapy.extensions.logstats.LogStats'] 2019-09-12 00:35:58 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2019-09-12 00:35:58 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2019-09-12 00:35:58 [scrapy.middleware] INFO: Enabled item pipelines: [] 2019-09-12 00:35:58 [scrapy.core.engine] INFO: Spider opened 2019-09-12 00:35:58 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2019-09-12 00:35:58 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2019-09-12 00:35:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com/robots.txt> (referer: None) 2019-09-12 00:35:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.ebay.com> (referer: None) 2019-09-12 00:35:59 [scrapy.downloadermiddlewares.robotstxt] DEBUG: Forbidden by robots.txt: <GET http://www.ebay.com/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=nintendo+switch+console&_ipg=200> 2019-09-12 00:35:59 [scrapy.core.engine] INFO: Closing spider (finished) 2019-09-12 00:35:59 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/exception_count': 1, 'downloader/exception_type_count/scrapy.exceptions.IgnoreRequest': 1, 'downloader/request_bytes': 663, 'downloader/request_count': 2, 'downloader/request_method_count/GET': 2, 'downloader/response_bytes': 45542, 'downloader/response_count': 2, 'downloader/response_status_count/200': 2, 'elapsed_time_seconds': 1.179395, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2019, 9, 11, 22, 35, 59, 912975), 'log_count/DEBUG': 3, 'log_count/INFO': 10, 'memusage/max': 50532352, 'memusage/startup': 50532352, 'request_depth_max': 1, 'response_received_count': 2, 'robotstxt/forbidden': 1, 'robotstxt/request_count': 1, 'robotstxt/response_count': 1, 'robotstxt/response_status_count/200': 1, 'scheduler/dequeued': 2, 'scheduler/dequeued/memory': 2, 'scheduler/enqueued': 2, 'scheduler/enqueued/memory': 2, 'start_time': datetime.datetime(2019, 9, 11, 22, 35, 58, 733580)} 2019-09-12 00:35:59 [scrapy.core.engine] INFO: Spider closed (finished)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.