/usr/local/bin/python3 /Users/duoluoluolin/PycharmProjects/untitled/随便写写/动态渲染页面爬取/sele

手机扫码登录下就行发自我的 iPhone <a href="

麻烦请教一下，34行不知道为什么会报错。上网找这报错原因没能找到。 about taobaoproduct HOT 10 CLOSED

python3webspider commented on August 14, 2024

麻烦请教一下，34行不知道为什么会报错。上网找这报错原因没能找到。

from taobaoproduct.

Comments (10)

lllllllai27 commented on August 14, 2024 3

淘宝主要通过window.navigater.webdriver的属性值来识别，与navigator的其他属性无关，所以直接利用execute_script注入js把window.navigater.webdriver设置为false就可以了

我参考网上的资料
https://www.kebook.cn/9329/
加了这一行代码：
browser.execute_script("Object.defineProperties(navigator,{webdriver:{get:() => false}})")
还是不能爬取，执行程序的时候，会跳转到登录页面，然后建议用fiddler代理来替换js，这个就不知道怎么搞了，希望作者可以重新写一下这个代码，学习一下

from taobaoproduct.

Jenkin7 commented on August 14, 2024 1

我也遇到了同样的问题，原因可能是淘宝识别到了selenium的webdrive，然后进行了反爬虫，希望有人可以分享一下如何爬取淘宝商品数据

from taobaoproduct.

Germey commented on August 14, 2024 1

淘宝主要通过window.navigater.webdriver的属性值来识别，与navigator的其他属性无关，所以直接利用execute_script注入js把window.navigater.webdriver设置为false就可以了

from taobaoproduct.

pacluoluo commented on August 14, 2024

from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
from urllib.parse import quote
from pyquery import PyQuery as pq
import pymongo

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
browser = webdriver.Chrome(chrome_options=chrome_options)
wait = WebDriverWait(browser,10)
KEYWORD = 'ipad'

def index_page(page):
"""
抓取索引页
:param page:页码
"""
print('正在爬取第',page,'页')
try:
url = 'https://s.taobao.com/search?q=' + quote(KEYWORD)
browser.get(url)
if page > 1:
input = wait.until(
EC.presence_of_element_located((By.CSS_SELECTOR,'#mainsrp-pager div.form > input ')))
submit = wait.until(
EC.element_to_be_clickable((By.CSS_SELECTOR,'#mainsrp-pager div.form > span.btn.J_Submit')))
input.clear()
input.send_keys(page)
submit.click()
wait.until(
EC.text_to_be_present_in_element((By.CSS_SELECTOR,'#mainsrp-pager li.item.active > span'),str(page)))
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,'.m-itemlist .items .item')))
get_products()
except TimeoutException:
index_page(page)

def get_products():
"""
提取商品数据
"""
html = browser.page_source
doc = pq(html)
items = doc('#mainsrp.itemlist .items .item').items()
for item in items:
product ={
'image':item.find('.pic .img').attr('data-src'),
'price':item.find('price').text(),
'deal':item.find('.deal-cnt').text(),
'title':item.find('.title').text(),
'shop':item.find('.shop').text(),
'location':item.find('location').text()
}
print(product)
save_to_mongo(product)

MONGO_URL ='localhost'
MONGO_DB ='taobao'
MONGO_COLLECTION ='products'
client = pymongo.MongoClient(MONGO_URL)
db = client[MONGO_DB]
def save_to_mongo(result):
"""
保存至MongoDB
:param result:结果
"""
try:
if db[MONGO_COLLECTION].insert(result):
print('存储到MongoDB成功')
except Exception:
print('存储到MongoDB失败')

MAX_PAGE = 2
def main():
"""
遍列每一页
"""
for i in range(1, MAX_PAGE + 1):
index_page(i)

if name == 'main':
main()

源码不知道为什么34，35行老是报上面的错误

from taobaoproduct.

clamyang commented on August 14, 2024

看了代码，查找条件没错。你34行 > 后边不要加换行试以下吧。

from taobaoproduct.

CHN2017 commented on August 14, 2024

it's a terrible problem for new. could author solve this issue?

from taobaoproduct.

wjx1018960145 commented on August 14, 2024

爬取时停在登录页面怎么搞

from taobaoproduct.

Hfywtias commented on August 14, 2024

爬取时停在登录页面怎么搞

加一

from taobaoproduct.

wjx1018960145 commented on August 14, 2024

手机扫码登录下就行发自我的 iPhone

…

在 2018年12月20日，上午10:30，Hfywtias ***@***.***> 写道：爬取时停在登录页面怎么搞加一 — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

from taobaoproduct.

Germey commented on August 14, 2024

https://mp.weixin.qq.com/s/Iz-DY1UrSfVFRFh5CyHl3Q 也可参考

from taobaoproduct.

麻烦请教一下，34行不知道为什么会报错。上网找这报错原因没能找到。 about taobaoproduct HOT 10 CLOSED

Comments (10)

Related Issues (19)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent