shenxiangzhuang / pythondataanalysis Goto Github PK
View Code? Open in Web Editor NEWThe data and code that used in my book.
The data and code that used in my book.
你好,我想知道这是要抓取什么mydata = soup.select('#display')[0].get_text()?我去网页源代码看了下没有找到id=#display的
这是读者卡若米
反馈的问题, 是我在调整代码的时候出了错。 原因是执行Mymultithread(10)
后urls
列表已经pop空了,所以后面的Myfutures(10)
就没有待下载的网页了,所以时间也就是0了。这里只要改变两者的执行顺序即可,就是先执行Myfutures(10)
,再执行Mymultithread(10)
。
真心为自己的疏忽感到抱歉,同时在此感谢读者们的反馈。
章节:2.3.6 正则表达式入门
书中源码:“
import re
import requests
from fake_useragent import UserAgent
ua = UserAgent()
headers = {'User-Agent': ua.random}
html = requests.get('https://www.baidu.com/', headers=headers)
html.encoding = 'utf-8'
html = html.text
titles = re.findall(r'(\w{2})', html)
print(titles)”
利用此源码获取的网页内容已不能正常提取关键字,print出来的内容变成了像贴吧这样的内容,需要重新编辑正则式。
第五章涉及到发送邮件部分应该使用多线程,避免堵塞
早些使用OpenJDK, 生活会很美好
所以我建议作者在第8页教大家安装的是OpenJDK, 而不是OracleJDK
Oracle Technology Network License Agreement for Oracle Java SE >>
Further, You may not:
一直使用OpenJDK, 从未改变>>
AdoptOpenJDK
Traceback (most recent call last):
File "E:/Python/2019.7.15/19_面向对象.py", line 143, in
Demo.get_fileinfo()
File "E:/Python/2019.7.15/19_面向对象.py", line 21, in get_fileinfo
self.wb = load_workbook(filename=self.filename)
File "C:\Users\admin\AppData\Roaming\Python\Python37\site-packages\openpyxl\reader\excel.py", line 311, in load_workbook
data_only, keep_links)
File "C:\Users\admin\AppData\Roaming\Python\Python37\site-packages\openpyxl\reader\excel.py", line 126, in init
self.archive = _validate_archive(fn)
File "C:\Users\admin\AppData\Roaming\Python\Python37\site-packages\openpyxl\reader\excel.py", line 98, in _validate_archive
============================== FILE INFO ==============================
archive = ZipFile(filename, 'r')
File "D:\Python_Edition\Python37\lib\zipfile.py", line 1204, in init
self.fp = io.open(file, filemode)
FileNotFoundError: [Errno 2] No such file or directory: 'pandas_simple.xlsx'
麻烦看下请求豆瓣之后,安装咋们的新代码,添加邮箱,密码登录就报错(https://user-images.githubusercontent.com/45935304/61353481-84576880-a8a2-11e9-8e85-2771706ea2b6.png)
在2.3.7模拟登陆的,页码是70页。需要提交表单数据登陆。您书里使用的是邮箱账号密码登陆。现在豆瓣网不支持邮箱注册了。我只能通过手机验证码登陆。我试着提交表单数据,但是都登陆不了。网页源码也和书上的变动挺大的。想向您请教一下。
mydata = soup.select('#display')[0].get_text()
IndexError: list index out of range
这一行出现了错误
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.