shenxiangzhuang / pythondataanalysis Goto Github PK

View Code? Open in Web Editor NEW

67.0 8.0 46.0 8.73 MB

The data and code that used in my book.

Python 33.09% Jupyter Notebook 66.91%

python3 data-science webcrawler

pythondataanalysis's Issues

有关代码的问题

你好，我想知道这是要抓取什么mydata = soup.select('#display')[0].get_text()？我去网页源代码看了下没有找到id=#display的

爬虫多线程Myfutures执行时间为0s

这是读者卡若米反馈的问题，是我在调整代码的时候出了错。原因是执行Mymultithread(10)后urls列表已经pop空了，所以后面的Myfutures(10)就没有待下载的网页了，所以时间也就是0了。这里只要改变两者的执行顺序即可，就是先执行Myfutures(10),再执行Mymultithread(10)。

真心为自己的疏忽感到抱歉，同时在此感谢读者们的反馈。

利用正则表达式提取百度首页关键词

章节：2.3.6 正则表达式入门
书中源码：“
import re
import requests
from fake_useragent import UserAgent

ua = UserAgent()
headers = {'User-Agent': ua.random}

headers = {}

html = requests.get('https://www.baidu.com/', headers=headers)
html.encoding = 'utf-8'
html = html.text

print(html)

titles = re.findall(r'(\w{2})', html)
print(titles)”
利用此源码获取的网页内容已不能正常提取关键字，print出来的内容变成了像贴吧这样的内容，需要重新编辑正则式。

早些使用OpenJDK, 生活会很美好

早些使用OpenJDK, 生活会很美好
所以我建议作者在第8页教大家安装的是OpenJDK, 而不是OracleJDK
Oracle Technology Network License Agreement for Oracle Java SE >>
Further, You may not:

use the Programs for any data processing or any commercial, production, or internal business purposes other than developing, testing, prototyping, and demonstrating your Application;
remove or modify any Program markings or any notice of Oracle’s or a licensor’s proprietary rights;
make the Programs available in any manner to any third party (other than Contractors acting on Your behalf as set forth in this Agreement);
assign this Agreement or distribute, give, or transfer the Programs or an interest in them to any third party, except as expressly permitted in this Agreement for Contractors (the foregoing shall not be construed to limit the rights You may otherwise have with respect to Separately Licensed Third Party Technology);
cause or permit reverse engineering (unless required by law for interoperability), disassembly or decompilation of the Programs; and
create, modify, or change the behavior of, classes, interfaces, or subpackages that are in any way identified as "java", "javax", "sun", “oracle” or similar convention as specified by Oracle in any naming convention designation.

一直使用OpenJDK, 从未改变>>
AdoptOpenJDK

书籍第五章，定制新闻服务的爬虫 url = http://tech.baidu.com/失效

MxlsxClass.py文件执行报错，找不到pandas_simple.xlsx文件

Traceback (most recent call last):
File "E:/Python/2019.7.15/19_面向对象.py", line 143, in
Demo.get_fileinfo()
File "E:/Python/2019.7.15/19_面向对象.py", line 21, in get_fileinfo
self.wb = load_workbook(filename=self.filename)
File "C:\Users\admin\AppData\Roaming\Python\Python37\site-packages\openpyxl\reader\excel.py", line 311, in load_workbook
data_only, keep_links)
File "C:\Users\admin\AppData\Roaming\Python\Python37\site-packages\openpyxl\reader\excel.py", line 126, in init
self.archive = _validate_archive(fn)
File "C:\Users\admin\AppData\Roaming\Python\Python37\site-packages\openpyxl\reader\excel.py", line 98, in _validate_archive
============================== FILE INFO ==============================
archive = ZipFile(filename, 'r')
File "D:\Python_Edition\Python37\lib\zipfile.py", line 1204, in init
self.fp = io.open(file, filemode)
FileNotFoundError: [Errno 2] No such file or directory: 'pandas_simple.xlsx'

mydata = soup.select('#display')[0].get_text()

IndexError: list index out of range
这一行出现了错误

shenxiangzhuang / pythondataanalysis Goto Github PK

pythondataanalysis's Issues

headers = {}

print(html)

Recommend Projects

Recommend Topics

Recommend Org