remitchell / python-scraping Goto Github PK

Code samples from the book Web Scraping with Python http://shop.oreilly.com/product/0636920034391.do

Python 0.69% Jupyter Notebook 92.30% JavaScript 2.54% Roff 4.46% HTML 0.01%

python-scraping's Introduction

Web Scraping with Python Code Samples

These code samples are for the book Web Scraping with Python 2nd Edition

If you're looking for the first edition code files, they can be found in the v1 directory.

Most code for the second edition is contained in Jupyter notebooks. Although these files can be viewed directly in your browser in Github, some formatting changes and oddities may occur. I recommend that you clone the repository, install Jupyter, and view them locally for the best experience.

The web changes, libraries update, and make mistakes and typos more frequently than I'd like to admit! If you think you've spotted an error, please feel free to make a pull request against this repository.

python-scraping's People

Contributors

Stargazers

Watchers

Forkers

tienhv kanakagiri akirakaneshiro woodwardoge laskarcyber kodexp goryszewskig lewisbrown newlionwang justek60 the7day japinol geoffclo tarsnake vladdid sanch9 cluo designedforluck jrragan mimibambino beiying kingwmj bbhushan muxuezi ckeda ashishbhatnagar05 siyuansg a2q seabear1 ledrui macloo benjolras dependency-injection dvtoochgw anjuncc toddhendrix sunilsharma07 iangow chapmanbe schnabeltierli panyzzing jasenasia elyco555 bpinkert quietcoolwu nicolaikrueger fburkitt tarunlnmiit coiy erexhepa why-not-sky q1ang hzheng maytheforschbewithyou guodao ruiyizhou wangpanjun pharrell90 codetasks boosheng francisz2 leobarone umarmughal824 charlietuna49 psandn davidabrahams dipsec jmortega renelar cparoh alangunning emanueles marwrz mnjstwins jxhnxdxms flyheart starte lizuxiang hsinwe fjxhkj boulbidev jeffreynghm alanponce ngoccan suqi navachok kcompher vrml-github charlesyuzhouliu yxartemis akirakane lasbun aimhighfly spinbris iterswap johnfelipe xiong1000 chuanchang aaronalbright weieast

python-scraping's Issues

Hosting Account Suspended for http://www.pythonscraping.com

The website http://www.pythonscraping.com is suspended while many examples use that site. Please fix

chapter3 question

hi,little sister,
in chapter3,Collect all the list of ExternalLinks，The code in the book is wrong with the splitAddress function times：

即将获取链接的URL是：/

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-4254ccc1a2c3> in <module>()
     64             getAllExternalLinks(link)
     65 
---> 66 getAllExternalLinks("http://oreilly.com")

<ipython-input-2-4254ccc1a2c3> in getAllExternalLinks(siteUrl)
     62             print("即将获取链接的URL是：" + link)
     63             allIntLinks.add(link)
---> 64             getAllExternalLinks(link)
     65 
     66 getAllExternalLinks("http://oreilly.com")

<ipython-input-2-4254ccc1a2c3> in getAllExternalLinks(siteUrl)
     62             print("即将获取链接的URL是：" + link)
     63             allIntLinks.add(link)
---> 64             getAllExternalLinks(link)
     65 
     66 getAllExternalLinks("http://oreilly.com")

<ipython-input-2-4254ccc1a2c3> in getAllExternalLinks(siteUrl)
     62             print("即将获取链接的URL是：" + link)
     63             allIntLinks.add(link)
---> 64             getAllExternalLinks(link)
     65 
     66 getAllExternalLinks("http://oreilly.com")

<ipython-input-2-4254ccc1a2c3> in getAllExternalLinks(siteUrl)
     49 allIntLinks = set()
     50 def getAllExternalLinks(siteUrl):
---> 51     html = urlopen(siteUrl)
     52     bs = BeautifulSoup(html,"html.parser")
     53     internalLinks = getInternalLinks(bs,splitAddress(siteUrl)[0])

~/anaconda3/lib/python3.6/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    221     else:
    222         opener = _opener
--> 223     return opener.open(url, data, timeout)
    224 
    225 def install_opener(opener):

~/anaconda3/lib/python3.6/urllib/request.py in open(self, fullurl, data, timeout)
    509         # accept a URL or a Request object
    510         if isinstance(fullurl, str):
--> 511             req = Request(fullurl, data)
    512         else:
    513             req = fullurl

~/anaconda3/lib/python3.6/urllib/request.py in __init__(self, url, data, headers, origin_req_host, unverifiable, method)
    327                  origin_req_host=None, unverifiable=False,
    328                  method=None):
--> 329         self.full_url = url
    330         self.headers = {}
    331         self.unredirected_hdrs = {}

~/anaconda3/lib/python3.6/urllib/request.py in full_url(self, url)
    353         self._full_url = unwrap(url)
    354         self._full_url, self.fragment = splittag(self._full_url)
--> 355         self._parse()
    356 
    357     @full_url.deleter

~/anaconda3/lib/python3.6/urllib/request.py in _parse(self)
    382         self.type, rest = splittype(self._full_url)
    383         if self.type is None:
--> 384             raise ValueError("unknown url type: %r" % self.full_url)
    385         self.host, self.selector = splithost(rest)
    386         if self.host:

ValueError: unknown url type: '/'

Using the code you provide on GitHub is wrong:

Traceback (most recent call last):
  File "/home/kongnian/PycharmProjects/Scraping/getAllExternalLinks.py", line 81, in <module>
    getAllExternalLinks("http://oreilly.com")
  File "/home/kongnian/PycharmProjects/Scraping/getAllExternalLinks.py", line 76, in getAllExternalLinks
    getAllExternalLinks(link)
  File "/home/kongnian/PycharmProjects/Scraping/getAllExternalLinks.py", line 76, in getAllExternalLinks
    getAllExternalLinks(link)
  File "/home/kongnian/PycharmProjects/Scraping/getAllExternalLinks.py", line 76, in getAllExternalLinks
    getAllExternalLinks(link)
  [Previous line repeated 15 more times]
  File "/home/kongnian/PycharmProjects/Scraping/getAllExternalLinks.py", line 63, in getAllExternalLinks
    html = urlopen(siteUrl)
  File "/home/kongnian/anaconda3/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/home/kongnian/anaconda3/lib/python3.6/urllib/request.py", line 532, in open
    response = meth(req, response)
  File "/home/kongnian/anaconda3/lib/python3.6/urllib/request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "/home/kongnian/anaconda3/lib/python3.6/urllib/request.py", line 564, in error
    result = self._call_chain(*args)
  File "/home/kongnian/anaconda3/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/home/kongnian/anaconda3/lib/python3.6/urllib/request.py", line 756, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/home/kongnian/anaconda3/lib/python3.6/urllib/request.py", line 532, in open
    response = meth(req, response)
  File "/home/kongnian/anaconda3/lib/python3.6/urllib/request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "/home/kongnian/anaconda3/lib/python3.6/urllib/request.py", line 570, in error
    return self._call_chain(*args)
  File "/home/kongnian/anaconda3/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/home/kongnian/anaconda3/lib/python3.6/urllib/request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

Is this the end of the collection？

thanks！

Chapter 05 mySQLBasicExample

Thanks for writing this book. As a non-programmer it has been a fun project working through the various examples. I've run into a wall trying to run the mySQLBasic Example. Running in Windows 7. mySQL installed. PyMySQL installed. The mySQL command prompt is open. Copied the subject code into notepad and saved as mySQL01.py. Running results in the following

I have tried resetting my root password based on instructions here: http://dev.mysql.com/doc/refman/5.7/en/resetting-permissions.html but have been unsuccessful. It is seems that there are two possible passwords, the one set during install and ''. To enter mysql I just hit enter, so it would seem that the password is ''.

What am I missing here?

Thanks!

Error on first example

Hello I just purchased your book, and I am on the first example "scrapetest.py", and I am getting an error.

Exception has occurred: NameError
name 'null' is not defined
File "C:\Users\mine\Documents\VSCode\Workspaces\PythonScrapping\scrapetest.py", line 114, in "execution_count": null,

It is a fresh install o vs code, and python 3.1. This is my first time running them. Any help is appreciated.

Indentation Error (Storing Data, 4th code)

On the second loop there is a slight indentation issue to have the proper form

csvFile = open('editors.csv', 'wt+')
writer = csv.writer(csvFile)
try:
    for row in rows:
        csvRow = []
        for cell in row.findAll(['td', 'th']):
            csvRow.append(cell.get_text())
        writer.writerow(csvRow) #it should be indented backward to have a proper csv form
finally:
    csvFile.close()

Chapter04_CrawlingModels.ipynb, Dealing with different website layouts have wrong codes

line 25:
body = bs.find('div', {'class', 'post-body'}).text

A Parameter named 'attrs' should be dic type, but the example code and the book has wrong codes.

I recommed change the code like
body = bs.find('div', {'class': 'post-body'}).text

chapitre11/3-readWebImages.py

driver.find_element_by_id("sitbReaderRightPageTurner").click()

"Element is not currently visible and may not be manipulated"

chap 16: error occurred while running multiprocess crawling code

from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
import random

from multiprocessing import Process
import os
import time

visited = []
def get_links(bs):
    print('Getting links in {}'.format(os.getpid()))
    links = bs.find('div', {'id':'bodyContent'}).find_all('a', href=re.compile('^(/wiki/)((?!:).)*$'))
    return [link for link in links if link not in visited]

def scrape_article(path):
    visited.append(path)
    html = urlopen('http://en.wikipedia.org{}'.format(path))
    time.sleep(3)
    bs = BeautifulSoup(html, 'html.parser')
    title = bs.find('h1').get_text()
    print('Scraping {} in process {}'.format(title, os.getpid()))
    links = get_links(bs)
    if len(links) > 0:
        newArticle = links[random.randint(0, len(links)-1)].attrs['href']
        print(newArticle)
        scrape_article(newArticle)

processes = []
processes.append(Process(target=scrape_article, args=('/wiki/Kevin_Bacon', )))
processes.append(Process(target=scrape_article, args=('/wiki/Monty_Python', )))

for p in processes:
    p.start()

following error occurred while running it

RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

changed code to this:

# inserted if__name__=='__main__':
if __name__ == '__main__':
    processes = []
    processes.append(Process(target=scrape_article, args=('/wiki/Kevin_Bacon', )))
    processes.append(Process(target=scrape_article, args=('/wiki/Monty_Python', )))
    
    for p in processes:
        p.start()

Chapter 10.4: Issue about Posting Image

Hello Everyone,

I am getting error with following code

import requests
files = {'uploadFile': open('python.png', 'rb')}
r = requests.post('http://pythonscraping.com/pages/processing2.php', files=files)
print(r.text)

Respondes is
Sorry, there was an error uploading your file.

Anyone's help would be very thankful.

ModuleNotFoundError: No module named 'stem'

I'd installed stem with pip install stem and installed using conda install -c conda-forge stem.

If i write on terminal: pip freeze the stem shows up on his last version 1.7.1, but when i try to import to my code i got:
ModuleNotFoundError: No module named 'stem'

I've tried to install and unistall stem on version 1.6.0 and 1.7.0, but it doesnt work!

CAPTCHA sample available?

Hi, I'm editing Korean translation of the book.
In chapter 11 readers should get hundreds of CAPTCHA images,
but there's no explanation how or where to get those images.
Maybe you can provide the images you have used when writing the book?
Please help me! Thank you.

Can't login to Yahoo! for scraping

I'm going through the 2nd ed. of the book now and it's great. I've spent hours upon hours trying to log into Yahoo! with a POST request but I'm being thwarted. First, the program throws a TooManyRedirects error. When I add the keyword arg of allow_redirects=False, apparently I am being redirected anyway to a site with no content:

Output of response_obj.text:
'<p>Found. Redirecting to <a href="https://guce.yahoo.com/consent?gcrumb=F12tPO4&trapType=login&done=https%3A%2F%2Fwww.yahoo.com%2F&src=">https://guce.yahoo.com/consent?gcrumb=F12tPO4&trapType=login&done=https%3A%2F%2Fwww.yahoo.com%2F&src=</a></p>'

I am passing my browser headers and just about every other data I can identify under normal login circumstances with the request.
If anyone can successfully log into Yahoo, please spread the knowledge!

Chapter3.Article"Crawling with Scrapy". NotImplementedError

Hi everybody! I am newbie and try to implement my first test scrapy project wikiSpider . I done everything as described in the book and got the result- NotImplementedError: ArticleSpider.parse callback is not defined.

files:
-articleSpider.py:
from scrapy.selector import Selector
from scrapy import Spider
from wikiSpider.items import Article

class ArticleSpider(Spider):
name="article"
allowed_domains = ["en.wikipedia.org"]
start_urls = ["http://en.wikipedia.org/wiki/Main_Page",
"http://en.wikipedia.org/wiki/Python_%28programming_language%29"]

def parse(self, response):
item = Article()
title = response.xpath('//h1/text()')[0].extract()
print("Title is: "+title)
item['title'] = item
return item

-items.py

from scrapy import Item, Field

class Article(Item):
# define the fields for your item here like:
# name = scrapy.Field()
title=Field()

More details you can see here...
![mypic1](https://user-images.githubusercontent.com/29522625/37275003-b2660ff0-25e6-11e8-9b35-1a3afe85ebdd.jpg

Can anybody help me? Thnx in advance...

[SSL: CERTIFICATE_VERIFY_FAILED] on example site

Hi,

I can't anymore use the example site.

urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1129)>

Can you renewal the certificate ? ;)

Thx,

To bypass the problem I use requests not urlllib :

requests.get("url", verify=False) instead of urlopen()
BeautifulSoup(html.content, 'html.parser') instead of BeautifulSoup(html.read(), 'html.parser')

chapter 2 nameList got problem: NoneType object is not callable

hey i type the same code as the book says, and i got this error:

Traceback (most recent call last):
File "C:\Users\hongy\AppData\Local\Programs\Python\Python36\nameList.py", line 5, in
nameList = bsObj.findall('span', {'class':'green'})
TypeError: 'NoneType' object is not callable

im new to python, and cant figure it out by searching google. could you help me with this?

Chapter 8: 2-countUncommon2Grams.py

First of all, I really enjoying working though all the examples in the book, however, on this specific chapter I am lost. You have a function isCommon but never use it in the program.

Also, the output that you have in the book does not match with what you have in this repo.

I am confused can you please advised? Thank you!

Possible improvement on the code of page 76

Hi author,

The lines of code are at the very top of page 76 in the book:

try:
    for row in tableRows:
        csvRow = []
        for cell in row.findAll(['td', 'th']):
            csvRow.append(cell.get_text())
            writer.writerow(csvRow)
finally:
    csvFile.close()

I tested it and believed that, if we intend to output a normal csv file, this code should be modified as below:

try:
    for row in tableRows:
        csvRow = []
        for cell in row.findAll(['td', 'th']):
            csvRow.append(cell.get_text())
        writer.writerow(csvRow)
finally:
    csvFile.close()

That is, this line is moved to the left by one indent:
writer.writerow(csvRow)

Otherwise, the csv file would include many lines that are not complete.

findAll vs find_all

the text uses find_all

Using this BeautifulSoup object, you can use the find_all function...

where as the example uses findAll instead:

 = bs.findAll('span', {'class':'green'})
for name in nameList:
    print(name.get_text())

which doesn't make a difference when the code is run, but the code example and text should remain consistent with one-another

Chapter5 1-getPageMedia.py doesn't work

Chapter5's code.

Web scrapping

The first Rule of /articlesMoreRules.py doesn't need '^'

HTTPError module not found

Is this new or an issue with Python2? Thanks.

Chapter5 articles.py

I tried to run the code for the file articles.py both on Jupyter and on the anaconda command line, but everytime i run it, I get the error "No module named 'scrapy.contrib'" and up until now I couldn't solve this issue. I'd be glad if I could get any help with it, Thank you so much.

Error on "urllib.request import urlopen" from Chapter01_BeginningToScrape.ipynb

Hi,

I am getting below error after the code

`from urllib.request import urlopen

html = urlopen('http://pythonscraping.com/pages/page1.html')`

`Traceback (most recent call last):
File "C:\Anaconda3\envs\py38\lib\http\client.py", line 871, in _get_hostport
port = int(host[i+1:])
ValueError: invalid literal for int() with base 10: 'port'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 1, in
File "C:\Anaconda3\envs\py38\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:\Anaconda3\envs\py38\lib\urllib\request.py", line 525, in open
response = self._open(req, data)
File "C:\Anaconda3\envs\py38\lib\urllib\request.py", line 542, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "C:\Anaconda3\envs\py38\lib\urllib\request.py", line 502, in _call_chain
result = func(*args)
File "C:\Anaconda3\envs\py38\lib\urllib\request.py", line 1379, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "C:\Anaconda3\envs\py38\lib\urllib\request.py", line 1319, in do_open
h = http_class(host, timeout=req.timeout, **http_conn_args)
File "C:\Anaconda3\envs\py38\lib\http\client.py", line 833, in init
(self.host, self.port) = self._get_hostport(host, port)
File "C:\Anaconda3\envs\py38\lib\http\client.py", line 876, in _get_hostport
raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
http.client.InvalidURL: nonnumeric port: 'port'`

I am using the latest version of Python (3.8.5). What could be the problem?

Thank you.

chapter3

No "getNextExternalLink" method

文件目录与书籍目录有出入，部分代码需要修改

目录中的V1/chapter5，实际对应书籍目录第六章
https://github.com/REMitchell/python-scraping/blob/master/v1/chapter5/1-getPageMedia.py
中应进行修改baseUrl = "https://pythonscraping.com"，否则报错

python 範例

Chapter 3: Recursively crawling an entire site

Can someone help me fix the syntax issue below?

chapter4 ?

'<'Access Token Secret'>' why I do not find in https://apps.twitter.com!

Simple inversion in Chapter 04 > 2 Crawling through sites with search

in Chapter 04 > 2 Crawling through sites with search:
in function def search(self, topic, site):
there is: content = Content(topic, title, body, url)
instead of: content = Content(topic, url, title, body)
as in:
class Content: def __init__(self, topic, url, title, body)
making returned content difficult to understand.

Typo on page 221 of the book

The line

In the second scenario, the load your Internet connection and home machine can place on a site like Wikipedia....

Should be

In the third scenario, the load your Internet connection and home machine can place on a site like Wikipedia....

Can you provide the CAPTCHA 'captchaExample.png' of Chapter 11?

I want to test the CAPTCHA recognition with tesseract of the code of Chapter 11. But I can't find the CAPTCHA file captchaExample.png. I can make a screenshot of the picture on the book and save it as png format, but the result of the command tesseract captchaExample.png output is:

Tesseract Open Source OCR Engine v3.04.01 with Leptonica
Empty page!!
Empty page!!

So I want to obtain the original CAPTCHA file, can you provide it? Thanks.

UnicodeEncodeError: 'ascii' codec can't encode character '\u2019' in position 83: ordinal not in range(128)

I try to run code 'chapter6/6-readDocx.py' , error is blow:
UnicodeEncodeError: 'ascii' codec can't encode character '\u2019' in position 83: ordinal not in range(128)

it look like the error line is 'xml_content.decode('utf-8')'

how can i fix it?help me please

UserWarning: No parser was explicitly specified

All instances of BeautifulSoup([your markup]) need to be updated to BeautifulSoup([your markup], "html.parser").

For Example, the current use: bsObj = BeautifulSoup(html),
is fixed as bsObj = BeautifulSoup(html, "html.parser")

Full error being returned:

bs4/init.py:181: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 5 of the file 5-findParents.py. To get rid of this warning, change code that looks like this:

BeautifulSoup([your markup])

to this:

BeautifulSoup([your markup], "html.parser")

markup_type=markup_type))

Question in ch2

from urllib.request import urlopen
from bs4 import BeautifulSoup
html=urlopen("http://www.pythonscraping.com/pages/warandpeace.html")
bs=BeautifulSoup(html,"html.parser")
nameList = bs.find_all(text='the prince')
print(len(nameList))

I run the code above and the result is 7. However, when I use 'ctrl+F' to search 'the prince' in the the browser, the result is 11. I'm confused why the results are inconsistent.

Chapter 5, Page 75: Link extractor returns 0 results

chapter 10 - handling redirects

Hi,
I'm running the 3-javascriptRedirect.py file from chapter 10, and StaleElementReferenceException is not thrown, therefore the full 10 sec timeout is performed. Is this a bug?

[문의] splitAddress 사용법

안녕하세요.
크롤링 시작하기 65페이지

splitAddress(startingPage)[0]
여기서 [0]의 쓰임새 문의 드립니다.

감사합니다.

seleniumBasic.py in Chapter10 - 'Service' object has no attribute 'process'

Sorry, I am missing the executable path,
specific in executable_path=''

my code is same as the book:

#!/usr/bin/env python3

import time
from selenium import webdriver

driver = webdriver.PhantomJS(executable_path='')
driver.get("http://www.pythonscraping.com/pages/javascript/ajaxDemo.html")
time.sleep(3)
print(driver.find_element_by_id("content").text)
driver.close()

but when I run this code, I got this:

Traceback (most recent call last):
File "use_selenium.py", line 6, in
driver = webdriver.PhantomJS(executable_path='')
File "D:\Python34\lib\site-packages\selenium\webdriver\phantomjs\webdriver.py", line 52, in init
self.service.start()
File "D:\Python34\lib\site-packages\selenium\webdriver\common\service.py", line 64, in start
stdout=self.log_file, stderr=self.log_file)
File "D:\Python34\lib\subprocess.py", line 848, in init
restore_signals, start_new_session)
File "D:\Python34\lib\subprocess.py", line 1104, in _execute_child
startupinfo)
OSError: [WinError 87] ▒▒▒▒▒▒▒▒
Exception ignored in: <bound method Service.del of <selenium.webdriver.phantomjs.service.Service object at 0x000000000307A940>>
Traceback (most recent call last):
File "D:\Python34\lib\site-packages\selenium\webdriver\common\service.py", line 163, in del
self.stop()
File "D:\Python34\lib\site-packages\selenium\webdriver\common\service.py", line 135, in stop
if self.process is None:
AttributeError: 'Service' object has no attribute 'process'

I am using windows8.1+python3.4, install selenium by pip3. And it's also this error occurs in my centos6.5 server patform

[SSL: CERTIFICATE_VERIFY_FAILED] on pythonscraping

When running the code in the first two chapters that use pythonscraping.com I receive the following error:

Method missing

There is no method named getNextExternalLink() defined in Chapter 3 getExternalLinks.py

Error occurred when running chapter-12/2-seleniumCookies.py

The output of this script is:

[{'expires': 'Sun, 18 Dec 2016 12:53:17 GMT', 'name': '_gat', 'path': '/', 'expiry': 1482065597, 'domain': '.pythonscraping.com', 'httponly': False, 'value': '1', 'secure': False}, {'expires': 'Tue, 18 Dec 2018 12:43:17 GMT', 'name': '_ga', 'path': '/', 'expiry': 1545136997, 'domain': '.pythonscraping.com', 'httponly': False, 'value': 'GA1.2.2049848913.1482064997', 'secure': False}, {'name': 'has_js', 'path': '/', 'domain': 'pythonscraping.com', 'httponly': False, 'value': '1', 'secure': False}]

WebDriverException Traceback (most recent call last)
in ()
12 driver2.delete_all_cookies()
13 for cookie in savedCookies:
---> 14 driver2.add_cookie(cookie)
15
16 driver2.get("http://pythonscraping.com")

/usr/local/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py in add_cookie(self, cookie_dict)
669
670 """
--> 671 self.execute(Command.ADD_COOKIE, {'cookie': cookie_dict})
672
673 # Timeouts

/usr/local/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py in execute(self, driver_command, params)
234 response = self.command_executor.execute(driver_command, params)
235 if response:
--> 236 self.error_handler.check_response(response)
237 response['value'] = self._unwrap_value(
238 response.get('value', None))

/usr/local/lib/python3.5/site-packages/selenium/webdriver/remote/errorhandler.py in check_response(self, response)
190 elif exception_class == UnexpectedAlertPresentException and 'alert' in value:
191 raise exception_class(message, screen, stacktrace, value['alert'].get('text'))
--> 192 raise exception_class(message, screen, stacktrace)
193
194 def _value_or_default(self, obj, key, default):

WebDriverException: Message: {"errorMessage":"Can only set Cookies for the current domain","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","Content-Length":"243","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:58537","User-Agent":"Python-urllib/3.5"},"httpVersion":"1.1","method":"POST","post":"{"cookie": {"expires": "Sun, 18 Dec 2016 12:53:17 GMT", "name": "_gat", "path": "/", "expiry": 1482065597, "domain": ".pythonscraping.com", "httponly": false, "value": "1", "secure": false}, "sessionId": "94eb85a0-c51f-11e6-badf-39312152c0b6"}","url":"/cookie","urlParsed":{"anchor":"","query":"","file":"cookie","directory":"/","path":"/cookie","relative":"/cookie","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/cookie","queryKey":{},"chunks":["cookie"]},"urlOriginal":"/session/94eb85a0-c51f-11e6-badf-39312152c0b6/cookie"}}
Screenshot: available via screen

How to eliminate this error? Thank you in advance.

Question in ch14 v1

What is the meaning of the sentence "horrify web administrators by sending their website traffic from Internet Explorer 5.0." ? What special feature does Internet Explorer 5.0 have ?

chapter13/4-dragAndDrop.py Doesn't work

debian 7

from selenium import webdriver
from selenium.webdriver.remote.webelement import WebElement
from selenium.webdriver import ActionChains

driver = webdriver.PhantomJS(executable_path='./phantomjs')
driver.get('http://pythonscraping.com/pages/javascript/draggableDemo.html')

print(driver.find_element_by_id("message").text)
Prove you are not a bot, by dragging the square from the blue area to the red area!

element = driver.find_element_by_id("draggable")
target = driver.find_element_by_id("div2")
actions = ActionChains(driver)
actions.drag_and_drop(element, target).perform()

print(driver.find_element_by_id("message").text)
Prove you are not a bot, by dragging the square from the blue area to the red area!

======
the same result saw at windows7 too!!
and i found this
SeleniumHQ/selenium#2533

Can't view files

Can't view any files. Just get a "Sorry, something went wrong. Reload?" message.

Chapter 6: unzipping the PDFMiner3K file

I was able to download this file, but my Windows computer does not recognize that it must be extracted. How can I unzip this file?

HTTPError 403:Forbidden or no internal links

I am from China, therefore, as you know, google, facebook, twitter are not available.
There are many these kinds of external links related to these social websites.

And it should also consider the number of internal links, since in some websites, there are no internal links. It is very weird.
Thanks

csv Creation Failure

python --version: 3.7.0
chapter5/2-createCsv.py

Hello, I write the code according to the content on github, and even copy and paste the code directly afterwards, so that I can control the platform to report errors:
FileNotFoundError: [Errno 2] No such file or directory: '../files/test.csv'
Can you tell me how to solve the problem?