Giter Site home page Giter Site logo

Comments (13)

kevinoden avatar kevinoden commented on May 2, 2024 5

I had problems with the fetch_mldata function as well. tried the following which worked for me:

from sklearn.datasets import fetch_mldata
import tempfile
test_data_home = tempfile.mkdtemp()
mnist = fetch_mldata('MNIST original', data_home = test_data_home)
mnist

from handson-ml.

kevinoden avatar kevinoden commented on May 2, 2024 1

from handson-ml.

ageron avatar ageron commented on May 2, 2024

Hey, thanks a lot @youngsoul ! Could you please give more details about the SSL error you got? I want to limit the number of packages to install if I can, but if this bug is too frequent, I'll update the notebooks to use your solution instead.

This mldata.org issue is really annoying, let's hope it gets resolved shortly. For other people who might be interested, there's more details about this issue at scikit-learn/scikit-learn#8588

from handson-ml.

youngsoul avatar youngsoul commented on May 2, 2024

Hi @ageron I will definitely do that as soon as I get back home tonight. Just a little background, I was running with python 3.6, on MacOS 10.10.x.

I had a similar issue with the pandas read_csv that takes a URL. The only way I could get it work is via requests. I will get you the details later on.

from handson-ml.

youngsoul avatar youngsoul commented on May 2, 2024

Hi @ageron
My setup is:
Python 3.6
Mac OS X El Capitan 10.11.6

Here is the error I receive: ( I know its lengthy ):
`Could not download MNIST data from mldata.org, trying alternative...

HTTPError Traceback (most recent call last)
in ()
3 try:
----> 4 mnist = fetch_mldata('MNIST original')
5 except urllib.error.HTTPError as ex:

/Users/youngsoul/Documents/Development/PythonDev/VirtualEnvs/SciKitSandboxEnv/lib/python3.6/site-packages/sklearn/datasets/mldata.py in fetch_mldata(dataname, target_name, data_name, transpose_data, data_home)
141 try:
--> 142 mldata_url = urlopen(urlname)
143 except HTTPError as e:

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
222 opener = _opener
--> 223 return opener.open(url, data, timeout)
224

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py in open(self, fullurl, data, timeout)
531 meth = getattr(processor, meth_name)
--> 532 response = meth(req, response)
533

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py in http_response(self, request, response)
641 response = self.parent.error(
--> 642 'http', request, response, code, msg, hdrs)
643

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py in error(self, proto, *args)
563 args = (dict, proto, meth_name) + args
--> 564 result = self._call_chain(*args)
565 if result:

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
503 func = getattr(handler, meth_name)
--> 504 result = func(*args)
505 if result is not None:

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py in http_error_302(self, req, fp, code, msg, headers)
755
--> 756 return self.parent.open(new, timeout=req.timeout)
757

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py in open(self, fullurl, data, timeout)
531 meth = getattr(processor, meth_name)
--> 532 response = meth(req, response)
533

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py in http_response(self, request, response)
641 response = self.parent.error(
--> 642 'http', request, response, code, msg, hdrs)
643

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py in error(self, proto, *args)
569 args = (dict, 'default', 'http_error_default') + orig_args
--> 570 return self._call_chain(*args)
571

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
503 func = getattr(handler, meth_name)
--> 504 result = func(*args)
505 if result is not None:

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
649 def http_error_default(self, req, fp, code, msg, hdrs):
--> 650 raise HTTPError(req.full_url, code, msg, hdrs, fp)
651

HTTPError: HTTP Error 500: INTERNAL SERVER ERROR

During handling of the above exception, another exception occurred:

SSLError Traceback (most recent call last)
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py in do_open(self, http_class, req, **http_conn_args)
1317 h.request(req.get_method(), req.selector, req.data, headers,
-> 1318 encode_chunked=req.has_header('Transfer-encoding'))
1319 except OSError as err: # timeout error

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py in request(self, method, url, body, headers, encode_chunked)
1238 """Send a complete request to the server."""
-> 1239 self._send_request(method, url, body, headers, encode_chunked)
1240

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py in _send_request(self, method, url, body, headers, encode_chunked)
1284 body = _encode(body, 'body')
-> 1285 self.endheaders(body, encode_chunked=encode_chunked)
1286

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py in endheaders(self, message_body, encode_chunked)
1233 raise CannotSendHeader()
-> 1234 self._send_output(message_body, encode_chunked=encode_chunked)
1235

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py in _send_output(self, message_body, encode_chunked)
1025 del self._buffer[:]
-> 1026 self.send(msg)
1027

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py in send(self, data)
963 if self.auto_open:
--> 964 self.connect()
965 else:

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py in connect(self)
1399 self.sock = self._context.wrap_socket(self.sock,
-> 1400 server_hostname=server_hostname)
1401 if not self._context.check_hostname and self._check_hostname:

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py in wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, session)
400 server_hostname=server_hostname,
--> 401 _context=self, _session=session)
402

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py in init(self, sock, keyfile, certfile, server_side, cert_reqs, ssl_version, ca_certs, do_handshake_on_connect, family, type, proto, fileno, suppress_ragged_eofs, npn_protocols, ciphers, server_hostname, _context, _session)
807 raise ValueError("do_handshake_on_connect should not be specified for non-blocking sockets")
--> 808 self.do_handshake()
809

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py in do_handshake(self, block)
1060 self.settimeout(None)
-> 1061 self._sslobj.do_handshake()
1062 finally:

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py in do_handshake(self)
682 """Start the SSL/TLS handshake."""
--> 683 self._sslobj.do_handshake()
684 if self.context.check_hostname:

SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:749)

During handling of the above exception, another exception occurred:

URLError Traceback (most recent call last)
in ()
10 mnist_alternative_url = "https://github.com/amplab/datascience-sp14/raw/master/lab7/mldata/mnist-original.mat"
11 mnist_path = "./mnist-original.mat"
---> 12 response = urllib.request.urlopen(mnist_alternative_url)
13 with open(mnist_path, "wb") as f:
14 content = response.read()

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
221 else:
222 opener = _opener
--> 223 return opener.open(url, data, timeout)
224
225 def install_opener(opener):

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py in open(self, fullurl, data, timeout)
524 req = meth(req)
525
--> 526 response = self._open(req, data)
527
528 # post-process response

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py in _open(self, req, data)
542 protocol = req.type
543 result = self._call_chain(self.handle_open, protocol, protocol +
--> 544 '_open', req)
545 if result:
546 return result

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
502 for handler in handlers:
503 func = getattr(handler, meth_name)
--> 504 result = func(*args)
505 if result is not None:
506 return result

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py in https_open(self, req)
1359 def https_open(self, req):
1360 return self.do_open(http.client.HTTPSConnection, req,
-> 1361 context=self._context, check_hostname=self.check_hostname)
1362
1363 https_request = AbstractHTTPHandler.do_request

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py in do_open(self, http_class, req, **http_conn_args)
1318 encode_chunked=req.has_header('Transfer-encoding'))
1319 except OSError as err: # timeout error
-> 1320 raise URLError(err)
1321 r = h.getresponse()
1322 except:

URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:749)>`

from handson-ml.

ageron avatar ageron commented on May 2, 2024

Hi @youngsoul, it looks like you ran into a specific OSX issue which is described in this StackOverflow Question (check out Craig Glennie's answer).

Please let me know if this fixes the problem.

from handson-ml.

ageron avatar ageron commented on May 2, 2024

It seems that mldata.org is back up (at last!), so I can finally close this issue. I might remove the fallback function to avoid any confusion.

from handson-ml.

youngsoul avatar youngsoul commented on May 2, 2024

👍

from handson-ml.

ageron avatar ageron commented on May 2, 2024

Thanks for your feedback @kevinoden . By default, fetch_mldata() tries to download the data in $HOME/scikit_learn_data. If it cannot create this directory (or subdirectories) for some reason (e.g., access rights issue, or disk is full), then it will fail. Your solution will create a new temporary directory every time. This works around the issue, but it also means that the data will be downloaded again every time you run fetch_mldata() instead of caching the data in $HOME/scikit_learn_data. It may not be an issue for you, but I just wanted to point this out.
Cheers

from handson-ml.

kevinoden avatar kevinoden commented on May 2, 2024

from handson-ml.

MeghashreeRaghav avatar MeghashreeRaghav commented on May 2, 2024

hi @youngsoul

how did you overcome from

HTTPError: HTTP Error 500: INTERNAL SERVER ERROR issue.

Even i am facing similar issue :(

from handson-ml.

ageron avatar ageron commented on May 2, 2024

Any HTTP Error between 500 and 599 is a server-side error. This probably means that the server has some temporary problem, and you should try again later. This is the most likely cause, IMO. However, if the problem persists then it might be due to a network issue on your computer or your local network, perhaps the DNS is pointing you to the wrong server, somehow, or modifying the requests. Perhaps try again on a different computer and/or a different network.

from handson-ml.

C-H-I-R-A-G avatar C-H-I-R-A-G commented on May 2, 2024

I got a "fetch_mldata not defined" error and I use this code and it works, I hope it's useful to you.

from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784')

from handson-ml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.