davidmcneil / mnist Goto Github PK
View Code? Open in Web Editor NEWMNIST data set parser https://crates.io/crates/mnist
MNIST data set parser https://crates.io/crates/mnist
The README has:
let Mnist { trn_img, trn_lbl, .. } = MnistBuilder::new()
.image_format_28x28()
.label_format_1x1()
[...]
but both image_format_28x28()
and label_format_1x1()
are invalid methods.
They should probably be removed from the example (as is the case in the documentation).
MnistBuilder
expects the MNIST data to be, for example, data/train-images.idx3-ubyte
. But the Mnist web size gives train-images-idx3-ubyte.gz
; Note the .idx3
vs -idx3
.
This means that I have to rename the files after having downloaded the original ones, which can be confusing.
I guess MnistBuilder
should expect files with .idx3
and not -idx3
?
here are some test code
use mnist::MnistBuilder;
#[test]
fn test_mnist() {
let mnist = MnistBuilder::new()
.label_format_digit()
.training_set_length(50_000)
.validation_set_length(10_000)
.test_set_length(10_000)
.finalize();
dbg!(mnist.trn_img);
dbg!(mnist.trn_lbl);
}
[dependencies]
mnist = "0.5.0"
and got error
running 1 test
thread 'test_mnist' panicked at 'Unable to find path to images at "data/train-images-idx3-ubyte".'
how could i fix it?
I would like the library to use log, to be able to suppress the messages by configuring log level.
The library is using reqwest
in blocking mode at the moment.
│ └── reqwest v0.10.10
│ ├── base64 v0.13.0
│ ├── bytes v0.5.6
│ ├── encoding_rs v0.8.28
│ │ └── cfg-if v1.0.0
│ ├── futures-core v0.3.15
│ ├── futures-util v0.3.15
│ │ ├── futures-core v0.3.15
│ │ ├── futures-io v0.3.15
│ │ ├── futures-task v0.3.15
│ │ ├── memchr v2.4.0
│ │ ├── pin-project-lite v0.2.6
│ │ ├── pin-utils v0.1.0
│ │ └── slab v0.4.3
│ │ [build-dependencies]
│ │ └── autocfg v1.0.1
│ ├── http v0.2.4
│ │ ├── bytes v1.0.1
│ │ ├── fnv v1.0.7
│ │ └── itoa v0.4.7
│ ├── http-body v0.3.1
│ │ ├── bytes v0.5.6
│ │ └── http v0.2.4 (*)
│ ├── hyper v0.13.10
│ │ ├── bytes v0.5.6
│ │ ├── futures-channel v0.3.15
│ │ │ └── futures-core v0.3.15
│ │ ├── futures-core v0.3.15
│ │ ├── futures-util v0.3.15 (*)
│ │ ├── h2 v0.2.7
│ │ │ ├── bytes v0.5.6
│ │ │ ├── fnv v1.0.7
│ │ │ ├── futures-core v0.3.15
│ │ │ ├── futures-sink v0.3.15
│ │ │ ├── futures-util v0.3.15 (*)
│ │ │ ├── http v0.2.4 (*)
│ │ │ ├── indexmap v1.6.2 (*)
│ │ │ ├── slab v0.4.3
│ │ │ ├── tokio v0.2.25
│ │ │ │ ├── bytes v0.5.6
│ │ │ │ ├── fnv v1.0.7
│ │ │ │ ├── futures-core v0.3.15
│ │ │ │ ├── iovec v0.1.4
│ │ │ │ │ └── libc v0.2.97
│ │ │ │ ├── lazy_static v1.4.0
│ │ │ │ ├── memchr v2.4.0
│ │ │ │ ├── mio v0.6.23
│ │ │ │ │ ├── cfg-if v0.1.10
│ │ │ │ │ ├── iovec v0.1.4 (*)
│ │ │ │ │ ├── libc v0.2.97
│ │ │ │ │ ├── log v0.4.14
│ │ │ │ │ │ └── cfg-if v1.0.0
│ │ │ │ │ ├── net2 v0.2.37
│ │ │ │ │ │ ├── cfg-if v0.1.10
│ │ │ │ │ │ └── libc v0.2.97
│ │ │ │ │ └── slab v0.4.3
│ │ │ │ ├── num_cpus v1.13.0 (*)
│ │ │ │ ├── pin-project-lite v0.1.12
│ │ │ │ └── slab v0.4.3
│ │ │ ├── tokio-util v0.3.1
│ │ │ │ ├── bytes v0.5.6
│ │ │ │ ├── futures-core v0.3.15
│ │ │ │ ├── futures-sink v0.3.15
│ │ │ │ ├── log v0.4.14 (*)
│ │ │ │ ├── pin-project-lite v0.1.12
│ │ │ │ └── tokio v0.2.25 (*)
│ │ │ ├── tracing v0.1.26
│ │ │ │ ├── cfg-if v1.0.0
│ │ │ │ ├── log v0.4.14 (*)
│ │ │ │ ├── pin-project-lite v0.2.6
│ │ │ │ └── tracing-core v0.1.18
│ │ │ │ └── lazy_static v1.4.0
│ │ │ └── tracing-futures v0.2.5
│ │ │ ├── pin-project v1.0.7
│ │ │ │ └── pin-project-internal v1.0.7 (proc-macro)
│ │ │ │ ├── proc-macro2 v1.0.27 (*)
│ │ │ │ ├── quote v1.0.9 (*)
│ │ │ │ └── syn v1.0.73 (*)
│ │ │ └── tracing v0.1.26 (*)
│ │ ├── http v0.2.4 (*)
│ │ ├── http-body v0.3.1 (*)
│ │ ├── httparse v1.4.1
│ │ ├── httpdate v0.3.2
│ │ ├── itoa v0.4.7
│ │ ├── pin-project v1.0.7 (*)
│ │ ├── socket2 v0.3.19
│ │ │ ├── cfg-if v1.0.0
│ │ │ └── libc v0.2.97
│ │ ├── tokio v0.2.25 (*)
│ │ ├── tower-service v0.3.1
│ │ ├── tracing v0.1.26 (*)
│ │ └── want v0.3.0
│ │ ├── log v0.4.14 (*)
│ │ └── try-lock v0.2.3
│ ├── hyper-tls v0.4.3
│ │ ├── bytes v0.5.6
│ │ ├── hyper v0.13.10 (*)
│ │ ├── native-tls v0.2.7
│ │ │ ├── log v0.4.14 (*)
│ │ │ ├── openssl v0.10.35
│ │ │ │ ├── bitflags v1.2.1
│ │ │ │ ├── cfg-if v1.0.0
│ │ │ │ ├── foreign-types v0.3.2
│ │ │ │ │ └── foreign-types-shared v0.1.1
│ │ │ │ ├── libc v0.2.97
│ │ │ │ ├── once_cell v1.8.0
│ │ │ │ └── openssl-sys v0.9.65 (*)
│ │ │ ├── openssl-probe v0.1.4
│ │ │ └── openssl-sys v0.9.65 (*)
│ │ ├── tokio v0.2.25 (*)
│ │ └── tokio-tls v0.3.1
│ │ ├── native-tls v0.2.7 (*)
│ │ └── tokio v0.2.25 (*)
│ ├── ipnet v2.3.1
│ ├── lazy_static v1.4.0
│ ├── log v0.4.14 (*)
│ ├── mime v0.3.16
│ ├── mime_guess v2.0.3
│ │ ├── mime v0.3.16
│ │ └── unicase v2.6.0
│ │ [build-dependencies]
│ │ └── version_check v0.9.3
│ │ [build-dependencies]
│ │ └── unicase v2.6.0 (*)
│ ├── native-tls v0.2.7 (*)
│ ├── percent-encoding v2.1.0
│ ├── pin-project-lite v0.2.6
│ ├── serde v1.0.126 (*)
│ ├── serde_urlencoded v0.7.0
│ │ ├── form_urlencoded v1.0.1
│ │ │ ├── matches v0.1.8
│ │ │ └── percent-encoding v2.1.0
│ │ ├── itoa v0.4.7
│ │ ├── ryu v1.0.5
│ │ └── serde v1.0.126 (*)
│ ├── tokio v0.2.25 (*)
│ ├── tokio-tls v0.3.1 (*)
│ └── url v2.2.2
│ ├── form_urlencoded v1.0.1 (*)
│ ├── idna v0.2.3
│ │ ├── matches v0.1.8
│ │ ├── unicode-bidi v0.3.5
│ │ │ └── matches v0.1.8
│ │ └── unicode-normalization v0.1.19 (*)
│ ├── matches v0.1.8
│ └── percent-encoding v2.1.0
can you consider switching to curl
which has a considerably smaller depdency footprint?
├── curl v0.4.38
│ ├── curl-sys v0.4.44+curl-7.77.0
│ │ ├── libc v0.2.97
│ │ ├── libz-sys v1.1.3
│ │ │ └── libc v0.2.97
│ │ │ [build-dependencies]
│ │ │ ├── cc v1.0.68
│ │ │ └── pkg-config v0.3.19
│ │ └── openssl-sys v0.9.65
│ │ └── libc v0.2.97
│ │ [build-dependencies]
│ │ ├── autocfg v1.0.1
│ │ ├── cc v1.0.68
│ │ └── pkg-config v0.3.19
│ │ [build-dependencies]
│ │ ├── cc v1.0.68
│ │ └── pkg-config v0.3.19
│ ├── libc v0.2.97
│ ├── openssl-probe v0.1.4
│ ├── openssl-sys v0.9.65 (*)
│ └── socket2 v0.4.0
│ └── libc v0.2.97
download_and_extract
iterates over archives but then download
ignores its archive
parameter and iterates again. So, the first outer loop iteration downloads every needed archive and the subsequent iterations re-trigger a useless check whether the file needs to be downloaded:
Download directory /tmp/mnist/ does not exists. Creating....
Attempting to download and extract train-images-idx3-ubyte.gz...
- Downloading from file from http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz and saving to file as: /tmp/mnist/train-images-idx3-ubyte.gz
9912422 / 9912422 ╢==================================================================================================================================================================================================╟ 100.00 % 51537113.46/s
- Downloading from file from http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz and saving to file as: /tmp/mnist/train-labels-idx1-ubyte.gz
28881 / 28881 ╢========================================================================================================================================================================================================╟ 100.00 % 473278.10/s
- Downloading from file from http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz and saving to file as: /tmp/mnist/t10k-images-idx3-ubyte.gz
1648877 / 1648877 ╢==================================================================================================================================================================================================╟ 100.00 % 20324122.82/s
- Downloading from file from http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz and saving to file as: /tmp/mnist/t10k-labels-idx1-ubyte.gz
4542 / 4542 ╢===========================================================================================================================================================================================================╟ 100.00 % 74586.06/s
Extracting archive "/tmp/mnist/train-images-idx3-ubyte.gz" to "/tmp/mnist/train-images-idx3-ubyte"...
Attempting to download and extract train-labels-idx1-ubyte.gz...
File "/tmp/mnist/train-images-idx3-ubyte.gz" already exists, skipping downloading.
File "/tmp/mnist/train-labels-idx1-ubyte.gz" already exists, skipping downloading.
File "/tmp/mnist/t10k-images-idx3-ubyte.gz" already exists, skipping downloading.
File "/tmp/mnist/t10k-labels-idx1-ubyte.gz" already exists, skipping downloading.
Extracting archive "/tmp/mnist/train-labels-idx1-ubyte.gz" to "/tmp/mnist/train-labels-idx1-ubyte"...
Attempting to download and extract t10k-images-idx3-ubyte.gz...
File "/tmp/mnist/train-images-idx3-ubyte.gz" already exists, skipping downloading.
File "/tmp/mnist/train-labels-idx1-ubyte.gz" already exists, skipping downloading.
File "/tmp/mnist/t10k-images-idx3-ubyte.gz" already exists, skipping downloading.
File "/tmp/mnist/t10k-labels-idx1-ubyte.gz" already exists, skipping downloading.
Extracting archive "/tmp/mnist/t10k-images-idx3-ubyte.gz" to "/tmp/mnist/t10k-images-idx3-ubyte"...
Attempting to download and extract t10k-labels-idx1-ubyte.gz...
File "/tmp/mnist/train-images-idx3-ubyte.gz" already exists, skipping downloading.
File "/tmp/mnist/train-labels-idx1-ubyte.gz" already exists, skipping downloading.
File "/tmp/mnist/t10k-images-idx3-ubyte.gz" already exists, skipping downloading.
File "/tmp/mnist/t10k-labels-idx1-ubyte.gz" already exists, skipping downloading.
Extracting archive "/tmp/mnist/t10k-labels-idx1-ubyte.gz" to "/tmp/mnist/t10k-labels-idx1-ubyte"...
See how the handling of train-images-idx3-ubyte.gz
triggers the 4 downloads, and then the other 3 re-check for the 4 files.
With my proposed patch:
Download directory /tmp/mnist/ does not exists. Creating....
Attempting to download and extract train-images-idx3-ubyte.gz...
- Downloading from file from http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz and saving to file as: /tmp/mnist/train-images-idx3-ubyte.gz
9912422 / 9912422 ╢==================================================================================================================================================================================================╟ 100.00 % 46574828.67/s
Extracting archive "/tmp/mnist/train-images-idx3-ubyte.gz" to "/tmp/mnist/train-images-idx3-ubyte"...
Attempting to download and extract train-labels-idx1-ubyte.gz...
- Downloading from file from http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz and saving to file as: /tmp/mnist/train-labels-idx1-ubyte.gz
28881 / 28881 ╢========================================================================================================================================================================================================╟ 100.00 % 714115.69/s
Extracting archive "/tmp/mnist/train-labels-idx1-ubyte.gz" to "/tmp/mnist/train-labels-idx1-ubyte"...
Attempting to download and extract t10k-images-idx3-ubyte.gz...
- Downloading from file from http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz and saving to file as: /tmp/mnist/t10k-images-idx3-ubyte.gz
1648877 / 1648877 ╢==================================================================================================================================================================================================╟ 100.00 % 16212662.04/s
Extracting archive "/tmp/mnist/t10k-images-idx3-ubyte.gz" to "/tmp/mnist/t10k-images-idx3-ubyte"...
Attempting to download and extract t10k-labels-idx1-ubyte.gz...
- Downloading from file from http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz and saving to file as: /tmp/mnist/t10k-labels-idx1-ubyte.gz
4542 / 4542 ╢==========================================================================================================================================================================================================╟ 100.00 % 111722.52/s
Extracting archive "/tmp/mnist/t10k-labels-idx1-ubyte.gz" to "/tmp/mnist/t10k-labels-idx1-ubyte"...
Each archive is checked once in their loop iteration.
I am getting 'Unable to read whole file in memory (data/train-images-idx3-ubyte)'. Is it possible to load data partially?
Follow-up of linfa compilation problem due to mnist 0.5.
It comes from the use of cfg!
in download.rs which does not remove any code unlike #[cfg]
, hence the error on Windows.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.