Giter Site home page Giter Site logo

mnist's People

Contributors

davidmcneil avatar nbigaouette avatar quietlychris avatar sufflope avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

mnist's Issues

README example uses non-existing method

The README has:

    let Mnist { trn_img, trn_lbl, .. } = MnistBuilder::new()
         .image_format_28x28()
         .label_format_1x1()
[...]

but both image_format_28x28() and label_format_1x1() are invalid methods.

They should probably be removed from the example (as is the case in the documentation).

Expected filename issue

MnistBuilder expects the MNIST data to be, for example, data/train-images.idx3-ubyte. But the Mnist web size gives train-images-idx3-ubyte.gz; Note the .idx3 vs -idx3.

This means that I have to rename the files after having downloaded the original ones, which can be confusing.

I guess MnistBuilder should expect files with .idx3 and not -idx3?

panicked at 'Unable to find path to images at "data/train-images-idx3-ubyte".'

here are some test code

use mnist::MnistBuilder;

#[test]
fn test_mnist() {

    let mnist = MnistBuilder::new()
        .label_format_digit()
        .training_set_length(50_000)
        .validation_set_length(10_000)
        .test_set_length(10_000)
        .finalize();
    
    dbg!(mnist.trn_img);
    dbg!(mnist.trn_lbl);
}
  • cargo.toml
[dependencies]
mnist = "0.5.0"

and got error

running 1 test
thread 'test_mnist' panicked at 'Unable to find path to images at "data/train-images-idx3-ubyte".'

how could i fix it?

Different HTTP backend

The library is using reqwest in blocking mode at the moment.

Dependencies for reqwest
│   └── reqwest v0.10.10
│       ├── base64 v0.13.0
│       ├── bytes v0.5.6
│       ├── encoding_rs v0.8.28
│       │   └── cfg-if v1.0.0
│       ├── futures-core v0.3.15
│       ├── futures-util v0.3.15
│       │   ├── futures-core v0.3.15
│       │   ├── futures-io v0.3.15
│       │   ├── futures-task v0.3.15
│       │   ├── memchr v2.4.0
│       │   ├── pin-project-lite v0.2.6
│       │   ├── pin-utils v0.1.0
│       │   └── slab v0.4.3
│       │   [build-dependencies]
│       │   └── autocfg v1.0.1
│       ├── http v0.2.4
│       │   ├── bytes v1.0.1
│       │   ├── fnv v1.0.7
│       │   └── itoa v0.4.7
│       ├── http-body v0.3.1
│       │   ├── bytes v0.5.6
│       │   └── http v0.2.4 (*) 
│       ├── hyper v0.13.10
│       │   ├── bytes v0.5.6
│       │   ├── futures-channel v0.3.15
│       │   │   └── futures-core v0.3.15
│       │   ├── futures-core v0.3.15
│       │   ├── futures-util v0.3.15 (*) 
│       │   ├── h2 v0.2.7
│       │   │   ├── bytes v0.5.6
│       │   │   ├── fnv v1.0.7
│       │   │   ├── futures-core v0.3.15
│       │   │   ├── futures-sink v0.3.15
│       │   │   ├── futures-util v0.3.15 (*) 
│       │   │   ├── http v0.2.4 (*) 
│       │   │   ├── indexmap v1.6.2 (*) 
│       │   │   ├── slab v0.4.3
│       │   │   ├── tokio v0.2.25
│       │   │   │   ├── bytes v0.5.6
│       │   │   │   ├── fnv v1.0.7
│       │   │   │   ├── futures-core v0.3.15
│       │   │   │   ├── iovec v0.1.4
│       │   │   │   │   └── libc v0.2.97
│       │   │   │   ├── lazy_static v1.4.0
│       │   │   │   ├── memchr v2.4.0
│       │   │   │   ├── mio v0.6.23
│       │   │   │   │   ├── cfg-if v0.1.10
│       │   │   │   │   ├── iovec v0.1.4 (*) 
│       │   │   │   │   ├── libc v0.2.97
│       │   │   │   │   ├── log v0.4.14
│       │   │   │   │   │   └── cfg-if v1.0.0
│       │   │   │   │   ├── net2 v0.2.37
│       │   │   │   │   │   ├── cfg-if v0.1.10
│       │   │   │   │   │   └── libc v0.2.97
│       │   │   │   │   └── slab v0.4.3
│       │   │   │   ├── num_cpus v1.13.0 (*) 
│       │   │   │   ├── pin-project-lite v0.1.12
│       │   │   │   └── slab v0.4.3
│       │   │   ├── tokio-util v0.3.1
│       │   │   │   ├── bytes v0.5.6
│       │   │   │   ├── futures-core v0.3.15
│       │   │   │   ├── futures-sink v0.3.15
│       │   │   │   ├── log v0.4.14 (*) 
│       │   │   │   ├── pin-project-lite v0.1.12
│       │   │   │   └── tokio v0.2.25 (*) 
│       │   │   ├── tracing v0.1.26
│       │   │   │   ├── cfg-if v1.0.0
│       │   │   │   ├── log v0.4.14 (*) 
│       │   │   │   ├── pin-project-lite v0.2.6
│       │   │   │   └── tracing-core v0.1.18
│       │   │   │       └── lazy_static v1.4.0
│       │   │   └── tracing-futures v0.2.5
│       │   │       ├── pin-project v1.0.7
│       │   │       │   └── pin-project-internal v1.0.7 (proc-macro)
│       │   │       │       ├── proc-macro2 v1.0.27 (*) 
│       │   │       │       ├── quote v1.0.9 (*) 
│       │   │       │       └── syn v1.0.73 (*) 
│       │   │       └── tracing v0.1.26 (*) 
│       │   ├── http v0.2.4 (*) 
│       │   ├── http-body v0.3.1 (*) 
│       │   ├── httparse v1.4.1
│       │   ├── httpdate v0.3.2
│       │   ├── itoa v0.4.7
│       │   ├── pin-project v1.0.7 (*) 
│       │   ├── socket2 v0.3.19
│       │   │   ├── cfg-if v1.0.0
│       │   │   └── libc v0.2.97
│       │   ├── tokio v0.2.25 (*) 
│       │   ├── tower-service v0.3.1
│       │   ├── tracing v0.1.26 (*) 
│       │   └── want v0.3.0
│       │       ├── log v0.4.14 (*) 
│       │       └── try-lock v0.2.3
│       ├── hyper-tls v0.4.3
│       │   ├── bytes v0.5.6
│       │   ├── hyper v0.13.10 (*) 
│       │   ├── native-tls v0.2.7
│       │   │   ├── log v0.4.14 (*) 
│       │   │   ├── openssl v0.10.35
│       │   │   │   ├── bitflags v1.2.1
│       │   │   │   ├── cfg-if v1.0.0
│       │   │   │   ├── foreign-types v0.3.2
│       │   │   │   │   └── foreign-types-shared v0.1.1
│       │   │   │   ├── libc v0.2.97
│       │   │   │   ├── once_cell v1.8.0
│       │   │   │   └── openssl-sys v0.9.65 (*) 
│       │   │   ├── openssl-probe v0.1.4
│       │   │   └── openssl-sys v0.9.65 (*) 
│       │   ├── tokio v0.2.25 (*) 
│       │   └── tokio-tls v0.3.1
│       │       ├── native-tls v0.2.7 (*) 
│       │       └── tokio v0.2.25 (*) 
│       ├── ipnet v2.3.1
│       ├── lazy_static v1.4.0
│       ├── log v0.4.14 (*) 
│       ├── mime v0.3.16
│       ├── mime_guess v2.0.3
│       │   ├── mime v0.3.16
│       │   └── unicase v2.6.0
│       │       [build-dependencies]
│       │       └── version_check v0.9.3
│       │   [build-dependencies]
│       │   └── unicase v2.6.0 (*) 
│       ├── native-tls v0.2.7 (*) 
│       ├── percent-encoding v2.1.0
│       ├── pin-project-lite v0.2.6
│       ├── serde v1.0.126 (*) 
│       ├── serde_urlencoded v0.7.0
│       │   ├── form_urlencoded v1.0.1
│       │   │   ├── matches v0.1.8
│       │   │   └── percent-encoding v2.1.0
│       │   ├── itoa v0.4.7
│       │   ├── ryu v1.0.5
│       │   └── serde v1.0.126 (*) 
│       ├── tokio v0.2.25 (*) 
│       ├── tokio-tls v0.3.1 (*) 
│       └── url v2.2.2
│           ├── form_urlencoded v1.0.1 (*) 
│           ├── idna v0.2.3
│           │   ├── matches v0.1.8
│           │   ├── unicode-bidi v0.3.5
│           │   │   └── matches v0.1.8
│           │   └── unicode-normalization v0.1.19 (*) 
│           ├── matches v0.1.8
│           └── percent-encoding v2.1.0

can you consider switching to curl which has a considerably smaller depdency footprint?

Depdendencies for curl
├── curl v0.4.38                                                                             
│   ├── curl-sys v0.4.44+curl-7.77.0
│   │   ├── libc v0.2.97
│   │   ├── libz-sys v1.1.3
│   │   │   └── libc v0.2.97
│   │   │   [build-dependencies]
│   │   │   ├── cc v1.0.68
│   │   │   └── pkg-config v0.3.19
│   │   └── openssl-sys v0.9.65
│   │       └── libc v0.2.97
│   │       [build-dependencies]
│   │       ├── autocfg v1.0.1
│   │       ├── cc v1.0.68
│   │       └── pkg-config v0.3.19
│   │   [build-dependencies]
│   │   ├── cc v1.0.68
│   │   └── pkg-config v0.3.19
│   ├── libc v0.2.97
│   ├── openssl-probe v0.1.4
│   ├── openssl-sys v0.9.65 (*)
│   └── socket2 v0.4.0
│       └── libc v0.2.97

Redundant iteration when downloading archives

download_and_extract iterates over archives but then download ignores its archive parameter and iterates again. So, the first outer loop iteration downloads every needed archive and the subsequent iterations re-trigger a useless check whether the file needs to be downloaded:

Download directory /tmp/mnist/ does not exists. Creating....
Attempting to download and extract train-images-idx3-ubyte.gz...
- Downloading from file from http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz and saving to file as: /tmp/mnist/train-images-idx3-ubyte.gz
9912422 / 9912422 ╢==================================================================================================================================================================================================╟ 100.00 % 51537113.46/s
 - Downloading from file from http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz and saving to file as: /tmp/mnist/train-labels-idx1-ubyte.gz
28881 / 28881 ╢========================================================================================================================================================================================================╟ 100.00 % 473278.10/s
 - Downloading from file from http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz and saving to file as: /tmp/mnist/t10k-images-idx3-ubyte.gz
1648877 / 1648877 ╢==================================================================================================================================================================================================╟ 100.00 % 20324122.82/s
 - Downloading from file from http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz and saving to file as: /tmp/mnist/t10k-labels-idx1-ubyte.gz
4542 / 4542 ╢===========================================================================================================================================================================================================╟ 100.00 % 74586.06/s
 Extracting archive "/tmp/mnist/train-images-idx3-ubyte.gz" to "/tmp/mnist/train-images-idx3-ubyte"...
Attempting to download and extract train-labels-idx1-ubyte.gz...
  File "/tmp/mnist/train-images-idx3-ubyte.gz" already exists, skipping downloading.
  File "/tmp/mnist/train-labels-idx1-ubyte.gz" already exists, skipping downloading.
  File "/tmp/mnist/t10k-images-idx3-ubyte.gz" already exists, skipping downloading.
  File "/tmp/mnist/t10k-labels-idx1-ubyte.gz" already exists, skipping downloading.
Extracting archive "/tmp/mnist/train-labels-idx1-ubyte.gz" to "/tmp/mnist/train-labels-idx1-ubyte"...
Attempting to download and extract t10k-images-idx3-ubyte.gz...
  File "/tmp/mnist/train-images-idx3-ubyte.gz" already exists, skipping downloading.
  File "/tmp/mnist/train-labels-idx1-ubyte.gz" already exists, skipping downloading.
  File "/tmp/mnist/t10k-images-idx3-ubyte.gz" already exists, skipping downloading.
  File "/tmp/mnist/t10k-labels-idx1-ubyte.gz" already exists, skipping downloading.
Extracting archive "/tmp/mnist/t10k-images-idx3-ubyte.gz" to "/tmp/mnist/t10k-images-idx3-ubyte"...
Attempting to download and extract t10k-labels-idx1-ubyte.gz...
  File "/tmp/mnist/train-images-idx3-ubyte.gz" already exists, skipping downloading.
  File "/tmp/mnist/train-labels-idx1-ubyte.gz" already exists, skipping downloading.
  File "/tmp/mnist/t10k-images-idx3-ubyte.gz" already exists, skipping downloading.
  File "/tmp/mnist/t10k-labels-idx1-ubyte.gz" already exists, skipping downloading.
Extracting archive "/tmp/mnist/t10k-labels-idx1-ubyte.gz" to "/tmp/mnist/t10k-labels-idx1-ubyte"...

See how the handling of train-images-idx3-ubyte.gz triggers the 4 downloads, and then the other 3 re-check for the 4 files.

With my proposed patch:

Download directory /tmp/mnist/ does not exists. Creating....
Attempting to download and extract train-images-idx3-ubyte.gz...
- Downloading from file from http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz and saving to file as: /tmp/mnist/train-images-idx3-ubyte.gz
9912422 / 9912422 ╢==================================================================================================================================================================================================╟ 100.00 % 46574828.67/s
 Extracting archive "/tmp/mnist/train-images-idx3-ubyte.gz" to "/tmp/mnist/train-images-idx3-ubyte"...
Attempting to download and extract train-labels-idx1-ubyte.gz...
- Downloading from file from http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz and saving to file as: /tmp/mnist/train-labels-idx1-ubyte.gz
28881 / 28881 ╢========================================================================================================================================================================================================╟ 100.00 % 714115.69/s
 Extracting archive "/tmp/mnist/train-labels-idx1-ubyte.gz" to "/tmp/mnist/train-labels-idx1-ubyte"...
Attempting to download and extract t10k-images-idx3-ubyte.gz...
- Downloading from file from http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz and saving to file as: /tmp/mnist/t10k-images-idx3-ubyte.gz
1648877 / 1648877 ╢==================================================================================================================================================================================================╟ 100.00 % 16212662.04/s
 Extracting archive "/tmp/mnist/t10k-images-idx3-ubyte.gz" to "/tmp/mnist/t10k-images-idx3-ubyte"...
Attempting to download and extract t10k-labels-idx1-ubyte.gz...
- Downloading from file from http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz and saving to file as: /tmp/mnist/t10k-labels-idx1-ubyte.gz
4542 / 4542 ╢==========================================================================================================================================================================================================╟ 100.00 % 111722.52/s
 Extracting archive "/tmp/mnist/t10k-labels-idx1-ubyte.gz" to "/tmp/mnist/t10k-labels-idx1-ubyte"...

Each archive is checked once in their loop iteration.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.