Comments (9)
Hey @marisbasha! Sorry you're running into this issue. It's probably something we haven't explained clearly enough.
If that's the case, where should I download the data from?
Just checking, did you already download the "source" test data as described here?
https://vak.readthedocs.io/en/latest/development/contributors.html#download-test-data
To do that you would run
nox -s test-data-download-source
Just to clarify, I should use my own "toy data" or does running vak prep tests/data_for_tests/configs/ConvEncoderUMAP_train_audio_cbin_annot_notmat.toml generate "toy data"?
You are right that these are basically "toy" datasets, that are as small as possible. I tried to define the two different types in that section on the development set-up page but just in case it's not clear: the "source" data is inputs to vak, like audio and annotation files. You create the other type, the "generated" test data, when you run nox -s test-data-generate
. This "generated" test data consists of (small) prepared datasets and results, some of which are used by the unit tests.
You don't actually need to generate this test data to be able to develop. I just suggested it as a fairly painless way to check that you were able to set up the environment correctly. The script that generates the test data should be able to run to completion without any errors.
I am almost finished with that feature branch that will fix the unit tests so you can run them to test what you are developing. That branch will also speed up the script that generates the test data considerable and reduce the size of the generated test data.
#693
Does that help?
from vak.
Discussed this with @marisbasha and @yardencsGitHub today. Updating here with some thoughts I've had
- We should add as a new model family a
VAEModel
. I would suggest that class look something like theVAExperiment
class here, https://github.com/AntixK/PyTorch-VAE/blob/master/experiment.py#L15 -- as far as training/validation step- We might want to additionally have the methods their base class has,
encode
,decode
, andsample
, for all VAE models: https://github.com/AntixK/PyTorch-VAE/blob/master/models/base.py - let's avoid directly adapting their code though, since a lot of the logic is specific to their framework (and also because the license is Apache)
- We might want to additionally have the methods their base class has,
- Add a model
AVA
that uses as the model familyVAEModel
- Make it so that we can load weights from AVA
- check if there are weights in the shared data, and test on those: https://research.repository.duke.edu/concern/datasets/9k41zf38g?locale=en -- my initial impression is that there are not model weights / checkpoints in the shared dataset, and that it's just the audio data / low-D features possibly?
- if not, use the original repo to generate weights and test that we can load those
- Add an intitial walkthrough / how-to in the docs on using this model
from vak.
I don't think we need this for the initial implementation but noting for future work:
- I think we can use
WindowDataset
for the Shotgun VAE models (which trains on randomly drawn windows)
from vak.
Tentative / rough to-do list for @marisbasha after our meeting today
- initial implementation
- set up vak development enviornment as described here
- to test the set up and make sure you have configs for generating test data (next step), run
nox -s test-data-generate
(this will may crash because I introduced a bug in the parametric UMAP branch--I might have fixed it by the time you start--but it will run enough for you to get the config files you need) - now that you have configs you need, generate a toy dataset to test the VAE model on:
vak prep tests/data_for_tests/configs/ConvEncoderUMAP_train_audio_cbin_annot_notmat.toml
- using that toy dataset, get initial implementation working in a notebook, as described in the next section
- add implementation to vak (these steps work in a notebook too, but this part describes adding them to vak in separate modules)
- declare the VAE specific loss function in
vak.nn.loss.vae
- not clear to me yet if there's something specific to the loss they use, it's defined in the
forward
method instead of a factored out loss function, here -- I would prefer to implement aAVALoss
that verbatim repeats their logic, just to get closer to numerical reproducibility
- not clear to me yet if there's something specific to the loss they use, it's defined in the
- add
vak.models.vae_model.VAEModel
that defines the model family, using thevak.models.model_family
decorator and sub-classingvak.base.Model
. For an example see https://github.com/vocalpy/vak/blob
/a96ff976283ccdc34852fcf2ba5bb51808b6b25e/src/vak/models/frame_classification_model.py#L22- I think we will need a
training_step
andvalidation_step
with logic specific to datasets prepared by vak - we will probably also want VAE specific methods that make it easier to inspect model behavior interactively, e.g.
encode
anddecode
- I think we will need a
- add
vak.nets.ava
with architecture here (slightly refactored, e.g. with for loops?). for an example see https://github.com/vocalpy/vak/blob/main/src/vak/nets/tweetynet.py- I would favor building the model inside
__init__
instead of using a separate method, and I would favor using for loops + separateencoder
+decoder
attributes so that methods likeencode
can just doreturn self.encoder(x)
- see for example the VanillaVAE implementation in Pytorch-VAE: https://github.com/AntixK/PyTorch-VAE/blob/master/models/vanilla_vae.py
- I would favor building the model inside
- add
vak.models.AVA
that defines theAVA
model, using@model(family='VAEModel')
) -- for an example see https://github.com/vocalpy/vak/blob/main/src/vak/models/tweetynet.py - add docstrings
- add unit tests
- declare the VAE specific loss function in
from vak.
@NickleDave I am having trouble with nox -s test-data-generate
. I receive the following error:
NotADirectoryError: Path specified for ``data_dir`` not found: tests/data_for_tests/source/audio_cbin_annot_notmat/gy6or6/032312
Which after inspection I see that tests/data_for_tests/source/
is an empty directory. I checked in the code for gy6or6
, and I saw a script to download it. I put the data inside the audio_cbin_annot_notmat folder, but I get an error that says there's no .not.mat file in the directory, but I cannot find a link to download the data elsewhere.
Just to clarify, I should use my own "toy data" or does running vak prep tests/data_for_tests/configs/ConvEncoderUMAP_train_audio_cbin_annot_notmat.toml
generate "toy data"?
If that's the case, where should I download the data from?
from vak.
Everything fine now. Thanks!
from vak.
🙌 awesome, glad to hear it!
Will ping you here as soon as I get that branched merged, it does fix a couple minor bugs so you'll probably want to git pull
them in along with the fixed tests
from vak.
@NickleDave I have pushed again to my fork the parts divided by file.
I am having trouble configuring the trainer. Could we have a brief discussion?
from vak.
Ah whoops, sorry I missed this @marisbasha.
What you have so far looks great. I am reading through your code now to make sure I understand where you're at.
We can definitely discuss what to do with the trainer when we meet tomorrow.
from vak.
Related Issues (20)
- ENH: Make pre-trained model functionality more flexible, clearer
- ENH: Make it possible to train using WindowDataset with stride option
- ENH: Add `model_path` option to config for eval + predict, use in place of `checkpoint_path`, `spect_scaler`, etc.
- Declare/formalize a model file format
- BUG: vak prep fails for learncurve when train_set_durs and replicates are not in prep section HOT 1
- ENH: Add Params dataclasses for high-level functions
- CLN: Catch audio channels > 1 in prep.frame_classification and raise a clear error
- DOC: Add Yang Carstens Provost 2023 + Koparkar et al 2023 to pubs section of about page HOT 1
- ENH: Add ability to specify function/callable to make spectrogram during prep step
- DEV/ENH: Switch to using tomlkit
- DOC: Better document spectrogram options, e.g. with a how-to + link to API + reference docs
- BUG: Running vak 1.0.0a1 with device set to CPU crashes HOT 6
- Finish Parametric UMAP model + add datasets
- ENH: Refer to 'accelerator' not 'device'
- TST/CLN: Further clean up / refactor how we generate test data
- DOC/CLN: revise/fix API docs for 1.0.0
- Inconsistent syllable error rate between vak eval and predict HOT 15
- BUG: `Cannot convert a MPS Tensor to float64 dtype` on Apple M1 Max HOT 3
- BUG/MAINT: Have all metrics return tensors HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vak.