Comments (2)
Variable Batch Size
ok, I implemented variable batch size. Let me know if this is what you're looking for.. I'm a little confused. Example:
from torchsample import TensorDataset
import torch
import numpy as np
x = torch.ones(10,1,30,30)
y = torch.from_numpy(np.arange(10))
loader = TensorDataset(x, y, batch_size=2)
xbatch, ybatch = loader.next_batch() # uses default batch_size=2
print(ybatch.numpy())
# [out] : [0, 1]
xbatch, ybatch = loader.next_batch(3) # uses batch_size=3 for this batch only
print(ybatch.numpy())
# [out] : [2, 3, 4]
xbatch, ybatch = loader.next_batch() # goes back to default
print(ybatch.numpy())
# [out] : [5, 6]
Not sure what you mean by epoch agnostic? If you call loader.next_batch()
it will continuously sample from the data without stopping -- note this is unique to torchsample
and isnt possible in pytorch
-- and will reset the iterator and shuffle again when it reaches the end of the data. For example, this next example is totally valid and could be used without any epoch loop to sample a fixed number of batches instead:
for i in range(10000):
xbatch, ybatch = loader.next_batch()
Check the following to see it re-shuffles each epoch as well:
loader = TensorDataset(x, y, batch_size=2, shuffle=True)
# 10 loops = 2 epochs (n=10 / batch_size = 2 --> 5 loops per "epoch")
for i in range(10):
if i == 5:
print('\n')
xbatch, ybatch = loader.next_batch()
print(ybatch.numpy())
#[out]:
#[5 9]
#[7 2]
#[4 1]
#[6 8]
#[3 0]
#[8 5]
#[2 1]
#[9 6]
#[0 7]
#[3 4]
I'm not going to include an epoch counter but you can easily check how many passes the sampler has done through the data using the loader.batches_seen
counter.
The only way it'll stop at the end of the data is if you use the sampler directly as an iterator:
# for loop will return after one pass through the data
for xbatch, ybatch in sampler:
pass
Stratified Sampling
Ok, I think I implemented what you wanted but it adds a scitkit-learn
dependency (only if you use this class though). It's called StratifiedSampler
and can either be instantiated directly and passed into a dataset, or you can say sampler='stratified'
in the dataset. Here's an example:
import torch
import numpy as np
from torchsample import TensorDataset
x = torch.randn(8,2)
y = torch.from_numpy(np.array([0, 0, 1, 1, 0, 0, 1, 1]))
loader = TensorDataset(x, y, batch_size=2, sampler='stratified')
for xbatch, ybatch in loader:
print(ybatch.numpy())
# [out]:
#[0 1]
#[1 0]
#[1 0]
#[0 1]
or
x = torch.randn(8,2)
y = torch.from_numpy(np.array([0, 0, 1, 1, 0, 0, 1, 1]))
sampler = StratifiedSampler(y, batch_size=2)
loader = TensorDataset(x, y, batch_size=2, sampler=sampler)
And it works for more than two classes:
import torch
import numpy as np
from torchsample import TensorDataset
x = torch.randn(8,2)
y = torch.from_numpy(np.array([0, 0, 1, 1, 2, 2, 3, 3]))
loader = TensorDataset(x, y, batch_size=4, sampler='stratified')
for xbatch, ybatch in loader:
print(ybatch.numpy())
#[out]:
#[0 3 2 1]
#[3 1 2 0]
Here's the scikit-learn reference. Any of these could be used:
http://scikit-learn.org/stable/modules/classes.html#module-sklearn.model_selection
from torchsample.
This is wonderful, thanks.
I was not aware that you reset the iterator internally, I must have overlooked that when I skimmed the code.
One last thing though, it's minor but if you ask for a batch size greater than the number of samples left before a reset it will return an incomplete batch - all it has left over. Maybe have a flag that would guarantee batch size by watching for that special case?
Also there is some odd behavior: (basically bugs)
x = torch.ones(10,1,30,30)
y = torch.from_numpy(np.arange(10))
loader = TensorDataset(x, y, batch_size=3)
xbatch, ybatch = loader.next_batch(5)
print(ybatch.numpy())
xbatch, ybatch = loader.next_batch(7)
print(ybatch.numpy())
xbatch, ybatch = loader.next_batch()
print(ybatch.numpy())
last fetch causes an error
anaconda/lib/python2.7/site-packages/torchsample/dataset_iter.py", line 151, in __next__
raise StopIteration
StopIteration
error goes away with TensorDataset(x, y, batch_size=5) for some reason.
The following
x = torch.ones(10,1,30,30)
y = torch.from_numpy(np.arange(10))
loader = TensorDataset(x, y, batch_size=6)
xbatch, ybatch = loader.next_batch(2)
print(ybatch.numpy())
xbatch, ybatch = loader.next_batch(3)
print(ybatch.numpy())
xbatch, ybatch = loader.next_batch(2)
print(ybatch.numpy())
xbatch, ybatch = loader.next_batch()
print(ybatch.numpy())`
results in
[0 1]
[2 3 4]
[0 1]
[2 3 4 5 6 7]
Not clear why it skipped, in some other scenarios it didn't. Just some peculiar behavior.
from torchsample.
Related Issues (20)
- Binary Classification not working HOT 1
- Using Pre-Trained models
- How to set the learning rate for the optimizer in the ModuleTrainer compile function
- Are there any tutorials and documents for this repo? HOT 1
- How can I use torchsample with vae
- Range Normalization divide by 0?
- is this still under development?
- Pytorch1.3 upgradations welcome??
- Callbacks.ReduceLROnPlateau seems not work for small learning rate
- Can torchsample augment 3D medical image directly? HOT 1
- callback with input data (and targets)
- AlphaPose Project for python don't run on my windows10 (2020)
- Contact HOT 1
- Installing torchsample and ModuleTrainer
- customized loss function HOT 1
- Index error in module_trainer.py HOT 1
- Missing data type conversion for interpolation
- Support for torch Variables
- pip3 install
- Binary Classification Accuracy calculation error
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from torchsample.