Comments (9)
For extractive, batch_size is maximum number of sentences in the source document
For abstractive, batch_size is maximum number of tokens in the target summary
It is designed to use the memory more effectively.
from presumm.
As I understood so far :
Each step, the Trainer
class run accum_count
mini-batch.
For each mini-batch, there is X samples, where X vary following :
X * max(len(x) for x in X) <= batch_size
Please let me know if I understood right !
I think some comments on this function :
PreSumm/src/models/data_loader.py
Lines 112 to 124 in fa69433
might be helpful !
from presumm.
Then, what is the exactly number of real(original mean) batch size?
from presumm.
Batchsize (by its traditional definition) is not a fixed number here. This is designed to use the memory much more efficiently than using a fixed number.
from presumm.
@jdh3577 As I said, this is dynamic during training.
from presumm.
@nlpyang would decreasing the batch size to 512 in Extractive summarization affect performance?
from presumm.
Hi, Does the batchsize here have something to do with the number of GPU, it uses the distributed training, how does the model update its parameters? is it all gpus gradient merge and then update?
from presumm.
if self.grad_accum_count > 1:
if self.n_gpu > 1:
grads = [p.grad.data for p in self.model.parameters()
if p.requires_grad
and p.grad is not None]
distributed.all_reduce_and_rescale_tensors(
grads, float(1))
for o in self.optims:
o.step()
maybe the code here
from presumm.
@nlpyang could you please shed some light on the meaning of this parameter, it clearly isn't the number of documents in the batch, but something related to the number of word-pieces multiplied by a funny factor 300. Is the latter a typo or a magic number inserted on purpose ?
Thanks
from presumm.
Related Issues (20)
- having [Errno 21] Is a directory: while running train for BertExtAbs
- step 4 converting to simpler json returning asci error
- TypeError: __init__() got an unexpected keyword argument 'temp_dir'
- example_add_guidance.py
- Error when testing BertAbs model HOT 2
- Acc is very low and does not converge during training
- Cannot load model via torch.load HOT 1
- xsum数据集 HOT 1
- data preprocessing: empty 'tgt' text HOT 1
- issue for converting to bert_data HOT 2
- Use pretrained model : train_from HOT 9
- Getting the same sequence for all input candidate in generation
- How to do inference using pretrained bertsum models?
- Training the BERT large extractive model
- How can i know if i download BERT successfully
- bert-base-uncased HOT 2
- error in step 3 HOT 1
- 在运行test模式,BertAbs模型时,遇到了RuntimeError: "index_select_out_cuda_impl" not implemented for 'Float' HOT 1
- How to save the best model? HOT 1
- RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:450
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from presumm.