Comments (23)
I know, but ... the same problem ... my memory is limited .. so ...
PS. I am Chinese
from transformers.
是不是语料的问题,bert是在wiki上训练的。我用kp20k训练了一个mini bert,在测试集上的accuracy目前是80%,你要不要试试用我这个作为encoder?
from transformers.
Hi guys,
I would like to keep the issues of this repository focused on the package it-self.
I also think it's better to keep the conversation in english so everybody can participate.
Please move this conversation to your repository: https://github.com/memray/seq2seq-keyphrase-pytorch or emails.
Thanks, I am closing this discussion.
Best,
from transformers.
have u tried transformer decoder ?instead of rnn decoder.
from transformers.
not yet, I will try. But I think rnn decoder should not be such bad.
from transformers.
not yet, I will try. But I think rnn decoder should not be such bad.
emmm,maybe u should used mean of last layer to initialize decoder, not the last token representation of last layer.
I am also very concerned about the results of using transformer decoder. If you are done, can you tell me? Thank you.
from transformers.
I think the batch size of RNN with BERT is too small. pleas see
https://github.com/memray/seq2seq-keyphrase-pytorch/blob/master/pykp/dataloader.py
line 377-378
from transformers.
I don't know what you mean by giving me this link. I set to 10 really because of the memory problem. Actually, when sentence length is 512, the max batch size is only 5, if it is 6 or bigger there will be memory error for my GPU.
from transformers.
not yet, I will try. But I think rnn decoder should not be such bad.
emmm,maybe u should used mean of last layer to initialize decoder, not the last token representation of last layer.
I am also very concerned about the results of using transformer decoder. If you are done, can you tell me? Thank you.
You are right. Maybe the mean is better, I will try as well. Thanks.
from transformers.
May i ask a question? R u chinese?23333
from transformers.
Cause for one example, it has N targets. We wanna put all targets in the same batch. 10 is too small that the targets of one example would be in different batches probably.
from transformers.
I know, but ... the same problem ... my memory is limited .. so ...
PS. I am Chinese
i am as well hahaha
from transformers.
from transformers.
accuracy 是masklm和nextsentence两个任务的,不是key phrase generation,我没说清楚,抱歉。我的算力有限,两块p100, 快一个月了,目前还没训练完。80%是当前的表现。
from transformers.
你提到的mini bert 是什么意思?
from transformers.
我大概理解你的意思了,你相当于是用kp20重新预训练一个bert,不过这样做... 感觉确实蛮麻烦。
from transformers.
我大概理解你的意思了,你相当于是用kp20重新预训练一个bert,不过这样做... 感觉确实蛮麻烦。
是的,用的是 Junseong Kim的代码:https://github.com/codertimo/BERT-pytorch ,模型规模比谷歌的BERT-Base Uncased都小很多。这个是L-8 H-256 A-8.我把目前训练的checkpoint和vocab文件发给你
from transformers.
但是你这个checkpoint,我的这个版本能直接用吗,还是说我必须装你的那个版本的代码?
from transformers.
你可以发到我邮箱 [email protected] , 谢
from transformers.
但是你这个checkpoint,我的这个版本能直接用吗,还是说我必须装你的那个版本的代码?
可以根据Junseong Kim 的代码创建一个bert model然后加载参数,不一定得安装
from transformers.
好的把。那你把checkpoint 发给我试试。
from transformers.
accuracy 是masklm和nextsentence两个任务的,不是key phrase generation,我没说清楚,抱歉。我的算力有限,两块p100, 快一个月了,目前还没训练完。80%是当前的表现。
你好,能把mini版模型发我一下吗,[email protected],谢谢啦。
from transformers.
hi, @whqwill I have some doubts about the usage manner of bert with RNN.
In bert with RNN method, I see you only consider the last term's representation (I mean the TN's) as the input to RNN decoder, why not use the other term's representation, like T1 to TN-1 ? I think the last term's information is too less to represent all the context information.
from transformers.
Related Issues (20)
- EncoderDecoderModel with XLM-R
- Mamba: which tokenizer has been saved and how to use it? HOT 1
- Create panoptic segmentation task guide
- Error at the generation stage by MusicGen stereo model HOT 3
- Trying to stack tensors from different devices in `_pad_to_max_length` in Whisper batched inference
- [Whisper] Word-level timestamps broken for short-form audio HOT 2
- [BUG] Load StarCoder2 AWQ using Transformers HOT 5
- `import transformers` accidentally initializing both torch and jax/xla at startup time HOT 5
- FSDP Doesn't Work with model.generate() HOT 2
- Nondeterministic behavior from GPT with MPS backend HOT 6
- LlamaRMSNorm() Dtype Casting Error HOT 1
- Trainer do not move the model to GPU when doing evaluation with FSDP
- [i18n-PL] Translating docs to Polish HOT 3
- PEFT models donot "override" user's argument for return_full_text. HOT 2
- There is a probability that a bug will be triggered when tracing the llama model: torch.fx.proxy.TraceError: symbolically traced variables cannot be used as inputs to control flow HOT 1
- Couldn't connect to `https://huggingface.co`. HOT 1
- MPS memory leak?
- BLOOM embeddings should specify padding_idx HOT 5
- Importing `CLIPVisionModelWithProjection` crashes with `AttributeError: 'NoneType' object has no attribute 'dumps'` HOT 1
- 'CLIPEncoder' object has no attribute '_gradient_checkpointing_func' HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformers.