abhishekkrthakur / bert-entity-extraction Goto Github PK

View Code? Open in Web Editor NEW

122.0 122.0 60.0 9 KB

Python 100.00%

bert-entity-extraction's Introduction

Hi there 👋

I'm a data scientist / machine learning engineer.

bert-entity-extraction's People

Contributors

Stargazers

Watchers

Forkers

iamsantoshkumar souptik1997 codetcode mhmohona zahrasedaghat digitalcompanion smith6036 geg00 tao-physics oostopitre krishna-22 ikestevens sharath-ramachandran brijeshsnm astudnew ssameerr jeriousman dmytrosytro singsang2 ibragim-bad lukasashic kankajan814 zenithexpo davidmilstein boczekbartek xbad fagan2888 pyturn chenmalobani snotani clarkkent0618 cinnqi abtuo sagarjounkani iamakrai bartmnli famasya chetan8000 rusticeel girishgouda16 hercules261188 honggen-zhang arunpatala xiamaozi11 brooney222 vvr-rao xiaochen93 chintagunta85 cara1i jamesallain advit200 hurricanejin rabem00 nuwancw syntheticthinkers mcfatbeard57 matmalone tritam593 jpabcd aakarshsurendra pvmtrang

bert-entity-extraction's Issues

dataset of bert-entity-extraction

Hello
I am interested in "bert-entity-extraction", I want to run your code but I could not find the dataset, I could not find the path of the dataset in your repository?
kindly, could you help me with how I can get the dataset?

Doubt about cuda

Hello,

I have doubts regarding the use of CUDA, to start the computer where I execute the code is an RTX 3050, I don't know if this allows me to use the device that you propose in your code or not? In case it can't, it can be run, or I can install the cudatoolkit=11 anyway. I am very new to this and I have many questions.

Thanks!

TypeError: dropout(): argument 'input' (position 1) must be Tensor, not str

in version4, the return_dict in model.py must be set to false explicitly.

OSError: Error no file named ['pytorch_model.bin', 'tf_model.h5', 'model.ckpt.index']

Hi:

I am very new to using BERT and just followed along on your youtube.com video describing how to build and entity extraction workflow. I downloaded the uncased_L-12_H-768_A-12 BERT model to use along with the tutorial; however, when I run the train.py script I get the following error:

OSError: Error no file named ['pytorch_model.bin', 'tf_model.h5', 'model.ckpt.index']

Would you be able to direct me on how to overcome?

v/r,
L

i keep getting an Error code

hello i am trying to study abhishek's code but keep getting an error, at first i tried to modify the training function and validation function such that the argument for the _, _, loss = model() accepts all the parameters explicitly, but that still gave me the same error, please note that the error arises without any modification whatso ever.
here's my notebook (NOTE: it's the exact same code provided):

TypeError                                 Traceback (most recent call last)
<ipython-input-34-2952253b6321> in <module>()
     87         optimizer,
     88         device,
---> 89         scheduler
     90     )
     91     test_loss = eval_fn(

5 frames
<ipython-input-26-42378f43f4c8> in train_fn(data_loader, model, optimizer, device, scheduler)
     16 
     17     optimizer.zero_grad()
---> 18     _, _, loss = model(ids = ids, mask=mask, token_type_ids = token_type_ids, target_pos = target_pos, target_tag = target_tag)
     19     loss.backward()
     20     optimizer.step()

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

<ipython-input-27-6b85467d75fd> in forward(self, ids, mask, token_type_ids, target_pos, target_tag)
     39         )
     40 
---> 41         bo_tag = self.bert_drop_1(o1)
     42         bo_pos = self.bert_drop_2(o1)
     43 

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/dropout.py in forward(self, input)
     56 
     57     def forward(self, input: Tensor) -> Tensor:
---> 58         return F.dropout(input, self.p, self.training, self.inplace)
     59 
     60 

/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in dropout(input, p, training, inplace)
    981     return (_VF.dropout_(input, p, training)
    982             if inplace
--> 983             else _VF.dropout(input, p, training))
    984 
    985 

TypeError: dropout(): argument 'input' (position 1) must be Tensor, not str

Purpose of loss_fn()?

Hello Abhishek and my fellow NLP enthusiasts , i have a question, i read the documentation for BERTmodel and as far as i know it already returns crossentropy loss by default, i dont understand the reason of creating a loss_fn() for the purpose. Also, attention mask already handles the padded sequence so it allows to calculate loss only on the non padded tokens. Can someone let know why a separate function is created here. Sorry if I am missing something :)

error while loading model.bin

Hi anyone faced below error while loading model.bin


---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
[/usr/lib/python3.7/pickle.py](https://localhost:8080/#) in load_persid(self)
   1119         try:
-> 1120             pid = self.readline()[:-1].decode("ascii")
   1121         except UnicodeDecodeError:

UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 63: ordinal not in range(128)

During handling of the above exception, another exception occurred:

UnpicklingError                           Traceback (most recent call last)
4 frames
[/usr/lib/python3.7/pickle.py](https://localhost:8080/#) in load_persid(self)
   1121         except UnicodeDecodeError:
   1122             raise UnpicklingError(
-> 1123                 "persistent IDs in protocol 0 must be ASCII strings")
   1124         self.append(self.persistent_load(pid))
   1125     dispatch[PERSID[0]] = load_persid

UnpicklingError: persistent IDs in protocol 0 must be ASCII strings

Getting ValueError while running through the code

Following is the Traceback for your reference :

Traceback (most recent call last):
File "train.py", line 102, in
train_loss = engine.train_fn(train_data_loader, model, optimizer, device, scheduler)
File "/home/jeetkarshus/Notebooks/v1/Python_Scripts/src/engine.py", line 8, in train_fn
for data in tqdm(data_loader, total=len(data_loader)):
File "/home/jeetkarshus/anaconda3/envs/NER/lib/python3.6/site-packages/tqdm/std.py", line 1104, in iter
for obj in iterable:
File "/home/jeetkarshus/anaconda3/envs/NER/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 801, in next
return self._process_data(data)
File "/home/jeetkarshus/anaconda3/envs/NER/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
data.reraise()
File "/home/jeetkarshus/anaconda3/envs/NER/lib/python3.6/site-packages/torch/_utils.py", line 369, in reraise
raise self.exc_type(msg)
ValueError: Caught ValueError in DataLoader worker process 2.
Original Traceback (most recent call last):
File "/home/jeetkarshus/anaconda3/envs/NER/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/jeetkarshus/anaconda3/envs/NER/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/jeetkarshus/anaconda3/envs/NER/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/jeetkarshus/Notebooks/v1/Python_Scripts/src/dataset.py", line 27, in getitem
add_special_tokens=False
File "/home/jeetkarshus/anaconda3/envs/NER/lib/python3.6/site-packages/transformers/tokenization_utils_base.py", line 1425, in encode
**kwargs,
File "/home/jeetkarshus/anaconda3/envs/NER/lib/python3.6/site-packages/transformers/tokenization_utils_base.py", line 1737, in encode_plus
**kwargs,
File "/home/jeetkarshus/anaconda3/envs/NER/lib/python3.6/site-packages/transformers/tokenization_utils.py", line 454, in _encode_plus
first_ids = get_input_ids(text)
File "/home/jeetkarshus/anaconda3/envs/NER/lib/python3.6/site-packages/transformers/tokenization_utils.py", line 442, in get_input_ids
f"Input {text} is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers."
ValueError: Input nan is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers.

I tried using smaller batch size but still the same issue.

abhishekkrthakur / bert-entity-extraction Goto Github PK

bert-entity-extraction's Introduction

Hi there 👋

bert-entity-extraction's People

Contributors

Stargazers

Watchers

Forkers

bert-entity-extraction's Issues

dataset of bert-entity-extraction

Doubt about cuda

TypeError: dropout(): argument 'input' (position 1) must be Tensor, not str

OSError: Error no file named ['pytorch_model.bin', 'tf_model.h5', 'model.ckpt.index']

i keep getting an Error code

Purpose of loss_fn()?

error while loading model.bin

Getting ValueError while running through the code

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent