I'm a data scientist / machine learning engineer.
bert-entity-extraction's Introduction
bert-entity-extraction's People
Forkers
iamsantoshkumar souptik1997 codetcode mhmohona zahrasedaghat digitalcompanion smith6036 geg00 tao-physics oostopitre krishna-22 ikestevens sharath-ramachandran brijeshsnm astudnew ssameerr jeriousman dmytrosytro singsang2 ibragim-bad lukasashic kankajan814 zenithexpo davidmilstein boczekbartek xbad fagan2888 pyturn chenmalobani snotani clarkkent0618 cinnqi abtuo sagarjounkani iamakrai bartmnli famasya chetan8000 rusticeel girishgouda16 hercules261188 honggen-zhang arunpatala xiamaozi11 brooney222 vvr-rao xiaochen93 chintagunta85 cara1i jamesallain advit200 hurricanejin rabem00 nuwancw syntheticthinkers mcfatbeard57 matmalone tritam593 jpabcd aakarshsurendra pvmtrangbert-entity-extraction's Issues
dataset of bert-entity-extraction
Hello
I am interested in "bert-entity-extraction", I want to run your code but I could not find the dataset, I could not find the path of the dataset in your repository?
kindly, could you help me with how I can get the dataset?
Doubt about cuda
Hello,
I have doubts regarding the use of CUDA, to start the computer where I execute the code is an RTX 3050, I don't know if this allows me to use the device that you propose in your code or not? In case it can't, it can be run, or I can install the cudatoolkit=11 anyway. I am very new to this and I have many questions.
Thanks!
TypeError: dropout(): argument 'input' (position 1) must be Tensor, not str
OSError: Error no file named ['pytorch_model.bin', 'tf_model.h5', 'model.ckpt.index']
Hi:
I am very new to using BERT and just followed along on your youtube.com video describing how to build and entity extraction workflow. I downloaded the uncased_L-12_H-768_A-12 BERT model to use along with the tutorial; however, when I run the train.py script I get the following error:
OSError: Error no file named ['pytorch_model.bin', 'tf_model.h5', 'model.ckpt.index']
Would you be able to direct me on how to overcome?
v/r,
L
i keep getting an Error code
hello i am trying to study abhishek's code but keep getting an error, at first i tried to modify the training function and validation function such that the argument for the _, _, loss = model()
accepts all the parameters explicitly, but that still gave me the same error, please note that the error arises without any modification whatso ever.
here's my notebook (NOTE: it's the exact same code provided):
TypeError Traceback (most recent call last)
<ipython-input-34-2952253b6321> in <module>()
87 optimizer,
88 device,
---> 89 scheduler
90 )
91 test_loss = eval_fn(
5 frames
<ipython-input-26-42378f43f4c8> in train_fn(data_loader, model, optimizer, device, scheduler)
16
17 optimizer.zero_grad()
---> 18 _, _, loss = model(ids = ids, mask=mask, token_type_ids = token_type_ids, target_pos = target_pos, target_tag = target_tag)
19 loss.backward()
20 optimizer.step()
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
<ipython-input-27-6b85467d75fd> in forward(self, ids, mask, token_type_ids, target_pos, target_tag)
39 )
40
---> 41 bo_tag = self.bert_drop_1(o1)
42 bo_pos = self.bert_drop_2(o1)
43
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/dropout.py in forward(self, input)
56
57 def forward(self, input: Tensor) -> Tensor:
---> 58 return F.dropout(input, self.p, self.training, self.inplace)
59
60
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in dropout(input, p, training, inplace)
981 return (_VF.dropout_(input, p, training)
982 if inplace
--> 983 else _VF.dropout(input, p, training))
984
985
TypeError: dropout(): argument 'input' (position 1) must be Tensor, not str
Purpose of loss_fn()?
Hello Abhishek and my fellow NLP enthusiasts , i have a question, i read the documentation for BERTmodel and as far as i know it already returns crossentropy loss by default, i dont understand the reason of creating a loss_fn() for the purpose. Also, attention mask already handles the padded sequence so it allows to calculate loss only on the non padded tokens. Can someone let know why a separate function is created here. Sorry if I am missing something :)
error while loading model.bin
Hi anyone faced below error while loading model.bin
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
[/usr/lib/python3.7/pickle.py](https://localhost:8080/#) in load_persid(self)
1119 try:
-> 1120 pid = self.readline()[:-1].decode("ascii")
1121 except UnicodeDecodeError:
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 63: ordinal not in range(128)
During handling of the above exception, another exception occurred:
UnpicklingError Traceback (most recent call last)
4 frames
[/usr/lib/python3.7/pickle.py](https://localhost:8080/#) in load_persid(self)
1121 except UnicodeDecodeError:
1122 raise UnpicklingError(
-> 1123 "persistent IDs in protocol 0 must be ASCII strings")
1124 self.append(self.persistent_load(pid))
1125 dispatch[PERSID[0]] = load_persid
UnpicklingError: persistent IDs in protocol 0 must be ASCII strings
Getting ValueError while running through the code
Following is the Traceback for your reference :
Traceback (most recent call last):
File "train.py", line 102, in
train_loss = engine.train_fn(train_data_loader, model, optimizer, device, scheduler)
File "/home/jeetkarshus/Notebooks/v1/Python_Scripts/src/engine.py", line 8, in train_fn
for data in tqdm(data_loader, total=len(data_loader)):
File "/home/jeetkarshus/anaconda3/envs/NER/lib/python3.6/site-packages/tqdm/std.py", line 1104, in iter
for obj in iterable:
File "/home/jeetkarshus/anaconda3/envs/NER/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 801, in next
return self._process_data(data)
File "/home/jeetkarshus/anaconda3/envs/NER/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
data.reraise()
File "/home/jeetkarshus/anaconda3/envs/NER/lib/python3.6/site-packages/torch/_utils.py", line 369, in reraise
raise self.exc_type(msg)
ValueError: Caught ValueError in DataLoader worker process 2.
Original Traceback (most recent call last):
File "/home/jeetkarshus/anaconda3/envs/NER/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/jeetkarshus/anaconda3/envs/NER/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/jeetkarshus/anaconda3/envs/NER/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/jeetkarshus/Notebooks/v1/Python_Scripts/src/dataset.py", line 27, in getitem
add_special_tokens=False
File "/home/jeetkarshus/anaconda3/envs/NER/lib/python3.6/site-packages/transformers/tokenization_utils_base.py", line 1425, in encode
**kwargs,
File "/home/jeetkarshus/anaconda3/envs/NER/lib/python3.6/site-packages/transformers/tokenization_utils_base.py", line 1737, in encode_plus
**kwargs,
File "/home/jeetkarshus/anaconda3/envs/NER/lib/python3.6/site-packages/transformers/tokenization_utils.py", line 454, in _encode_plus
first_ids = get_input_ids(text)
File "/home/jeetkarshus/anaconda3/envs/NER/lib/python3.6/site-packages/transformers/tokenization_utils.py", line 442, in get_input_ids
f"Input {text} is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers."
ValueError: Input nan is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers.
I tried using smaller batch size but still the same issue.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.