Comments (12)
I think I use the original type from conll03. I didn't change the format of tag column. It works. Conceptually I think the format of tag column won't affect, it's just a readable format defined by human.
from neuronlp2.
@jk78346 @udion I guess the original tagging type from conll03 works, but converting it to BIO (or more advanced BIOES) can improve the performance.
from neuronlp2.
After some tracing, I add one print line within getNext() @./neuronlp2/io/reader.py, (and another print(inst...) line outside):
class CoNLL03Reader(object):
def __init__(self, file_path, word_alphabet, char_alphabet, pos_alphabet, chunk_alphabet, ner_alphabet):
self.__source_file = open(file_path, 'r')
self.__word_alphabet = word_alphabet
self.__char_alphabet = char_alphabet
self.__pos_alphabet = pos_alphabet
self.__chunk_alphabet = chunk_alphabet
self.__ner_alphabet = ner_alphabet
def close(self):
self.__source_file.close()
def getNext(self, normalize_digits=True):
line = self.__source_file.readline()
print("line = ", line, ", self.__source_file = ", self.__source_file)
# skip multiple blank lines.
while len(line) > 0 and len(line.strip()) == 0:
line = self.__source_file.readline()
if len(line) == 0:
return None
And I got the following terminal result(part only):
Reading data from data/conll2003/NeuroNLP2_sep=s_eng_train.txt
('line = ', '1 EU NNP I-NP I-ORG\n', ', self.__source_file = ', <open file 'data/conll2003/NeuroNLP2_sep=s_eng_train.txt', mode 'r' at 0x7f18ff9af8a0>)
('inst = ', <neuronlp2.io.instance.NERInstance object at 0x7f18f623d750>)
('line = ', '', ', self.__source_file = ', <open file 'data/conll2003/NeuroNLP2_sep=s_eng_train.txt', mode 'r' at 0x7f18ff9af8a0>)
('inst = ', None)
Total number of data: 1
So, does it mean that python build-in readline() function has problem reading second line of a .txt file?
Or most possibly what's wrong with my data format?
from neuronlp2.
It looks wired because the reader only got one training instance from your data.
Would please paste more training instances for me to check your data format?
from neuronlp2.
from neuronlp2.
@XuezheMax The following is part of my train data format:
1 EU NNP I-NP I-ORG
2 rejects VBZ I-VP O
3 German JJ I-NP I-MISC
4 call NN I-NP O
5 to TO I-VP O
6 boycott VB I-VP O
7 British JJ I-NP I-MISC
8 lamb NN I-NP O
9 . . O O
10 Peter NNP I-NP I-PER
11 Blackburn NNP I-NP I-PER
12 BRUSSELS NNP I-NP I-LOC
13 1996-08-22 CD I-NP O
14 The DT I-NP O
15 European NNP I-NP I-ORG
16 Commission NNP I-NP I-ORG
17 said VBD I-VP O
18 on IN I-PP O
19 Thursday NNP I-NP O
20 it PRP B-NP O
21 disagreed VBD I-VP O
22 with IN I-PP O
23 German JJ I-NP I-MISC
24 advice NN I-NP O
25 to TO I-PP O
26 consumers NNS I-NP O
27 to TO I-VP O
28 shun VB I-VP O
29 British JJ I-NP I-MISC
30 lamb NN I-NP O
31 until IN I-SBAR O
32 scientists NNS I-NP O
33 determine VBP I-VP O
34 whether IN I-SBAR O
35 mad JJ I-NP O
36 cow NN I-NP O
37 disease NN I-NP O
38 can MD I-VP O
39 be VB I-VP O
40 transmitted VBN I-VP O
41 to TO I-PP O
42 sheep NN I-NP O
43 . . O O
44 Germany NNP I-NP I-LOC
45 's POS B-NP O
46 representative NN I-NP O
47 to TO I-PP O
48 the DT I-NP O
"./NeuroNLP2_sep=s_eng_train.txt" 204566L, 4589269C
@rsb3060 I'm wondering that does the data format really effect whether the model code can be run or not, as long as it has five columns?
from neuronlp2.
from neuronlp2.
Hi, as this terminal output line indicates:
'1 EU NNP I-NP I-ORG\n'
I put '\n' at the end of each line, and separate each column with one space.
Is this correct?
from neuronlp2.
@jk78346
There should be a break line between two sentences. Otherwise, the reader will treat them as a single one.
The following is the correct format for your examples:
1 EU NNP I-NP I-ORG
2 rejects VBZ I-VP O
3 German JJ I-NP I-MISC
4 call NN I-NP O
5 to TO I-VP O
6 boycott VB I-VP O
7 British JJ I-NP I-MISC
8 lamb NN I-NP O
9 . . O O
1 Peter NNP I-NP I-PER
2 Blackburn NNP I-NP I-PER
3 BRUSSELS NNP I-NP I-LOC
4 1996-08-22 CD I-NP O
...
Moreover, your ner tagging schema is not BIO, please convert it correctly.
@rsb3060
Thank you so much for your answer!
from neuronlp2.
Really appreciate all of your answers. I realize that you mean 'break line between consecutive sentences', not '\n' for each 'line' of this .txt file. Thanks so much. Now it works.
from neuronlp2.
@jk78346 did it work without converting it to BIO?
from neuronlp2.
thanks
from neuronlp2.
Related Issues (20)
- RuntimeError: maximum recursion depth exceeded (python3+torch0.4) HOT 4
- TypeError: invalid file: None ===conllx_stacked_data.create_alphabets(alphabet_path, None,
- No such file or directory: 'data/sskip/sskip.ger.64.gz' && data/sskip/sskip.eng.100.gz && data/conll2003/english/eng.train.bioes.conll HOT 3
- whart is /run_analyze.sh used for, looks missing some files HOT 1
- Time to Compute
- Run a trained model HOT 1
- RuntimeError: maximum recursion depth exceeded HOT 2
- some questions about dataset and f1 score! HOT 1
- Can you give me the data and sskip.eng.100.gz? HOT 2
- Error Analysis
- how to use Elmo or Bert HOT 1
- embedding size HOT 1
- parsing input format HOT 3
- variability in the results HOT 2
- RuntimeError: CUDA error: an illegal instruction was encountered HOT 3
- Interpretation of code variables HOT 2
- Error trying to train a model HOT 2
- AssertionError for word_dim HOT 2
- Unable to find training data HOT 2
- TypeError in Biaffine HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from neuronlp2.