varun196 / knowledge_graph_from_unstructured_text Goto Github PK
View Code? Open in Web Editor NEWBuilding knowledge graph from input data
Building knowledge graph from input data
Traceback (most recent call last):
File "knowledge_graph.py", line 292, in <module>
main()
File "knowledge_graph.py", line 287, in main
doc = resolve_coreferences(doc,stanford_core_nlp_path,named_entities,verbose)
File "knowledge_graph.py", line 217, in resolve_coreferences
result = coref_obj.resolve_coreferences(corefs,doc,ner,verbose)
File "knowledge_graph.py", line 200, in resolve_coreferences
replaced_sent = words[i] + " "+ replaced_sent
IndexError: list index out of range
Data file added for reproducing the error
input_data (1).txt
Primary analysis suggests: The file has tokens like:
" North-East", and "third-largest", stanford tokenizer for coreference splits across hyphen, while nltk does does not. So, as per , nltk the token length of corresponding sentence is 37, which does not match co-reference indices (with 41 tokens) ['North', '-','East',third','-','largest']
Hello all,
Can someone help me please with this error
Traceback (most recent call last): File "relation_extractor.py", line 25, in <module> Stanford_Relation_Extractor() File "relation_extractor.py", line 16, in Stanford_Relation_Extractor p = subprocess.Popen(['./process_large_corpus.sh',f,f + '-out.csv'], stdout=subprocess.PIPE) File "/usr/lib/python3.6/subprocess.py", line 729, in __init__ restore_signals, start_new_session) File "/usr/lib/python3.6/subprocess.py", line 1364, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: './process_large_corpus.sh': './process_large_corpus.sh'
I get this error when I run the 'relation_extraction.py' file. There is no process_large_corpus.sh file in the repo.
Any help or hint will be appreciated.
Thanks very much
If tokenizing the input still retains a ',' (e.g. "$80,000"), program will not work complaining there are 4 fields instead of 3. Refer #1 .
I can run the first and second command successfully, but get error when i working
python3 create_structured_csv.py
The error is like below:
input_data
Traceback (most recent call last):
File "create_structured_csv.py", line 58, in <module>
main()
File "create_structured_csv.py", line 30, in main
df = pd.read_csv(curr_dir +"/data/output/kg/"+file_name+".txt-out.csv")
File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 702, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 435, in _read
data = parser.read(nrows)
File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 1139, in read
ret = self._engine.read(nrows)
File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 1995, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 899, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 914, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 968, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 955, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2172, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 3 fields in line 30, saw 4
create_structured_csv.py needs input_data.txt-out.csv file to be created in the intermediate process, but the file is not being generated/saved in the process. This program only working if input_data.txt-out.csv file is downloaded from github(already present), which is not the case for new data.
Hello guys,
Can someone please help me with this error. The 'process_large_corpus.sh' file is in the right directory so I do not understand what the issue is..??
Traceback (most recent call last): File "relation_extractor.py", line 27, in <module> Stanford_Relation_Extractor() File "relation_extractor.py", line 17, in Stanford_Relation_Extractor p = subprocess.Popen(['./process_large_corpus.sh',f,f + '-out.csv'], stdout=subprocess.PIPE) File "/usr/lib/python3.6/subprocess.py", line 729, in __init__ restore_signals, start_new_session) File "/usr/lib/python3.6/subprocess.py", line 1364, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: './process_large_corpus.sh': './process_large_corpus.sh'
Thanks very much..!!
When I tried to give input: "John transferred 5000 dollars to Rohan. John committed fraud of 2 million dollars. While going through serious debt he also took loan from Bank Of Baroda. USA allegedly provided John money of $8 million."
I got nothing (result/named_entity_input.csv was empty) after executing create_structured_csv.py (tried using both (optimized verbose nltk spacy" and (spacy))
Can I know what pattern does this module capture in a paragraph
Or what kind/structure of input is optimal for this module to handle them.
Btw, I really liked your approach and thanks in advance !
when I try to use relation extractor I get this error
Traceback (most recent call last):
File "C:\Users\Siyavash\Desktop\maliheh\knowledge_graph_from_unstructured_text\relation_extractor.py", line 28, in
Stanford_Relation_Extractor()
File "C:\Users\Siyavash\Desktop\maliheh\knowledge_graph_from_unstructured_text\relation_extractor.py", line 19, in Stanford_Relation_Extractor
p = subprocess.Popen(['./process_large_corpus.sh',f,f + '-out.csv'], stdout=subprocess.PIPE)
File "C:\Users\Siyavash\anaconda3\envs\mynlp\lib\subprocess.py", line 951, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\Siyavash\anaconda3\envs\mynlp\lib\subprocess.py", line 1420, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
OSError: [WinError 193] %1 is not a valid Win32 application
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.