recski / brise-plandok Goto Github PK
View Code? Open in Web Editor NEWInformation extraction from text documents of the zoning plan of the City of Vienna
License: MIT License
Information extraction from text documents of the zoning plan of the City of Vienna
License: MIT License
This command works fine if data/gold*
files don't exist:
python create_dataset.py -d ~/sandbox/brise-nlp/annotation/2021_09/full_data -g fourlang -o -n gold
But if I rerun it to regenerate the graphs, the same command fails with this error:
/home/recski/miniconda3/envs/brise/lib/python3.7/site-packages/pandas/core/indexing.py:845: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self.obj[key] = _infer_fill_value(value)
/home/recski/miniconda3/envs/brise/lib/python3.7/site-packages/pandas/core/indexing.py:966: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self.obj[item] = s
Emptying the data
folder solves the problem.
@Eszti please have a look when you can
Add attributes gold_exists
and gold_attributes
NOTE: this issue was detected on dev
Executing
python brise_plandok/convert.py \
-i XLSX \
-if ~/research/data/brise/ann/$1.xlsx \
-o JSON \
-of ~/research/data/brise/ann/$1.json
results in
Traceback (most recent call last):
File "/home/eszter/research/brise-plandok/brise_plandok/convert.py", line 391, in <module>
main()
File "/home/eszter/research/brise-plandok/brise_plandok/convert.py", line 387, in main
converter.convert(input_stream, output_stream)
File "/home/eszter/research/brise-plandok/brise_plandok/convert.py", line 358, in convert
self.write(doc, output_stream)
File "/home/eszter/research/brise-plandok/brise_plandok/convert.py", line 348, in write
self.write_json(doc, stream)
File "/home/eszter/research/brise-plandok/brise_plandok/convert.py", line 324, in write_json
stream.write(json.dumps(doc))
AttributeError: 'str' object has no attribute 'write'
See label changes here: https://github.com/recski/brise-nlp/issues/49 (Changes TBD in brise-plandok)
The following preprocessing step leaves the sections.sens.text attribute null if nlp_cache.json is used.
python brise_plandok/plandok.py sample_data/txt/*.txt > sample_data/json/sample.jsonl
If the cache file is deleted, and therefore regenerated on the next run, the json output is complete.
Two possible reasons identified: read_dataset
expects a list but gets a numpy array, also potato now needs data files to have a .pickle extension. But using pickle is now deprecated, I suggest updating create_dataset.py so that it uses the new Dataset class in potato to save the data as csv.
see #13 (comment)
After executing pip install .
I have the following line in my .bash_profile
:
export ALTO_JAR=/home/eszter/tuw_nlp_resources/alto-2.3.6-SNAPSHOT-all.jarexport ALTO_JAR=/home/eszter/tuw_nlp_resources/alto-2.3.6-SNAPSHOT-all.jarexport ALTO_JAR=/home/eszter/tuw_nlp_resources/alto-2.3.6-SNAPSHOT-all.jar
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.