Giter Site home page Giter Site logo

micro-portraits's Introduction

micro-portraits

This repository contains code to extract microportraits.

Language covered: Dutch

Required input: NAF files containing the following layers:

  • terms
  • deps
  • coreference (entities)

Output: csv files with descriptions. Descriptions for the same entity share an identifier.

This work is described in:

Fokkens, Antske, Nel Ruigrok, Camiel J. Beukeboom, Gagestein Sarah, and Wouter Van Attveldt. "Studying Muslim Stereotyping through Microportrait Extraction." In LREC. 2018.

Running the code:

python -m microportraits inputfile.naf > outputfile.csv

micro-portraits's People

Contributors

antske avatar vanatteveldt avatar

Stargazers

ABW avatar  avatar

Watchers

James Cloos avatar Emiel van Miltenburg avatar  avatar Ruben Izquierdo avatar piek avatar Minh Le avatar  avatar Paul Huygen avatar Marten Postma avatar R.H. Segers avatar  avatar  avatar  avatar

micro-portraits's Issues

Error extracting MP

Error while extracting. this is the error message, underneath the NAF

OSError: Error reading file '/tmp/tmp/Rtmp8Vc6UD/file42b4634a88.naf': failed to load external entity "/tmp/tmp/Rtmp8Vc6UD/file42b4634a88.naf"
nel@toob:~/micro-portraits$ ~/micro-portraits/env/bin/python -m microportraits --no-coref /tmp/Rtmp8Vc6UD/file42b4634a88.naf > /tmp/check.csv
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/nel/micro-portraits/microportraits/main.py", line 2, in
main()
File "/home/nel/micro-portraits/microportraits/microportraits.py", line 1499, in main
extract_microportraits(args.inputfile, sys.stdout, args.surface, not args.no_coref)
File "/home/nel/micro-portraits/microportraits/microportraits.py", line 1469, in extract_microportraits
sentence_level_portraits = extract_sentence_level_portraits(nafobj)
File "/home/nel/micro-portraits/microportraits/microportraits.py", line 1263, in extract_sentence_level_portraits
term_portrait = extract_sentence_portrait(nafobj, term)
File "/home/nel/micro-portraits/microportraits/microportraits.py", line 1163, in extract_sentence_portrait
get_activity_relations(nafobj, term_portrait)
File "/home/nel/micro-portraits/microportraits/microportraits.py", line 1009, in get_activity_relations
investigate_relations(nafobj, tid, term_portrait)
File "/home/nel/micro-portraits/microportraits/microportraits.py", line 970, in investigate_relations
if duplicate_heads(heads):
File "/home/nel/micro-portraits/microportraits/microportraits.py", line 880, in duplicate_heads
refence_rel = heads[0][1]
IndexError: list index out of range

Hereby NAF:
naf.zip

Heeft gezegd dat dubbel in MP

Dat heeft minister Blok gezegd in het radioprogramma Nieuws en Co op NPO Radio 1
issue van maken dat hij voltooid deelwoord versie maakt en de hebben + vd versie
Bovendien lijkt het zo te zijn dat bij de versie waar het zeggen is geworden (ipv heeft gezegd) het hd/obj1 voor DAT wel goed wordt aangegeven. In de versie heeft gezegd is dat een constituent geworden.
Zie voor artikel
https://amcat.nl/navigator/projects/1/articlesets/77805/191539176/

Naf:
naf.zip
Bestand:
mp_debug.zip

output option for "long" format (one word per line)

Current output places multiple words on a line, which can make it difficult to match with information on terms. Would it be possible to make the output in long/tidy/tokenlist/one-word-per-line, giving mention id and role of a word:

portret id mention id rol woord term_id woord rol
17 17.1 Tijdens-rol bewindsman t0 head
17 17.1 Tijdens-rol beduusd t1 direct modifier
17 17.1 Tijdens-rol nog t3 constituent
17 17.1 Tijdens-rol van t4 constituent
17 17.1 Tijdens-rol dag t5 constituent

Error: myhead = heads[0] gives IndexError: list index out of range

About 10% of documents produce this error:

~/micro-portraits/env/bin/python -m microportraits /tmp/RtmpuddLVE/file386e3f8880f8.naf > /tmp/RtmpuddLVE/file386e39b89bb7.csv
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/nel/micro-portraits/microportraits/__main__.py", line 2, in <module>
    main()
  File "/home/nel/micro-portraits/microportraits/microportraits.py", line 1394, in main
    extract_microportraits(args.inputfile, sys.stdout, args.surface)
  File "/home/nel/micro-portraits/microportraits/microportraits.py", line 1367, in extract_microportraits
    sentence_level_portraits = extract_sentence_level_portraits(nafobj)
  File "/home/nel/micro-portraits/microportraits/microportraits.py", line 1161, in extract_sentence_level_portraits
    term_portrait = extract_sentence_portrait(nafobj, term)
  File "/home/nel/micro-portraits/microportraits/microportraits.py", line 1061, in extract_sentence_portrait
    get_activity_relations(nafobj, term_portrait)
  File "/home/nel/micro-portraits/microportraits/microportraits.py", line 908, in get_activity_relations
    investigate_relations(nafobj, tid, term_portrait)
  File "/home/nel/micro-portraits/microportraits/microportraits.py", line 888, in investigate_relations
    analyze_coord_relations(nafobj, head_rel[0], term_portrait)
  File "/home/nel/micro-portraits/microportraits/microportraits.py", line 832, in analyze_coord_relations
    myhead = heads[0]
IndexError: list index out of range

Example document: naf.zip

extracting error

While extracting the portrets based on the nafs we see this error: about 1/10

/tmp/RtmpAu7A30/file1958137fa441.naf > /tmp/RtmpAu7A30/file19583b234aac.csv
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/nel/micro-portraits/microportraits/main.py", line 2, in
main()
File "/home/nel/micro-portraits/microportraits/microportraits.py", line 1500, in main
extract_microportraits(args.inputfile, sys.stdout, args.surface, not args.no_coref)
File "/home/nel/micro-portraits/microportraits/microportraits.py", line 1470, in extract_microportraits
sentence_level_portraits = extract_sentence_level_portraits(nafobj)
File "/home/nel/micro-portraits/microportraits/microportraits.py", line 1264, in extract_sentence_level_portraits
term_portrait = extract_sentence_portrait(nafobj, term)
File "/home/nel/micro-portraits/microportraits/microportraits.py", line 1164, in extract_sentence_portrait
get_activity_relations(nafobj, term_portrait)
File "/home/nel/micro-portraits/microportraits/microportraits.py", line 1010, in get_activity_relations
investigate_relations(nafobj, tid, term_portrait)
File "/home/nel/micro-portraits/microportraits/microportraits.py", line 970, in investigate_relations
if duplicate_heads(heads):
File "/home/nel/micro-portraits/microportraits/microportraits.py", line 880, in duplicate_heads
refence_rel = heads[0][1]
IndexError: list index out of range
Using ',' as decimal

Parsing failures?

Geen idee, maar een op 10 geeft deze output bij het draaien van de nafs in R

/tmp/RtmpTqUbHM/file5d5e118e3ef6.naf > /tmp/RtmpTqUbHM/file5d5e18cad741.csv
Using ',' as decimal and '.' as grouping mark. Use read_delim() for more control.
Warning: 1805 parsing failures.
row col expected actual file
1 -- 1 columns 8 columns '/tmp/RtmpTqUbHM/file5d5e18cad741.csv'
2 -- 1 columns 8 columns '/tmp/RtmpTqUbHM/file5d5e18cad741.csv'
3 -- 1 columns 8 columns '/tmp/RtmpTqUbHM/file5d5e18cad741.csv'
4 -- 1 columns 8 columns '/tmp/RtmpTqUbHM/file5d5e18cad741.csv'
5 -- 1 columns 8 columns '/tmp/RtmpTqUbHM/file5d5e18cad741.csv'
... ... ......... ......... ......................................
See problems(...) for more details.

for dep in head2deps.get(head_id): TypeError: 'NoneType' object is not iterable

Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/nel/micro-portraits/microportraits/main.py", line 2, in
main()
File "/home/nel/micro-portraits/microportraits/microportraits.py", line 1394, in main
extract_microportraits(args.inputfile, sys.stdout, args.surface)
File "/home/nel/micro-portraits/microportraits/microportraits.py", line 1367, in extract_microportraits
sentence_level_portraits = extract_sentence_level_portraits(nafobj)
File "/home/nel/micro-portraits/microportraits/microportraits.py", line 1161, in extract_sentence_level_portraits
term_portrait = extract_sentence_portrait(nafobj, term)
File "/home/nel/micro-portraits/microportraits/microportraits.py", line 1048, in extract_sentence_portrait
add_rows_for_description(dep[0],nafobj,head2deps,term_portrait,'property')
File "/home/nel/micro-portraits/microportraits/microportraits.py", line 1014, in add_rows_for_description
for dep in head2deps.get(head_id):
TypeError: 'NoneType' object is not iterable

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.