Giter Site home page Giter Site logo

germaparl2's People

Contributors

christophleonhardt avatar

Watchers

 avatar  avatar

germaparl2's Issues

Wrong attribution of party in Session #1 1949

The first speaker in the first session of the first legislative period should be a CDU speaker. But running this kwic query, we get a SPD speaker.

library(polmineR)

corpus("GERMAPARL2") %>% 
  subset(protocol_date == "1949-09-07") %>% 
  kwic(query = "Zukunft", s_attributes = c("protocol_date", "speaker_party"))

Yielding the result:

protocol_date speaker_party left node right
1949-09-07 SPD Gesetz unseres gesetzgeberischen Handelns in Zukunft sein . Geistige und politische
1949-09-07 SPD für eine glücklichere Entwicklung der Zukunft schöpfen wird . Lassen Sie

This is the XML of the protocol:
https://github.com/PolMine/GermaParlTEI/blob/main/01/BT_01_001.xml

This is the pdf:
https://dserver.bundestag.de/btp/01/01001.pdf

The term "Zukunft" ist part of a speech of "Präsident Dr. Köhler".

This is the Wikipedia entry for Erich Köhler (CDU): https://de.wikipedia.org/wiki/Erich_Köhler#:~:text=Erich%20Köhler%20(*%2027.,erster%20Präsident%20des%20Deutschen%20Bundestages.

In the XML, we can see this line: Dr. Köhler is not recognized here as a speaker:
https://github.com/PolMine/GermaParlTEI/blob/d81fdf431efec3dbc0fb007993b9c936a84d1600/01/BT_01_001.xml#L175

The end of debates is missed in a number of protocols

Issue

Occasionally, the end of the debates is not recognized. In consequence, appendices or speeches which weren't held during the debate but entered into the minutes later are accidentally included in the final TEI and thus in the CWB corpus. This does not conform to the regular approach to only include speeches held during the debate. The extent of this extra content varies widely between protocols, from single lines to multiple additional speeches.

Example of the Issue

A preliminary analysis of this issue suggests that this is the case in quite a few sessions in the 2nd, the 17th and in particular in the 18th legislative period. It still occurs in other legislative periods as well, albeit to a lesser extent.

As an example see protocol 18/200.

Discussion

At first glance, there seem to be multiple reason for this:

  • missing line breaks in the original file. In this case, the line containing the comment that indicates that the session ends does not end with the comment itself and the regular expression does not match.
  • there are patterns for the end-of-debate comment which are plausible, but have been missed before.
  • there are remaining encoding issues. In this case, the regular expression which is used to detect the end of the debate does not match because the apparent whitespace is no whitespace that is matched by "\s".

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.