eriktks / find-journalists Goto Github PK

0.0 2.0 0.0 50 KB

License: Apache License 2.0

Shell 2.91% Python 97.09%

find-journalists's Introduction

Lysander: Finding political journalists on Twitter

This directory contains software developed in the pilot of the project Automated Analysis of Online Behaviour on Social Media, a cooperation of the University of Groningen and the Netherlands eScience Center. The main project software repository is called machine learning.

The goal of the project is to analyze tweets of politicians and political journalists. Names of relevant politicians can be collected from documents like parliament member lists and ballots but it is much harder to find the names of relevant journalists. The software in this directory aims to find such journalists by examining the follower links between politicians and other users on Twitter.

Usage

Run like:

python getFollowers.py markrutte sybrandbuma apechtold > getFollowers.out

to collect the people that are followed by the users in your seed list. The script needs your Twitter account data to be stored in a file definitions.py in the format:

# twitter.com authentication keys
token = "???"
token_secret = "???"
consumer_key = "???"
consumer_secret = "???"

Replace the strings "???" with the key information from https://apps.twitter.com , see https://www.slickremix.com/docs/how-to-get-api-keys-and-tokens-for-twitter/ for instructions

In order to find more relevant users, you can run this command after getFollowers.py is finished:

python makevec.py markrutte sybrandbuma apechtold < getFollowers.out > makevec.out 2> makevec.err

It generates a selection of relevant users (makevec.err) and a vector representations for these users (makevec.out)

Input data files for Dutch politicians can be requested from Erik Tjong Kim Sang e.tjong.kim.sang(at)esciencenter.nl

Information added by the Python template

Badges

(Customize these badges with your own links, and check https://shields.io/ or https://badgen.net/ to see which other badges are available.)

fair-software.eu recommendations
(1/5) code repository
(2/5) license
(3/5) community registry
(4/5) citation
(5/5) checklist
howfairis
Other best practices
Static analysis
Coverage
Documentation
GitHub Actions
Build
Metadata consistency
Lint
SonarCloud
MarkDown link checker

How to use find_journalists

The project setup is documented in project_setup.md. Feel free to remove this document (and/or the link to this document) if you don't need it.

Installation

To install find_journalists from GitHub repository, do:

git clone https://github.com/online-behaviour/find-journalists.git
cd find-journalists
python3 -m pip install .

Documentation

Include a link to your project's full documentation here.

Contributing

If you want to contribute to the development of find-journalists, have a look at the contribution guidelines.

Credits

This package was created with Cookiecutter and the NLeSC/python-template.

find-journalists's People

Contributors

Watchers

find-journalists's Issues

Next step: Linting

Linting instructions

Next step: Read the Docs

Your Python package should have publicly available documentation, including API documentation for your users.
Read the Docs can host your user documentation for you.

To host the documentation of this repository please perform the following instructions:

go to Read the Docs
log in with your GitHub account
find online-behaviour/find-journalists in list and press + button.
- If repository is not listed,
  1. go to Read the Docs GitHub app
  2. make sure online-behaviour has been granted access.
  3. reload repository list on Read the Docs import page
wait for the first build to be completed at https://readthedocs.org/projects/find-journalists/builds
check that the link of the documentation badge in the README.md works

See README.dev.md# how to build documentation site locally.

Next step: Sonarcloud integration

Continuous code quality can be handled by Sonarcloud. This repository is configured to use Sonarcloud to perform quality analysis and code coverage report on each push.

In order to configure Sonarcloud analysis GitHub Action workflow you must follow the steps below:

go to Sonarcloud to create a new Sonarcloud project
login with your GitHub account
add Sonarcloud organization or reuse existing one
set up a repository
go to new code definition administration page and select Number of days option
To be able to run the analysis:
1. a token must be created at Sonarcloud account
2. the created token must be added as SONAR_TOKEN to secrets on GitHub

Next step: Citation data

It is likely that your CITATION.cff currently doesn't pass validation. The error messages you get from the cffconvert GitHub Action are unfortunately a bit cryptic, but doing the following helps:

Check if the given-name and family-name keys need updating. If your family name has a name particle like von or van or de, use the name-particle key; if your name has a suffix like Sr or IV, use name-suffix. For details, refer to the schema description: https://github.com/citation-file-format/citation-file-format
Update the value of the orcid key. If you do not have an orcid yet, you can get one here https://orcid.org/.
Add more authors if needed
Update date-released using the YYYY-MM-DD format.
Update the doi key with the conceptDOI for your repository (see https://help.zenodo.org for more information on what a conceptDOI is). If your project doesn't have a DOI yet, you can use the string 10.0000/FIXME to pass validation.
Update the keywords array with some keywords of your own that describe your project.

Once you do all the steps above, the cffconvert workflow will tell you what content it expected to see in .zenodo.json. Copy-paste from the GitHub Action log into a new file .zenodo.json. Afterwards, the cffconvert GitHub Action should be green.

To help you keep the citation metadata up to date and synchronized, the cffconvert GitHub Action checks the following 6 aspects:

Whether your repository includes a CITATION.cff file.

By including this file, authors of the software can receive credit for the work they put in.
Whether your CITATION.cff is valid YAML.

Visit http://www.yamllint.com/ to see if the contents of your CITATION.cff are valid YAML.
Whether your CITATION.cff adheres to the schema (as listed in the CITATION.cff file itself under key cff-version).

The Citation File Format schema can be found here, along with an explanation of all the keys. You're advised to use the latest available schema version.
Whether your repository includes a .zenodo.json file.

With this file, you can control what metadata should be associated with any future releases of your software on Zenodo: things like the author names, along with their affiliations and their ORCIDs, the license under which the software has been released, as well as the name of your software and a short description. If your repository doesn't have a .zenodo.json file, Zenodo will take a somewhat crude guess to assign these metadata.

The cffconvert GitHub action will tell you what it expects to find in .zenodo.json, just copy and paste it to a new file named .zenodo.json. The suggested text ignores CITATION.cff's version, commit, and date-released. cffconvert considers these keys suspect in the sense that they are often out of date, and there is little purpose to telling Zenodo about these properties: Zenodo already knows.
Whether .zenodo.json is valid JSON.

Currently unimplemented, but you can check for yourself on https://jsonlint.com/.
Whether CITATION.cff and .zenodo.json contain equivalent data.

This final check verifies that the two files are in sync. The check ignores CITATION.cff's version, commit, and date-released.

Next step: Zenodo integration

Zenodo integration instructions.