This project aims to download and analyze publications that cite affiliation with "The Alan Turing Institute" from DataCite's API and other resources. Note that Zenodo and Arxiv both use DataCite as their DOI providers.
The project is implemented in Python and uses Poetry for dependency management.
- Python 3.x
- Poetry (for dependency management)
-
Clone the Repository
git clone https://github.com/thealanturinginstitute/turing_publications.git
-
Navigate to Project Directory
cd turing_publications
-
Install Poetry If you haven't installed Poetry yet, you can install it by following the instructions here.
-
Install Dependencies
poetry install
-
Activate the Poetry Environment
poetry shell
-
Run the Script
Download data from DataCite:
python src/datacite_api.py
Parse downloaded data into a csv file:
python src/datacite2csv.py
If you would like to contribute, please fork the repository and use a feature branch. Pull requests are warmly welcome.
MIT License. See LICENSE for details.
Downloading data from DataCite works by looking for "Alan Turing Institute" within the text of each record (different systems put affiliations into different places). This seems to download records largely from Zenodo.
Below I'm trying to collect some examples of outputs created by The Alan Turing Institute that are not included, for reference and debugging. This is biased and incomplete:
- Most of arxiv.org papers!
- Some papers from arxiv.org that include the Turing somewhere in the text
- Random example from arxiv.org that came out of the Turing where Turing is only listed as an affiliation in the actual paper
- Some Zenodo outputs: