Awesome-Biomolecule-Language-Cross-Modeling: a curated list of resources for paper "Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey"
BC5CDR:1500 PubMed articles with 4409 annotated chemicals, 5818 diseases and 3116 chemical-disease interactions (named entity recognition)
BioCreative V: BC5CDR corpus consists of 1500 PubMed articles with 4409 annotated chemicals, 5818 diseases and 3116 chemical-disease interactions.
Elsevier Corpus: This is a corpus of 40k (40,001) open access (OA) CC-BY articles from across Elsevier’s journals represent the first cross-discipline research of data at this scale to support NLP and ML research.
Europe PMC - Bulk download of full text and SI of > 5 million articles.
...
Thx for your great job. Would it be easier to follow if articles in various fields were arranged in chronological order? If need help, glad to join this project.