caltechlibrary / pubarchiver Goto Github PK
View Code? Open in Web Editor NEWPackage up microPublication.org and other journals for archiving into Portico and PMC
License: Other
Package up microPublication.org and other journals for archiving into Portico and PMC
License: Other
A simple implementation would be to periodically ask the journal API for the list of what's available and compare that to the recorded uploads, then email someone if there's a discrepancy.
The volume number portion of the PMC file names is set based on the current year, while it's supposed to be set based on the year of the article's publication.
If datacite.org returns no content for a given URL, you get an exception:
microarchiver messages.py error(): No content found for https://api.datacite.org/dois/10.17912/micropub.biology.000167
Traceback (most recent call last):
File "/home/mhucka/system/lib/python3.6/site-packages/microarchiver/__main__.py", line 186, in main
MainBody(source, after, output_dir, do_zip, report, get_xml, preview, say).run()
File "/home/mhucka/system/lib/python3.6/site-packages/microarchiver/__main__.py", line 283, in run
self.write_articles(dest_dir, articles)
File "/home/mhucka/system/lib/python3.6/site-packages/microarchiver/__main__.py", line 394, in write_articles
xml = self._metadata_xml(article)
File "/home/mhucka/system/lib/python3.6/site-packages/microarchiver/__main__.py", line 421, in _metadata_xml
raise error
microarchiver.exceptions.NoContent: No content found for https://api.datacite.org/dois/10.17912/micropub.biology.000167
This should be handled more gracefully.
Request via email:
Would it be possible to have a different output or output format so that I can find information in the attachment without having to launch another app? Currently, when I click on the report, the columns are too narrow to see the specific article DOI value, so I have to open it in sheets or scroll over to see the link. While it seems minor, I am logging these data in another sheet so I can track the article through to pubmed upload, which means I'm entering things on another sheet --the less mousing I have to do the better.
The use of tail
at this line in the code,
is very suspicious. I strongly suspect this should be cat
but it needs more careful analysis.
The mail command at the end of the script,
unconditionally sends mail, even if the mail command at
has sent mail due to a failure. It needs to be rewritten so that only one message is sent.
_save_articles
doesn't test if saving article succeed, and thus can't deal cases where article is in the article.xml but something goes wrong when actual download is attempt. need to catch the error, then write exceptions so they can be tried next time
The microPublication team requested the ability to write to a Google sheet the info about what has been uploaded.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.