manubot / rootstock Goto Github PK

View Code? Open in Web Editor NEW

442.0 17.0 164.0 7.33 MB

Clone me to create your Manubot manuscript

Home Page: https://manubot.github.io/rootstock/

License: Other

Shell 13.49% HTML 86.51%

markdown manuscript publishing manubot doi citation continuous-publication

rootstock's People

Contributors

Stargazers

Watchers

Forkers

agitter dhimmel evancofer vsmalladi petebachant resurgo-genetics thuang kmezhoud arielsvn julienhering adebali codeaudit fabi91 kaymmm joshgay shankyty ethanwillis edawson project-renard-survey benjamin-lee shuvro-zz zach-hensel pierreenolivier slochower matthiasmengel michaelmhoffman pythseq mardom yvanlebras nupulmonary dankein rhagenson chayast sunt05 gaybro8777 valavanca aulenbac ruihong000 olgabot emna944 jaquejbrito taylordm jperkel danielderrick nilswellhausen jaybee84 wolfgangdj sq-96 alimanfoo wookietreiber liaojinyue sacdallago aelmas na399 arinbasu adam3smith lihpsg jneubert rando2 ram-z duerrsimon lihuazou jungyo zhongshan2020 sblisesivdin bhushan760 stephanlewandowsky lubianat mbhall88 ctb bkutlu matthewturk suchow klc100 saran-wang spatialmodel qinyuz2 faisal-qadri cee498project1 wanganye123 phildong mohamadimahnaz ahastie2 cr25 bolajilawal fmsabatini gooood-night banyekalaok pankajkarman yanhanl longhuang-hit hanzhs1225 cathyxinchangli swartmilan sgosline kaszanas zhampu whzouyang coralblade cthoyt

rootstock's Issues

MIT licensed JS code

The anchor script (https://github.com/greenelab/manubot-rootstock/blob/master/build/assets/anchors.js) is not yet listed in the licensing section of the Readme.

I think it would be good to mention this somehow, maybe with "unless noted explicitly" or something similar?

Page numbers in the print / PDF output

Some consider the lack of page numbers to be disturbing.

SVG images fail to export to PDF

Refs jgm/pandoc#265

Add functionality to add banner in HTML template

It could be useful to have a simple way to add info text in a highly visible banner like "Work in progress" or "Published, peer-reviewed version at [...]" to the head of the HTML file with some simple config.

(From #127)

Consider wkhtmltopdf alternatives for HTML to PDF export

I was browsing recent pandoc commits and saw jgm/pandoc@c7e3c1e, refs jgm/pandoc#3909 and jgm/pandoc#3906.

We should look into WeasyPrint and Prince.

This could help with the lack of SVG image export in wkhtmltopdf as well as the some of the aesthetics issues. In addition, our conda install of wkhtmltopdf is linux only.

Update CSS to left justify table captions

I suggest changing the CSS to left justify table captions, since the main text is left justified and the figure captions are as well. For example, only the table caption in this document is center justified: https://greenelab.github.io/meta-review/v/b8eeea542ce238bbcaf2023add2aecb86ef726bd/

It's not immediately obvious where to change the CSS to accomplish this, but I didn't look thoroughly.

Forking a new manuscript documentation

I started thinking about more detailed documentation for someone who wanted to create a new manuscript using this repository as a template. They could fork through GitHub, but that would only support a single manuscript per user.

The process I'm trying is roughly:

Clone https://github.com/greenelab/manubot-rootstock locally and rename manubot-rootstock to my desired new manuscript name
Create a new repository on GitHub with that name
Set my new GitHub remote as origin and manubot-rootstock as upstream
Enable Travis CI and change the readme links
Change the github.io links
Execute some (or all?) of initialize.sh to create the necessary remote branches
Generate and configure the GitHub SSH deploy key for deploy.sh

Is there are more streamlined process we could recommend? Am I missing any steps?

Renaming repository from manubot-rootstock

I don't think manubot-rootstock is a terrible repo name, but it may not be the best. The two biggest problems I see are that it's:

long
confusing since many users may not be familiar with rootstocks

Here are some other names I jotted down:

manubot-init
manubot-model
manubot-stemcell
manubot-template
manucat
manucross
manumorph
manuroot
manusource
manustart

Alphabetically sorted so I don't bias others with my ranking. Since this would be a slightly disruptive change, we should only make it if we feel any of these names is much better.

CC @cgreene @agitter @slochower. I do like the name "Manubot" for the overall system and the python package. That's way all of these names stick to the manu* theme.

Intregrating Manubot and Idyll

Idyll stands for Interactive Document Language and is a "markup language for interactive documents." The current description reads:

Idyll extends the ubiquitous Markdown format to enable the creation of dynamic, interactive narratives for the web. The language and toolchain aim to empower journalists, researchers, and technical experts to create compelling content using familiar tools and processes.

Idyll can be used to create explorable explanations, to power blog engines and content management systems, and to generate dynamic technical reports. The tool can generate standalone webpages or be embedded inside of your existing site.

Taking a look at an example was helpful. See Idyll on GitHub at idyll-lang/idyll.

@cgreene met the Idyll folks recently and wondered whether it'd be helpful for the Deep Review in greenelab/deep-review#842.

This issue is for discussing whether there is synergy between Idyll and Manubot, and whether there's an opportunity to integrate them in some form.

CC @mathisonian @AndrewGYork @marciovm.

@AndrewGYork is also working on interactive papers hosted via GitHub (example).

Allow additional information in metadata.yaml

Based on some of the points already discussed on deep review and greenelab/meta-review#75 (comment), I think adding a few additional variables to the metadata would help Manubot be a little more flexible. Some ideas of what we might want to allow:

Address information (in addition to affiliation)
Corresponding author status
First/co-first author status
Specification of author symbol

For the last three, I think we could implement sensible defaults in the jinja template to use if not specified. For example, corresponding author status may be set to "no" unless it is explicitly set to "yes."

Increase top page margin in print media

The printed page margin was a bit too small on the top for the Sci-Hub manuscript. PeerJ applied their own banner which overlaps with some of the text. See https://peerj.com/preprints/3100v1.pdf

For example,

The other margins looked fine.

Archiving metadata (issues, pull request, etc.)

In deep review, the issues and pull requests were a critical part of the manuscript. I'd like to discuss strategies for archiving some of this metadata.

One initial thought would be to have the build script take a snapshot of the issues and pull requests at the time of the build, ideally with some caching. The deploy script could push them to a new branch, perhaps adding a timestamp. I haven't thought through the technical aspects of this. I expect it is feasible using some of the tools or APIs here.

cc @cgreene

Reference numbering with misspecified citation

In deep review (greenelab/deep-review#845 ), we had a pair of citations without a ; separator [@url:https://eprint.iacr.org/2017/281.pdf @tag:Papernot2017_pate]. The second paper was numbered in the reference list but not actually cited in text, which led to inconsistent reference numbering:

The skipped reference number 161 is @tag:Papernot2017_pate. See the permalink for more context. As a reader, I would expect that @tag:Papernot2017_pate is numbered based on the first appearance in the text.

Add "Send Feedback" button that creates an issue

Jake VDP wrote an astronomy paper (github source) that published to gh-pages (http://jakevdp.github.io/multiband_LS/) via gh-publisher. While each of those steps is a little clunky, one awesome feature of this page is that it has a "Send Feedback" button which then opens up a GitHub issue! This is a great way to create a dialogue with the manuscript authors and readers.

EDIT: Added link to gh-publisher

Bake hypothesis into the HTML versions of the manuscripts

Add:

<script src="https://hypothes.is/embed.js" async></script>

Doesn't work natively with the PDF files, sadly.

Simplifying authors.tsv to manuscript conversion

Currently author parsing is disabled in this repo. I'm thinking of simplifying the TSV format and how it gets added to the manuscript. Basically, here would be the columns:

github_username
full_name
initials (possibly)
orcid
affiliations
funding
email
symbols (superscript symbols to add next to name). Could be symbol for corresponding or contributed equally. Or anything. The symbols would be manually defined in the mardown doc.

I was thinking of removing the approve column, and going for each author submits a PR to add their name, hence approving.

Unlike the system for the deep review, the build system, would not try to condense affiliations or funding across authors. In other words, each author would get their details printed next to their name. There would be more duplication of text, but this system will be more reliable. Additionally, we may eventually move to putting much of this info in tooltips for the HTML version.

@agitter what do you think. Feel free to disagree!

Show context for references

Building on @dhimmel's post on author versus numeric citation styles, another advantage of author-based citations in the current version of Manubot is that it is easier to find where a reference is cited. I can search for Pantcheva, 2018 more easily than 13, for instance, especially if 13 is cited as 12-14 or appears in numeric parts of the text.

A nice feature for numeric citations might a form of "show context" that some journals use. https://www.nature.com/articles/ncomms12989#references is an arbitrary example. The context consists of snippets of the manuscript where the reference was used plus links back to those locations.

This would also give us one way to address #117. We could assert that the reference number is an increasing function of the reference's first context.

Generic URL ––> archive.org persistent ID/URL ––> Manubot

When you cite a news or blog URL, you might want to reference the archive.org snapshop of the URL.

Can the @url: identifier send a request to archive.org and get that URL to cite in Manubot?

See blog post: https://medium.com/@RaoOfPhysics/89bd3f2ce0fd

Adopting the pandoc-citeproc markdown citation format

Currently we cite multiple documents like:

Several groups [@doi:10.1371/journal.pone.0032235 @doi:10.1109/TCBB.2014.2343960 @doi:10.1038/srep11476] initiated

Prior to pandoc, this gets converted to:

Several groups [@1AlhRKQbe; @ZzaRyGuJ; @UpFrhdJf] initiated

Then post pandoc conversion, it will look like:

Several groups [30,192,193] initiated

Note how we have to add semicolons to separate each reference. We figured this out at lierdakil/pandoc-crossref#110. It would be nice to align our format with the pandoc-citeproc format. This presumably would also allow us to make non-bracketed citations like:

@doi:10.1371/journal.pone.0032235 was the first group

This would presumably render to

Qi et al 2012 was the first group

However, I haven't found the actual docs for the markdown citation formatting supported by pandoc-citeproc (docs). Tagging @lierdakil and @slochower in case they have any insights.

Add BUILD_PDF flag

To work-around PDF build issues (#120) and for quicker local development a BUILD_PDF flag like BUILD_DOCX might be useful.

This would require skipping "manuscript.pdf" in webpage.py, would that be a problem?

Automated figure & table numbering

@slochower welcome to manubot-rootstock... which is meant to be forked when creating a new manuscript. Still a work in progress.

See previous discussions at greenelab/deep-review#354 (comment) and greenelab/deep-review#558.

It seems like the best way to number and reference tables and figures will be with pandoc-tablenos and pandoc-fignos, which are both python packages by @tomduck that we can add to the environment:

They can be enabled in the pandoc conversion script with:

--filter pandoc-fignos
--filter pandoc-tablenos

Since we're also using jinja2 templating, we could do the conversion prior to pandoc if there is a compelling reason.

@slochower do you want to submit the PR? I'm thinking the initial use case we should target is markdown tables and figures embedded via absolute URL (let's save the relative image path case for later).

Also @slowchower, any idea how figure and table captions work?

CC @agitter.

Error during build if there are zero references

I've been playing around with manually building a manuscript based off this template and noticed that if I have absolutely zero references in my document, I get a build error. If I add a reference in any section (e.g., putting [@doi:10.1126/science.1127344] as a placeholder in my abstract), then the error goes away.

$ bash build/build.sh 
Retrieving and processing reference metadata
Using metadata cache: True
Traceback (most recent call last):
  File "references.py", line 111, in <module>
    ref_df['standard_citation'], ref_df['citation_id'] = zip(*result)
ValueError: not enough values to unpack (expected 2, got 0)

I haven't debugged the code, but I think result (calculated on line 109, just above the error) is empty when there are no references. Would a simple check if result not None: ... before line 111 be a workaround?

result = ref_df.citation.apply(
    get_standard_citatation, cache=metadata_cache, override=overrides)

(FWIW, I do get the "potentially misformatted references" error in any case, but the build continues successfully after I add the placeholder. The warning from the templates in the front matter.)

PDF formatting is not ideal

In several places, the PDF rendering looks (subjectively) worse than the HTML output. (I'm not sure if I'll have time to work on this during the week, but I wanted to drop this here in case someone else has time before me.)

Overall, I think the margins of the PDF could be adjusted. The relatively short title already wraps in the PDF.
There are places where the HTML has spaces between the text and the references, but the PDF output does not. I'm not sure why this happens.
Code style could be formatted as monospaced in the PDF output.
Tables look much better in HTML than PDF (shading and banding).
The SVG example figure is missing (known problem: #14).

Tracking Manubot usage

It may be nice in the future to produce statistics about how many documents have been authored with Manubot and this rootstock or refer to more examples. @dhimmel has https://github.com/dhimmel/rephetio-manuscript/ and were examples listed in #62.

I haven't been able to think of a non-invasive way to track this. Does anyone else have ideas? Is this worthwhile?

Support Alternate Themes

It is difficult to read a long manuscript with the current style settings.

It might be useful to build on the work of other projects which convert Markdown into the usual academic style:

https://github.com/ickc/markdown-latex-css
https://github.com/thomaspark/pubcss/ // https://thomaspark.co/project/pubcss/demo/acm-sig-sample-web-theme.html
https://gist.github.com/killercup/5917178
etc

Enable more advanced math rendering by default

The current default math used in our pandoc build command is severely limited: see the "TeX math in HTML" section of the pandoc demos. Pandoc has support for several more advanced methods for math rendering in HTML.

The question is which one to choose? I've seen MathJax used before in scholarly publishing. However, KaTex is faster to render. There are also several more options.

@slochower did you look into the math options at all for b03e1c3?

Creating a Manubot CSL that perfects the format of bibliographic entries

Currently, Manubot uses style.csl a slightly modified version of proceedings-of-the-royal-society-b.csl. While this style is decent, I have some ideas for an optimal style. And of course, authors can always switch the style to that of whatever journal they'd like.

The style I envision uses numbers for citations, i.e. renders likeblah blah [1-5,7].. Non bracketed citations could show author name like: Pippi, Hippi, et al [7] wrote.

Bibliographic entries would look something like:

Sci-Hub provides access to nearly all scholarly literature
_{^{Daniel S Himmelstein, Ariel R Romero, Stephen R McLaughlin, Bastian Greshake Tzovaras, Casey S Greene}}
PeerJ Preprints (2017-07-20) _{^{DOI: 10.7287/peerj.preprints.3100v1}}

Ideally, author names would be in smaller text and hyperlink to ORCID records when available. The smallness of text here is an exaggeration (limited formatting options).

Compared to historical bibliographic formats, the following points are stressed:

Unique identification is the most important aspect of a reference. A hyperlink or DOI is the single most important piece of information.
The title is the most salient human-readable piece of information
Just having a year for the date is too imprecise. The month and day are important for placing works in the proper historical context.
Authorship information is important, but often takes up too much of a reference. Having authorship information in smaller or lighter text would be nice.
Unless a reference is only available in a physical format, the volume / issue / page information is irrelevant.
Historical reference styles adopt vastly different styles based on the type of record (article, interview, etcetera). This is largely unnecessary. If anything, a badge can display the type of record.

There's a webapp to generate a custom CSL style. I've found it a bit difficult to use, but its probably the way to go.

One question is whether to print out the URL rather than hyperlink the title. The benefit of showing the URL would be for readers who have printed the PDF. However, if a reader is at a computer, they could always go back to the digital version with the hyperlink.

Suggestions welcome.

gh-publisher: lessons to learn?

Have you seen https://github.com/ewanmellor/gh-publisher? What lessons can we learn from them?

EDIT. Example: http://drphilmarshall.github.io/Ideas-for-Citizen-Science-in-Astronomy/

SETUP.md repo sed substitution is failing

At the OpenCon do-a-thon, we've had 2 users experience potentially faulty substitutions. Rather than rebranding their README to USER/REPO, their README.md is rebranded to USER/USER. Possibly introduced in #84?

The two examples are https://github.com/zambujo/manubot/commit/10397d6a05235c3517ac981b9b3c67920c226b9a are broadwym/manu1@64954e5.

Interestingly one user did not have the issue: https://github.com/schliebs/open_manuscript/commit/77da6c844ac061061c03b93721e7eade90fabd99, making me wonder whether its user error or not.

SETUP.md currently uses:

sed "s/greenelab/$OWNER/g" README.md > tmp && mv -f tmp README.md
sed "s/manubot-rootstock/$REPO/g" README.md > tmp && mv -f tmp README.md

@vsmalladi any ideas what could be happening?

Replace OpenTimestamps submodule with pip install

OpenTimestamps is now on PyPI (announcement). Install with:

pip install opentimestamps-client

We should also update python-bitcoinlib to v0.8.0.

ReScience

The ReScience journal could be a potential use case for manubot-rootstock. From https://arxiv.org/abs/1707.04393:

The main inconvenience of the GitHub platform is its almost complete lack of support for the publishing steps, once a submission has successfully passed the reviewing process. At this point, the submission consists of an article text in Markdown format plus a set of code and data files in a git repository. The desired archival form is an article in PDF format plus a permanent archive of the submitted code and data, with a Digital Object Identifier (DOI) providing a permanent reference. The Zenodo platform allows straightforward archiving of snapshots of a repository hosted on GitHub, and issues a DOI for the archive. This leaves the task of producing a PDF version of the article, which is currently handled by the managing editor of the submission, in order to ease the technical burden on our authors

Toggle Annotate/Highlight Popup // Add Unhighlight Ability

I highlight text with my mouse as I'm reading this. Apparently, there are lots of us who do this.

This is really annoying on Manubot HTML outputs because the highlight popup comes up every time. One time I clicked it by mistake and now there's no way for me to get rid of my highlight and I feel like a jackass who highlighted some unimportant text.

It'd be great if I could a) toggle the highlight-popup and b) un-highlight.

Setting up manubot-rootstock for GitLab with GitLab-CI

Just a suggestion: https://about.gitlab.com/features/gitlab-ci-cd/ :)

Creating a diff between two manuscript versions

Oftentimes, it's important (and required in scholarly publishing) to show the changes between two versions of a manuscript. It would be ideal if Manubot users could "track changes" between two manuscript versions.

Pandoc doesn't have builtin support for diffs: jgm/pandoc#2374. Other options would be:

Exporting to latex and using latexdiff
Exporting to docx and using LibreOffice's Compare Document feature. Currently, not accessible via command line.
Export to ODT and use oodiff
Diffing manuscript.md as a text file (perhaps using diff, prettydiff, or rich-text-diff)
Use GitHub's rich diff view preview or react-rich-diff

Instructions for manual references

It would be helpful to describe the usage of manual-references.json in references/README.md. I can make a pull request myself (eventually).

Bitcoin sign (₿, U+20BF) doesn't render in PDF and some browsers

As commented by @arielsvn in greenelab/scihub-manuscript#51 (comment):

there seems to be an encoding issue with the bitcoin symbol on the Discussion section. I noticed it on the pdf, and the same happens with the markdown file, at least on my computer.

This is likely due to the unicode character (₿, U+20BF) a recent addition as part of Unicode 10.0, released June 2017. Note this release has other important symbols/emojis such as 🧟 (Zombie) and 🧖 (Person in Steamy Room).

For me, on Chrome on Ubuntu 17.10, the bitcoin sign renders in the HTML but not the PDF. I'm assuming the PDF gets a certain font embedded on Travis CI, which doesn't have the latest characters. Note that when I generate the PDF locally, the bitcoin signs do render.

So @arielsvn, I think we may want to look into the following solutions:

Updating the font used by the Travis CI build
Specifying a font to use that is up to date

@arielsvn you probably know best what to do here.

Setup commands fail on macOS

Some of the commands in SETUP.md fail on macOS. IIRC, these commands are:

TRAVIS_ENCRYPT_ID=`grep \
  --only-matching --perl-regexp \
  --regexp='(?<=encrypted_)[a-zA-Z0-9]+(?=_key)' \
  travis-encrypt-file.log`
sed --in-place "s/f2f00aaf6402/$TRAVIS_ENCRYPT_ID/g" deploy.sh

sed --in-place "s/greenelab/$OWNER/g" README.md
sed --in-place "s/manubot-rootstock/$REPO/g" README.md

The issue is likely that the mac versions of these utilities don't support the same long arguments. What a shame.

macOS PDF build issues: long arguments not accepted & missing fonts in PDF

sh build/build.sh fails on MAC OS as the following:

ln --symbolic and rm --recursive do not work. When I changed them to ln -s and rm -r, respectively, they are fine.

However, then it complains about pango. I manually installed it using homebrew and pango was not an issue anymore.

Then the build was completed with no errors but warnings:

WARNING: Ignored `-ms-text-size-adjust: 100%` at 78:5, unknown property.
WARNING: Ignored `-webkit-text-size-adjust: 100%` at 79:5, unknown property.
WARNING: Ignored `-moz-box-sizing: content-box` at 204:5, unknown property.
WARNING: Ignored `-webkit-appearance: button` at 379:5, unknown property.
WARNING: Ignored `cursor: pointer` at 380:5, the property does not apply for the print media.
WARNING: Ignored `cursor: default` at 389:5, the property does not apply for the print media.
WARNING: Ignored `-webkit-appearance: textfield` at 410:5, unknown property.
WARNING: Ignored `-moz-box-sizing: content-box` at 411:5, unknown property.
WARNING: Ignored `-webkit-box-sizing: content-box` at 412:5, unknown property.
WARNING: Ignored `-webkit-appearance: none` at 423:5, unknown property.
WARNING: Invalid or unsupported selector 'button::-moz-focus-inner,
input::-moz-focus-inner ', Unknown pseudo-element: -moz-focus-inner
WARNING: Invalid or unsupported selector '*:not("#mkdbuttons") ', (<FunctionBlock not( … )>, ':not() only accepts a simple selector')
WARNING: Ignored `-webkit-font-smoothing: subpixel-antialiased` at 486:5, unknown property.
WARNING: Ignored `-moz-border-radius: 3px` at 491:5, unknown property.
WARNING: Ignored `-webkit-border-radius: 3px
` at 492:5, unknown property.
WARNING: Ignored `-webkit-font-smoothing: subpixel-antialiased` at 528:5, unknown property.
WARNING: Ignored `cursor: text
` at 529:5, the property does not apply for the print media.
WARNING: Ignored `word-break: break-all` at 733:5, unknown property.
WARNING: Ignored `word-break: break-word` at 734:5, unknown property.
WARNING: Ignored `-webkit-hyphens: auto` at 735:5, unknown property.
WARNING: Ignored `-moz-hyphens: auto` at 736:5, unknown property.

And generated PDF has squares only.

Do you have any idea on why this might be happening?

Deploy with windows generated keys fails

Ran into a deploy error when setting up a manuscript at the OpenCon doathon:

bad decrypt
140040671200928:error:0606506D:digital envelope routines:EVP_DecryptFinal_ex:wrong final block length:evp_enc.c:520:

Prepended file numbers

It seems like it would be better to specify the ordering of the markdown files by having a separate file.

As it is now it looks like people would have to rename several files if they wanted to change the ordering or add some content in the middle.

Journal compatibility

I'm excited to see this standalone manuscript repository!

I have a general question in regards to journal submissions. Many journals require Word or LaTex formats for submission. Have you thought about how manuscripts written in this markdown format can be submitted to a journal with those requirements? Would one use pandoc outside of the automatic build to do a one time conversion to Word or LaTeX?

Webpage prints to A4 dimensions rather than Letter

See for example this Sci-Hub Manuscript PDF. The Paper Size according to the PDF's properties is A4, Portrait (8.26 × 11.68 inch). This caused an issue when I printed the PDF where some final lines on a page were omitted.

This StackOverflow notes how to change the page to Letter (8.5 × 11). I just want to confirm this is a change we want to make. I didn't realize there were multiple paper sizes, both prevalent, in this unstandardized world!

Preserving old versions of files on the gh-pages branch via directories

The gh-pages branch is responsible for the GitHub Pages site and contains output HTML, PDF, CSS, image, and OTS files. Currently, new manuscript builds overwrite the files, which are in the root directory of this branch:

https://github.com/greenelab/manubot-rootstock/blob/f165f609f33b11fdf71a0db6435d4dd159f23973/ci/deploy.sh#L62-L68

I propose instead creating a directory structure, so all past outputs on gh-pages are preserved through versioned directories. The version would be the master commit that the build was based on (i.e. $TRAVIS_COMMIT). For example, I commit f165f60 to master. The outputs that currently go to the root directory of gh-pages would instead go to the v/f165f609f33b11fdf71a0db6435d4dd159f23973 directory (v for version). The latest HTML and PDF manuscript would stay available at their current URLs, probably via symbolic links (see here for how symlinks act with GitHub Pages).

We could use redirects, so v/freeze redirects to the latest versioned directory.

The benefits of this change are twofold:

You can view outdated versions of the HTML manuscript. Right now, you can only see the rendered HTML for the latest version.
The OpenTimestamp .ots files need to be upgraded. Until they're upgraded, they depend on a calendar service for verification. Currently, we haven't upgraded timestamps, which creates the possibility that we may be unable to prove existence if the calendar goes down. Note that the timestamps can only be upgraded after the bitcoin transaction confirms, which could be days. That's why we don't specify --wait in our builds. Anyways, previously I was planning on rewriting the gh-pages history to upgrade timestamps in past commits. However, rewriting history is dangerous. It would be preferable to be able to upgrade past timestamps without rewriting history, which this proposal would enable.

The main disadvantage I can think of is repository size, since more files are being tracked. However, I'm not sure it'd be any bigger, since all files are currently in the git history at some point. According to this:

even if you have multiple files with the same contents but different names or in different locations or from different commits only one copy would ever be saved but with several pointers to it in each commit tree.

Shallow cloning would lose its savings, but I'm not sure we care.

One final point to consider is that a single commit will sometimes be deployed multiple times (say if the CI build is rerun). They will not always be the same. For the same source commit, I think we'd use the latest build.

Symlink CSS to output directory for local viewing of the HTML

I propose we symlink github-pandoc.css to the output/ directory so that local building and viewing of webpage/index.html or output/manuscript.html (I know those are symlinks of each other) loads the CSS. Viewing the HTML from either webpage/ or output/ currently can't find the CSS because the browser follows the symlink into output/ and therefore doesn't find webpage/github-pandoc.css. Does that make sense? A simple ln -s ../webpage/github-pandoc.css in output/ fixes the issue.

Markdown proofer

This CircleCI blog describes Markdown Proofer for validating YAML blocks in Markdown files. It is written in Go, which we could get from conda, but it may not cleanly integrate into our test environment. I'm also uncertain whether it could be applied directly to YAML files like metadata.yaml.

Nevertheless I thought it was worth monitoring.

Update manual reference guidelines with link to examples

This page provides some nice examples of the CSL metadata for different document types. Would be nice to add to docs.

Check out Pandoc Scholar

Described in Formatting Open Science: agilely creating multiple document formats for academic manuscripts with Pandoc Scholar:

In this article we demonstrate the feasibility of writing scientific manuscripts in plain markdown (MD) text files, which can be easily converted into common publication formats, such as PDF, HTML or EPUB, using Pandoc. The simple syntax of Markdown assures the long-term readability of raw files and the development of software and workflows. We show the implementation of typical elements of scientific manuscripts—formulas, tables, code blocks and citations—and present tools for editing, collaborative writing and version control. We give an example on how to prepare a manuscript with distinct output formats, a DOCX file for submission to a journal, and a LATEX/PDF version for deposition as a PeerJ preprint. Further, we implemented new features for supporting ‘semantic web’ applications, such as the ‘journal article tag suite’—JATS, and the ‘citation typing ontology’—CiTO standard.

The GitHub repo for this project is pandoc-scholar/pandoc-scholar. Created by @tarleb.

Let's see if there's anything from Pandoc Scholar we should incorporate here or learn from.

nan (missing) author fields

Quoting from https://greenelab.github.io/scihub-manuscript/

0000-0002-9925-9623 · Department of Applied Bioinformatics, Institute of Cell Biology and Neuroscience, Goethe University Frankfurt · Funded by nan

Would be better to have jinja2 omit blank fields entirely. In other words remove " · Funded by nan"

Manubot vs. alternatives

I just found out about Manubot, can you tell the differences between Manubot and alternatives, like:

Authorea
Gitlab
Overleaf
Google docs...

Autogenerate DOIs (with Zenodo?) based on releases/tags

At the moment, PDFs get pushed to PeerJ. But you could use the GitHub-Zenodo integration to snapshop the whole repo and give it a DOI.