poldrack / autocv Goto Github PK

View Code? Open in Web Editor NEW

127.0 6.0 17.0 283 KB

automatic generation of CV

License: MIT License

Python 93.03% TeX 4.19% Dockerfile 0.75% Makefile 1.84% Shell 0.19%

autocv's Introduction

autoCV

NOTE: This project has been deprecated in favor of AcademicDB and is no longer being maintained.

A tool for automatic generation of a LaTeX-based curriculum vitae (CV)

Motivation

I recently wanted to update my CV to include all of my open science activities, such as links to open access papers, code/data, and include DOIs. Rather than doing this by hand for each publication, I decided to build an automated tool to generate a CV using PubMed and ORCID to download the publication information, and a set of text files containing other info. It's still a work in progress but it might be helpful for you; so far I have only tested it on my own CV, and it will almost certainly need work for others (especially if you have a common name that is not uniquely identified with a simple Pubmed query). It will be most useful for more advanced researchers whose CV may be many pages long.

The project takes advantage of the very nice LaTeX CV template from Dario Taraborelli.

Structure

The idea behind this package is to use PubMed and ORCID to obtain an up-to-date CV in a relatively automated way. Using it requires that the user first enter some relevant information into their ORCID account:

Education
Employment
Invited Positions and Distinctions
Membership and Service

In addition, it requires generating several CSV files containing other information that is not well organized or available within ORCID:

conference.csv: Conference presentations
talks.csv: Colloquium and other talks
funding.csv: Grants and other funding
editorial.csv: Editorial duties and reviewing
additional_pubs.csv: Publications that are not found in PubMed/ORCID (including books, book chapters, and conference proceedings - note that ORCID allows addition of books but the metadata are a bit screwy, so I prefer entering them manually in this file)
teaching.csv: Courses taught

It also allows addition of links to any reference using a csv file called links.csv.

Finally, you will need to generate a json file called params.json that contains some metadata about you - see example here.

You will need to take a look at the examples of these files in the repository to see their structure.

You can see an example of the output here.

Running the code

First you should install the package:

pip install -U autocv

You will also need a working LaTeX installation - on the mac I use MacTeX.

Then you can run the full process by simply typing autoCV from the directory that contains all of the necessary files. Type autoCV -h for additional options.

NSF Collaborators list

You can use bin/get_NSF_collaborators to create a spreadsheet that generates the raw material from which one can assemble their NSF collaborators spreadsheet. To use this you will need to create a json file that contains details about you; here is an example of what mine looks like:

{
    "lastname": "poldrack",
    "middlename": "alan",
    "firstname": "russell",
    "email": "[email protected]",
    "orcid": "0000-0001-6755-0259",
    "query": "poldrack-r",
    "url": "http://www.poldracklab.org",
    "twitter": "@russpoldrack",
    "github": "http://github.com/poldrack",
    "phone": "xxx-xxx-xxxx",
    "address": [ "Stanford University", "Department of Psychology", "Building 420", "450 Jane Stanford Way", "Stanford, CA, 94305-2130" ]
}

Testing

Tests for many of the components can be run using this command:

python -m pytest tests

autocv's People

Contributors

Stargazers

Watchers

Forkers

akimbler gzoumpourlis blipblipgo zuxfoucault m9h shabtastic lefatoum2 satra tsalo iago-noncontributedforks daniel-gong jbdenniso natematias bchanlee dalarconrub emdupre mathiasfls

autocv's Issues

Biosketch format

Maybe a suggestion for future add-ons: would it be possible to export this to the various biosketch formats used by NIH or other government/funding agencies?

fix testing strategy

the current testing strategy has unwanted dependencies between tests. should rewrite using a strategy that decouples the individual tests

[idea] Semantic Scholar API

List of papers authored by a person and number of citations per article can be obtained via Semantic Scholar API: https://api.semanticscholar.org/

This could be used to, for example, highlight most cited first/last author articles.

Test failure due to Unicode character

Running the test suite on Windows 10 with TeXMaker lead to the following failure:

______________________________________________ test_latex_cv_write_latex ______________________________________________

latexcv = <autocv.latex.LatexCV object at 0x0000013D971690D0>
tmpdir_factory = TempdirFactory(_tmppath_factory=TempPathFactory(_given_basetemp=None, _trace=<pluggy._tracing.TagTracerSub object at 0x0000013D93018100>, _basetemp=WindowsPath('C:/Users/xxx/AppData/Local/Temp/pytest-of-mph648/pytest-0')))

    def test_latex_cv_write_latex(latexcv, tmpdir_factory):
        outfile = tmpdir_factory.mktemp("data").join("test.tex")
>       latexcv.write_latex(outfile)

tests\test_latex.py:45:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
autocv\latex.py:56: in write_latex
    f.write(getattr(self, section))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <encodings.cp1252.IncrementalEncoder object at 0x0000013D97169220>
input = '\\section*{Publications (Google Scholar H-index = 129)}\\subsection*{2022}Adebimpe A, Bertolero M, Dolui S et al. (20...ations. \\textit{Psychon Bull Rev, 3}, 434-48. \\href{http://dx.doi.org/10.3758/bf03214547}{DOI} \\vspace{2mm}\r\n\r\n'
final = False

    def encode(self, input, final=False):
>       return codecs.charmap_encode(input,self.errors,encoding_table)[0]
E       UnicodeEncodeError: 'charmap' codec can't encode character '\u0144' in position 34449: character maps to <undefined>

C:\ProgramData\Anaconda3\lib\encodings\cp1252.py:19: UnicodeEncodeError

I also got the following warning:

..\..\..\..\..\ProgramData\Anaconda3\lib\site-packages\pyreadline\py3k_compat.py:8
  C:\ProgramData\Anaconda3\lib\site-packages\pyreadline\py3k_compat.py:8: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.10 it will stop working
    return isinstance(x, collections.Callable)

..\..\..\..\..\ProgramData\Anaconda3\lib\site-packages\win32\lib\pywintypes.py:2
  C:\ProgramData\Anaconda3\lib\site-packages\win32\lib\pywintypes.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    import imp, sys, os

autocv\utils.py:174
  C:\Users\mph648\Documents\GitHub\autoCV\autocv\utils.py:174: DeprecationWarning: invalid escape sequence \&
    setattr(pub, field, value.replace(' &', ' \&') ) # noqa

-- Docs: https://docs.pytest.org/en/stable/warnings.html

Example output missing

The file linked to at the end of the README (https://github.com/poldrack/autoCV/blob/master/tests/autocv_template.pdf) does not exist.

separate out content download and rendering

Per https://twitter.com/choldgraf/status/1264941141364899840?s=20 from @choldgraf:

One thought: it might be nice to cleanly separate our the "pull my credentials from various open sources" step and the "render my cv with latex" step. They seem easily separable and the first could be used as an entry into other kinds of outputs (eg html)

Use either docker or local LaTeX install

It would probably not be too difficult to set it up so that it can be run either inside a docker with auto-installation of latex, or directly on host and you're responsible for setting up latex yourself?

Originally posted by @thesamovar in #6 (comment)

can't find 'main' module

The package builds and installs correctly here but when trying to run it I have the error "can't find 'main' module in ". This is after following the instructions on the ReadMe and help pages.

allow arbitrary template

https://twitter.com/neuralreckoning/status/1264944287600476160?s=20

Feature suggestion: some simple templating to format CVs to different requirements. I've often seen very specific requirements for different jobs/grants/etc.

[idea] Publons API

Editorial board membership and reviwer service might be possible to acquire automatically via Publons API: https://publons.com/api/v2/

add configurable target directory

currently the code requires that all of the necessary files are in the current working directory. it would be useful to allow a command line setting for make_cv.py that would allow specification of an arbitrary directory. In particular, this would make adding tests much easier.

refactoring plans

The initial release of the code (1.0a0) leaves a lot to be desired. In the next few weeks I am planning to refactor it, with a few goals in mind:

define Publications class
- move away from using DOIs as index - use a hash instead
define Researcher class
separate downloading from rendering
-- ultimately would like to be able to render to generic markup (e.g. using jinja2) but probably won't get there for this next version.
better command line interface
- allow specification of:
  - base directory
  - template file
need to rethink docker workflow. right now, the calls to docker use the version of the package that is installed within the container, rather than the current version on the host system, which could lead to version mismatches.
Modularity (suggested by @chrisgorgo): try driving the refactor by making it easier for people to contribute new modules/components. For example layouts, styles, data sources etc.
tests
- what kinds of tests should we include (beyond an end-to-end smoke test)?