Giter Site home page Giter Site logo

papertool's Introduction

Papertool

Papertool is (will be) a multi-datasource client for viewing and navigating scientific publications written in Rust. It allows to navigate through the references and citations of a publication, query metadata and even retrieve a (link to) the PDF file in most cases.

It's extensible and can support multiple data sources. It can deal with wrong metadata, ultimately by asking the user on how to fix e.g. a situation where two publications have the same DOI (which cannot be right, DOIs are unique).

All queried data is locally cached in a SQLite3 database to speed up subsequent accesses.

This is still in an early stage of development. The "hard stuff" seems to work quite well already, though this is certainly not usable yet for real-world usage.

Data sources are currently implemented or planned:

  • SemanticScholar for citations/references and PDF link (implemented)
  • DBLP for metadata (planned)
  • CrossRef for metadata (planned)

The philosophy is to use as raw as possible data from the sources, and to cook them into nice BibTeX records using heuristics afterwards. This has the advantage that changes in heuristics can easily be implemented and affect all data, not just those retrieved after the change.

Usage

You can run the tool by just typing cargo run in a terminal.

This will currently query all papers that cite at least one of 5 publications I deemed "interesting" (hardcoded DOIs in main.rs), sort them by their count of references to "interesting" papers. The top few of that list show a link to a PDF version of that paper by using an experimental semanticscholar feature.

Results are cached to /tmp/bla.sqlite for the moment. If any weird panics occur, delete that cache file.

Note that the example query will fail two times, because of conflicting DOIs, with the following message:

##########################
# NEED USER INTERVENTION #
##########################
Got #1 from API:
'Configurable 3D Scene Synthesis and 2D Image Rendering with Per-pixel Ground Truth Using Stochastic Grammars' by Chenfanfu Jiang et al. (doi:10.1007/s11263-018-1103-5, semanticscholar:d24ff1cde1673ba1ecb4890dd7eeaa85d464221c)
Got #2 from database due to same doi:
'Configurable, Photorealistic Image Rendering and Ground Truth Synthesis by Sampling Stochastic Grammars Representing Indoor Scenes' by Chenfanfu Jiang et al. (doi:10.1007/s11263-018-1103-5, arxiv:1704.00112, semanticscholar:6597649cf4b45fac4ae62d455d271c1ce9e829f3)

Obviously, one of these doi fields is wrong. Please enter a correction using the following syntax:
        > doi1=<insert new doi here>           to change #1's doi to something else or
        > arxiv2=-,doi2=-                      to change #2's arxiv id and doi to <none>
        > 1=<...>                              to change #1's id, if there is only one conflicting id
> 

This is because SemanticScholar wrongly gives the same DOI for two different papers. Resolve this conflict e.g. by typing 2= (or doi2=-) to permanently and persistently clear #2's doi entry.

Then re-run the program, and another similar error will be shown. After handling that one, in the third run, the programm will successfully retrieve all data and show:

...
    (
        "Automatic Generation of Vivid LEGO Architectural Sculptures",
        "didn\'t bother",
        1,
    ),
    (
        "Attribute And-Or Grammar for Joint Parsing of Human Attributes, Part and Pose",
        "didn\'t bother",
        1,
    ),
    (
        "Partial Procedural Geometric Model Fitting for Point Clouds",
        "https://arxiv.org/pdf/1610.04936.pdf",
        2,
    ),
    (
        "Guided Proceduralization: Optimizing Geometry Processing and Grammar Extraction for Architectural Models",
        "https://arxiv.org/pdf/1807.02578.pdf",
        2,
    ),
    (
        "3D Semantic Segmentation of Modular Furniture Using rjMCMC",
        "http://web-info8.informatik.rwth-aachen.de/media/papers/egpaper_final_GX7r76o.pdf",
        2,
    ),
    (
        "Neural Procedural Reconstruction for Residential Buildings",
        "http://openaccess.thecvf.com/content_ECCV_2018/papers/Huayi_Zeng_Neural_Procedural_Reconstruction_ECCV_2018_paper.pdf",
        2,
    ),
    (
        "A Generalized Proceduralization Framework for Urban Models with Applications in Procedural Modeling, Synthesis, and Reconstruction",
        "no pdf",
        2,
    ),

papertool's People

Contributors

windfisch avatar

Watchers

 avatar  avatar  avatar  avatar

papertool's Issues

BibTeX record generation

This is not so simple. Metadata is usually not in a nice format and tons of heuristics need to be thrown against.

Examples:

  • "Proceedings of" vs "Proc." and similar abbreviations
  • "Texas" vs "TX" etc.
  • Conference names often redundantly contain the name or the publisher
  • Conference abbreviations ("3DUI" vs "3DUI'09" vs "3DUI '09")

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.