Giter Site home page Giter Site logo

ppwwyyxx / sopaper Goto Github PK

View Code? Open in Web Editor NEW
195.0 18.0 43.0 3.71 MB

Automatically Search and Download Papers

Home Page: https://pypi.python.org/pypi/sopaper/

License: Other

Python 54.11% Shell 1.98% Makefile 0.38% TeX 7.86% CSS 3.68% HTML 13.56% JavaScript 18.42%

sopaper's Introduction

SoPaper, So Easy

This is a project designed for researchers to conveniently access papers they need.

The command line tool sopaper can automatically search and download paper from Internet, given the title. The downloaded paper will thus have a readable file name (I wrote it at the beginning because I'm tired of seeing the file name being random strings). It mainly supports searching papers in computer science.

How to Use

Install command line dependencies:

  • pdftk command line executable.
    • Using pdftk on OSX10.11 might lead to hangs. See here for more info.
  • poppler-utils (optional)

Install python package: pip install --user sopaper

Usage:

$ sopaper --help
$ sopaper "Distinctive image features from scale-invariant keypoints"
$ sopaper "https://arxiv.org/abs/1606.06160"

NOTE: If you are not in school, you may need proxy by environment variable http_proxy and https_proxy, to be able to download from certain sites (such as 'dl.acm.org').

Features

The searcher module will fuzzy search and analyse results in

  • Google Scholar
  • Google

and the fetcher module will further analyse the results and download papers from the following possible sources:

Searcher and Fetcher are extensible to support more websites.

The command line tool will directly download the paper with a clean filename. All downloaded paper will be compressed using ps2pdf from poppler-utils, if available.

TODO

  • Fetcher dedup: when arxiv abs/pdf apperas both in search results, page would be downloaded twice (maybe add a cache for requests)
  • Don't trust arxiv link from google scholar
  • Is title correctly updated for dlacm?
  • Extract title from bibtex -- more accurate?
  • Fetcher for other sites

sopaper's People

Contributors

leetz avatar ppwwyyxx avatar wangycthu avatar zxytim avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sopaper's Issues

Oops!

$ paper-downloader -t "Efficient
Algorithms for Finding Minimum Spanning Trees in Undirected and"
Searching with Google Scholar
Searching with Google
Found item on google: Efficient algorithms for finding minimum spanning trees in ... - Springer at link.springer.com
Found item on google: Efficient algorithms for finding minimum spanning trees in undirected ... at link.springer.com
Found item on google: Efficient algorithms for finding minimum spanning trees in undirected ... at dl.acm.org
Found item on google: Efficient Algorithms for Finding Minimum Spanning Tree in ... at www.researchgate.net
Found item on google: Efficient algorithms for finding minimum spanning trees in undirected ... at citeseer.uark.edu:8080
Found item on google: Efficient Algorithms for Finding Minimum Spanning Trees in ... at citeseer.uark.edu:8080
Found item on google: Efficient algorithms for finding minimum spanning trees in undirected ... at www.bibsonomy.org
Directly Download to ./Efficient Algorithms for Finding Minimum Spanning Trees in Undirected and.pdf...
URL is http://link.springer.com/content/pdf/10.1007%2FBF02579168.pdf
--2014-03-23 11:51:05-- http://link.springer.com/content/pdf/10.1007%2FBF02579168.pdf
Resolving link.springer.com (link.springer.com)... 211.155.87.20, 211.155.87.26
Connecting to link.springer.com (link.springer.com)|211.155.87.20|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘./Efficient Algorithms for Finding Minimum Spanning Trees in Undirected and.pdf’

[ <=>                                   ] 52,589      --.-K/s   in 0.01s   

2014-03-23 11:51:07 (4.65 MB/s) - ‘./Efficient Algorithms for Finding Minimum Spanning Trees in Undirected and.pdf’ saved [52589]

./Efficient Algorithms for Finding Minimum Spanning Trees in Undirected and.pdf: HTML document, UTF-8 Unicode text, with very long lines

Format is not PDF!
Analyzing http://dl.acm.org/citation.cfm?id=18500
Download error: list index out of range
Traceback (most recent call last):
File "/home/tim/software/Paper-Downloader/resources/resource.py", line 38, in download
self.do_download(filename)
File "/home/tim/software/Paper-Downloader/resources/dlacm.py", line 25, in do_download
url = pdf[0].get('href')
IndexError: list index out of range

Test for file type on windows

Code
s = Popen('file "{0}"'.format(f.name),
stdout=PIPE, shell=True).stdout.read()
is platform specific
I suggest to add a switch in the config and to use
if ukconfig.USE_PYPDF2:
try:
fo = open(f.name, "rb")
PyPDF2.PdfFileReader(fo)
s = "PDF document"
except PyPDF2.utils.PdfReadError:
s = "invalid PDF file"
finally:
fo.close()
else:
s = Popen('file "{0}"'.format(f.name),
stdout=PIPE, shell=True).stdout.read()

Syntax Error

Hi,
I get the following error whenever I try to run sopaper:

Traceback (most recent call last):
  File "/home/clw/.local/bin/sopaper", line 11, in <module>
    load_entry_point('sopaper==0.8', 'console_scripts', 'sopaper')()
  File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 489, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2843, in load_entry_point
    return ep.load()
  File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2434, in load
    return self.resolve()
  File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2440, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/home/clw/.local/lib/python3.7/site-packages/sopaper/__main__.py", line 23, in <module>
    from sopaper import searcher
  File "/home/clw/.local/lib/python3.7/site-packages/sopaper/searcher/__init__.py", line 8, in <module>
    from ..lib.ukutil import import_all_modules
  File "/home/clw/.local/lib/python3.7/site-packages/sopaper/lib/ukutil.py", line 76
    print check_filetype(open("./ukconfig.py").read(), 'PDF')
                       ^
SyntaxError: invalid syntax

I think I have all packages installed (see below), and have had this error now on two independent systems (Ubuntu 16.04 and ArchLinux). Any help would be appreciated.

Some more info on packages:

Package        Version 
-------------- --------
beautifulsoup4 4.7.1   
certifi        2019.3.9
chardet        3.0.4   
idna           2.8     
requests       2.21.0  
sopaper        0.8     
soupsieve      1.9.1   
termcolor      1.1.0   
urllib3        1.24.3  

extra/poppler 0.76.0-1 [installed]
    PDF rendering library based on xpdf 3.0
extra/poppler-data 0.4.9-1 [installed]
    Encoding data for the poppler PDF rendering library
extra/poppler-glib 0.76.0-1 [installed]

Feature Request

  • title auto-completion: e.g.
    • given title: Object count area graphs for the evaluation'
    • should complete to: 'Object count area graphs for the evaluation of object detection and segmentation algorithms'
  • meta-info extraction: such as conference, year, etc.

paper-downloader.py is not stand-alone

$ ./paper-downloader.py
Traceback (most recent call last):
File "./paper-downloader.py", line 21, in
import fetcher
File "/home/zxytim/software/SoPaper/common/fetcher/init.py", line 16, in
from dbsearch import search_exact
File "/home/zxytim/software/SoPaper/common/dbsearch.py", line 89, in
init_title_for_similar_search()
File "/home/zxytim/software/SoPaper/common/dbsearch.py", line 84, in init_title_for_similar_search
db = get_mongo('paper')
File "/home/zxytim/software/SoPaper/common/ukdbconn.py", line 25, in get_mongo
_db = MongoClient(*ukconfig.mongo_conn)[ukconfig.mongo_db]
File "/home/zxytim/.local/lib/python2.7/site-packages/pymongo/mongo_client.py", line 352, in init
raise ConnectionFailure(str(e))
pymongo.errors.ConnectionFailure: could not connect to 127.0.0.1:27018: [Errno 111] Connection refused

SyntaxError: invalid syntax

sopaper "Distinctive image features from scale-invariant keypoints"
Traceback (most recent call last):
  File "/Users/jpope/miniconda3/envs/tf10/bin/sopaper", line 7, in <module>
    from sopaper.__main__ import main
  File "/Users/jpope/miniconda3/envs/tf10/lib/python3.7/site-packages/sopaper/__main__.py", line 23, in <module>
    from sopaper import searcher
  File "/Users/jpope/miniconda3/envs/tf10/lib/python3.7/site-packages/sopaper/searcher/__init__.py", line 8, in <module>
    from ..lib.ukutil import import_all_modules
  File "/Users/jpope/miniconda3/envs/tf10/lib/python3.7/site-packages/sopaper/lib/ukutil.py", line 76
    print check_filetype(open("./ukconfig.py").read(), 'PDF')
                       ^
SyntaxError: invalid syntax

I'm using python2

brew install poppler
Error: poppler 0.56.0 is already installed
To upgrade to 0.71.0, run brew upgrade poppler

seems to fix things.

failed to download paper

hello,
as the titles says, sopaper fails to download papers
image
unfortunately, the error message does not give further specifics.

it apparently fails to find it:

(sopaper) nfg@NI-CA-107962:~$ sopaper -u "Distinctive image features from scale-invariant keypoints"
INFO Searching 'Distinctive Image Features from Scale-invariant Keypoints' with searcher: 'Google Scholar' ...
INFO Searching 'Distinctive Image Features from Scale-invariant Keypoints' with searcher: 'Google' ...
Results for Distinctive Image Features from Scale-invariant Keypoints:

my env specs are:

(sopaper) nfg@NI-CA-107962:~$ conda list
# packages in environment at /home/nfg/anaconda3/envs/sopaper:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
ca-certificates           2023.08.22           h06a4308_0
certifi                   2020.6.20          pyhd3eb1b0_3
libffi                    3.4.4                h6a678d5_0
libgcc-ng                 13.2.0               h807b86a_2    conda-forge
libgomp                   13.2.0               h807b86a_2    conda-forge
libsqlite                 3.43.0               h2797004_0    conda-forge
libstdcxx-ng              13.2.0               h7e041cc_2    conda-forge
libzlib                   1.2.13               hd590300_5    conda-forge
ncurses                   6.4                  hcb278e6_0    conda-forge
pip                       20.1.1             pyh9f0ad1d_0    conda-forge
python                    2.7.18               h42bf7aa_3
readline                  8.2                  h8228510_1    conda-forge
setuptools                44.0.0                   py27_0    conda-forge
sqlite                    3.43.0               h2c6b66d_0    conda-forge
tk                        8.6.13               h2797004_0    conda-forge
wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
zlib                      1.2.13               hd590300_5    conda-forge

any hope would be appreciated!

best,
stas

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.