Giter Site home page Giter Site logo

johnmartinsson / crawl-ml-proceedings Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 252 KB

A tool to (i) crawl machine learning proceedings for titles, authors, abstracts, pdf urls, bibtex et cetera, and (ii) make a similarity ranking to a supplied abstract of your own.

License: Apache License 2.0

Python 2.43% HTML 97.57%

crawl-ml-proceedings's Introduction

Crawl ICLR, ICML, NeurIPS and arXiv

A tool to crawl machine learning proceedings for abstracts, pdfs, key words, bibtex et cetera and populate a database with these. Then match the abstracts of the papers in the database against one of your own abstracts and rank by similarity.

How to use

python3 crawl.py --venue=arxiv --query_term='"active learning"' --database=papers.db

This will crawl 'arxiv' for papers with "active learning" in the title and insert them into papers.db. The query functionality is limited, but just keep to lower case characters and use the same format as above and it should be fine. It can handle

'"words to match" AND "otherword"'

which will make sure that "words to match" appears as is in the title of the paper, and that "otherword" appears in the title. (I hope.)

supported venues:

  • iclr, back to 2018,
  • iclm, back to 2013,
  • neurips, back to 1988
  • arxiv

Run this command for all venues and query_terms that you want to use to crawl papers, with the same --database=papers.db command, and it will be populated with all papers you want.

ICLR constrains the number of queries, so this takes time if you have many hits. It is also a bit buggy still and may crash.

Rank abstract similarity to a pre-defined weighted sentence list

python find_papers.py --database=papers.db --random_papers=0 --sentence_list_name=default

This will rank all the abstracts with the predefined weighted sentences in the sentence list "detault" in find_papers.py, change to your liking to get relevant hits. If you set --random_papers > 0 a random selection of all papers will be chosen for the ranking. Can be good when building a new sentence list to iterate quickly and get a feeling for what type of matches it produces.

crawl-ml-proceedings's People

Contributors

johnmartinsson avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.