Giter Site home page Giter Site logo

arxiv_tool's Introduction

ArXiv Article Extractor

This Python script is designed to extract and compile a list of articles from the arXiv repository within a specified date range. It specifically targets articles related to artificial intelligence in the Computer Science (cs) category. The script generates a Markdown file containing a table of articles with details such as the arXiv ID, submission date, title, authors, and a full abstract.

Features

  • Extracts articles from arXiv using BeautifulSoup4 for HTML parsing.
  • Filters articles based on a date range provided by the user.
  • Outputs a Markdown file with a table of articles.
  • Each article entry includes metadata such as the title, authors, arXiv ID, URL, tags, and abstracts.
  • Offers command-line interface options to specify the date range and output directory.

Requirements

Before running the script, ensure that you have the following packages installed:

  • requests
  • bs4 (BeautifulSoup4)
  • pydantic
  • rich
  • click

You can install the required packages using pip:

pip install requests beautifulsoup4 pydantic rich click

Usage

To use the script, execute it from the command line with the following options:

python arxiv_extractor.py extract --query "Artifical Intelligence" --from-date YYYY-MM-DD --to-date YYYY-MM-DD --target-dir /path/to/directory
  • --query: The query
  • --from-date: The start date for the articles to be extracted. (format: YYYY-MM-DD)
  • --to-date: The end date for the articles. (format: YYYY-MM-DD)
  • --target-dir: The directory where the Markdown file will be saved.

Example

python arxiv_extractor.py extract --from-date 2023-03-01 --to-date 2023-03-31 --target-dir ./articles

This will extract articles from March 1, 2023, to March 31, 2023, and save the Markdown file in the ./articles directory.

Installation with pipx

pipx install git+https://github.com/raphaelmansuy/arxiv_tool.git --force --include-dep

Output

The script creates a Markdown file named with the date range of the articles, for example, "2023-03-01_to_2023-03-31-arxiv.md". The file contains a table with the following columns:

  • Arxiv Number
  • Submitted
  • Title
  • Authors
  • Abstract

Each row corresponds to an article, and the Arxiv Number is linked to the article's page on arXiv.org.

Contributing

Contributions are welcome! Feel free to submit pull requests or create issues for bugs and feature requests.

License

This script is provided as-is, with no warranty. You are free to use, modify, and distribute it under the terms of your choosing. Please check with the arXiv terms of service regarding the use of their data.

Keep in mind that this README.md is a template and should be modified to fit your project's specifics and additional details as needed. If you have any particular sections or details you'd like to add or modify, please let me know, and I can assist you further.

arxiv_tool's People

Contributors

raphaelmansuy avatar

Stargazers

 avatar Bsh avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.