Giter Site home page Giter Site logo

korenyoni / opus-api Goto Github PK

View Code? Open in Web Editor NEW
14.0 2.0 5.0 121 KB

OPUS (opus.nlpl.eu) Python3 API

Home Page: https://k0ren.com

License: MIT License

Makefile 14.62% Python 85.38%
python machine-learning opus api language-model parallel-corpus parallel-corpora corporate corpora corpus

opus-api's Introduction

/$$$$$$            /$$$$$$$  /$$   /$$  /$$$$$$

/$$__ $$ | $$__ $$| $$ | $$ /$$__ $$

/$$$$$$$| $$ $$ /$$$$$$ | $$ $$| $$ | $$| $$ __/

/$$_____/| $$ | $$ /$$__ $$| $$$$$$$/| $$ | $$| $$$$$$

$$ | $$ | $$| $$ __/| $$____/ | $$ | $$ ____ $$
$$ | $$ | $$| $$ | $$ | $$ | $$ /$$ $$
ย $$$$$$$| $$$$$$/| $$ | $$ | $$$$$$/| $$$$$$/ _______/ ______/ __/ ______/ ______/

pypi build Documentation Status Updates

OPUS (opus.nlpl.eu) Python API

Requirements

Download PhantomJS and make sure its in your PATH, eg:

$ wget -qO- https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-linux-x86_64.tar.bz2 | tar xvj -C ~/.local/bin --strip 2 phantomjs-2.1.1-linux-x86_64/bin

Installation

Stable release

To install Opus API, run this command in your terminal:

$ pip install opus_api

This is the preferred method to install Opus API, as it will always install the most recent stable release.

If you don't have pip installed, this Python installation guide can guide you through the process.

From sources

The sources for Opus API can be downloaded from the Github repo.

You can either clone the public repository:

$ git clone git://github.com/yonkornilov/opus_api

Or download the tarball:

$ curl  -OL https://github.com/yonkornilov/opus_api/tarball/master

Once you have a copy of the source, you can install it with:

$ make install

Usage

Find your languages:

$ opus_api langs

[
...
  {
    "description": "en (English)", 
    "id": 69, 
    "name": "en"
  },
  ...
  {
    "description": "ru (Russian)", 
    "id": 198, 
    "name": "ru"
  }...
...
]

Find corpora:

$ opus_api get en ru --maximum 300 --minimum 3

{
  "corpora": [
    {
      "id": 1, 
      "name": "OpenSubtitles2016", 
      "src_tokens": "157.5M", 
      "trg_tokens": "133.6M", 
      "url": "http://opus.nlpl.eu/download.php?f=OpenSubtitles2016%2Fen-ru.txt.zip"
    },
  ...
    {
      "id": 13, 
      "name": "KDE4", 
      "src_tokens": "1.8M", 
      "trg_tokens": "1.4M", 
      "url": "http://opus.nlpl.eu/download.php?f=KDE4%2Fen-ru.txt.zip"
    }
  ]
}

TODO

  1. Get: parallel corpora for formats other than MOSES and TMX
  2. New feature: query available languages for corpora set

Credits

This package's CLI is powered by click.

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

opus-api's People

Contributors

frankier avatar korenyoni avatar pyup-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

opus-api's Issues

Python 3?

Hi. I'm kind of interested in using this, but I'm using Python 3 for everything nowadays. It looks like this is intentionally Python 2 only. Would you be interested in moving to Python 3?

URLs are a bit wrong

  • Opus API version: 0.6.2
  • Python version: 3.6.12
  • Operating System: OS X 10.15.6

Description

I want pipe the opus_api to wget after extracting the URL from the resulting JSON object, but the urls are wrong because they are prefixed by http://opus.nlpl.eu/. Stripping this results in an appropriate URL.

What I Did

$ opus_api get en fa | jq '.corpora[].url'
/Users/erippeth/miniconda3/envs/opus-dl/lib/python3.6/site-packages/selenium/webdriver/phantomjs/webdriver.py:49: UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead
  warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless '
"http://opus.nlpl.eu/https://object.pouta.csc.fi/OPUS-OpenSubtitles/v2018/moses/en-fa.txt.zip"
"http://opus.nlpl.eu/https://object.pouta.csc.fi/OPUS-Tanzil/v1/moses/en-fa.txt.zip"
"http://opus.nlpl.eu/https://object.pouta.csc.fi/OPUS-TEP/v1/moses/en-fa.txt.zip"
"http://opus.nlpl.eu/https://object.pouta.csc.fi/OPUS-QED/v2.0a/moses/en-fa.txt.zip"
"http://opus.nlpl.eu/https://object.pouta.csc.fi/OPUS-Wikipedia/v1.0/moses/en-fa.txt.zip"
"http://opus.nlpl.eu/https://object.pouta.csc.fi/OPUS-GNOME/v1/moses/en-fa.txt.zip"
"http://opus.nlpl.eu/https://object.pouta.csc.fi/OPUS-TED2013/v1.1/moses/en-fa.txt.zip"
"http://opus.nlpl.eu/https://object.pouta.csc.fi/OPUS-infopankki/v1/moses/en-fa.txt.zip"
"http://opus.nlpl.eu/https://object.pouta.csc.fi/OPUS-KDE4/v2/moses/en-fa.txt.zip"
"http://opus.nlpl.eu/https://object.pouta.csc.fi/OPUS-Ubuntu/v14.10/moses/en-fa.txt.zip"
"http://opus.nlpl.eu/https://object.pouta.csc.fi/OPUS-GlobalVoices/v2017q3/moses/en-fa.txt.zip"
"http://opus.nlpl.eu/https://object.pouta.csc.fi/OPUS-ELRC_2922/v1/moses/en-fa.txt.zip"

Unpin dependencies

Currently the setup.py includes a requirements.txt, which requires specific versions of every requirement. This causes problems when combined with other software since they could require a version range which isn't compatible with the specific version required here.

Or consider another scenario: I need a newer version of e.g. requests, then I'm stuck because this package is pinned to an older version.

This article goes into a bit more depth about why setup.py should be treated differently to requirements.txt https://caremad.io/posts/2013/07/setup-vs-requirement/

My suggestion is that the dependencies should be moved into setup.py and the version should be changed to minimum versions (using >=) rather than exact version requirements.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.