Giter Site home page Giter Site logo

wilvk / githubdl Goto Github PK

View Code? Open in Web Editor NEW
18.0 3.0 1.0 23 KB

A tool for downloading individual files/directories from Github or Github Enterprise.

License: MIT License

Shell 4.13% Python 94.71% Dockerfile 1.15%
github-enterprise python json github pypi pip3

githubdl's Introduction

PyPI version Downloads Run Status

Github Path Downloader

A tool for downloading individual files/directories from Github or Github Enterprise.

This circumvents the requirement to clone a complete repository.

Requirements:

  • Python 3.4+
  • A Github or Github Enterprise Account

Installation:

pip:

$ pip install githubdl

http:

$ pip install git+https://github.com/wilvk/githubdl.git

ssh:

$ pip install git+ssh://[email protected]:wilvk/githubdl.git

from clone:

$ git clone [email protected]:wilvk/githubdl.git
$ cd githubdl
$ pip install -e .

Usage:

Obtaining a Github token:

You will need a token from either Github Enterprise or Github as this package works with the Github v3 API.

To do this:

  • Log into your Github account
  • Click the Avatar Menu in the top-right corner, and select Settings
  • On the Settings page, from the menu on the left-hand side, select Developer Settings
  • From the Developer Settings page, from the menu, select Personal access tokens
  • Click the Generate new token button
  • Enter a name for the token. The token should only require the read:org permission specified.

There are also instructions on how to do this here.

Usage (from the commandline):

With your new Github token, export it as the environment variable GIT_TOKEN.

On Unix/Linux:

$ export GIT_TOKEN=1234567890123456789012345678901234567890123

On Windows:

C:\> set GIT_TOKEN=1234567890123456789012345678901234567890123

Single file:

Then, for example, to download a file called README.md from the repository http://github.com/wilvk/pbec:

$ githubdl -u "http://github.com/wilvk/pbec" -f "README.md"
2018-05-12 07:19:16,934 - root         - INFO     - Requesting file: README.md at url: https://api.github.com/repos/wilvk/pbec/contents/README.md
2018-05-12 07:19:18,165 - root         - INFO     - Writing to file: README.md

Entire directory:

$ githubdl -u "http://github.com/wilvk/pbec" -d "support"
2018-05-12 07:19:41,667 - root         - INFO     - Retrieving a list of files for directory: support
2018-05-12 07:19:41,668 - root         - INFO     - Requesting file: support at url: https://api.github.com/repos/wilvk/pbec/contents/support
2018-05-12 07:19:42,978 - root         - INFO     - Requesting file: support/Screen Shot 2017-12-10 at 9.27.56 pm.png at url: https://api.github.com/repos/wilvk/pbec/contents/support/Screen Shot 2017-12-10 at 9.27.56 pm.png
2018-05-12 07:19:46,274 - root         - INFO     - Writing to file: support/Screen Shot 2017-12-10 at 9.27.56 pm.png
2018-05-12 07:19:46,286 - root         - INFO     - Retrieving a list of files for directory: support/docker
...

Entire repository:

$ githubdl -u "http://github.com/wilvk/pbec" -d "/" -t "."

...

Note: if -t is not set, output will go to your / directory.

By commit hash:

Single file from a specific commit:

$ githubdl -u "http://github.com/wilvk/pbec" -f "README.md" -r "c29eb5a5d364870a55c0c22f203f8c4e2ce1c638"

...

Entire directory from a specific commit:

$ githubdl -u "http://github.com/wilvk/pbec" -d "support" -r "c29eb5a5d364870a55c0c22f203f8c4e2ce1c638"

...

Entire repository from a specific commit:

$ githubdl -u "http://github.com/wilvk/pbec" -d "/" -r "c29eb5a5d364870a55c0c22f203f8c4e2ce1c638" -t "."

...

Note: if -t is not set, output will go to your / directory.

Entire repository from a specific commit, with submodules (as specified in .gitmodules):

$ githubdl -u "http://github.com/wilvk/pbec" -d "/" -r "c29eb5a5d364870a55c0c22f203f8c4e2ce1c638" -t "." -s

...

List all tags for a repository in JSON:

$ githubdl -u "http://github.com/wilvk/pbec" -a

List all branches for a repository in JSON:

$ githubdl -u "http://github.com/wilvk/pbec" -b

Options:

Current options are:

$ githubdl --help             or     -h
           --file                    -f
           --dir                     -d
           --url (required)          -u
           --target                  -t
           --git_token               -g
           --log_level               -l
           --reference               -r
           --tags                    -a
           --branches                -b
           --submodules              -s

Logging:

Valid log levels are: DEBUG, INFO, WARN, ERROR, CRITICAL

References:

References can be applied to file and directory download only and consist of valid:

  • repository tags
  • commit SHAs
  • branch names.

Usage (as a package):

Loading the package (in a REPL):

$ python
Python 3.4.8 (default, Feb  7 2018, 02:31:08)
[GCC 5.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import githubdl

Downloading a directory:

Passing in a token:

>>> githubdl.dl_dir("https://github.com/wilvk/pbec", "support", github_token="1234567890123456789012345678901234567890123")

Token as an environment variable:

In bash:

$ export GIT_TOKEN=1234567890123456789012345678901234567890123

In Python:

>>> githubdl.dl_dir("https://github.com/wilvk/pbec", "support")

Saving to a different path:

>>> githubdl.dl_dir("https://github.com/wilvk/pbec", "support", "support_new")

Saving to a different path with submodules:

>>> githubdl.dl_dir("https://github.com/wilvk/pbec", "support", "support_new", submodules=True)

Downloading a file:

Passing in a token:

>>> githubdl.dl_file("https://github.com/wilvk/pbec", "README.md", github_token="1234567890123456789012345678901234567890123")

Token as an environment variable:

In bash:

$ export GIT_TOKEN=1234567890123456789012345678901234567890123

In Python:

>>> githubdl.dl_file("https://github.com/wilvk/pbec", "README.md")

Saving with a different filename:

>>> githubdl.dl_file("https://github.com/wilvk/pbec", "README.md", "NEW_README.md")

Extended options:

File download options:

Only repo_url and file_name are required.

  def dl_file(repo_url, file_name, target_filename='', github_token='', log_level='', reference=''):

Directory download options:

Only repo_url and base_path are required.

  def dl_dir(repo_url, base_path, target_path='', github_token='', log_level='', reference='', submodules=''):

Tags download options:

Only repo_url is required.

  def dl_tags(repo_url, github_token='', log_level=''):

Branches download options:

Only repo_url is required.

  def dl_branches(repo_url, github_token='', log_level=''):

A note on logging:

Log level is passed in as logging variable. e.g.

>>> import logging
>>> import githubdl
>>> githubdl.dl_file("http://github.com/wilvk/pbec", "README.md", log_level=logging.DEBUG)

Tests:

$ auto/run-tests

Note: You will have to have a Github token exported as GIT_TOKEN to run the tests.

githubdl's People

Contributors

wilvk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

akamit91

githubdl's Issues

Exception: 'str' object has no attribute 'get'

When use file by file I don't get any problem but when I tried to download a directory a got this

The command was
githubdl -u "https://github.com/eugenp/tutorials.git" -d "spring-cloud-data-flow/"

also tried
githubdl -u "https://github.com/eugenp/tutorials.git" -d "spring-cloud-data-flow/" -t "."

Fix Readme.md to be Readme.rtf

Allows readme to show correctly in Pypi.

Will also need to move existing readme.md to docs path and point githubdl.seso.io to /docs/readme.md

Error when downloading

Might be related to #4 (though I am on OSX)
Or maybe it doesn't work downloading from the "history" if a repo.

Trying to download the directory "raptor" from the directory/url
https://github.com/EsotericSoftware/spine-runtimes/tree/1c717e2d4402e69dd6fe81bf1bb3d8d6d0a58447/example

Gives me error

(37) ➜  ~ githubdl -u "https://github.com/EsotericSoftware/spine-runtimes/tree/1c717e2d4402e69dd6fe81bf1bb3d8d6d0a58447/example" -d "raptor"
2018-11-20 11:06:01,007 - root         - INFO     - Retrieving a list of files for directory: raptor
2018-11-20 11:06:01,007 - root         - INFO     - repo_name: EsotericSoftware/spine-runtimes/tree/1c717e2d4402e69dd6fe81bf1bb3d8d6d0a58447/example api_path: contents request_string: /raptor
2018-11-20 11:06:01,007 - root         - INFO     - Requesting file: raptor at url: https://api.github.com/repos/EsotericSoftware/spine-runtimes/tree/1c717e2d4402e69dd6fe81bf1bb3d8d6d0a58447/example/contents/raptor
2018-11-20 11:06:02,421 - root         - CRITICAL - Unable to retrieve list of files from response.
 Exception: 'str' object has no attribute 'get'
 Response: {'message': 'Not Found', 'documentation_url': 'https://developer.github.com/v3'}

bug

i execute

githubdl -u https://github.com/jrowberg/i2cdevlib/tree/master/Arduino/MPU6050 -d i2cdev

and the output was like this:

2018-11-06 15:13:32,057 - root         - INFO     - Retrieving a list of files for directory: i2cdev
2018-11-06 15:13:32,057 - root         - INFO     - repo_name: jrowberg/i2cdevlib/tree/master/Arduino/MPU6050 api_path: contents request_string: /i2cdev
2018-11-06 15:13:32,057 - root         - INFO     - Requesting file: i2cdev at url: https://api.github.com/repos/jrowberg/i2cdevlib/tree/master/Arduino/MPU6050/contents/i2cdev
2018-11-06 15:13:32,060 - requests.packages.urllib3.connectionpool - INFO     - Starting new HTTPS connection (1): api.github.com
2018-11-06 15:13:35,848 - root         - CRITICAL - Unable to retrieve list of files from response.
 Exception: 'str' object has no attribute 'get'
 Response: {'message': 'Not Found', 'documentation_url': 'https://developer.github.com/v3'}

feature_request(enhancement): submodules support

1. Summary

It would be nice, if GitHub Path Downloader will support GitHub submodules downloading.

2. Argumentation

I use Pelican — static site generator. It haven't normal manager for plugins downloading. At the time, that automatically build my site:

  1. User must clone all big https://github.com/getpelican/pelican-plugins repository.
  2. Or I need add third-party, not my plugins to my repository.

In my opinion, better solution — download specific plugins. But I don't see the easy way to download submodules via GitHub Path Downloader.

3. Data

  • main.py
"""githubdl folders."""
import githubdl
import yaml

YAMLCONFIG = yaml.load(open('pelicanvariables.yaml', encoding='utf-8'))
[githubdl.dl_dir(
    "https://github.com/getpelican/pelican-plugins",
    plugin,
    "pelican-plugins/" + plugin) for plugin in YAMLCONFIG["variables"]]

4. Expected behavior

If pelicanvariables.yaml:

variables:
- random_article
- slim

Output:

Python 3.6.1 (default, Dec 2015, 13:05:11)
[GCC 4.8.2] on linux
2019-01-18 06:15:11,572 - root         - INFO     - Retrieving a list of files for directory: random_article
2019-01-18 06:15:11,573 - root         - INFO     - repo_name: getpelican/pelican-plugins api_path: contents request_string: /random_article
2019-01-18 06:15:11,573 - root         - INFO     - Requesting file: random_article at url:https://api.github.com/repos/getpelican/pelican-plugins/contents/random_article
2019-01-18 06:15:11,791 - root         - INFO     - repo_name: getpelican/pelican-plugins api_path: contents request_string: /random_article/Readme.md
2019-01-18 06:15:11,791 - root         - INFO     - Requesting file: random_article/Readme.md at url: https://api.github.com/repos/getpelican/pelican-plugins/contents/random_article/Readme.md
2019-01-18 06:15:12,047 - root         - INFO     - Writing to file: pelican-plugins/random_article/Readme.md
2019-01-18 06:15:12,047 - root         - INFO     - repo_name: getpelican/pelican-plugins api_path: contents request_string: /random_article/__init__.py
2019-01-18 06:15:12,047 - root         - INFO     - Requesting file: random_article/__init__.py at url: https://api.github.com/repos/getpelican/pelican-plugins/contents/random_article/__init__.py
2019-01-18 06:15:12,257 - root         - INFO     - Writing to file: pelican-plugins/random_article/__init__.py
2019-01-18 06:15:12,261 - root         - INFO     - repo_name: getpelican/pelican-plugins api_path: contents request_string: /random_article/random_article.py
2019-01-18 06:15:12,263 - root         - INFO     - Requesting file: random_article/random_article.py at url: https://api.github.com/repos/getpelican/pelican-plugins/contents/random_article/random_article.py
2019-01-18 06:15:12,504 - root         - INFO     - Writing to file: pelican-plugins/random_article/random_article.py
2019-01-18 06:15:12,504 - root         - INFO     - Retrieving a list of files for directory: slim
2019-01-18 06:15:12,504 - root         - INFO     - repo_name: getpelican/pelican-plugins api_path: contents request_string: /slim
2019-01-18 06:15:12,504 - root         - INFO     - Requesting file: slim at url: https://api.github.com/repos/getpelican/pelican-plugins/contents/slim
2019-01-18 06:15:12,707 - root         - INFO     - repo_name: getpelican/pelican-plugins api_path: contents request_string: /slim/README.md
2019-01-18 06:15:12,707 - root         - INFO     - Requesting file: slim/README.md at url:https://api.github.com/repos/getpelican/pelican-plugins/contents/slim/README.md
2019-01-18 06:15:12,923 - root         - INFO     - Writing to file: pelican-plugins/slim/README.md
2019-01-18 06:15:12,923 - root         - INFO     - repo_name: getpelican/pelican-plugins api_path: contents request_string: /slim/__init__.py
2019-01-18 06:15:12,923 - root         - INFO     - Requesting file: slim/__init__.py at url: https://api.github.com/repos/getpelican/pelican-plugins/contents/slim/__init__.py
2019-01-18 06:15:13,123 - root         - INFO     - Writing to file: pelican-plugins/slim/__init__.py
2019-01-18 06:15:13,124 - root         - INFO     - repo_name: getpelican/pelican-plugins api_path: contents request_string: /slim/slim.py
2019-01-18 06:15:13,124 - root         - INFO     - Requesting file: slim/slim.py at url: https://api.github.com/repos/getpelican/pelican-plugins/contents/slim/slim.py
2019-01-18 06:15:13,357 - root         - INFO     - Writing to file: pelican-plugins/slim/slim.py

Plugins (folders) random_article and slim successful download for me to pelican-plugins folder.

5. Actual behavior

Else pelicanvariables.yaml:

variables:
- replacer
Python 3.6.1 (default, Dec 2015, 13:05:11)
[GCC 4.8.2] on linux
2019-01-18 06:12:27,374 - root         - INFO     - Retrieving a list of files for directory: replacer
2019-01-18 06:12:27,375 - root         - INFO     - repo_name: getpelican/pelican-plugins api_path: contents request_string: /replacer
2019-01-18 06:12:27,375 - root         - INFO     - Requesting file: replacer at url: https://api.github.com/repos/getpelican/pelican-plugins/contents/replacer
2019-01-18 06:12:27,568 - root         - CRITICAL - Unable to retrieve list of files from response.
 Exception: 'str' object has no attribute 'get'
 Response: {'name': 'replacer', 'path': 'replacer', 'sha': '7881a1d838b599dd69c8b83a90f00fee96a3ae44', 'size': 0, 'url': 'https://api.github.com/repos/getpelican/pelican-plugins/contents/replacer?ref=master', 'html_url': 'https://github.com/narusemotoki/replacer/tree/7881a1d838b599dd69c8b83a90f00fee96a3ae44', 'git_url': 'https://api.github.com/repos/narusemotoki/replacer/git/trees/7881a1d838b599dd69c8b83a90f00fee96a3ae44', 'download_url': None, 'type': 'submodule', 'submodule_git_url': 'https://github.com/narusemotoki/replacer', '_links': {'self': 'https://api.github.com/repos/getpelican/pelican-plugins/contents/replacer?ref=master', 'git': 'https://api.github.com/repos/narusemotoki/replacer/git/trees/7881a1d838b599dd69c8b83a90f00fee96a3ae44', 'html': 'https://github.com/narusemotoki/replacer/tree/7881a1d838b599dd69c8b83a90f00fee96a3ae44'}}

It would be nice, if 'type': 'submodule', GitHub Path Downloader download value of submodule_git_url key.

Thanks.

feature request: specify branch

Hello,

is there any way to specify the branch from where I want to download an entire folder?
I only found the "entire branch download" option.

Add ability to interpret path and reference from full url

As per issue #6 , it can be confusing when looking at a github web url and translating it into the format -u -d -r

for example:

from:

https://github.com/EsotericSoftware/spine-runtimes/tree/1c717e2d4402e69dd6fe81bf1bb3d8d6d0a58447/examples

to:

githubdl -u "https://github.com/EsotericSoftware/spine-runtimes" -d "examples/raptor" -r 1c717e2d4402e69dd6fe81bf1bb3d8d6d0a58447

this should be detected and taken care of from the url provided.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.