notglop / docker-drag Goto Github PK

View Code? Open in Web Editor NEW

581.0 581.0 156.0 33 KB

Download image from the Docker Hub HTTPS API

License: GNU General Public License v3.0

Python 100.00%

docker-drag's People

Contributors

Stargazers

Watchers

Forkers

devhliu yardenfi seidzade lombmartgh tarekkhal jeffque vinv1n jcorvino sunsheng fimreal cgqabc skumarvvn alireza-delavari rnz ccpu asskicker0214 magnologan guyomer mrayyanhkhan rhenancp linkghm abz89 hshafiee xq001 anupammehtasf buffettliu kalimuthu-velappan zhipengzuo bilalesi wingslikeeagles xieydd chris-rescale muthukumarse rskjetlein shankerj shinhwagk bobby-lin whalecold zeng-hai sunze delphieritas jshpng lenrys29 yyxida lengjiayi notevery watermeion cryzlasm russellpanda scriptshadow kerven88 wsjhk ikingye ubuntulover09 y1meng chantzish ingenieria-automate vmalinics0 mchandrakandh sivakguru duyongtju ecohover letterligo ljw-linux2007 dvb-cfpb issac-zy findlayfeng mrlantian musery m-halliday iatbzh grubatao nearpengju123 licy183 xdliubuaa war-s bertreyking bmhoang xfdingscut jianbing910325 harshvaragiya c88888 hanavi xwbxxx lcdumort hanmillee teosoft123 shahab271069 dreampy magicwenli nobugeveryday maomaochong199 antimomentum whereiwillgo saintyue hb02 jeffreygohkw sudeepparajuli t3ro luk3ya0

docker-drag's Issues

Failed to rerun this script when the first running interrupted by a network problem

This script can't continue to download the target file when the request fails.

Traceback (most recent call last):
  File "docker_pull.py", line 93, in <module>
    os.mkdir(imgdir)
FileExistsError: [Errno 17] File exists: 'tmp_imagename_latest'

How to change images source?

Hi, I am wondering which part of the code identify the image source? The default image source got a slow downloading speed.

Please add opensource license

Can you please add an open source license so that people may use this software?

Fake layer ids

Hi,

I am currently porting this tool to .net core tool platform. So i have experiment about some cases.
That layer hashes calculated with sha256(layer.tar).

Also, recalculate layers when load image.tar

Could you check this ?

repository full name is not reserved

This is good tool. The docker host linux machine in my office doesn't have internet access. I used this tool to download the docker image from docker hub in windows machine which has internet access. It works great.
But I want this tool keep the repository full name for me. I hope the repository name is just same as docker hub after I load the tar file to docker engine, maybe some good repository name format policy would apply inside this tool?
Below is my operations:

> python docker_pull.py library/nginx:1.19.3-alpine
Creating image structure in: tmp_nginx_1.19.3-alpine
188c0c94c7c5: Pull complete [2796860]
61c2c0635c35: Pull complete [6761747]
378d0a9d4d5f: Pull complete [601]
2fe865f77305: Pull complete [897]
b92535839843: Pull complete [664]
Docker image pulled: library_nginx.tar

Then I renamed library_nginx.tar to library_nginx-1.19.3-alpine.tar and upload it to linux machine which installed docker engine.

$ sudo docker load -i library_nginx-1.19.3-alpine2.tar
ace0eda3e3be: Loading layer [==================================================>]  5.843MB/5.843MB
4daeb7840e4d: Loading layer [==================================================>]  17.45MB/17.45MB
835f5b67679c: Loading layer [==================================================>]  3.072kB/3.072kB
d0e26daf1f58: Loading layer [==================================================>]  4.096kB/4.096kB
8d6d1951ab0a: Loading layer [==================================================>]  3.584kB/3.584kB
Loaded image: nginx:1.19.3-alpine
$ sudo docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
nginx               1.19.3-alpine       4efb29ff172a        13 days ago         21.8MB

As you can see the "library/" section is lost in REPOSITORY name.

How to install downloaded images to docker?

save to /var/lib/docker/overlay2 and make it visable in docker images

error with python 3.7.4

Cannot fetch manifest

$ python docker_pull.py bkimminich/juice-shop
[-] Cannot fetch manifest for bkimminich/juice-shop [HTTP 404]
b'{"errors":[{"code":"MANIFEST_UNKNOWN","message":"OCI index found, but accept header does not support OCI indexes"}]}\n'

[Feature Request] Progress bar using tqdm

It would be good to have a progress bar in this for loop:

docker-drag/docker_pull.py

Lines 98 to 144 in 249fc4f

    
           for layer in layers: 
        
           	ublob = layer['digest'] 
        
           	# FIXME: Creating fake layer ID. Don't know how Docker generates it 
        
           	fake_layerid = hashlib.sha256((parentid+'\n'+ublob+'\n').encode('utf-8')).hexdigest() 
        
           	layerdir = imgdir + '/' + fake_layerid 
        
           	os.mkdir(layerdir) 
        
           	# Creating VERSION file 
        
           	file = open(layerdir + '/VERSION', 'w') 
        
           	file.write('1.0') 
        
           	file.close() 
        
           	# Creating layer.tar file 
        
           	sys.stdout.write(ublob[7:19] + ': Downloading...') 
        
           	sys.stdout.flush() 
        
           	bresp = requests.get('https://{}/v2/{}/blobs/{}'.format(registry, repository, ublob), headers=auth_head, verify=False) 
        
           	if (bresp.status_code != 200): 
        
           		bresp = requests.get(layer['urls'][0], headers=auth_head, verify=False) 
        
           		if (bresp.status_code != 200): 
        
           			print('\rERROR: Cannot download layer {} [HTTP {}]'.format(ublob[7:19], bresp.status_code, bresp.headers['Content-Length'])) 
        
           			print(bresp.content) 
        
           			exit(1) 
        
           	print("\r{}: Pull complete [{}]".format(ublob[7:19], bresp.headers['Content-Length'])) 
        
           	content[0]['Layers'].append(fake_layerid + '/layer.tar') 
        
           	file = open(layerdir + '/layer.tar', "wb") 
        
           	mybuff = BytesIO(bresp.content) 
        
           	unzLayer = gzip.GzipFile(fileobj=mybuff) 
        
           	file.write(unzLayer.read()) 
        
           	unzLayer.close() 
        
           	file.close() 
        
           	# Creating json file 
        
           	file = open(layerdir + '/json', 'w') 
        
           	# last layer = config manifest - history - rootfs 
        
           	if layers[-1]['digest'] == layer['digest']: 
        
           		# FIXME: json.loads() automatically converts to unicode, thus decoding values whereas Docker doesn't 
        
           		json_obj = json.loads(confresp.content) 
        
           		del json_obj['history'] 
        
           		del json_obj['rootfs'] 
        
           	else: # other layers json are empty 
        
           		json_obj = json.loads(empty_json) 
        
           	json_obj['id'] = fake_layerid 
        
           	if parentid: 
        
           		json_obj['parent'] = parentid 
        
           	parentid = json_obj['id'] 
        
           	file.write(json.dumps(json_obj)) 
        
           	file.close()

Maybe something like tqdm? https://github.com/tqdm/tqdm

Downloading container image from Azure Container Registry

Hello,

I am trying to download a container image from my azure container registry. The azure container registry has username and password authentication.

But I am facing the below error.

Traceback (most recent call last):
  File "/home/user/docker_pull.py", line 72, in <module>
    auth_head = get_auth_head('application/vnd.docker.distribution.manifest.v2+json')
  File "/home/user/docker_pull.py", line 54, in get_auth_head
    access_token = resp.json()['token']
KeyError: 'token'

Can't find velero's docker image

Hello, thank you for the useful script, super appreciate that.

I got 404 when download velero. The dockehub url is https://hub.docker.com/r/velero/velero

$ python3 docker_pull.py velero/velero
target is  https://registry-1.docker.io/v2/velero/velero/manifests/latest
[-] Cannot fetch manifest for velero/velero [HTTP 404]
b'{"errors":[{"code":"MANIFEST_UNKNOWN","message":"OCI index found, but accept header does not support OCI indexes"}]}

How `docker` calculates the layer ids

docker-drag/docker_pull.py

Lines 118 to 119 in 5413165

    
           # FIXME: Creating fake layer ID. Don't know how Docker generates it 
        
           fake_layerid = hashlib.sha256((parentid+'\n'+ublob+'\n').encode('utf-8')).hexdigest()

For each layer docker creates a v1 config, and a layer id is basically a digest of the v1 config, another layer id, and the parent layer id. If you're interested I can probably describe it more precisely. And possibly how other parts of docker save work.

python docker_pull.py centos -> KeyError: 'layers'

Thanks for this wonderful script.
But I have encountered the following problem.

(python2) C:\Users\red.suh>python docker_pull.py centos
Traceback (most recent call last):
  File "docker_pull.py", line 40, in <module>
    layers = resp.json()['layers']
KeyError: 'layers'

For nginx or sonatype/nexus3, there is no problem.
How can I solve the problem?
Thanks

KeyError: 'content-length'

I am calling the script as follows:

python docker_pull.py registry.access.redhat.com/ubi8/ubi

The script will output the following:

78afc5364ad2: Downloading...

And then it fails:

Traceback (most recent call last):
  File "docker_pull.py", line 155, in <module>
    unit = int(bresp.headers['Content-Length']) / 50
  File "C:\Users\<NTID>\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\structures.py", line 52, in __getitem__
    return self._store[key.lower()][1]
KeyError: 'content-length'

I've made a few adjustments to your script, primarily adding the proxies= option for calls to requests functions as I'm stuck behind a proxy. This allows me to communicate with the Redhat registry. So line 155 is actually line 141 as referenced below.

docker-drag/docker_pull.py

Line 141 in 5413165

unit = int(bresp.headers['Content-Length']) / 50

I've added a print statement to dump the bresp.headers and Content-Length is not in there:

{
  'Accept-Ranges': 'bytes',
  'Content-Type': 'text/plain',
  'ETag': '"ced9e6d20e1ac931689399b68b0dd6a4:1588112355"',
  'Last-Modified': 'Tue, 28 Apr 2020 22:03:16 GMT',
  'Server': 'AkamaiNetStorage',
  'Vary': 'Accept-Encoding',
  'Date': 'Mon, 18 May 2020 17:57:51 GMT',
  'Transfer-Encoding': 'chunked',
  'X-Docker-Size': '-1',
  'Cache-Control': 'proxy-revalidate',
  'Connection': 'Keep-Alive',
  'Content-Encoding': 'gzip'
}

I've confirmed with a machine at home (different OS and no proxy) that the same error is experienced. I understand that the script relies on the Docker HTTPS API v2 so I apologize if the redhat registry is actually v1. I'm new to docker and do not know how to check for this.

Cannot pull OCI images - KeyError: 'layers'

python docker_pull.py postgres

I run the script in Windows 11/ Python 3.11 and get the following error:

Traceback (most recent call last):
  File "D:\share\docker_pull.py", line 87, in <module>
    layers = resp.json()['layers']
             ~~~~~~~~~~~^^^^^^^^^^
KeyError: 'layers'

very useful, thank you

Big files(e. g. gitlab/gitlab-ce:12.3.7-ce.0) aren't supported

Logs:

Creating image structure in: tmp_gitlab-ce_12.3.7-ce.0
e80174c8b43b: Pull complete [44144090]
d1072db285cc: Pull complete [529]
858453671e67: Pull complete [849]
3d07b1124f98: Pull complete [170]
655fb0f51b08: Pull complete [26257584]
063c37e78c5c: Pull complete [141]
a0398d68068f: Pull complete [146]
f41e790a20a6: Pull complete [236]
8eb8c4ceb762: Pull complete [4095]
e3a502127d8c: Pull complete [705893899]
Traceback (most recent call last):
  File "docker_pull.py", line 129, in <module>
    file.write(unzLayer.read())
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\gzip.py", li
ne 276, in read
    return self._buffer.read(size)
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\gzip.py", li
ne 471, in read
    uncompress = self._decompressor.decompress(buf, size)
MemoryError

docker login question

I try to use Oauth2 to get token, but failed, do you have any suggestions?

Able to tell the calculation document about fake_layerid

	fake_layerid = hashlib.sha256((parentid+'\n'+ublob+'\n').encode('utf-8')).hexdigest()
	layerdir = imgdir + '/' + fake_layerid
	os.mkdir(layerdir)

I used the docker save -o test to find that the fake_layerid is different from the script generated.

[Feature Request] Update documentation to provide a sample for how to use the container after downloading it.

First super useful tool, made use of it at work after finding it through a stackoverflow post. Thank you for making such a straightforward but functional tool.

Next, minor nitpick while docker load is contained within the main documentation for docker it might not hurt to throw the basic use case in the documentation so people will have some context for how to use this tool effectively.

A slight detail

Hi,

Into file 'docker_pull.py', line 164 and 165 :

	# FIXME: json.loads() automatically converts to unicode, thus decoding values whereas Docker doesn't
	json_obj = json.loads(confresp.content.decode("utf8"))

Instead of :

	json_obj = json.loads(confresp.content)

For me it fixed the failure.

pulls from nvidia gpu cloud

First of all, thanks for the script!

However, I couldn't make it work for

docker pull nvcr.io/nvidia/tensorflow:19.10-py3

via:

python docker_pull.py nvcr.io/nvidia/tensorflow:19.05-py3

Error message:

Traceback (most recent call last):
  File "docker_pull.py", line 47, in <module>
    reg_service = resp.headers['WWW-Authenticate'].split('"')[3]
IndexError: list index out of range

Use with authorization needed registry

Hi any possibilities to use this script with authorization needed registry ? Thanks

UNAUTHORIZED, authentication required, when running against a private image

$ python docker_pull.py foo/bar:latest
[-] Cannot fetch manifest for foo/bar [HTTP 401]
b'{"errors":[{"code":"UNAUTHORIZED","message":"authentication required","detail":[{"Type":"repository","Class":"","Name":"foo/bar","Action":"pull"}]}]}\n'

But if I run:
docker pull foo/bar:latest

it works.

I'm not sure how to pass my creds to the script.

Re-authenticate upon token expiration

trying

524$ python2  docker_pull.py "jupyter/datascience-notebook"
/usr/lib/python2.7/site-packages/urllib3/connectionpool.py:852: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning)
Creating image structure in: tmp_datascience-notebook_latest
a48c500ed24e: Pull complete [30957448]
1e1de00ff7e1: Pull complete [841]
0330ca45a200: Pull complete [412]
471db38bcfbf: Pull complete [849]
0b4aba487617: Pull complete [162]
1bac85b3a63e: Pull complete [19696719]
245be47b44f6: Pull complete [424313]
ef168d10cf08: Pull complete [666]
3f40baab49e8: Pull complete [1891]
1074310668a8: Pull complete [6005]
acab6d938518: Pull complete [147]
d3c413e667b9: Pull complete [72063649]
63b84d46215a: Pull complete [11195]
e2aa43484a2e: Pull complete [93156146]
e45a3ec35504: Pull complete [2160]
b91bbc043eab: Pull complete [434]
8842220992fc: Pull complete [691]
fc8f34d51deb: Pull complete [1006]
5f6edb450186: Pull complete [1015]
44257488fae5: Pull complete [824729788]
540df7774880: Pull complete [71414708]
178f3a1a18b4: Pull complete [271621569]
03528a45986d: Pull complete [456599]
5c52a47b5569: Pull complete [10279]
1f67b31a20f8: Pull complete [17277944]
70de4b41273e: Pull complete [171]
Traceback (most recent call last):
  File "docker_pull.py", line 86, in <module>
    file.write(unzLayer.read())
  File "/usr/lib/python2.7/gzip.py", line 261, in read
    self._read(readsize)
  File "/usr/lib/python2.7/gzip.py", line 303, in _read
    self._read_gzip_header()
  File "/usr/lib/python2.7/gzip.py", line 197, in _read_gzip_header
    raise IOError, 'Not a gzipped file'
IOError: Not a gzipped file

Fails when downloading Windows images

This program fails when downloading Windows server core images. For example,
python docker_pull.py mcr.microsoft.com/windows/nanoserver
will fail.

I believe the issue lies in the fact that microsoft uses https://mcr.microsoft.com/v2/ as it's base API url. I made a work around by setting repo = 'windows' in docker_pull.py and using the microsoft.com link as the base API url which seems to be working for now. Authentication still goes through https://auth.docker.io and works the same way you wrote it.

	for layer in layers:
	ublob = layer['digest']
	# FIXME: Creating fake layer ID. Don't know how Docker generates it
	fake_layerid = hashlib.sha256((parentid+'\n'+ublob+'\n').encode('utf-8')).hexdigest()
	layerdir = imgdir + '/' + fake_layerid
	os.mkdir(layerdir)

	# Creating VERSION file
	file = open(layerdir + '/VERSION', 'w')
	file.write('1.0')
	file.close()

	# Creating layer.tar file
	sys.stdout.write(ublob[7:19] + ': Downloading...')
	sys.stdout.flush()
	bresp = requests.get('https://{}/v2/{}/blobs/{}'.format(registry, repository, ublob), headers=auth_head, verify=False)
	if (bresp.status_code != 200):
	bresp = requests.get(layer['urls'][0], headers=auth_head, verify=False)
	if (bresp.status_code != 200):
	print('\rERROR: Cannot download layer {} [HTTP {}]'.format(ublob[7:19], bresp.status_code, bresp.headers['Content-Length']))
	print(bresp.content)
	exit(1)
	print("\r{}: Pull complete [{}]".format(ublob[7:19], bresp.headers['Content-Length']))
	content[0]['Layers'].append(fake_layerid + '/layer.tar')
	file = open(layerdir + '/layer.tar', "wb")
	mybuff = BytesIO(bresp.content)
	unzLayer = gzip.GzipFile(fileobj=mybuff)
	file.write(unzLayer.read())
	unzLayer.close()
	file.close()

	# Creating json file
	file = open(layerdir + '/json', 'w')
	# last layer = config manifest - history - rootfs
	if layers[-1]['digest'] == layer['digest']:
	# FIXME: json.loads() automatically converts to unicode, thus decoding values whereas Docker doesn't
	json_obj = json.loads(confresp.content)
	del json_obj['history']
	del json_obj['rootfs']
	else: # other layers json are empty
	json_obj = json.loads(empty_json)
	json_obj['id'] = fake_layerid
	if parentid:
	json_obj['parent'] = parentid
	parentid = json_obj['id']
	file.write(json.dumps(json_obj))
	file.close()

notglop / docker-drag Goto Github PK

docker-drag's People

Contributors

Stargazers

Watchers

Forkers

docker-drag's Issues

Recommend Projects

Recommend Topics

Recommend Org