pyflit's Introduction

README

Pyflit is a simple Python HTTP downloader. The features it supports are shown below.

Features

HTTP GET
multi-threaded fetch multiple URLs
multi-segment file fetch
gzip/deflate/bzip2 compression supporting
a simple progress-bar
download pause and resume
proxy supporting

Don't use this package in production!

This package is not well tested and buggy that should not be used in production!

Simple Tutorial

HTTP GET

First, get self defined URL opener object, you can specify some handlers to support cookie, authentication and other advanced HTTP features. If you want change the User-Agent or add Referer in the HTTP request headers, you can also given a self defined headers as argument. And more, you can turn on proxy by given a dictionary of proxy address. See the API reference for details.

Example:

handlers = [cookie_handler, redirect_handler]
headers = {'User-Agent': 'Mozilla/5.0 '
           '(Macintosh; Intel Mac OS X 10_9_4) '
           'AppleWebKit/537.77.4 (KHTML, like Gecko) '
           'Version/7.0.5 Safari/537.77.4'}
proxies = {'http': 'http://someproxy.com:8080'}

opener = flit.get_opener(handlers, headers, proxies)
u = opener.open("http://www.python.org")
resp = u.read()

Multiple URLs fetching

You can just call flit.flit_tasks() to fetch multiple URLs with specified working thread number, a generator will be returned and you can iterate it to process the data chunks.

Example:

from pyflit import flit

def chunk_process(chunk):
    """Output chunk information.
    """
    print "Status_code: %s\n%s\n%s \nRead-Size: %s\nHistory: %s\n" % (
        chunk['status_code'],
        chunk['url'],
        chunk['headers'],
        len(chunk['content']),
        chunk.get('history', None))

links = ['http://www.domain.com/post/%d/' % i for i in xrange(100, 200)]
thread_number = 5
opener = flit.get_opener([handlers [, headers [, proxies]]])
chunks = flit.flit_tasks(links, thread_number, opener)
for chunk in chunks:
    chunk_process(chunk)

Multiple segment file downloading

Multiple segment file downloading use multiple thread to download the separated part of the URL file, you can simply give two arguments: URL address and the segment number.

Example:

from pyflit import flit

url = "http://www.gnu.org/software/emacs/manual/pdf/emacs.pdf"
segment_number = 2
opener = flit.get_opener([handlers [, headers [, proxies]]])
flit.flit_segments(url, segment_number, opener)

Contributing

You can send pull requests via GitHub or help fix the bugs in the issues list.

pyflit's People

Contributors

Stargazers

Watchers

pyflit's Issues

fix suggestion.

When using the following script:

from pyflit import flit
url = "my_url.com"
segment_number = 4
headers = {'User-Agent': 'Mozilla/5.0 '
           '(Macintosh; Intel Mac OS X 10_9_4) '
           'AppleWebKit/537.77.4 (KHTML, like Gecko) '
           'Version/7.0.5 Safari/537.77.4'}
opener = flit.get_opener(headers=headers)
flit.flit_segments(url, segment_number,opener)

It returns the following error:

C:\Anaconda3\lib\site-packages\pyflit\utils.py in progressbar(total_volume, completed_volume, progress)
97 already * '█',
98 head * '▎',
---> 99 (left - head) * ' ',
100 total_volume / float(1024 * 1024),
101 progress)

TypeError: can't multiply sequence by non-int of type 'float'

The error can be fixed by the following line:
99 int(left - head) * ' ',

If I try do download with https only garbage is returned.

Example

from pyflit import flit

url = 'https://wiiu.titlekeys.com/json'

opener = flit.get_opener()
u = opener.open(url)
print u.read()

Stops downloading after some time

After a while(10-15 min) of downloading a huge file, script stops downloading and stands still for hours. Could you please guide me what am i doing wrong

function returns incorrect range

the function is returning range is not correct, the if the file size is 100331 the last range tuple is 100330!

https://github.com/galeo/pyflit/blob/master/pyflit/flit.py#L420

Recommend Projects

galeo / pyflit Goto Github PK

pyflit's Introduction

README

Features

Don't use this package in production!

Simple Tutorial

HTTP GET

Multiple URLs fetching

Multiple segment file downloading

Contributing

pyflit's People

Contributors

Stargazers

Watchers

Forkers

pyflit's Issues

Recommend Projects

Recommend Topics

Recommend Org