littlecodersh / trip Goto Github PK

View Code? Open in Web Editor NEW

207.0 12.0 27.0 97 KB

Async HTTP for Humans, coroutine Requests :tent:

License: Other

Python 100.00%

forhumans coroutine trip python

trip's Introduction

Trip: Async HTTP for Humans

TRIP, Tornado & Requests In Pair, an async HTTP library for Python.

Simple as Requests, Trip let you get rid of annoying network blocking.

Coroutine in python 2.7+ can be this simple:

import trip

def main():
    r = yield trip.get('https://httpbin.org/get', auth=('user', 'pass'))
    print(r.content)

trip.run(main)

With Trip, you may finish one hundred requests in one piece of time.

Trip gets its name from two powerful site packages and aims to combine them together. Trip refers to 'Tornado & Requests In Pair', TRIP. To put them together, I reused much of their codes about structure and dealing. Actually I only made little effort to make a mixture. Thanks to Tornado and Requests.

Through using Trip, you may take full advantage of Requests, including: Sessions with Cookie persistence, browser-style SSL verification, automatic content decoding, basic/digest authentication, elegant key/value Cookies. Meanwhile, your requests are coroutine like using AsyncHTTPClient of Tornado, network blocking will not be a problem.

Found difficult optimizing spiders' time consuming? Found tricky using asyncio http packages? Found heavy custimizing big spider framework? Try Trip, you will not regret!

Installation

Paste it into your console and enjoy:

python -m pip install trip

Documents

Documents are here: http://trip.readthedocs.io/zh/latest/

Advanced usage

Some of the advaced features are listed here:

Using async and await in python 3

import trip

async def main():
    r = await trip.get('https://httpbin.org/get', auth=('user', 'pass'))
    print(r.content)

trip.run(main)

Sessions with Cookie persistence

import trip

def main():
    s = trip.Session()
    r = yield s.get(
        'https://httpbin.org/cookies/set',
        params={'name': 'value'},
        allow_redirects=False)
    r = yield s.get('https://httpbin.org/cookies')
    print(r.content)

trip.run(main)

Event hooks

import trip

def main():
    def print_url(r, *args, **kwargs):
        print(r.url)
    def record_hook(r, *args, **kwargs):
        r.hook_called = True
        return r
    url = 'http://httpbin.org/get'
    r = yield trip.get('http://httpbin.org', hooks={'response': [print_url, record_hook]})
    print(r.hook_called)

trip.run(main)

Timeouts

import trip

def main():
    r = yield trip.get('http://github.com', timeout=0.001)
    print(r)

trip.run(main)

Proxy

import trip

proxies = {
    'http': '127.0.0.1:8080',
    'https': '127.0.0.1:8081',
}

def main():
    r = yield trip.get('https://httpbin.org/get', proxies=proxies)
    print(r.content)

trip.run(main)

How to contribute

You may open an issue to share your ideas with me.
Or fork this project and do it your own on master branch.
Please write demo codes of bugs or new features. You know, codes help.
Finally if you finish your work and make a pull request, I will merge it in time after essential tests.

Similiar projects

curequests: Curio + Requests, Async HTTP for Humans.
grequests: Gevent + Requests.
requests-threads: Twisted Deferred Thread backend for Requests.
requests-futures: Asynchronous Python HTTP Requests for Humans using Futures.

trip's People

Contributors

Stargazers

Watchers

trip's Issues

0.0.10 tarball missing

https://github.com/littlecodersh/trip/releases doesnt have 0.0.10

依赖的request是版本是多少呢？

import trip
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python2.7/site-packages/trip/init.py", line 12, in
from .api import (
File "/usr/local/lib/python2.7/site-packages/trip/api.py", line 10, in
from . import sessions
File "/usr/local/lib/python2.7/site-packages/trip/sessions.py", line 21, in
from requests.sessions import (
ImportError: cannot import name preferred_clock

requests.version
'2.13.0'

我看requests的sessions文件里并没有preferred_clock函数。。。

RuntimeError: IOLoop is already running

In Jupyter, this error will always be raised. but the result can be gotten though.

下载图片时，有时成功，有时失败

图片能下载下来，运行也没报错，可是有些图片是能打开的，有些图片时损坏的，而且损坏的占了7层多，用的是最新版 0.0.10

import datetime
import trip
today_date = datetime.datetime.now().strftime('%Y-%m-%d')
headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36',
            'Accept': 'image/webp,image/apng,image/*,*/*;q=0.8',
            'Referer': 'http://iir.circ.gov.cn/ipq/',
            'Connection': 'close',
            'content-type': 'application/json'
        }

@trip.coroutine
def download(i):
    try:
        s = trip.Session()
        res = yield s.get(
            'http://iir.circ.gov.cn/ipq/captchacn.svl?Mon%20May%2013%202019%2017:14:14%20GMT+0800%20(%E4%B8%AD%E5%9B%BD%E6%A0%87%E5%87%86%E6%97%B6%E9%97%B4)', stream=True, headers=headers,
                                                       timeout=25)
        file_name = 'E://captcha2//{0}_{1}.jpg'.format(today_date, i)
        if 200 == res.status_code:
            print (file_name)
            with open(file_name, 'wb') as fd:
                for chunk in res.iter_content(1024):
                    chunk = yield chunk
                    fd.write(chunk)
    except:
        import traceback
        print(traceback.format_exc())
def run():
    yield [download(i) for i in range(1, 50)]

trip.run(run)

Github 链接的文档URL错误

正确地址应为 http://trip.readthedocs.io/zh_CN/latest/

请问有没有和grequests的对比呢？

如题

grequests 是requests + gevent， requests的作者写的，也很简短

可以和aiohttp比较吗？

可以对比速度，易用性，功能丰富程度等

@trip.coroutine
def downloadImg():
    if not os.path.exists('./data'):
        os.mkdir('./data')
    q = 0
    for i in downUrl:
#        if q < 1100:
#        	q += 1
#        	continue
        imgUrl = []
        lenOfData = 0
        r = yield s.get(i[0])
        Soup = BeautifulSoup(r.text,'html.parser')
        u = Soup.select('#main > article > div > div')
        if not os.path.exists('./data/{}'.format(i[-1])):
            os.mkdir('./data/{}'.format(i[-1]))
        else:
            lenOfData = len(os.listdir('./data/{}'.format(i[-1])))
        for t in u:
            if t.select('a') == []:
                continue
            imgUrl.append(t.select('a')[0].get('href'))
        if len(imgUrl) == lenOfData:
            print("跳过< {} >".format(i[-1]))
            continue
        t = 0
        for url in imgUrl:
            r = yield s.get(url,stream=True)
            with open('./data/{}/{}.jpg'.format(i[-1],t), 'wb') as fd:
                for chunk in r.iter_content(256):
                    chunk = yield chunk
                    fd.write(chunk)
            print("正在下载<{}>的第{}张图片".format(i[-1],t))
            t += 1

函数如上，全局变量downUrl 是一个数组，保存Url连接的数组

how to use trip gracefully？

here is my code

import trip

PROXIES = {
    'http': 'socks5://127.0.0.1:1086',
    'https': 'socks5://127.0.0.1:1086'
}


@trip.coroutine
def test():
    r = yield trip.get(url='https://github.com/shadowsocks/shadowsocks-android/releases/download/v4.2.5/shadowsocks-nightly-4.2.5.apk',
                       proxies=PROXIES,
                       stream=True).iter_content(chunk_size=1024)


def main():
    with open('shadowsocks-nightly-4.2.5.apk', 'wb') as f:
        for chunk in trip.run(test):
            f.write(chunk)


if __name__ == "__main__":
    main()

I wanna download a file, output is here

$ python2 test.py
Traceback (most recent call last):
  File "test.py", line 23, in <module>
    main()
  File "test.py", line 18, in main
    for chunk in trip.run(test):
  File "/usr/local/lib/python2.7/site-packages/trip/api.py", line 153, in run
    IOLoop.current().run_sync(fn)
  File "/usr/local/lib/python2.7/site-packages/tornado/ioloop.py", line 458, in run_sync
    return future_cell[0].result()
  File "/usr/local/lib/python2.7/site-packages/tornado/concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "/usr/local/lib/python2.7/site-packages/tornado/gen.py", line 307, in wrapper
    yielded = next(result)
  File "test.py", line 13, in test
    stream=True).iter_content(chunk_size=1024)
AttributeError: 'Future' object has no attribute 'iter_content'

大的网页应答， chunk方式出错。 trip 0.08

大的网页应答， chunk方式出错。 python 2.7.13 , trip 0.08

例如

def xxx_page():
    r = yield trip.get('http://www.163.com', )
    print r.text

trip.run(xxx_page)

页面不完整，而requests是正常的。
python 2.7.13 , trip 0.08

"C:\Users\Administrator\AppData\Local\Programs\Python\Python36\lib\site-packages\tornado\gen.py", line 1069, in run 
yielded = self.gen.send(value)

  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python36\lib\site-packages\trip\sessions.py", line 430, in send
    r = self.prepare_response(req, r)

  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python36\lib\site-packages\trip\sessions.py", line 292, in prepare_response
    MockResponse(headerDict), MockRequest(req))

  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python36\lib\http\cookiejar.py", line 1664, in extract_cookies
    for cookie in self.make_cookies(response, request):

  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python36\lib\http\cookiejar.py", line 1581, in make_cookies
    rfc2965_hdrs = headers.get_all("Set-Cookie2", [])
AttributeError: 'HTTPHeaderDict' object has no attribute 'get_all'