Giter Site home page Giter Site logo

trip's Introduction

Trip: Async HTTP for Humans

pypi

TRIP, Tornado & Requests In Pair, an async HTTP library for Python.

Simple as Requests, Trip let you get rid of annoying network blocking.

Coroutine in python 2.7+ can be this simple:

import trip

def main():
    r = yield trip.get('https://httpbin.org/get', auth=('user', 'pass'))
    print(r.content)

trip.run(main)

With Trip, you may finish one hundred requests in one piece of time.

Trip gets its name from two powerful site packages and aims to combine them together. Trip refers to 'Tornado & Requests In Pair', TRIP. To put them together, I reused much of their codes about structure and dealing. Actually I only made little effort to make a mixture. Thanks to Tornado and Requests.

Through using Trip, you may take full advantage of Requests, including: Sessions with Cookie persistence, browser-style SSL verification, automatic content decoding, basic/digest authentication, elegant key/value Cookies. Meanwhile, your requests are coroutine like using AsyncHTTPClient of Tornado, network blocking will not be a problem.

Found difficult optimizing spiders' time consuming? Found tricky using asyncio http packages? Found heavy custimizing big spider framework? Try Trip, you will not regret!

Installation

Paste it into your console and enjoy:

python -m pip install trip

Documents

Documents are here: http://trip.readthedocs.io/zh/latest/

Advanced usage

Some of the advaced features are listed here:

Using async and await in python 3

import trip

async def main():
    r = await trip.get('https://httpbin.org/get', auth=('user', 'pass'))
    print(r.content)

trip.run(main)

Sessions with Cookie persistence

import trip

def main():
    s = trip.Session()
    r = yield s.get(
        'https://httpbin.org/cookies/set',
        params={'name': 'value'},
        allow_redirects=False)
    r = yield s.get('https://httpbin.org/cookies')
    print(r.content)

trip.run(main)

Event hooks

import trip

def main():
    def print_url(r, *args, **kwargs):
        print(r.url)
    def record_hook(r, *args, **kwargs):
        r.hook_called = True
        return r
    url = 'http://httpbin.org/get'
    r = yield trip.get('http://httpbin.org', hooks={'response': [print_url, record_hook]})
    print(r.hook_called)

trip.run(main)

Timeouts

import trip

def main():
    r = yield trip.get('http://github.com', timeout=0.001)
    print(r)

trip.run(main)

Proxy

import trip

proxies = {
    'http': '127.0.0.1:8080',
    'https': '127.0.0.1:8081',
}

def main():
    r = yield trip.get('https://httpbin.org/get', proxies=proxies)
    print(r.content)

trip.run(main)

How to contribute

  1. You may open an issue to share your ideas with me.
  2. Or fork this project and do it your own on master branch.
  3. Please write demo codes of bugs or new features. You know, codes help.
  4. Finally if you finish your work and make a pull request, I will merge it in time after essential tests.

Similiar projects

trip's People

Contributors

chyroc avatar littlecodersh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

trip's Issues

依赖的request是版本是多少呢?

import trip
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python2.7/site-packages/trip/init.py", line 12, in
from .api import (
File "/usr/local/lib/python2.7/site-packages/trip/api.py", line 10, in
from . import sessions
File "/usr/local/lib/python2.7/site-packages/trip/sessions.py", line 21, in
from requests.sessions import (
ImportError: cannot import name preferred_clock

requests.version
'2.13.0'

我看requests的sessions文件里并没有preferred_clock函数。。。

下载图片时,有时成功,有时失败

图片能下载下来,运行也没报错,可是有些图片是能打开的,有些图片时损坏的,而且损坏的占了7层多,用的是最新版 0.0.10

import datetime
import trip
today_date = datetime.datetime.now().strftime('%Y-%m-%d')
headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36',
            'Accept': 'image/webp,image/apng,image/*,*/*;q=0.8',
            'Referer': 'http://iir.circ.gov.cn/ipq/',
            'Connection': 'close',
            'content-type': 'application/json'
        }

@trip.coroutine
def download(i):
    try:
        s = trip.Session()
        res = yield s.get(
            'http://iir.circ.gov.cn/ipq/captchacn.svl?Mon%20May%2013%202019%2017:14:14%20GMT+0800%20(%E4%B8%AD%E5%9B%BD%E6%A0%87%E5%87%86%E6%97%B6%E9%97%B4)', stream=True, headers=headers,
                                                       timeout=25)
        file_name = 'E://captcha2//{0}_{1}.jpg'.format(today_date, i)
        if 200 == res.status_code:
            print (file_name)
            with open(file_name, 'wb') as fd:
                for chunk in res.iter_content(1024):
                    chunk = yield chunk
                    fd.write(chunk)
    except:
        import traceback
        print(traceback.format_exc())
def run():
    yield [download(i) for i in range(1, 50)]

trip.run(run)

我读取json文件,显示乱码。

我抓今日头条的json,出现raise JSONDecodeError("Expecting value", s, err.value) from None
���������
出现类似这样的乱码。
image

协程运行速度比多进程慢

@trip.coroutine
def downloadImg():
    if not os.path.exists('./data'):
        os.mkdir('./data')
    q = 0
    for i in downUrl:
#        if q < 1100:
#        	q += 1
#        	continue
        imgUrl = []
        lenOfData = 0
        r = yield s.get(i[0])
        Soup = BeautifulSoup(r.text,'html.parser')
        u = Soup.select('#main > article > div > div')
        if not os.path.exists('./data/{}'.format(i[-1])):
            os.mkdir('./data/{}'.format(i[-1]))
        else:
            lenOfData = len(os.listdir('./data/{}'.format(i[-1])))
        for t in u:
            if t.select('a') == []:
                continue
            imgUrl.append(t.select('a')[0].get('href'))
        if len(imgUrl) == lenOfData:
            print("跳过< {} >".format(i[-1]))
            continue
        t = 0
        for url in imgUrl:
            r = yield s.get(url,stream=True)
            with open('./data/{}/{}.jpg'.format(i[-1],t), 'wb') as fd:
                for chunk in r.iter_content(256):
                    chunk = yield chunk
                    fd.write(chunk)
            print("正在下载<{}>的第{}张图片".format(i[-1],t))
            t += 1
  • 函数如上,全局变量downUrl 是一个数组,保存Url连接的数组

how to use trip gracefully?

here is my code

import trip

PROXIES = {
    'http': 'socks5://127.0.0.1:1086',
    'https': 'socks5://127.0.0.1:1086'
}


@trip.coroutine
def test():
    r = yield trip.get(url='https://github.com/shadowsocks/shadowsocks-android/releases/download/v4.2.5/shadowsocks-nightly-4.2.5.apk',
                       proxies=PROXIES,
                       stream=True).iter_content(chunk_size=1024)


def main():
    with open('shadowsocks-nightly-4.2.5.apk', 'wb') as f:
        for chunk in trip.run(test):
            f.write(chunk)


if __name__ == "__main__":
    main()

I wanna download a file, output is here

$ python2 test.py
Traceback (most recent call last):
  File "test.py", line 23, in <module>
    main()
  File "test.py", line 18, in main
    for chunk in trip.run(test):
  File "/usr/local/lib/python2.7/site-packages/trip/api.py", line 153, in run
    IOLoop.current().run_sync(fn)
  File "/usr/local/lib/python2.7/site-packages/tornado/ioloop.py", line 458, in run_sync
    return future_cell[0].result()
  File "/usr/local/lib/python2.7/site-packages/tornado/concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "/usr/local/lib/python2.7/site-packages/tornado/gen.py", line 307, in wrapper
    yielded = next(result)
  File "test.py", line 13, in test
    stream=True).iter_content(chunk_size=1024)
AttributeError: 'Future' object has no attribute 'iter_content'

大的网页应答, chunk方式出错。 trip 0.08

大的网页应答, chunk方式出错。 python 2.7.13 , trip 0.08

例如

def xxx_page():
    r = yield trip.get('http://www.163.com', )
    print r.text

trip.run(xxx_page)

页面不完整,而requests是正常的。
python 2.7.13 , trip 0.08

proxies参数被忽略

沿着 trip.get 调用找到 adapters.send,proxies 只有参数和注释,没有使用

AttributeError: 'HTTPHeaderDict' object has no attribute 'get_all'

前天用时没问题
今天使用时出了错

"C:\Users\Administrator\AppData\Local\Programs\Python\Python36\lib\site-packages\tornado\gen.py", line 1069, in run 
yielded = self.gen.send(value)

  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python36\lib\site-packages\trip\sessions.py", line 430, in send
    r = self.prepare_response(req, r)

  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python36\lib\site-packages\trip\sessions.py", line 292, in prepare_response
    MockResponse(headerDict), MockRequest(req))

  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python36\lib\http\cookiejar.py", line 1664, in extract_cookies
    for cookie in self.make_cookies(response, request):

  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python36\lib\http\cookiejar.py", line 1581, in make_cookies
    rfc2965_hdrs = headers.get_all("Set-Cookie2", [])
AttributeError: 'HTTPHeaderDict' object has no attribute 'get_all'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.