getlantern / laeproxy Goto Github PK

View Code? Open in Web Editor NEW

6.0 17.0 6.0 74 KB

Lantern App Engine Proxy - free proxy anyone can deploy to App Engine for use with Lantern desktop clients

License: GNU General Public License v3.0

Python 100.00%

laeproxy's Introduction

Lantern App Engine Proxy

Free proxy anyone can deploy to App Engine for use with Lantern desktop clients.

Overview

laeproxy is a proxy designed to run on Google App Engine. To work within GAE's limits, it only accepts requests within a certain size, and in the case of GET requests, for content within a certain size (via the Range header). The local proxy in Lantern desktop clients has baked-in support for this, automatically converting regular GET requests from the browser into one or more range requests to laeproxy, whose responses it combines into a single response back to the browser.

Getting Started

Install the App Engine Python SDK (e.g. brew install google-app-engine).

Clone laeproxy:

git clone git://github.com/getlantern/laeproxy.git

Run from App Engine's development server:

cd laeproxy
dev_appserver.py .

Make a test request:

curl -H'Range: bytes=0-300' -v localhost:8080/http/www.google.com/humans.txt
...
< HTTP/1.1 206 Partial Content
< Server: Development/1.0
< Date: Wed, 30 Jan 2013 06:46:36 GMT
< X-laeproxy-result: Retrieved from network 2013-01-30 06:46:36.328209
< X-laeproxy-upstream-status-code: 206
< X-laeproxy-upstream-server: sffe
...
<
Google is built by a large team of engineers, designers, researchers...

Running tests

Install the requirements for running the functional tests:

sudo pip install unittest2 gaedriver multiprocessing webob==1.1.1

Configure gaedriver.conf appropriately, make sure laeproxy is running locally if you're testing it in the dev_appserver, and then run ./test.py.

laeproxy's People

Contributors

Stargazers

Watchers

Forkers

llama- euccastro imclab httpsgithu lanterndev jeffliusky

laeproxy's Issues

accept http connect traffic through POSTs

expect no Content-Length information loss from App Engine once they fix it

http://code.google.com/p/googleappengine/issues/detail?id=4878 was recently escalated. We can take advantage of the fix once it's available.

switch to sockets api

App Engine released a sockets API back in April 2013. Currently still in Preview and still only available to paid apps (as of March 2014).

The sockets API could allow HTTP CONNECT support as well as streaming the response to the client without storing it in memory first.

If the sockets API is ever made available to free apps as well as paid apps, we could make it super easy for censored users to create their own laeproxy instances, which would add a lot more capacity to the network. Some censored users might even be willing and able to pay for their own laeproxy instances in the meantime, so a first pass could just be writing up some docs on how to deploy your own instance, and then make it easy to tell lantern to use it.

Would obsolete #3, #4, #5, and #10.

memory limit exceeded

$ curl -v -b's_cc=true; s_nr=1333471500554; gpw_e24=http%3A%2F%2Fwww.oracle.com%2Ftechnetwork%2Fjava%2Fjavase%2Fdownloads%2Fjdk-6u31-download-1501634.html; s_sq=%5B%5BB%5D%5D' -O https://laeproxyhr1.appspot.com/http/download.oracle.com/otn-pub/java/jdk/6u31-b05/jdk-6u31-windows-i586.exe?AuthParam=1333471621_3f95a4d6452a0b2dd621774ba7ea30f1
* About to connect() to laeproxyhr1.appspot.com port 443 (#0)
*   Trying 72.14.204.141...   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0connected
* Connected to laeproxyhr1.appspot.com (72.14.204.141) port 443 (#0)
* SSLv3, TLS handshake, Client hello (1):
} [data not shown]
* SSLv3, TLS handshake, Server hello (2):
{ [data not shown]
* SSLv3, TLS handshake, CERT (11):
{ [data not shown]
* SSLv3, TLS handshake, Server finished (14):
{ [data not shown]
* SSLv3, TLS handshake, Client key exchange (16):
} [data not shown]
* SSLv3, TLS change cipher, Client hello (1):
} [data not shown]
* SSLv3, TLS handshake, Finished (20):
} [data not shown]
* SSLv3, TLS change cipher, Client hello (1):
{ [data not shown]
* SSLv3, TLS handshake, Finished (20):
{ [data not shown]
* SSL connection using RC4-SHA
* Server certificate:
*    subject: C=US; ST=California; L=Mountain View; O=Google Inc; CN=*.appspot.com
*    start date: 2012-03-08 08:43:36 GMT
*    expire date: 2013-03-08 08:53:36 GMT
*    subjectAltName: laeproxyhr1.appspot.com matched
*    issuer: C=US; O=Google Inc; CN=Google Internet Authority
*    SSL certificate verify ok.
> GET /http/download.oracle.com/otn-pub/java/jdk/6u31-b05/jdk-6u31-windows-i586.exe?AuthParam=1333471621_3f95a4d6452a0b2dd621774ba7ea30f1 HTTP/1.1
> User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 OpenSSL/0.9.8r zlib/1.2.5
> Host: laeproxyhr1.appspot.com
> Accept: */*
> Cookie: s_cc=true; s_nr=1333471500554; gpw_e24=http%3A%2F%2Fwww.oracle.com%2Ftechnetwork%2Fjava%2Fjavase%2Fdownloads%2Fjdk-6u31-download-1501634.html; s_sq=%5B%5BB%5D%5D
> 
  0     0    0     0    0     0      0      0 --:--:--  0:00:16 --:--:--     0< HTTP/1.1 500 Internal Server Error
< Date: Tue, 03 Apr 2012 16:46:21 GMT
< Content-Type: text/html; charset=UTF-8
< Server: Google Frontend
< Content-Length: 466
< 
{ [data not shown]
100   466  100   466    0     0     28      0  0:00:16  0:00:16 --:--:--   115* Connection #0 to host laeproxyhr1.appspot.com left intact

* Closing connection #0
* SSLv3, TLS alert, Client hello (1):
} [data not shown]

2012-04-03 12:46:21.788 /http/download.oracle.com/otn-pub/java/jdk/6u31-b05/jdk-6u31-windows-i586.exe?AuthParam=1333471621_3f95a4d6452a0b2dd621774ba7ea30f1 500 15688ms 0kb curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 OpenSSL/0.9.8r zlib/1.2.5
66.234.239.195 - - [03/Apr/2012:09:46:21 -0700] "GET /http/download.oracle.com/otn-pub/java/jdk/6u31-b05/jdk-6u31-windows-i586.exe?AuthParam=1333471621_3f95a4d6452a0b2dd621774ba7ea30f1 HTTP/1.1" 500 0 - "curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 OpenSSL/0.9.8r zlib/1.2.5" "laeproxyhr1.appspot.com" ms=15689 cpu_ms=855 api_cpu_ms=0 cpm_usd=3.148707 exit_code=105 instance=00c61b117c64bd15bcee87c8452f05d6f5d3ed
C 2012-04-03 12:46:21.692
Exceeded soft private memory limit with 167.906 MB after servicing 29 requests total
W 2012-04-03 12:46:21.692
While handling this request, the process that handled this request was found to be using too much memory and was terminated. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may have a memory leak in your application.

memcache

When laeproxy forwards a range request to a server that doesn't support them, it converts any 200 response it gets back into a 206 if possible, and just discards the portion of the response outside the requested range. laeproxy should cache this portion for future requests if possible, as recommended by http://tools.ietf.org/html/rfc2616#section-14.35.2 (last paragraph).

header_msg attribute doesn't always exist on urlfetches

See this log:

2012-09-27 18:24:14.049 /http/blog.idv.tw/ 500 1293ms 0kb Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.79 Safari/537.4
216.3.159.67 - - [27/Sep/2012:18:24:14 -0700] "GET /http/blog.idv.tw/ HTTP/1.1" 500 225 - "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.79 Safari/537.4" "rlanternz.appspot.com" ms=1294 cpu_ms=20 cpm_usd=0.000119 instance=00c61b117cc5037559c5bc00151a0e94b3799884
D 2012-09-27 18:24:12.788
Target url: http://blog.idv.tw/
D 2012-09-27 18:24:12.788
Stripped request headers: [('host', 'rlanternz.appspot.com')]
D 2012-09-27 18:24:14.043
urlfetch response status: 301
D 2012-09-27 18:24:14.044
urlfetch response headers: {'content-length': '0', 'via': 'HTTP/1.1 GWA', 'x-powered-by': 'PHP/5.3.8', 'x-google-cache-control': 'remote-fetch', 'server': 'Apache', 'connection': 'close', 'location': 'http://www.blog.idv.tw/', 'date': 'Fri, 28 Sep 2012 01:24:12 GMT', 'content-type': 'text/html; charset=UTF-8', 'x-pingback': 'http://www.blog.idv.tw/journal/xmlrpc.php'}
E 2012-09-27 18:24:14.044
_CaselessDict instance has no attribute 'header_msg'
Traceback (most recent call last):
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1511, in __call__
    rv = self.handle_exception(request, response, e)
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1505, in __call__
    rv = self.router.dispatch(request, response)
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1253, in default_dispatcher
    return route.handler_adapter(request, response)
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1077, in __call__
    return handler.dispatch()
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 547, in dispatch
    return self.handle_exception(e, self.app.debug)
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 545, in dispatch
    return method(*args, **kwargs)
  File "/base/data/home/apps/s~rlanternz/dev.362038828774706183/laeproxy.py", line 282, in wrapper
    return handler(self, *args, **kw)
  File "/base/data/home/apps/s~rlanternz/dev.362038828774706183/laeproxy.py", line 215, in handler
    fheaders.header_msg.getheaders('connection') if i.strip()) \
AttributeError: _CaselessDict instance has no attribute 'header_msg'

authentication

laeproxy should only accept authenticated requests from Lantern instances.

correct location headers with relative uris

https://tools.ietf.org/html/rfc2616#section-14.30 specifies that the location header sent in e.g. a redirect response have an absolute uri, but many servers violate the spec:

$ curl -I http://www.dailymotion.com
HTTP/1.1 302 Found
Server: DMS/1.0.42
Date: Wed, 19 Sep 2012 16:18:42 GMT
Location: /us
...

In this case, laeproxy will faithfully act as a transparent proxy:

Non-200 or 206 response to range request, returning response as-is

but when Google Frontend receives the location header with the relative uri, it attempts to correct it by adding the laeproxy instance's address to make it an absolute and now broken uri:

$ curl -H'Range: bytes=0-1999999' -I https://laeproxyhr1.appspot.com/http/www.dailymotion.com/
HTTP/1.1 302 Found
X-laeproxy-result: Retrieved from network 2012-09-19 16:21:13.379890
X-laeproxy-upstream-status-code: 302
X-laeproxy-upstream-server: DMS/1.0.42
via: HTTP/1.1 GWA
location: https://laeproxyhr1.appspot.com/us
Date: Wed, 19 Sep 2012 16:21:13 GMT
Server: Google Frontend
...

automatically purge logs

Currently there is no way to configure an App Engine application to not log any requests. The log retention UI under Application Settings looks like:

Logs Retention
Google App Engine will store logs up to X days in the past, where X is specified below, or up to the storage limit size set below, whichever limit is reached first.
___ GBytes of logs storage or ___ days of logs. (Maximum number of days: 365)

and the minimum values it accepts appear to be 1 and 0 respectively.

Barring a way to configure zero log retention, we should somehow automate frequent purging of logs.

Kaleidoscope integration

Each laeproxy instance should be associated with a Lantern user and should only service requests from nodes it can reach through the Kaleidoscope trust graph.

automate deployment

Currently deploying updated versions of laeproxy to App Engine (and pointing Lantern Controller to the new instances if their addresses changed) is a manual process.

use new urlfetch.headers.header_msg API

From http://googleappengine.blogspot.com/2012/08/app-engine-171-released.html:

URLFetch
We’ve updated the way URLFetch handles multiple headers in response to one of our public issues. When a response contains the same header multiple times, these values will now be returned as a list.

implement automated tests

Set up comprehensive tests that cover all the edge cases, app engine limits, exercise all branches of the code, etc.

repeated redirects to youtube captcha

have encountered repeated redirects to http://www.youtube.com/das_captcha in the past

hook up loggly to production laeproxy instances

to track exceptions, errors, warnings, etc. that happen in production

Download large files in smaller chunks

This would likely get around the memory limit issues as well as request timeout issues. Here's an example of a request that has timed out for me:

curl -v -x127.0.0.1:8787 -O http://lantern.s3.amazonaws.com/cometd.tgz
* About to connect() to proxy 127.0.0.1 port 8787 (#0)
*   Trying 127.0.0.1... connected
* Connected to 127.0.0.1 (127.0.0.1) port 8787 (#0)
> GET http://lantern.s3.amazonaws.com/cometd.tgz HTTP/1.1
> User-Agent: curl/7.19.7 (universal-apple-darwin10.0) libcurl/7.19.7 OpenSSL/0.9.8r zlib/1.2.3
> Host: lantern.s3.amazonaws.com
> Accept: */*
> Proxy-Connection: Keep-Alive
> 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:01:00 --:--:--     0< HTTP/1.1 503 Service Unavailable
< Content-Type: text/html; charset=utf-8
< Cache-Control: no-cache
< X-laeproxy: Missed GAE deadline
< Date: Thu, 05 Apr 2012 21:58:39 GMT
< Server: Google Frontend
< Content-Length: 0
< 
  0     0    0     0    0     0      0      0 --:--:--  0:01:00 --:--:--     0* Connection #0 to host 127.0.0.1 left intact

Looks like it's just not getting a response for 32 MB in time, but likely would if it chunked at say 2 MB or 1 MB.

tune RANGE_REQ_SIZE

When a Range header is not sent by the upstream requester, laeproxy requests a range of ~32MB, the current maximum App Engine response size. Because App Engine apps cannot send urlfetch responses until they've been completely downloaded into memory, the agent downstream can experience lag (in particular when the destination server is slow).

Relevant App Engine feature requests:
http://code.google.com/p/googleappengine/issues/detail?id=1903
http://code.google.com/p/googleappengine/issues/detail?id=4888