Comments (19)
Non-UTF-8 requests are supposed to be transcoded first with req = req.decode() so that webob can assume utf-8 encoding throughout. Can you confirm if this is an area that req.decode misses?
from webob.
My understanding is that each field of a multipart body can have its own charset supplied along content-type and req.decode() seems to assume that the whole request is encoded with the same character encoding.
The tentative fix uses cgi.FieldStorage.type_options['charset'] to decode each field of the multipart data independently, should that be moved somehow into request.decode implementation?
from webob.
I think doing it in decode would make things simpler -- all of the
non-utf-8 code is isolated there.
On 8 August 2012 23:13, proppy [email protected] wrote:
My understanding is that each field of a multipart body can have its own
charset supplied along content-type and req.decode() seems to assume that
the whole request is encoded with the same character encoding.The tentative fix uses cgi.FieldStorage.type_options['charset'] to decode
each fields of the multipart data independently, should that be moved
somehow into request.decode implementation?—
Reply to this email directly or view it on GitHubhttps://github.com//issues/64#issuecomment-7595889.
from webob.
So the charset argument to decode would specify the default charset, and each field of a multipart body would honor their specific charset if present?
from webob.
Exactly. I don't think I've ever seen a multipart body from a browser with
parts that have encodings specified, I wonder if you are getting this from
some user-agent or is it a synthetic test case?
On 9 August 2012 00:58, proppy [email protected] wrote:
So the charset argument to decode would specify the default charset, and
each field of a multipart body would honor their specific charset if
present?—
Reply to this email directly or view it on GitHubhttps://github.com//issues/64#issuecomment-7598970.
from webob.
We are getting this on App Engine Python upload handler when input[type=text] with non utf-8 charset data are submitted along with file upload.
See: http://code.google.com/p/googleappengine/issues/detail?id=2749
from webob.
So, the content-type: ...; charset on a section of a multipart
body differs from the same header on the whole request? An actual full
request (whatever went over the wire) would be great to inspect.
On 9 August 2012 01:06, proppy [email protected] wrote:
We are getting this on App Engine Python upload handler when
input[type=text] with non utf-8 charset data are submitted along with file
upload.—
Reply to this email directly or view it on GitHubhttps://github.com//issues/64#issuecomment-7599157.
from webob.
Here is the content of os.environ[wsgi.input] in a manual test case reproducing the failure:
--000e0ce0b196b4ee6804c6c8af94
Content-Type: text/plain; charset=ISO-2022-JP
Content-Disposition: form-data; name=title
Content-Transfer-Encoding: 7bit
$B$3$s$K$A$O (B
--000e0ce0b196b4ee6804c6c8af94
Content-Type: text/plain; charset=ISO-8859-1
Content-Disposition: form-data; name=submit
Submit
--000e0ce0b196b4ee6804c6c8af94
Content-Type: message/external-body; charset=ISO-8859-1; blob-key=AMIfv94TgpPBtKTL3a0U9Qh1QCX7OWSsmdkIoD2ws45kP9zQAGTOfGNz4U18j7CVXzODk85WtiL5gZUFklTGY3y4G0Jz3KTPtJBOFDvQHQew7YUymRIpgUXgENS_fSEmInAIQdpSc2E78MRBVEZY392uhph3r-In96t8Z58WIRc-Yikx1bnarWo
Content-Disposition: form-data; name=file; filename="photo.jpg"
Content-Type: image/jpeg
Content-Length: 38491
X-AppEngine-Upload-Creation: 2012-08-08 15:32:29.035959
Content-MD5: ZjRmNGRhYmNhZTkyNzcyOWQ5ZGUwNDgzOWFkNDAxN2Y=
Content-Disposition: form-data; name=file; filename="photo.jpg"
--000e0ce0b196b4ee6804c6c8af94--
from webob.
Can you provide pprint.pformat(os.environ) as well?
On 9 August 2012 01:35, proppy [email protected] wrote:
Here is the content of os.environ[wsgi.input] in my manual test case
reproducing the failure:--000e0ce0b196b4ee6804c6c8af94
Content-Type: text/plain; charset=ISO-2022-JP
Content-Disposition: form-data; name=title
Content-Transfer-Encoding: 7bit$B$3$s$K$A$O (B
--000e0ce0b196b4ee6804c6c8af94
Content-Type: text/plain; charset=ISO-8859-1
Content-Disposition: form-data; name=submitSubmit
--000e0ce0b196b4ee6804c6c8af94
Content-Type: message/external-body; charset=ISO-8859-1;
blob-key=AMIfv94TgpPBtKTL3a0U9Qh1QCX7OWSsmdkIoD2ws45kP9zQAGTOfGNz4U18j7CVXzODk85WtiL5gZUFklTGY3y4G0Jz3KTPtJBOFDvQHQew7YUymRIpgUXgENS_fSEmInAIQdpSc2E78MRBVEZY392uhph3r-In96t8Z58WIRc-Yikx1bnarWo
Content-Disposition: form-data; name=file; filename="photo.jpg"Content-Type: image/jpeg
Content-Length: 38491
X-AppEngine-Upload-Creation: 2012-08-08 15:32:29.035959
Content-MD5: ZjRmNGRhYmNhZTkyNzcyOWQ5ZGUwNDgzOWFkNDAxN2Y=
Content-Disposition: form-data; name=file; filename="photo.jpg"--000e0ce0b196b4ee6804c6c8af94--
—
Reply to this email directly or view it on GitHubhttps://github.com//issues/64#issuecomment-7599837.
from webob.
I stripped all the envars that don't start with HTTP_ for convenience:
{'HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'HTTP_ACCEPT_CHARSET': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'HTTP_ACCEPT_LANGUAGE': 'en-US,en;q=0.8,ja;q=0.6',
'HTTP_CACHE_CONTROL': 'max-age=0',
'HTTP_CONTENT_TYPE': 'multipart/form-data; boundary=20cf3054a8cd0693cc04c6c90173',
'HTTP_HOST': '3113448.proppy-bugs.appspot.com',
'HTTP_ORIGIN': 'http://3113448.proppy-bugs.appspot.com',
'HTTP_USER_AGENT': 'Mozilla/5.0 (X11; CrOS x86_64 2694.0.0) AppleWebKit/537.3 (KHTML, like Gecko) Chrome/22.0.1222.0 Safari/537.3'}
from webob.
Seems like the fix in your pull request is the correct one, requiring req.decode(..) would be silly in this case. The only problem is the unicode literal that will not work on py3, but I'll fix it myself.
Thank you for the bug report and the patch.
from webob.
Oh, I was working on adding a test to test_request.py :) I guess I could drop it now.
Thanks a lot for merging.
from webob.
Added the extra test just in case you want to merge it too.
from webob.
Here's the correction that was necessary on py3: 27de7f9
Another test case is always helpful, can you please merge the correction above into the new test as well?
from webob.
Basically:
- self.assertEqual(req.POST['title'].encode('utf-8'),
- u'こんにちは'.encode('utf-8'))
+ self.assertEqual(req.POST['title'], text_('こんにちは', 'utf8'))
from webob.
FYI, it seems that nose fails to represent correctly the failure when comparing non-ascii string (that's why I was encoding things to utf-8 before comparing them in my previous patch):
File "/home/proppy/webob/nose-1.1.2-py2.7.egg/nose/plugins/failuredetail.py", line 43, in formatFailure
return (ec, '\n'.join([str(ev), tbinfo]), tb)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 74-78: ordinal not in range(128)
from webob.
Well, in that case, try self.assertEqual(req.POST['title'].encode('utf8'), text_('こんにちは', 'utf8').encode('utf8'))
The text_('こんにちは', 'utf8') thingy necessary for things to work on py3 as well.
from webob.
Done.
from webob.
Merged
from webob.
Related Issues (20)
- Open file leak with file uploads with wsgi server in daemon mode HOT 6
- request.json_body generates unclosed file ResourceWarning HOT 1
- should `expires` be removed from set_cookie?
- Setting response.expires via timedelta produces an incorrect header value
- Moving from Python 2 -> 3 and hit snag with webob and JSON serializer, pickle works HOT 5
- 1.8.7: pytest warnings HOT 2
- Mapping urllib.error.HTTPError to webob HTTPError HOT 2
- 1.8.7 fails several tests with python3.9/3.10 with no attribute 'isAlive' HOT 1
- DeprecationWarning: 'cgi' is deprecated and slated for removal in Python 3.13 HOT 6
- Pull Request #438. Request.decode() errors argument ignored
- Misleading documentation of AcceptLanguageValidHeader.lookup HOT 1
- Documentation only shows calling from repl HOT 2
- test_fieldstorage_not_multipart fails
- Adding PATCH method to environ_add_POST
- webob.request.disconnectionerror: the client disconnected while sending the body (22 more bytes were expected) HOT 2
- Is the maintenance plan updated for webob?
- Proposal: Clean up `MultiDict` inheritance from ABCs
- update or remove `__all__` declarations
- Encoding issue in request.py HOT 2
- Build warning regarding deprecated options
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from webob.