-
Notifications
You must be signed in to change notification settings - Fork 112
Description
I was getting a UserWarning: http.cookiejar bug! for a Unicode URL and after 4 hours of debugging seem to found the culprit.
After Webtest calls req.get_response for a Django WSGI application, the environment changes and no longer contains the raw URL-unquoted path but its UTF8-decoded equivalent. However, when extracting the cookies, Webtest still assumes that the path has the raw value, resulting in the following stacktrace:
File "/usr/lib/python3.12/http/cookiejar.py", line 1628, in make_cookies
ns_cookies = self._cookies_from_attrs_set(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/http/cookiejar.py", line 1583, in _cookies_from_attrs_set
cookie = self._cookie_from_cookie_tuple(tup, request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/http/cookiejar.py", line 1532, in _cookie_from_cookie_tuple
req_host, erhn = eff_request_host(request)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/http/cookiejar.py", line 642, in eff_request_host
erhn = req_host = request_host(request)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/http/cookiejar.py", line 627, in request_host
url = request.get_full_url()
^^^^^^^^^^^^^^^^^^^^^^
File "/home/venvs/p312/lib/python3.12/site-packages/webtest/utils.py", line 125, in get_full_url
return self._request.url
^^^^^^^^^^^^^^^^^
File "/home/venvs/p312/lib/python3.12/site-packages/webob/request.py", line 497, in url
url = self.path_url
^^^^^^^^^^^^^
File "/home/venvs/p312/lib/python3.12/site-packages/webob/request.py", line 469, in path_url
bpath_info = bytes_(self.path_info, self.url_encoding)
^^^^^^^^^^^^^^
File "/home/venvs/p312/lib/python3.12/site-packages/webob/descriptors.py", line 70, in fget
return req.encget(key, encattr=encattr)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/venvs/p312/lib/python3.12/site-packages/webob/request.py", line 167, in encget
return bytes_(val, 'latin-1').decode(encoding)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/venvs/p312/lib/python3.12/site-packages/webob/compat.py", line 33, in bytes_
return s.encode(encoding, errors)
^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'latin-1' codec can't encode character '\u0109' in position 1: ordinal not in range(256)
The modification of the environment happens because of how Django’s WSGIRequest handles it. In particular, this line decodes the PATH_INFO:
path_info = get_path_info(environ) or "/"and these 2 lines modify the PATH_INFO in the environ:
self.META = environ
self.META["PATH_INFO"] = path_infoFor example, performing a GET request of “/ab%C4%87%C4%8F%C4%99f” results in the following debug output:
inside get(), url='/ab%C4%87%C4%8F%C4%99f'
req.url_encoding=UTF-8, req.environ['PATH_INFO']=/abÄÄÄf, req.path_info=/abćďęf
inside do_request(), PATH_INFO=/abÄÄÄf
before get response, PATH_INFO=/abÄÄÄf
inside TestRequest.call_application, environ['PATH_INFO']=/abÄÄÄf
...
inside TestRequest.call_application, environ['PATH_INFO']=/abćďęf
after get response, PATH_INFO=/abćďęf
I don’t understand the internals sufficiently to say what the most correct course of action is:
- should Webtest create a copy of the environment prior to calling
req.get_responseand restore it after the call? - should Webtest copy only PATH_INFO and restore it after getting the response?
- should Webtest examine PATH_INFO after getting the response, and if it is Unicode, encode it back into a raw form?