Skip to content

Attempt at double decoding of PATH_INFO when using Django and CookieJar #274

@interDist

Description

@interDist

I was getting a UserWarning: http.cookiejar bug! for a Unicode URL and after 4 hours of debugging seem to found the culprit.

After Webtest calls req.get_response for a Django WSGI application, the environment changes and no longer contains the raw URL-unquoted path but its UTF8-decoded equivalent. However, when extracting the cookies, Webtest still assumes that the path has the raw value, resulting in the following stacktrace:

  File "/usr/lib/python3.12/http/cookiejar.py", line 1628, in make_cookies
    ns_cookies = self._cookies_from_attrs_set(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/http/cookiejar.py", line 1583, in _cookies_from_attrs_set
    cookie = self._cookie_from_cookie_tuple(tup, request)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/http/cookiejar.py", line 1532, in _cookie_from_cookie_tuple
    req_host, erhn = eff_request_host(request)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/http/cookiejar.py", line 642, in eff_request_host
    erhn = req_host = request_host(request)
                      ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/http/cookiejar.py", line 627, in request_host
    url = request.get_full_url()
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/venvs/p312/lib/python3.12/site-packages/webtest/utils.py", line 125, in get_full_url
    return self._request.url
           ^^^^^^^^^^^^^^^^^
  File "/home/venvs/p312/lib/python3.12/site-packages/webob/request.py", line 497, in url
    url = self.path_url
          ^^^^^^^^^^^^^
  File "/home/venvs/p312/lib/python3.12/site-packages/webob/request.py", line 469, in path_url
    bpath_info = bytes_(self.path_info, self.url_encoding)
                        ^^^^^^^^^^^^^^
  File "/home/venvs/p312/lib/python3.12/site-packages/webob/descriptors.py", line 70, in fget
    return req.encget(key, encattr=encattr)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/venvs/p312/lib/python3.12/site-packages/webob/request.py", line 167, in encget
    return bytes_(val, 'latin-1').decode(encoding)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/venvs/p312/lib/python3.12/site-packages/webob/compat.py", line 33, in bytes_
    return s.encode(encoding, errors)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'latin-1' codec can't encode character '\u0109' in position 1: ordinal not in range(256)

The modification of the environment happens because of how Django’s WSGIRequest handles it. In particular, this line decodes the PATH_INFO:

        path_info = get_path_info(environ) or "/"

and these 2 lines modify the PATH_INFO in the environ:

        self.META = environ
        self.META["PATH_INFO"] = path_info

For example, performing a GET request of “/ab%C4%87%C4%8F%C4%99f” results in the following debug output:

inside get(), url='/ab%C4%87%C4%8F%C4%99f'
   req.url_encoding=UTF-8, req.environ['PATH_INFO']=/abÄÄÄf, req.path_info=/abćďęf
inside do_request(), PATH_INFO=/abÄÄÄf
before get response, PATH_INFO=/abÄÄÄf
   inside TestRequest.call_application, environ['PATH_INFO']=/abÄÄÄf
   ...
   inside TestRequest.call_application, environ['PATH_INFO']=/abćďęf
after get response, PATH_INFO=/abćďęf 

I don’t understand the internals sufficiently to say what the most correct course of action is:

  • should Webtest create a copy of the environment prior to calling req.get_response and restore it after the call?
  • should Webtest copy only PATH_INFO and restore it after getting the response?
  • should Webtest examine PATH_INFO after getting the response, and if it is Unicode, encode it back into a raw form?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions