Skip to content

Elevated latency, instance count and new errors/warnings compared to Python 2 runtime #105

@pnico

Description

@pnico

We are using F1 instances (2 cores). In Python 3, with no meaningful code changes, we are seeing 1.5-2x median latency across the app, an even greater increase in created/active/billed instances with the same appengine settings, and some new errors and warnings we never saw before. For example, we see this a lot now with datastore operations:

...
  File "/layers/google.python.pip/pip/lib/python3.11/site-packages/google/appengine/datastore/datastore_query.py", line 2949, in __query_result_hook
    self._batch_shared.conn.check_rpc_success(rpc)
  File "/layers/google.python.pip/pip/lib/python3.11/site-packages/google/appengine/datastore/datastore_rpc.py", line 1365, in check_rpc_success
    rpc.check_success()
  File "/layers/google.python.pip/pip/lib/python3.11/site-packages/google/appengine/api/apiproxy_stub_map.py", line 614, in check_success
    self.__rpc.CheckSuccess()
  File "/layers/google.python.pip/pip/lib/python3.11/site-packages/google/appengine/api/apiproxy_rpc.py", line 149, in CheckSuccess
    raise self.exception
  File "/layers/google.python.pip/pip/lib/python3.11/site-packages/google/appengine/runtime/default_api_stub.py", line 276, in _CaptureTrace
    f(**kwargs)
  File "/layers/google.python.pip/pip/lib/python3.11/site-packages/google/appengine/runtime/default_api_stub.py", line 261, in _SendRequest
    raise self._ErrorException(*_DEFAULT_EXCEPTION)
google.appengine.runtime.apiproxy_errors.RPCFailedError: The remote RPC to the application server failed for call datastore_v3.RunQuery().

^^ (similar traces also with datastore_v3.Get()

We also see tons of these warnings, not related to any outgoing web calls our app is making: (I know it's just a warning and might be harmless, but it's worrying because of the extremely high volume in our logs):

Connection pool is full, discarding connection: appengine.googleapis.internal. Connection pool size: 10

I've been tweaking the app.yaml various ways, but I can't seem to find a configuration that solves or even significantly reduces either issue. Our app is doing the exact same operations as before with the same load. What can we try? This isn't sustainable, we can't afford this jump in cost and the app is performing much worse now. Our unit tests are running in about 30% less time compared to Python 2 so I didn't expect this, it seems like it could be I/O related?

Not sure if this is relevant, but I noticed this in google.appengine.api.apiproxy_rpc.py:

_MAX_CONCURRENT_API_CALLS = 100

_THREAD_POOL = futures.ThreadPoolExecutor(_MAX_CONCURRENT_API_CALLS)

But then in concurrent.futures.threads.py we see this comment in the constructor:

class ThreadPoolExecutor(_base.Executor):

    # Used to assign unique thread names when thread_name_prefix is not supplied.
    _counter = itertools.count().__next__

    def __init__(self, max_workers=None, thread_name_prefix='',
                 initializer=None, initargs=()):
        """Initializes a new ThreadPoolExecutor instance.

        Args:
            max_workers: The maximum number of threads that can be used to
                execute the given calls.
            thread_name_prefix: An optional name prefix to give our threads.
            initializer: A callable used to initialize worker threads.
            initargs: A tuple of arguments to pass to the initializer.
        """
        if max_workers is None:
            # ThreadPoolExecutor is often used to:
            # * CPU bound task which releases GIL
            # * I/O bound task (which releases GIL, of course)
            #
            # We use cpu_count + 4 for both types of tasks.
            # But we limit it to 32 to avoid consuming surprisingly large resource
            # on many core machine.
            max_workers = min(32, (os.cpu_count() or 1) + 4)

So the default number of workers would be 6, but apiproxy_rpc.py is setting it to 100 regardless of the actual number of CPU cores available. And, this default has a hardcoded limit of 32, because of reported "consuming surprisingly large resource." What led to the decision to use 100 here? (Edit: is this maybe OK because RPC calls aren't CPU-bound? Should the urllib3 connection pool size be increased to match this?)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions