In the Cloud service there have been slower than normal start times for the past few days.
This is caused by a combination of slow Docker image pull times and a large increase in executing runs. Before we can execute your code we need to get it onto the server so it can be run.
The pull times are slower for two reasons:
- The Digital Ocean Container registry has been slower. We're trying to get to the bottom of why this is the case.
- Our container caching system doesn't work well now we have a lot of worker servers. It relied on local caching on the server which gives very fast results. However there's limited disk space on the servers executing your code, so as the service has become more popular the cache hit ratio has gone down. We have had really significant growth in the past month that has meant this cache is now only good for customers doing a high volume of runs (so their images remain in the cache).
Solutions
From SyncLinear.com | TRI-4519
In the Cloud service there have been slower than normal start times for the past few days.
This is caused by a combination of slow Docker image pull times and a large increase in executing runs. Before we can execute your code we need to get it onto the server so it can be run.
The pull times are slower for two reasons:
Solutions
waitfunctions at scale. This will work by snapshotting the running container as part of the deploy, just before it would start executing. Then we can do a fast restore of this snapshot for every run (even the very first run after a deploy). This will give consistent fast start times, but is significantly more complex and won't ship until at least April.From SyncLinear.com | TRI-4519