route bake targets across pods with random loadbalance#3861
route bake targets across pods with random loadbalance#3861areebahmeddd wants to merge 1 commit into
Conversation
What about loadbalance=sticky ? |
|
Also, please squash the commits |
no change. it reuses the same shared connection path
sure 🙂 |
793a2f6 to
fb5e0c0
Compare
With loadbalance=random, all bake targets were landing on the same pod. The client connection was cached after the first dial, so pod selection only happened once. Fix by dialing a fresh connection per target when loadbalance=random. Also skip the shared local-mount session optimisation in this mode since targets connect to different pods and cannot share a session. Fixes docker#3382 Signed-off-by: Areeb Ahmed <areebahmed0709@gmail.com>
fb5e0c0 to
1050ee5
Compare
|
gentle ping! @AkihiroSuda the ci failure is not related to our changes. |
crazy-max
left a comment
There was a problem hiding this comment.
I was looking at
buildx/driver/kubernetes/podchooser/podchooser.go
Lines 29 to 45 in 0b358c7
Seems to still looks capable of picking the same pod for all targets in one bake no? ChoosePod() creates a new RNG on every call and seeds it with time.Now().Unix(). With this PR, multiple targets can create fresh clients and dial within the same second, so each call can get the same seed and produce the same first random value iiuc.
Maybe we should seed the RNG once at the package level, with concurrency handled, or use another safe random selection path? Otherwise this PR may still reproduce the original issue.
| func (d *DriverHandle) NewClient(ctx context.Context) (*client.Client, error) { | ||
| return d.Driver.Client(ctx, d.getClientOptions()...) | ||
| } | ||
|
|
||
| func (d *DriverHandle) NeedsNewClientPerBuild() bool { | ||
| type perBuildClientDriver interface { | ||
| NeedsNewClientPerBuild() bool | ||
| } | ||
| if p, ok := d.Driver.(perBuildClientDriver); ok { | ||
| return p.NeedsNewClientPerBuild() | ||
| } | ||
| return false | ||
| } |
There was a problem hiding this comment.
Actually after new review this doesn't feel like the right shape for the driver API. NeedsNewClientPerBuild() is added as an optional hidden interface inside DriverHandle, but the behavior is broader than the name says. The build code also uses it to disable shared local mount sessions.
Can we make that contract explicit near the Driver interface instead of hiding it in manager.go? Something like an optional RequiresUncachedClient() interface would make the build-side behavior easier to reason about. I would also rename NewClient() to something like UncachedClient() so it is clear this is deliberately bypassing DriverHandle.Client() caching, not just another constructor.
|
Hm, let me take a look |
loadbalance=randomon the kubernetes driver was sending all bake targets to the same pod.the client connection was cached after the first dial, so pod selection only happened once.
fix this by creating a new connection per target when
loadbalance=randomis enabled.also disable the shared local mount session optimization in this mode.
Closes #3382