The luigi scheduler doesn't seem to cope well with a huge amount (300,000 in my test) of scheduled tasks (siblings in the dependency graph). Scheduling becomes the bottleneck and load is not distributed over available workers.
Attempting to bypass this issue by grouping tasks in batches...
The luigi scheduler doesn't seem to cope well with a huge amount (300,000 in my test) of scheduled tasks (siblings in the dependency graph). Scheduling becomes the bottleneck and load is not distributed over available workers.
Attempting to bypass this issue by grouping tasks in batches...