This isn't really a bug, but when I run a single-threaded computation in julia with JULIA_NUM_THREADS=16, I get, e.g., ~200 samples from my own code plus ~3000 samples in the functions poptask/wait/task_done_hook. This is correct in the sense that that is where the samples were taken, but when producing the flame graph all of the space in the graph (15/16=93% of it) is occupied by sleeping threads that were sampled.
There isn't such an issue with printed output from Profile because it is as intrusive, the same samples are just printed at the bottom on three lines.
Would you accept a pull request to optionally filter profiling data by a regexp to remove sleeping threads from the flame graph?
This is what it looks like:
