Conversation
|
Hi, thank you for the PR! I haven't checked the code in detail yet, but at a glance it looks fine. I'll answer the questions from #1016 (comment) here:
There is also kind of a second level of utilization that we could somehow show visually - which CPUs are assigned to the given worker. Because right now we show all CPUs of the given PC, and the CPUs assigned to tasks, but in theory the worker might be managing only a subset of the CPUs (though that is probably relative rare? hard to tell). In fact, maybe we should just get rid of "global" CPU utilization and always just deal with the assigned resources (at least for CPUs, for memory it's more complicated). It could be useful follow-up work, to 1) only show the worker assigned CPUs here, and also to only gather utilization for those CPUs in the worker collection loop. Another follow-up could be to separate memory utilization across tasks (based on the memory utilization of their processes), but that can be very tricky to pull off, as tasks can spawn subprocesses, etc. CC @spirali about the color scheme (blue vs graying out unassigned CPUs) and if you think that we should show all CPUs on the worker, or only those managed by the worker. The latter would be more consistent with how we treat non-CPU resources. |
|
Majority of our users probably have whole node, but I am aware of some users that do subnode SLURM allocation. AFAIK they do not usually have more workers on a single node, they just have a single worker on a node that manages just subset of the node. Graying out non-managed resources seems good to me. But it should be always possible to see non-managed CPUs (or ideally all node resources). There are usually two use cases:
Let us say that will eventually have some "top"-like utility. For the first question, we want to only processes spawned by our task. For the second question we want to see all processes. So we should probably support both views. I have no strong opinion on what is a good default. |
|
Well, the thing is that knowing all resources of the node is not really something that we can robustly know. Currently, we sorta try to guess it for CPUs, but even there it might not be accurate. And for GPUs and other resources, we often might only have access to a subset of the resources. I think that from our point of view, we cannot reliably saw what is the "whole node", and should mostly talk only about resources managed by HQ workers. What we could do though is to say something like "we detected N additional CPUs on the node" or something. |
|
I wrote in "ideal case"; I know that we cannot provide "generic HW monitoring tool", but other CPUs are quite easy, so if we can provide them than I think that we should show them. |
I guess that depends on how SLURM is configured, it is definitely possible to hide some CPUs from HyperQueue. But yeah, I guess that we can show all CPUs (that are visible to us), especially since that we already do this today 😆 Just that I would treat it more as an auxiliary information rather than the main thing that we want to present. |
|
I do agree, that not showing the worker assigned cpus utilization as default is not ideal, but i wanted to keep the original intention, and this as an added feature. But as you mention above, I agree that showing it as a default makes more sense, and opting out to see "all cpus" could be a alternative. Also thinking about it now, if the user wants to see how the whole node/(all cpus in the list) are doing, they surely can do it without the fancy colors. So maybe even keep it simple without the switch might be a good option. So i will change the color scheme to classic green->red to assigned cpus, and graying out the other cpus. Set it as a default. If you come to a final decision about the possibility to switch between all cpus <> assigned cpus, i can remove it or keep it as is. I can try to brainstorm a bit to come up with new layout to represent the usage statistics. Alternatively we can discuss it when your schedule frees up. |
|
👍 on all you said. Regarding
you can keep it, but please make the assigned CPUs view be the default. |
|
I vote for keeping "assigned only" view, or at least sorting assigned cpus at the beginning. The original motivation for all this was when I need to find utilization of "my" 4 cpus on 256+ cpus machine. |
d70f387 to
f61e2f7
Compare
|
As we discussed above, I have used the gray out of the unused cpus in the Worker cpu utilization. Also it is set as a default. I also added the sorting of the used cpus as was metioned by Ada. Because i though that it makes sense on the bigger nodes, than mine, to be able to quickly locate and read the data that i want. In addition to that now in the worker utilization view, the usage of the assigned cpus is shown, and also the count. Let me know if it is ok or no. Idea: Because maybe it might be misleading to show (5 CPUS) if the worker can work with all of them. So maybe {num_used}/{num_available} in showcase below (5/16 CPUS) would be clearer. I am not sure about the change in the title, if Node is the correct thing to call it, as you discussed above. Demonstration of the current state below: |
|
I didn't like the sorting at first, but I guess it makes sense, there's not really much point in seeing the CPUs ordered by their ID (except for hyper-threading). Node utilization sounds good. In the worker utilization, there indeed might be some ways of separating assigned vs managed CPUs, but the space in the title is quite limited. Now that I see the output, I wonder if we should actually have three views:
@spirali What do you think of the three views idea? :) One problem with that is that we cannot easily determine the memory utilization of the worker based on tasks and CPUs. But showing the memory utilization in the title is a bit of a hack anyway, maybe we could show that elsewhere, and keep the CPU view only for CPUs, and not memory? |
|
Adding the extra view shouldn't be a problem. The question is how do i get the information about the managed cpus against the node cpus. This seems to be a hard thing to aquire, afaik. Correct me if i am wrong, but right now only the assigned cpus are 'easy' to distinguish. |
|
I think you already have all the information. The node list of CPUs is returned from the HW overview. The CPU resources managed by the worker are known, they are stored in the worker configuration. And you also know the CPUs of tasks that are currently assigned to the worker. |


This PR adds a option to show utilization with distinguished cpus that are used by running pinned tasks that are assigned to a worker. Without the pinning the feature doesn't work well now.
Currently the change of color scheme is used to distinguish the cpus. But maybe the graying out of the other cpus might be better.
Keybindings added is 'c' that enables the user to toggle between global and worker specific cpu usage.
Features that might be connected with this:
This is part of the feature requested in #1016