feat: timeouts#457
Conversation
7182a85 to
111e53f
Compare
This is a problem; if this is something we are going to use for the benchmarking study we need this to work with singularity because CHTC only uses singularity/apptainer. |
|
I originally assumed this was a more esoteric PR to test the error-handling workflow. Though, based on the meeting just now, I'll look into a nice way to get this working with Singularity. |
There was a problem hiding this comment.
Here is my first round review. I like the empty files if an error occurs, and it is good to have an associated log explaining why.
(Adding this comment again here) My only issue with this is the definition of "error." If a parameter combination fails the heuristics check, I do not want the output to be empty. I want the output to reflect what was actually produced, so I do not have to rerun that combination even though it "failed" the heuristics. I should be able to freely update the heuristics and have that output counted if it now passes, without rerunning combinations that previously produced empty output. In short, a parameter combination failing the heuristics should not be classified as an error as defined in this PR.
This PR not working with singularity is a very big problem because of the chtc integration.
"ML requires at least one pathway, and failing pathways can break ML-work. How do we want to handle downstream analysis when errors occur." This seems like a separate problem that I will fix internally in the ML code (I want to still make figures if we only have one pathway or a set of empty pathways).
This perplexes me but from my tests we do not need --keep-going. I do not know my original intent here
Co-Authored-By: Neha Talluri <78840540+ntalluri@users.noreply.github.com>
The small part here that is adaptable is that errors, in this PR, are only made in the |
|
This works with apptainer now, but this now touches an untested part of profiling, so that part needs a review from @jhiemstrawisc. |
ntalluri
left a comment
There was a problem hiding this comment.
here is half of a review on this pr
|
|
||
| By default, whenever SPRAS runs into a container error (i.e. an internal | ||
| algorithm error), the full workflow will fail. However, there are | ||
| certain designated errors where we don't want this to be the case (at |
There was a problem hiding this comment.
can we make this a separate section that explains each error instead of putting it in parentheses (even though right now it is only timeouts). We can expand this in the future.
| timeout: 1d | ||
|
|
||
| The timeout string parsing is delegated to `pytimeparse | ||
| <https://pypi.org/project/pytimeparse/>`__, which allows for more |
There was a problem hiding this comment.
| <https://pypi.org/project/pytimeparse/>`__, which allows for more | |
| <https://pypi.org/project/pytimeparse/>`__, (examples linked here). This allows for more |
| Timeouts | ||
| ########## | ||
|
|
||
| SPRAS allows for per-algorithm timeouts, specified under the global |
There was a problem hiding this comment.
| SPRAS allows for per-algorithm timeouts, specified under the global | |
| SPRAS allows for optional per-algorithm timeouts, specified under the global |
There was a problem hiding this comment.
can you also add a sentence on what happens when timeout is not included?
|
|
||
| def __init__(self, timeout: int, *args): | ||
| self.timeout = timeout | ||
| self.message = f"Timed out after {timeout}s." |
There was a problem hiding this comment.
could we convert this back into other time scales (if I put 5.6 days I don't want to have to read that as 483840 seconds).
| self.message = f"Timed out after {timeout}s." | |
| self.message = f"Timed out after {timeout} seconds." |
There was a problem hiding this comment.
f"Timed out after {timeout} seconds; {hours} hours; {days} days ."
Something like this?
There was a problem hiding this comment.
(only do this if it is easy to do)
Adds
timeoutto algorithms as a demonstration of passing through errors. Closes #316.Caveat: ML requires at least one pathway, and failing pathways can break ML-work. How do we want to handle downstream analysis when errors occur (including in the future heuristic errors?)