Skip to content

Throw from readDeck() instead of std::exit()#6935

Merged
blattms merged 3 commits intoOPM:masterfrom
hakonhagland:fix_readdeck_exit
Mar 24, 2026
Merged

Throw from readDeck() instead of std::exit()#6935
blattms merged 3 commits intoOPM:masterfrom
hakonhagland:fix_readdeck_exit

Conversation

@hakonhagland
Copy link
Copy Markdown
Contributor

@hakonhagland hakonhagland commented Mar 13, 2026

Alternative to #6933.

Problem

When readDeck() encounters unrecoverable parse errors, it calls MPI_Finalize() + std::exit(EXIT_FAILURE). This is problematic because:

  • Unit tests fail silentlystd::exit() bypasses Boost.Test, producing exit code 1 with no test failure report
  • Not idiomatic C++ — functions should signal errors via exceptions, letting callers decide how to handle them
  • MPI cleanup is wrongMPI_Finalize() requires cooperative shutdown from all ranks; on an error path MPI_Abort() is more appropriate

Solution

  • readDeck.cpp: Replace MPI_Finalize() + std::exit(EXIT_FAILURE) with throw std::runtime_error(failureMessage)
  • Main.hpp: Widen catch in Main::initialize_() from std::invalid_argument to std::exception so it catches both existing and new exception types. The existing MPI_Abort() call in the catch block handles MPI cleanup correctly.

Impact

Context Before After
Production MPI_Finalize + std::exit(1) Exception caught by Main::initialize_(), MPI_Abort, return false
Unit tests Silent exit code 1 Boost.Test catches std::runtime_error, reports failure with message
Normal runs No change No change (error path only)

Closes #6933

@hakonhagland hakonhagland added the manual:irrelevant This PR is a minor fix and should not appear in the manual label Mar 13, 2026
@hakonhagland
Copy link
Copy Markdown
Contributor Author

jenkins build this please

@bska
Copy link
Copy Markdown
Member

bska commented Mar 13, 2026

I still maintain that the unit test justification is just wrong. You should never add a unit test that has parse errors.

@hakonhagland
Copy link
Copy Markdown
Contributor Author

You should never add a unit test that has parse errors.

Yes, but in my opinion during development it will be helpful to have this to catch deck errors quickly.

@bska
Copy link
Copy Markdown
Member

bska commented Mar 13, 2026

You should never add a unit test that has parse errors.

Yes, but in my opinion during development it will be helpful to have this to catch deck errors quickly.

And you're making everything more difficult for the slight convenience of something that happens very infrequently. You should use other tools when developing.

@atgeirr
Copy link
Copy Markdown
Member

atgeirr commented Mar 16, 2026

The implementation here is better than in #6933 as it is less intrusive, but the MPI cleanup is worse: We do want cooperative cleanup and MPI_Finalize() when we can, instead of MPI_Abort(). So the code inside readDeck.cpp doing this is exactly as is should be, unlike the code in Main.hpp that just aborts!

I agree that we SHOULD move the finalize and exit out of readDeck.cpp, that function may instead return a status flag or similar. But in the end all processes must communicate to cooperatively finalize and end, as would happen in the current master.

The implementation in this PR would cooperate to establish that there is an error, all would throw, then whichever process reaches MPI_Abort() first gets everyone killed.

@atgeirr
Copy link
Copy Markdown
Member

atgeirr commented Mar 16, 2026

One more on this: The destructor of Main performs MPI_Finalize(), and lots of similar cleanup. I believe we should not call MPI_Finalize() or MPI_Abort() anywhere else if we can help it. I think the proper way to change this would be to just return EXIT_FAILURE from Main::initialize_().

If you look at the no-MPI version, that is exactly what happens! I believe that is actually correct and that we should remove the MPI_Abort() call, letting control propagate all the way out in the regular way. We will not simulate since initialize_() returns EXIT_FAILURE and then we go out of scope, Main::~Main() will finalize, and every rank end nicely with return EXIT_FAILURE.

Replace MPI_Finalize + std::exit(EXIT_FAILURE) with
throw std::runtime_error in readDeck() on parse failure.
This lets callers handle the error: Main::initialize_()
catches it and calls MPI_Abort(), while Boost.Test catches
it and reports a proper test failure instead of a silent
exit. Widen the catch clause in Main::initialize_() from
std::invalid_argument to std::exception to cover both
existing and new exception types.
Remove MPI_Abort() from Main::initialize_() catch block.
All exceptions from readDeck() are synchronized across
ranks via comm.min(), so returning false is safe and lets
Main::~Main() call MPI_Finalize() for cooperative shutdown.
@hakonhagland
Copy link
Copy Markdown
Contributor Author

We do want cooperative cleanup and MPI_Finalize() when we can

@atgeirr Good suggestion. I've added a commit that removes MPI_Abort() from the catch block and just returns false, letting Main::~Main() handle MPI_Finalize() for cooperative shutdown. This is safe since all exceptions from readDeck() are synchronized across ranks via comm.min().

Let me know if I should squash the two commits into one before merging.

@hakonhagland
Copy link
Copy Markdown
Contributor Author

jenkins build this please

@atgeirr
Copy link
Copy Markdown
Member

atgeirr commented Mar 16, 2026

This is better, although I think we should try one more improvement: the catch in Main.hpp does not have an associated communication to ensure all ranks stop if only one throws. The mechanism in place in readDeck.cpp ensure that for those failures, all ranks throw and will enter the catch block. However, other exceptions may still only be thrown on only one or some ranks (I do not have a particular candidate, but it would include anything the catch on (old) line 322 was designed to handle). Therefore that code must also have a comm.min() etc. to ensure cooperative shutdown, just as the code in readDeck.cpp has.

Maybe that also allows us to remove that handling from readDeck.cpp, in total essentially moving it to Main.hpp? I have not thought it through completely, but it would be nice to handle the communication of failures as far out as possible.

@hakonhagland
Copy link
Copy Markdown
Contributor Author

hakonhagland commented Mar 17, 2026

the catch in Main.hpp does not have an associated communication to ensure all ranks stop if only one throws

Note that the catch block is skipped by ranks that don't throw, so we can't synchronize inside it. Non-throwing ranks would never reach the comm.min() call.

The natural synchronization point is inside readDeck(), right after the rank-0-only parsing code. That's where the rank asymmetry is introduced (rank 0 parses, others wait), so that's where it needs to be resolved. The existing comm.min(parseSuccess) at line 783 does exactly this, it ensures all ranks agree on the outcome before any rank throws.

The rank-0-only exceptions during parsing (e.g., schedule consistency checks, satellite group validation) are caught by an inner try-catch inside readDeck() (line 745), which converts them to parseSuccess = 0. So by the time we reach comm.min(), all failure modes have been funneled into the same synchronized path.

Comment thread opm/simulators/flow/Main.hpp Outdated
// exceptions are caught inside readDeck() and converted to a
// synchronized throw via comm.min()). Returning false lets
// Main::~Main() call MPI_Finalize() for cooperative shutdown.
catch (const std::exception& e)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer two catch clauses with different specific error messages. std::exception can really be anything and does not have to be related to EclipseState.

Comment on lines -328 to -330
#if HAVE_MPI
MPI_Abort(MPI_COMM_WORLD, EXIT_FAILURE);
#endif
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is problematic (in combination with std::exception) as long as we do not know what state the system is in. With our current exception handling we do not know that as we only catch vanilla exceptions from the std library. E.g. memory might be exhausted, a fatal unrecoverable mpi problem has happened, a network problem etc.

If you only catch std::exception then actually MPI_Abort makes sense. If you you have no information about what happened, then you need to take the last available weapon that might somehow kill the application. If you want to get partly rid of this then you need to carefully refactor to use custom exceptions and act upon them accordingly. This would be a rather huge task and you would still use MPI_Abort on occasions.

Even if we argue that on an ideal system nothing will happen and the current code does makes sure exceptions are propagated to all ranks, the latter might easily change and boom we have a deadlock.

I have read all the arguments of the two different PRs. To me the other one seems less risky as it just adds some special casing. I have spent quite some time fixing deadlocks and problems in this area and would rather avoid us from spending more time.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer two catch clauses with different specific error messages. std::exception can really be anything and does not have to be related to EclipseState.

@blattms Good point. I've added a commit that splits the catch into two clauses:

  1. catch (const std::runtime_error& e) : for synchronized readDeck() parse failures. These are safe for cooperative shutdown via return false since all ranks throw together (synchronized via comm.min() inside readDeck()).

  2. catch (const std::exception& e) : safety net for unexpected exceptions (e.g., std::bad_alloc, network errors) that may not be synchronized across ranks. This retains MPI_Abort() to prevent deadlocks.

Each clause also has its own error message: "Failed to create valid EclipseState object" for parse failures vs. "Unexpected error during initialization" for other exceptions.

Use std::runtime_error catch for synchronized readDeck()
parse failures with cooperative shutdown via return false.
Keep std::exception catch with MPI_Abort() as safety net
for unexpected unsynchronized exceptions (e.g. bad_alloc)
that could cause deadlocks if not forcefully terminated.
@hakonhagland
Copy link
Copy Markdown
Contributor Author

jenkins build this please

@blattms
Copy link
Copy Markdown
Member

blattms commented Mar 24, 2026

Seems like everything has been addressed here. Still some reservations but no real objections.
Will do another jenkins run and merge.

@blattms
Copy link
Copy Markdown
Member

blattms commented Mar 24, 2026

jenkins build this serial please

@blattms blattms merged commit 1bb13fa into OPM:master Mar 24, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

manual:irrelevant This PR is a minor fix and should not appear in the manual

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants