Throw from readDeck() instead of std::exit() by hakonhagland · Pull Request #6935 · OPM/opm-simulators

hakonhagland · 2026-03-13T11:21:10Z

Alternative to #6933.

Problem

When readDeck() encounters unrecoverable parse errors, it calls MPI_Finalize() + std::exit(EXIT_FAILURE). This is problematic because:

Unit tests fail silently — std::exit() bypasses Boost.Test, producing exit code 1 with no test failure report
Not idiomatic C++ — functions should signal errors via exceptions, letting callers decide how to handle them
MPI cleanup is wrong — MPI_Finalize() requires cooperative shutdown from all ranks; on an error path MPI_Abort() is more appropriate

Solution

readDeck.cpp: Replace MPI_Finalize() + std::exit(EXIT_FAILURE) with throw std::runtime_error(failureMessage)
Main.hpp: Widen catch in Main::initialize_() from std::invalid_argument to std::exception so it catches both existing and new exception types. The existing MPI_Abort() call in the catch block handles MPI cleanup correctly.

Impact

Context	Before	After
Production	`MPI_Finalize` + `std::exit(1)`	Exception caught by `Main::initialize_()`, `MPI_Abort`, return false
Unit tests	Silent exit code 1	Boost.Test catches `std::runtime_error`, reports failure with message
Normal runs	No change	No change (error path only)

Closes #6933

hakonhagland · 2026-03-13T11:21:27Z

jenkins build this please

bska · 2026-03-13T11:33:14Z

I still maintain that the unit test justification is just wrong. You should never add a unit test that has parse errors.

hakonhagland · 2026-03-13T11:39:35Z

You should never add a unit test that has parse errors.

Yes, but in my opinion during development it will be helpful to have this to catch deck errors quickly.

bska · 2026-03-13T11:41:15Z

You should never add a unit test that has parse errors.

Yes, but in my opinion during development it will be helpful to have this to catch deck errors quickly.

And you're making everything more difficult for the slight convenience of something that happens very infrequently. You should use other tools when developing.

atgeirr · 2026-03-16T12:45:46Z

The implementation here is better than in #6933 as it is less intrusive, but the MPI cleanup is worse: We do want cooperative cleanup and MPI_Finalize() when we can, instead of MPI_Abort(). So the code inside readDeck.cpp doing this is exactly as is should be, unlike the code in Main.hpp that just aborts!

I agree that we SHOULD move the finalize and exit out of readDeck.cpp, that function may instead return a status flag or similar. But in the end all processes must communicate to cooperatively finalize and end, as would happen in the current master.

The implementation in this PR would cooperate to establish that there is an error, all would throw, then whichever process reaches MPI_Abort() first gets everyone killed.

atgeirr · 2026-03-16T13:06:44Z

One more on this: The destructor of Main performs MPI_Finalize(), and lots of similar cleanup. I believe we should not call MPI_Finalize() or MPI_Abort() anywhere else if we can help it. I think the proper way to change this would be to just return EXIT_FAILURE from Main::initialize_().

If you look at the no-MPI version, that is exactly what happens! I believe that is actually correct and that we should remove the MPI_Abort() call, letting control propagate all the way out in the regular way. We will not simulate since initialize_() returns EXIT_FAILURE and then we go out of scope, Main::~Main() will finalize, and every rank end nicely with return EXIT_FAILURE.

Replace MPI_Finalize + std::exit(EXIT_FAILURE) with throw std::runtime_error in readDeck() on parse failure. This lets callers handle the error: Main::initialize_() catches it and calls MPI_Abort(), while Boost.Test catches it and reports a proper test failure instead of a silent exit. Widen the catch clause in Main::initialize_() from std::invalid_argument to std::exception to cover both existing and new exception types.

Remove MPI_Abort() from Main::initialize_() catch block. All exceptions from readDeck() are synchronized across ranks via comm.min(), so returning false is safe and lets Main::~Main() call MPI_Finalize() for cooperative shutdown.

hakonhagland · 2026-03-16T15:08:46Z

We do want cooperative cleanup and MPI_Finalize() when we can

@atgeirr Good suggestion. I've added a commit that removes MPI_Abort() from the catch block and just returns false, letting Main::~Main() handle MPI_Finalize() for cooperative shutdown. This is safe since all exceptions from readDeck() are synchronized across ranks via comm.min().

Let me know if I should squash the two commits into one before merging.

hakonhagland · 2026-03-16T15:09:02Z

jenkins build this please

atgeirr · 2026-03-16T22:19:59Z

This is better, although I think we should try one more improvement: the catch in Main.hpp does not have an associated communication to ensure all ranks stop if only one throws. The mechanism in place in readDeck.cpp ensure that for those failures, all ranks throw and will enter the catch block. However, other exceptions may still only be thrown on only one or some ranks (I do not have a particular candidate, but it would include anything the catch on (old) line 322 was designed to handle). Therefore that code must also have a comm.min() etc. to ensure cooperative shutdown, just as the code in readDeck.cpp has.

Maybe that also allows us to remove that handling from readDeck.cpp, in total essentially moving it to Main.hpp? I have not thought it through completely, but it would be nice to handle the communication of failures as far out as possible.

hakonhagland · 2026-03-17T07:11:37Z

the catch in Main.hpp does not have an associated communication to ensure all ranks stop if only one throws

Note that the catch block is skipped by ranks that don't throw, so we can't synchronize inside it. Non-throwing ranks would never reach the comm.min() call.

The natural synchronization point is inside readDeck(), right after the rank-0-only parsing code. That's where the rank asymmetry is introduced (rank 0 parses, others wait), so that's where it needs to be resolved. The existing comm.min(parseSuccess) at line 783 does exactly this, it ensures all ranks agree on the outcome before any rank throws.

The rank-0-only exceptions during parsing (e.g., schedule consistency checks, satellite group validation) are caught by an inner try-catch inside readDeck() (line 745), which converts them to parseSuccess = 0. So by the time we reach comm.min(), all failure modes have been funneled into the same synchronized path.

blattms · 2026-03-17T14:32:26Z

+        // exceptions are caught inside readDeck() and converted to a
+        // synchronized throw via comm.min()). Returning false lets
+        // Main::~Main() call MPI_Finalize() for cooperative shutdown.
+        catch (const std::exception& e)


I would prefer two catch clauses with different specific error messages. std::exception can really be anything and does not have to be related to EclipseState.

blattms · 2026-03-17T14:57:12Z

-#if HAVE_MPI
-            MPI_Abort(MPI_COMM_WORLD, EXIT_FAILURE);
-#endif


I think this is problematic (in combination with std::exception) as long as we do not know what state the system is in. With our current exception handling we do not know that as we only catch vanilla exceptions from the std library. E.g. memory might be exhausted, a fatal unrecoverable mpi problem has happened, a network problem etc.

If you only catch std::exception then actually MPI_Abort makes sense. If you you have no information about what happened, then you need to take the last available weapon that might somehow kill the application. If you want to get partly rid of this then you need to carefully refactor to use custom exceptions and act upon them accordingly. This would be a rather huge task and you would still use MPI_Abort on occasions.

Even if we argue that on an ideal system nothing will happen and the current code does makes sure exceptions are propagated to all ranks, the latter might easily change and boom we have a deadlock.

I have read all the arguments of the two different PRs. To me the other one seems less risky as it just adds some special casing. I have spent quite some time fixing deadlocks and problems in this area and would rather avoid us from spending more time.

I would prefer two catch clauses with different specific error messages. std::exception can really be anything and does not have to be related to EclipseState.

@blattms Good point. I've added a commit that splits the catch into two clauses:

catch (const std::runtime_error& e) : for synchronized readDeck() parse failures. These are safe for cooperative shutdown via return false since all ranks throw together (synchronized via comm.min() inside readDeck()).

catch (const std::exception& e) : safety net for unexpected exceptions (e.g., std::bad_alloc, network errors) that may not be synchronized across ranks. This retains MPI_Abort() to prevent deadlocks.

Each clause also has its own error message: "Failed to create valid EclipseState object" for parse failures vs. "Unexpected error during initialization" for other exceptions.

Use std::runtime_error catch for synchronized readDeck() parse failures with cooperative shutdown via return false. Keep std::exception catch with MPI_Abort() as safety net for unexpected unsynchronized exceptions (e.g. bad_alloc) that could cause deadlocks if not forcefully terminated.

hakonhagland · 2026-03-17T15:26:27Z

jenkins build this please

blattms · 2026-03-24T15:11:22Z

Seems like everything has been addressed here. Still some reservations but no real objections.
Will do another jenkins run and merge.

blattms · 2026-03-24T15:11:34Z

jenkins build this serial please

hakonhagland added the manual:irrelevant This PR is a minor fix and should not appear in the manual label Mar 13, 2026

hakonhagland mentioned this pull request Mar 13, 2026

Improve unit test parser error handling #6933

Closed

hakonhagland added 2 commits March 16, 2026 14:36

Use cooperative MPI shutdown on parse failure

98dafe2

Remove MPI_Abort() from Main::initialize_() catch block. All exceptions from readDeck() are synchronized across ranks via comm.min(), so returning false is safe and lets Main::~Main() call MPI_Finalize() for cooperative shutdown.

hakonhagland force-pushed the fix_readdeck_exit branch from 9a8e11b to 98dafe2 Compare March 16, 2026 15:08

blattms reviewed Mar 17, 2026

View reviewed changes

hakonhagland mentioned this pull request Mar 23, 2026

Reservoir coupling: Fix master hang when slave fails to parse deck #6959

Draft

blattms merged commit 1bb13fa into OPM:master Mar 24, 2026
2 checks passed

Conversation

hakonhagland commented Mar 13, 2026 • edited by blattms Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Impact

Uh oh!

hakonhagland commented Mar 13, 2026

Uh oh!

bska commented Mar 13, 2026

Uh oh!

hakonhagland commented Mar 13, 2026

Uh oh!

bska commented Mar 13, 2026

Uh oh!

atgeirr commented Mar 16, 2026

Uh oh!

atgeirr commented Mar 16, 2026 • edited by bska Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hakonhagland commented Mar 16, 2026

Uh oh!

hakonhagland commented Mar 16, 2026

Uh oh!

atgeirr commented Mar 16, 2026

Uh oh!

hakonhagland commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

blattms Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

blattms Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

hakonhagland Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

hakonhagland commented Mar 17, 2026

Uh oh!

blattms commented Mar 24, 2026

Uh oh!

blattms commented Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hakonhagland commented Mar 13, 2026 •

edited by blattms

Loading

atgeirr commented Mar 16, 2026 •

edited by bska

Loading

hakonhagland commented Mar 17, 2026 •

edited

Loading