Skip to content

Improve unit test parser error handling#6933

Closed
hakonhagland wants to merge 1 commit intoOPM:masterfrom
hakonhagland:improve_ut_parser_err
Closed

Improve unit test parser error handling#6933
hakonhagland wants to merge 1 commit intoOPM:masterfrom
hakonhagland:improve_ut_parser_err

Conversation

@hakonhagland
Copy link
Copy Markdown
Contributor

Problem

When readDeck() encounters parse errors in unit tests, it calls std::exit(1) which bypasses Boost.Test entirely. This produces a silent failure (exit code 1, no error message) because:

  1. OpmLog has no registered backend in test fixtures, so error messages are discarded
  2. std::exit() skips Boost.Test's exception handling, so no test failure is reported

Solution

  • Add throwOnError parameter to readDeck() and FlowGenericVanguard::readDeck() (defaults to false for backward compatibility)
  • When enabled, throws std::runtime_error with the failure message instead of calling std::exit(1)
  • Register an OpmLog stderr backend in test fixture constructors so parser diagnostics are printed
  • Enable both in SimulatorFixture, test_equil, and test_RestartSerialization

Before / after

Before After
Error message Silently discarded Printed to stderr
Test result Silent exit code 1 Boost.Test reports std::runtime_error with message
Production behavior Unchanged Unchanged (throwOnError defaults to false)

@hakonhagland hakonhagland added the manual:irrelevant This PR is a minor fix and should not appear in the manual label Mar 12, 2026
@hakonhagland
Copy link
Copy Markdown
Contributor Author

jenkins build this please

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves unit-test diagnostics when Eclipse deck parsing fails by avoiding std::exit() (which bypasses Boost.Test) and ensuring parser errors are actually emitted to stderr during tests.

Changes:

  • Add an optional throwOnError parameter to Opm::readDeck() and FlowGenericVanguard::readDeck() to throw std::runtime_error instead of calling std::exit().
  • Register an OpmLog stderr backend in relevant test fixtures so parser diagnostics are visible in unit test output.
  • Enable throwOnError in SimulatorFixture, test_equil, and test_RestartSerialization.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/test_equil.cpp Enables throwOnError and adds stderr logging backend in the test fixture.
tests/test_RestartSerialization.cpp Enables throwOnError across restart serialization tests and adds stderr logging backend.
tests/SimulatorFixture.hpp Adds stderr logging backend and enables throwOnError for simulator-based tests.
opm/simulators/utils/readDeck.hpp Extends readDeck() API with throwOnError (default false).
opm/simulators/utils/readDeck.cpp Implements throwOnError by throwing instead of finalizing + exiting.
opm/simulators/flow/FlowGenericVanguard.hpp Extends FlowGenericVanguard::readDeck() API with throwOnError (default false).
opm/simulators/flow/FlowGenericVanguard.cpp Plumbs throwOnError through to Opm::readDeck().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread tests/SimulatorFixture.hpp
Comment thread opm/simulators/utils/readDeck.cpp
@bska
Copy link
Copy Markdown
Member

bska commented Mar 12, 2026

I'll be honest, I think this is a lot of complexity for a very narrow use case. There aren't supposed to be parse errors in the unit tests.

@hakonhagland
Copy link
Copy Markdown
Contributor Author

There aren't supposed to be parse errors in the unit tests.

@bska It has happened to me several times when adding unit tests with new datafiles to PRs, and then it is very confusing to track down what is happening.

@bska
Copy link
Copy Markdown
Member

bska commented Mar 12, 2026

It has happened to me several times when adding unit tests with new datafiles to PR

Then don't add them as unit test until you've verified that there are no parse errors. You don't need MPI support for that.

When readDeck() encounters parse errors in unit tests, it calls
std::exit(1) which bypasses Boost.Test entirely, producing a silent
failure with no error message (OpmLog has no backend in tests).

Add a throwOnError parameter to readDeck() that throws
std::runtime_error instead of calling std::exit(1). Register an
OpmLog stderr backend in test fixtures so parser diagnostics are
visible. Enable both in SimulatorFixture, test_equil, and
test_RestartSerialization.
@hakonhagland hakonhagland force-pushed the improve_ut_parser_err branch from 3104e26 to a9d58b3 Compare March 12, 2026 17:07
@hakonhagland
Copy link
Copy Markdown
Contributor Author

jenkins build this please

@hakonhagland
Copy link
Copy Markdown
Contributor Author

@bska Thanks for the feedback. A few thoughts:

  1. The change is small (50 insertions) and fully backward-compatible, throwOnError defaults to false, so existing behavior is unchanged.

  2. To clarify: this is not related to MPI. The throwOnError parameter is only used in single-process unit tests. The issue is that readDeck() calls std::exit(1) on parse errors, which bypasses Boost.Test's error reporting. Without the OpmLog backend, the error message is also silently discarded, so you get exit code 1 with no indication of what went wrong.

  3. "Verify first" is good practice, but during development you iterate on both the test code and the data file. When a typo in e.g WELLDIMS causes a silent exit, you end up reaching for GDB with break exit to find the error message. That's the workflow this PR improves.

  4. Even without throwOnError, the OpmLog backend setup alone is valuable: it surfaces parser warnings during test runs that are currently invisible.

If you feel throwOnError is too much, I could reduce the PR to just the OpmLog backend registration in the test fixtures. That alone would make parse errors visible before std::exit(1), which is the most important improvement.

@bska
Copy link
Copy Markdown
Member

bska commented Mar 12, 2026

The change is small (50 insertions) and fully backward-compatible, throwOnError defaults to false, so existing behavior is unchanged.

The number of lines is the only salient point of this.

You're adding a conditional, to a function that's already heavily nested, that will never be needed in a real simulation run yet you're asking every simulation run to pay for that complexity.

"Verify first" is good practice, but during development you iterate on both the test code and the data file. When a typo in e.g WELLDIMS causes a silent exit, you end up reaching for GDB with break exit to find the error message. That's the workflow this PR improves.

Checking if the data file is okay amounts to having something like

BOOST_AUTO_TEST_CASE(Check_Data_File)
{
    const auto deck = Parser{}.parseFile(your_filename_here);
}

and if you're developing the test you're already in the context of a set of unit tests it doesn't cost anything to have that. You can even put logging there if needed.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@atgeirr
Copy link
Copy Markdown
Member

atgeirr commented Mar 13, 2026

Not judging the merits of this PR here, but it made me wonder: why do we exit() instead of throw from readDeck() in the first place? Normally I would assume that throwing is what we want to do.

Edit: ... and would this not be dealt with by the ParseContext and ErrorGuard?

@hakonhagland
Copy link
Copy Markdown
Contributor Author

Normally I would assume that throwing is what we want to do.

@atgeirr Good question. I looked into this, ParseContext and ErrorGuard handle errors during parsing, but they don't cover the final termination step. Here's the flow:

  1. ParseContext::handleError() dispatches per error category: THROW_EXCEPTION throws immediately, DELAYED_EXIT1 accumulates in ErrorGuard, WARN/IGNORE just log.
  2. After parsing completes, readDeck() checks if (*errorGuard) for accumulated errors, extracts the messages, clears the guard (to prevent its destructor from also exiting), and then calls MPI_Finalize() + std::exit(EXIT_FAILURE).

So ErrorGuard collects the errors but readDeck() handles the actual termination. And it chose std::exit() rather than throwing, presumably because of the MPI cleanup requirement. But MPI_Finalize() is actually wrong on an error path (it requires cooperative shutdown from all ranks); MPI_Abort() would be more appropriate.

I agree that throwing is the right thing to do here. I've opened #6935 as a simpler alternative to this PR. It just replaces MPI_Finalize() + std::exit() with throw std::runtime_error() in readDeck(), and widens the catch in Main::initialize_() from std::invalid_argument to std::exception (which already calls MPI_Abort()).

@bska
Copy link
Copy Markdown
Member

bska commented Mar 13, 2026

why do we exit() instead of throw from readDeck() in the first place? Normally I would assume that throwing is what we want to do.

All of this was introduced in #2785 and whatever we do we need to be sure that we don't reintroduce the problems that prompted that change.

@hakonhagland
Copy link
Copy Markdown
Contributor Author

we need to be sure that we don't reintroduce the problems that prompted that change

@bska Thanks for the pointer to #2785. I've reviewed it and the underlying issue #2764.

The problem #2785 solved was that failureMessage was never printed before MPI_Abort(). The error ended up in failureMessage (via the catch block), not in ErrorGuard, so errorGuard->dump() printed nothing. #2785 fixed this by adding OpmLog::error(failureMessage) before termination and moving to MPI_Finalize() + std::exit().

Both #6933 and the simpler alternative #6935 preserve all of #2785's improvements. #6933 adds an optional throwOnError flag before the existing exit path (default false, so production is unchanged). #6935 takes the cleaner approach of replacing std::exit() with throw unconditionally.

@blattms
Copy link
Copy Markdown
Member

blattms commented Mar 17, 2026

why do we exit() instead of throw from readDeck() in the first place? Normally I would assume that throwing is what we want to do.

All of this was introduced in #2785

No, it was not. there was a hard MPI_Abort there before, which caused logging multiple times and with that PR this is done only once via MPI_Finalize and exit. The exit s there to have a proper return code from the program for calling scripts.

@blattms
Copy link
Copy Markdown
Member

blattms commented Mar 17, 2026

I do think that the subject of this PR does not reflect the real problem that this PR reveals:

MPI_Finalize needs to be called on all processes and this is not the case anymore for reservoir coupling which parses different decks where only some of them might have errors. Maybe it is time to test coupling with some decks being erroneous now before really fixing and merging this.

Hence I think we need to resolve this and this PR might be a good first step.

@bska
Copy link
Copy Markdown
Member

bska commented Mar 17, 2026

All of this was introduced in #2785

No, it was not. there was a hard MPI_Abort there before, which caused logging multiple times and with that PR this is done only once via MPI_Finalize and exit. The exit s there to have a proper return code from the program for calling scripts.

As far as this PR is concerned which talks about removing exit, "this" (cooperative finalisation and shutdown) was indeed all introduced in #2785.

@bska
Copy link
Copy Markdown
Member

bska commented Mar 17, 2026

Maybe it is time to test coupling with some decks being erroneous now before really fixing and merging this.

We're not merging this. The follow-up alternative in #6935 is a much better approach.

@blattms
Copy link
Copy Markdown
Member

blattms commented Mar 17, 2026

But MPI_Finalize() is actually wrong on an error path (it requires cooperative shutdown from all ranks); MPI_Abort() would be more appropriate.

I disagree. If you know that all the processes get there MPI_Finalize is the correct and clean way. This was the case here.
MPI_Abort should only be used as a last resort if something really goes wrong. The problem is the amount of MPI error messages that you get with MPI_Abort (one for each process). This has annoyed users quite a bit. You will hardly catch the real problem because of the all the noise.

@hakonhagland
Copy link
Copy Markdown
Contributor Author

MPI_Finalize needs to be called on all processes and this is not the case anymore for reservoir coupling which parses different decks where only some of them might have errors. Maybe it is time to test coupling with some decks being erroneous now before really fixing and merging this.

@blattms I've created a test case for this scenario in #6951. It runs a reservoir coupling master that spawns a slave whose DATA file has a deliberate parse error (missing INCLUDE file). The slave exits immediately after the parse failure, but the master hangs on MPI_Recv waiting for the slave's initial data exchange.

The test currently passes by detecting the expected (broken) behavior:

  • MPICH (v4.3.2): hydra kills the master immediately when the slave exits (exit code 9)
  • OpenMPI (v5.0.8) / system MPICH (v4.2.1): master hangs until a 10s timeout (exit code 124)

The plan for a follow-up PR is to have the slave send a "parse failed" signal as its first MPI message after spawn (before sendAndReceiveInitialData()), so the master can detect the failure and shut down cleanly. Once that fix lands, the test will be updated to expect a clean exit with code 1 on all MPI implementations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

manual:irrelevant This PR is a minor fix and should not appear in the manual

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants