GH-33823: [C++] Improve err msgs when opening files that are the wrong format#49771
GH-33823: [C++] Improve err msgs when opening files that are the wrong format#49771RobertLD wants to merge 2 commits intoapache:mainfrom
Conversation
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format? or See also: |
d7f5423 to
abd2b9c
Compare
There was a problem hiding this comment.
Pull request overview
This PR improves user-facing error messages in the C++ IPC readers/decoders when callers attempt to open an IPC File as a Stream (or vice versa), providing more actionable guidance aligned with GH-33823.
Changes:
- Update the IPC file reader footer/magic check failure message to suggest the streaming reader when appropriate.
- Add a heuristic in the IPC stream message decoder to detect an IPC file magic prefix being misinterpreted as a stream message length, and emit a more instructive error.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
cpp/src/arrow/ipc/reader.cc |
Improves the “not an Arrow file” error when the footer magic doesn’t match, suggesting the streaming reader. |
cpp/src/arrow/ipc/message.cc |
Adds detection for IPC File magic bytes when decoding a stream message and returns a more targeted “wrong format” error. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "Not an Arrow file. If this is an Arrow IPC Streaming format file, try " | ||
| "open_stream() instead."); |
There was a problem hiding this comment.
The error message suggests calling open_stream(), but this is not a C++ API entry point (C++ uses RecordBatchStreamReader::Open / stream reader types). Since this Status message is emitted from the C++ library and may be surfaced to C++ callers as well as bindings, consider making the guidance API-neutral (e.g., refer to the IPC stream reader) or mention the C++ API name explicitly.
| "Not an Arrow file. If this is an Arrow IPC Streaming format file, try " | |
| "open_stream() instead."); | |
| "Not an Arrow file. If this is an Arrow IPC streaming format file, use " | |
| "the IPC stream reader instead."); |
| ". This appears to be an Arrow IPC File format file. " | ||
| "Try open_file() instead of open_stream()."); |
There was a problem hiding this comment.
This error text is emitted from the C++ IPC decoder but recommends open_file() / open_stream(), which are binding-specific names (they don't exist in the C++ API). Consider API-neutral guidance (IPC file reader vs IPC stream reader) or include the C++ entry points (e.g., RecordBatchFileReader::Open / RecordBatchStreamReader::Open). Also, "Arrow IPC File format file" repeats "file"; reword to avoid the redundancy.
| ". This appears to be an Arrow IPC File format file. " | |
| "Try open_file() instead of open_stream()."); | |
| ". This appears to be an Arrow IPC file. " | |
| "Try the IPC file reader instead of the IPC stream reader."); |
|
Decent feedback Mr. Bot, I'll make changes shortly. Honestly impressed the LLM review had such good context. Good bot |
Rationale for this change
The original error messages did not provide instruction to users on how to best correct their usage
What changes are included in this PR?
Improving the messages
Are these changes tested?
Current unit tests were executed, no new tests were introduced
Are there any user-facing changes?
Yes, error messages have been updated that are surfaced to users