[FLINK-39401][formats] Port raw line-delimiter option to release-1.15#28048
[FLINK-39401][formats] Port raw line-delimiter option to release-1.15#28048featzhang wants to merge 3 commits intoapache:release-1.15from
Conversation
Port PR apache#27897 from master to release-1.15. Extends the raw format to support an optional 'raw.line-delimiter' configuration option: - Deserialization: splits each incoming message by the delimiter using a pre-compiled Pattern and emits one RowData per segment. Null messages with delimiter produce zero rows. Trailing delimiter is stripped to ensure round-trip compatibility. - Serialization: appends delimiter bytes (pre-computed) after each serialized value. - Backward compatible: all existing behavior preserved when raw.line-delimiter is not set. Changes: - RawFormatOptions: add LINE_DELIMITER ConfigOption (no default value) - RawFormatFactory: read option, pass to schema builders, register in optionalOptions() - RawFormatDeserializationSchema: add lineDelimiter + lineDelimiterPattern fields, new 5-arg constructor, override deserialize(byte[], Collector) - RawFormatSerializationSchema: add lineDelimiter + delimiterBytes fields, new 4-arg constructor, append delimiter in serialize() - RawFormatFactoryTest: add testLineDelimiterOption() - RawFormatLineDelimiterTest: new test class with 9 tests (JUnit 4)
|
@featzhang flinkbot is acting weirdly. Try making a empty comment. git commit -m "trigger" --allow-empty. |
CI failures on previous build are unrelated to this PR: - Azure agent pool image label missing for release-1.15 pipeline - Pre-existing flaky WikipediaEditsSourceTest (external IRC dependency) Empty commit to re-trigger Azure CI.
CI Failure Root-Cause Analysis (Build #74590)Quick summary: all 11 failing tasks trace back to Azure infrastructure issues on the 1. Primary cause — Azure agent pool image label no longer exists for
|
| Job | Error message |
|---|---|
e2e_1_ci |
No image label found to route agent pool Azure Pipelines. |
e2e_2_ci |
No image label found to route agent pool Azure Pipelines. |
docs_404_check |
No image label found to route agent pool Azure Pipelines. |
These jobs never start executing — they die before git clone. The Azure Pipelines agent pool used by the release-1.15 pipeline definition no longer advertises the image label those jobs request. This is a pipeline-configuration / agent-pool issue on the infra side, not something a PR change can fix.
2. Secondary cause — Test - connectors flaky + cascade
Test - connectors→Bash exited with code '1'(no Java stack trace surfaced in the task log — the failure is in the outer bash wrapper, consistent with the pre-existing flakiness seen on other recentrelease-1.15PRs).PublishTestResults→No test result files matching '**/TEST-*.xml' were found.— this is a direct cascade of the above: the test step exited before Surefire produced the XML reports.test_ci connectors(Job) /test_ci(Phase) /CI build (custom builders)(Stage) — all have empty messages, they are simply the Stage/Phase wrappers propagating the child task's failure.
3. Relationship to this PR's changes
This PR only touches 6 files under flink-table/flink-table-runtime/src/{main,test}/java/org/apache/flink/{formats,table/formats}/raw/:
RawFormatDeserializationSchema.java,RawFormatSerializationSchema.java,RawFormatFactory.java,RawFormatOptions.javaRawFormatFactoryTest.java,RawFormatLineDelimiterTest.java(new)
Zero overlap with:
- the e2e test suites (
e2e_1_ci,e2e_2_ci), - the connectors module (
Test - connectors), - the docs pipeline (
docs_404_check).
The only module that actually exercises this change is flink-table-runtime, which was not among the reported failures. I also re-ran the targeted unit tests locally:
mvn test -pl flink-table/flink-table-runtime -Dtest='RawFormat*'
→ Tests run: 51, Failures: 0, Errors: 0, Skipped: 0
(11 new RawFormatLineDelimiterTest + 7 RawFormatFactoryTest + 33 pre-existing SerDe tests.)
4. Context: release-1.15 is EOL
As I noted in the PR description, release-1.15 is past its community maintenance window. The Azure agent-pool image label removal and the connectors-test flakiness are both manifestations of the EOL infrastructure no longer being actively maintained, and re-running CI (I already pushed an empty commit d5efa5f to retrigger — same failure pattern) will not produce a green build on this branch under the current configuration.
Request
Given (a) the change is identical to the already-merged master PR #27897, (b) the failures are infrastructure-only and unrelated to the diff, and (c) local tests pass cleanly — would a committer with release-1.15 merge access be willing to merge this via a no-CI-required path? Happy to add the two docs/**/raw.md doc files from #27897 as well if that helps.
cc @spuru9 (already approved), and anyone still active on release-1.15 maintenance.
What is the purpose of the change
Port PR #27897 (FLINK-39401) from
mastertorelease-1.15. Extends therawformat with an optionalraw.line-delimiterconfiguration that lets each Kafka/file message encode multiple records separated by a delimiter.Brief change log
RawFormatOptions: addLINE_DELIMITERConfigOption(no default, supports Java escape sequences like\n,\r\n).RawFormatFactory: read the option, register it inoptionalOptions(), and pass it to the (de)serialization schemas.RawFormatDeserializationSchema:@Nullable String lineDelimiter; the previous 4-arg constructor delegates withnullfor backward compatibility.Patternfor splitting; newdeserialize(byte[], Collector<RowData>)override emits oneRowDataper segment.RawFormatSerializationSchema:@Nullable String lineDelimiter; old 3-arg constructor delegates withnull.delimiterBytes;serialize()appends them after the value bytes.nullrow still returnsnull.Verifying this change
Added tests:
RawFormatFactoryTest.testLineDelimiterOption— verifies the factory wires the option through correctly.RawFormatLineDelimiterTest(new, 11 tests) — covers:\n/ with multi-char / with GBK charset delimiters\n/ with custom delimiter, null rowRun:
Result:
Tests run: 51, Failures: 0, Errors: 0, Skipped: 0(11 new + 7 factory + 33 existing SerDe).Does this pull request potentially affect one of the following parts:
@Public(Evolving): no (new option is additive, behavior unchanged when unset)deserialize/serialize. When the option is unset, the behavior and allocations are unchanged; when set, a pre-compiledPatternand pre-computedbyte[]avoid per-record allocation.Documentation
LINE_DELIMITERoption; not porting the website docs update from [FLINK-39401] Extend raw format to support line-delimiter option #27897 sincerelease-1.15docs are frozen. Happy to add if desired.