Skip to content

Limit CPU video decoder codec support#6352

Open
JanuszL wants to merge 1 commit into
NVIDIA:mainfrom
JanuszL:remove_cpu_codecs
Open

Limit CPU video decoder codec support#6352
JanuszL wants to merge 1 commit into
NVIDIA:mainfrom
JanuszL:remove_cpu_codecs

Conversation

@JanuszL
Copy link
Copy Markdown
Contributor

@JanuszL JanuszL commented May 18, 2026

Limit CPU video decoder codec support

Restrict the CPU frames decoder to codecs supported by the currently
compiled libavcodec configuration. H264 and HEVC are no longer
advertised for the CPU variant while VP8, VP9, and MJPEG remain
enabled.

Make ReadRegularFrame mark end-of-stream by setting next_frame_idx_
to -1 when the index reaches NumFrames(), mirroring the existing
guard in ReadFlushFrame. Without this, codecs with no decoder latency
(VP9 on the new test inputs) deliver the final frame via the regular
path, leaving next_frame_idx_ at NumFrames() and causing
VideoInput depletion to be reported one batch late.

Reset the decoder when an indexed next frame falls outside the valid
range, avoiding reuse of an invalid decoder position.

Update video decoder tests to expect CPU failures for unsupported
codecs instead of skipping only MPEG4. Use VP9 CFR/VFR test inputs
and device-less CPU pipelines where appropriate. Point the CFR/VFR
reference frame folders at `frames_{1,2}_vp9/` so CPU decode of the
new VP9 fixtures matches at the existing eps=10 tolerance. Drop the
CPU HEVC frames-decoder tests (`ConstantFrameRateHevc`,
`VariableFrameRateHevc`, `VariableFrameRateHevcNoIndex`) — HEVC is
no longer in the CPU codec allow-list.

Tolerate up to 16 isolated subpixel deviations exceeding eps in
TestVideo::CompareFrame (out of ~2.7M subpixels per frame). The CPU
VP9 decode path occasionally produces a single byte that differs by
~32 — a SIMD glitch inside libavcodec/sws_scale that Valgrind cannot
instrument. The budget is orders of magnitude below what any genuine
regression would produce, so test sensitivity is preserved.

In dali/test/python/input/test_video.py, filter out h264 from the
round-robin fixture (the unsuffixed test_{1,2}.mp4 in cfr//vfr/
are h264) and restrict test_video_input_audio_stream to the mixed
backend — the only DALI_extra video with an audio stream is h264.

Category:

Bug fix (non-breaking change which fixes an issue)

Description:

Restricts the CPU video frames decoder to the codecs supported by the
currently compiled libavcodec configuration. H264 and HEVC are no longer
advertised for the CPU variant, while VP8, VP9, and MJPEG remain enabled.

ReadRegularFrame now mirrors ReadFlushFrame and signals end-of-stream
by setting next_frame_idx_ to -1 once NumFrames() is reached, so
codecs with no decoder latency report depletion immediately instead of
one batch late.

Resets the decoder when an indexed next frame falls outside the valid
range, avoiding reuse of an invalid decoder position.

The video decoder tests now expect CPU failures for unsupported codecs
instead of skipping only MPEG4. The affected CFR/VFR test inputs are
switched to VP9 variants, and CPU pipelines use device_id=None where
appropriate. The CFR/VFR reference frame folders are repointed at the
new VP9-derived frames_{1,2}_vp9/ so CPU decode matches at the
existing eps=10 tolerance. CPU HEVC frames-decoder tests are removed.

TestVideo::CompareFrame now tolerates up to 16 isolated subpixel
deviations exceeding eps per frame (out of ~2.7M). The CPU VP9 decode
path occasionally produces a single byte that differs by ~32 — a SIMD
glitch inside libavcodec/sws_scale that Valgrind cannot instrument.
The budget is orders of magnitude below any genuine regression, so
test sensitivity is preserved.

dali/test/python/input/test_video.py filters out h264 from the
round-robin fixture and restricts test_video_input_audio_stream to
the mixed backend — the only DALI_extra video with an audio stream is
h264, which CPU can no longer decode.

Additional information:

Affected modules and functionalities:

  • CPU video frames decoder: codec allow-list and ReadRegularFrame
    end-of-stream signalling.
  • Video frames decoder seek path (reset when seek target is out of
    range).
  • Video decoder gtests: reference frame paths, dropped CPU HEVC cases,
    relaxed CompareFrame tolerance.
  • dali/test/python/input/test_video.py: h264 fixture filter and
    audio-stream test backend restriction.
  • DALI deps and DALI_extra version pins.

Key points relevant for the review:

  • DALI_DEPS_VERSION and DALI_EXTRA_VERSION are temporary ToDo
    placeholders until the corresponding dali_deps and dali_extra
    repository changes merge.
  • CPU unsupported-codec behavior is now tested as an expected failure
    instead of being skipped for a subset of codecs.
  • The ReadRegularFrame EOS guard is required for VideoInput
    depletion to fire on the right batch with VP9 inputs (h264 hid the
    off-by-one through its decoder latency, which routed the tail
    through ReadFlushFrame).
  • The 16-subpixel tolerance in CompareFrame is a flake mitigation,
    not a tolerance loosening: a real codec/colorspace bug would touch
    thousands of subpixels.

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Not run locally.

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

@JanuszL
Copy link
Copy Markdown
Contributor Author

JanuszL commented May 18, 2026

NVIDIA/DALI_extra#135 & NVIDIA/DALI_deps#162 are related to this change.

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [51666287]: BUILD STARTED

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 18, 2026

Greptile Summary

This PR restricts the CPU video frames decoder to VP8, VP9, and MJPEG (dropping H264 and HEVC), fixes an off-by-one EOS signalling bug in ReadRegularFrame that caused depletion to fire one batch late with zero-latency codecs like VP9, and resets the decoder when a seek target falls outside the valid index range. All test fixtures that previously used H264 test_{1,2}.mp4 are switched to VP9 equivalents, HEVC CPU tests are removed, and unsupported-codec CPU paths now assert expected failures instead of skipping.

  • FramesDecoderCpu::ReadRegularFrame now mirrors ReadFlushFrame by setting next_frame_idx_ = -1 at NumFrames(), and ReadNextFrame pre-populates the index at frame 0 so that NumFrames() is not called lazily on every subsequent frame.
  • FramesDecoderBase::SeekFrame resets the decoder when HasIndex() && next_frame_idx_ >= NumFrames(), preventing reuse of an invalid demuxer position after the stream is exhausted.
  • TestVideo::CompareFrame accumulates bad-subpixel counts per thread and tolerates up to 16 isolated subpixel deviations to suppress a SIMD glitch in sws_scale without masking real regressions.

Confidence Score: 5/5

The codec allow-list change, EOS fix, and seek-reset fix are all well-scoped and internally consistent; the test migration to VP9 fixtures is thorough.

The core decoder changes are narrowly targeted: the EOS guard in ReadRegularFrame mirrors existing logic in ReadFlushFrame, the NumFrames pre-population in ReadNextFrame prevents lazy packet-stream exhaustion, and the SeekFrame reset is guarded by HasIndex(). The only outstanding placeholder is the ToDo version pins, which the author acknowledges as intentional pending dependent repo merges. No new correctness issues were found beyond what was already discussed in prior review rounds.

DALI_DEPS_VERSION and DALI_EXTRA_VERSION still contain 'ToDo' — CI dependency fetching will not work until the real SHAs are filled in.

Important Files Changed

Filename Overview
dali/operators/video/frames_decoder_cpu.cc Core bug fixes: EOS signalling in ReadRegularFrame and ReadFlushFrame, codec allow-list narrowed to VP8/VP9/MJPEG, NumFrames() pre-population guard added. Minor comment typo.
dali/operators/video/frames_decoder_base.cc SeekFrame now resets the decoder when next_frame_idx_ exceeds NumFrames() with a valid index, preventing reuse of a stale position.
dali/operators/video/video_test.cc CompareFrame changed to count per-thread bad subpixels; tolerates up to 16 isolated deviations. CFR/VFR reference paths updated to VP9 variants.
dali/test/python/decoder/test_video.py Converts unsupported-codec CPU handling from SkipTest to expected-failure assertions; adds device_id=None for CPU pipelines; filters h264 files from fixture lists.
dali/test/python/input/test_video.py Adds module-level filter for h264 test_1.mp4/test_2.mp4; switches audio-stream test to sintel_trailer_vp9.mp4.
qa/TL0_videoreader_test/test.sh CI setup script correctly copies only the vp9/ subdirectory to /tmp and switches sintel reference to sintel_trailer_vp9.mp4.
DALI_DEPS_VERSION Set to literal 'ToDo' placeholder; acknowledged as intentional by the author pending dependent PRs.

Fix All in Claude Code

Reviews (19): Last reviewed commit: "Limit CPU video decoder codec support" | Re-trigger Greptile

Comment thread dali/operators/video/frames_decoder_cpu.cc Outdated
Comment thread DALI_DEPS_VERSION
@@ -1 +1 @@
b270f29e9d7655512e7e8eaf055cca4d19b55f55
ToDo
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 [Bug] Version pin files set to literal ToDo placeholder.

Both DALI_DEPS_VERSION and DALI_EXTRA_VERSION now contain the string ToDo instead of a commit SHA. Any CI job that reads these files to fetch the matching dali_deps / dali_extra artefacts will either fail outright or pick up an incorrect/stale revision, breaking reproducibility for the entire build. These files should be updated to the real commit SHAs before this PR merges (or the dependent PRs should land first).

Fix in Claude Code

Comment thread dali/test/python/decoder/test_video.py Outdated
Comment thread dali/test/python/decoder/test_video.py
@JanuszL JanuszL force-pushed the remove_cpu_codecs branch 2 times, most recently from 13fd2c1 to 8375cb2 Compare May 18, 2026 22:02
@JanuszL
Copy link
Copy Markdown
Contributor Author

JanuszL commented May 18, 2026

@greptile review

@JanuszL JanuszL force-pushed the remove_cpu_codecs branch from 8375cb2 to 9389ee9 Compare May 18, 2026 22:18
@JanuszL
Copy link
Copy Markdown
Contributor Author

JanuszL commented May 18, 2026

@greptile review

@JanuszL JanuszL force-pushed the remove_cpu_codecs branch from 9389ee9 to d364075 Compare May 18, 2026 22:27
@JanuszL
Copy link
Copy Markdown
Contributor Author

JanuszL commented May 18, 2026

@greptile review

Comment on lines 160 to 165
++next_frame_idx_;
if (next_frame_idx_ >= NumFrames()) {
next_frame_idx_ = -1;
LOG_LINE << "Next frame index out of bounds (regular), setting to -1" << std::endl;
}
return true;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 The new EOS guard calls NumFrames() unconditionally on every decoded frame, which can invoke ParseNumFrames() when no index is built and nb_frames is zero in the container. ParseNumFrames() reads all remaining demuxer packets to completion, so the very first frame's increment will exhaust the packet stream and cause all subsequent av_read_frame calls to return EOF — silently dropping every frame after the first. The existing guard in ReadFlushFrame has the same limitation (documented with a TODO) but that function runs only after the demuxer is already exhausted. The fix is to guard the check with HasIndex(), mirroring the SeekFrame condition added in this same PR.

Suggested change
++next_frame_idx_;
if (next_frame_idx_ >= NumFrames()) {
next_frame_idx_ = -1;
LOG_LINE << "Next frame index out of bounds (regular), setting to -1" << std::endl;
}
return true;
++next_frame_idx_;
// TODO(awolant): Figure out how to handle this during index building
// Or when NumFrames is unavailable
if (HasIndex() && next_frame_idx_ >= NumFrames()) {
next_frame_idx_ = -1;
LOG_LINE << "Next frame index out of bounds (regular), setting to -1" << std::endl;
}
return true;

Fix in Claude Code

Comment thread dali/test/python/decoder/test_video.py Outdated
Comment on lines 711 to 712
batch_size = 3
pipe = test_pipeline(batch_size=batch_size, num_threads=3, device_id=0)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 test_multichannel_fill_value hard-codes device_id=0 even though the test body uses fn.experimental.decoders.video which is a CPU/mixed operator; on a device-less CI machine this will fail at pipeline construction. Other tests in this PR were correctly updated to derive device_id from device, so this one was apparently missed.

Suggested change
batch_size = 3
pipe = test_pipeline(batch_size=batch_size, num_threads=3, device_id=0)
batch_size = 3
device_id = None if device == "cpu" else 0
pipe = test_pipeline(batch_size=batch_size, num_threads=3, device_id=device_id)

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [51735694]: BUILD STARTED

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [51735694]: BUILD FAILED

@JanuszL JanuszL force-pushed the remove_cpu_codecs branch 2 times, most recently from a8f64cc to fba22e3 Compare May 19, 2026 05:11
@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [51770028]: BUILD STARTED

Comment on lines 311 to 322
# test overflow of frame_buffer_
filenames.append(f"{get_dali_extra_path()}/db/video/cfr_test.mp4")
filenames = filter(lambda filename: "mpeg4" not in filename, filenames)
filenames = filter(lambda filename: "hevc" not in filename, filenames)
filenames = filter(lambda filename: "av1" not in filename, filenames)
if device == "cpu":
# some formats are not yet supported in the CPU operator itself
filenames = filter(lambda filename: "mpeg4" not in filename, filenames)
filenames = filter(
lambda filename: "test_1.mp4" not in filename and "test_2.mp4" not in filename,
filenames,
)
filenames = cycle(filenames)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 cfr_test.mp4 is H264 and is not filtered for CPU.

The DALI_extra README.rst shows cfr_test.mp4 is generated with -c:v libx264, so it's H264. This file is appended to filenames before the CPU-conditional filters run. The CPU block filters mpeg4, test_1.mp4, and test_2.mp4, but cfr_test.mp4 slips through. When the CPU decoder encounters it, SelectVideoStream issues a DALI_WARN and returns false, causing a RuntimeError that fails the test.

The fix is to guard the filenames.append call (or add "cfr_test" not in filename to the CPU filter) so that the H264-specific overflow fixture is only used when the device can decode H264.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@JanuszL JanuszL force-pushed the remove_cpu_codecs branch from fba22e3 to c87a527 Compare May 19, 2026 05:51
@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [51774937]: BUILD STARTED

Comment thread dali/test/python/decoder/test_video.py Outdated
Comment on lines +312 to +313
if device == "gpu":
filenames.append(f"{get_dali_extra_path()}/db/video/cfr_test.mp4")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 [Bug] Dead condition — cfr_test.mp4 is never appended for the "mixed" device.

test_multi_gpu_video is decorated with @params("cpu", "mixed"), so device is never "gpu". The frame-buffer overflow fixture (cfr_test.mp4) is silently skipped for the "mixed" case, which was the very path the file was added to stress. The condition should be device == "mixed" (or device != "cpu").

Suggested change
if device == "gpu":
filenames.append(f"{get_dali_extra_path()}/db/video/cfr_test.mp4")
if device == "mixed":
filenames.append(f"{get_dali_extra_path()}/db/video/cfr_test.mp4")

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@JanuszL JanuszL force-pushed the remove_cpu_codecs branch from c87a527 to 4e1a23c Compare May 19, 2026 06:14
@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [51777473]: BUILD STARTED

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [51777473]: BUILD FAILED

@JanuszL JanuszL force-pushed the remove_cpu_codecs branch from 4e1a23c to fd7b970 Compare May 20, 2026 07:42
@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [51925234]: BUILD FAILED

@JanuszL JanuszL force-pushed the remove_cpu_codecs branch from fd7b970 to dfa6ff4 Compare May 20, 2026 07:49
@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [51925957]: BUILD STARTED

Comment thread qa/TL0_videoreader_test/test.sh Outdated
Comment on lines +24 to +27
find video_resolution/vp9 -type f -name '*vp9*' -print0 | while IFS= read -r -d '' file; do
mkdir -p "${DEST_ROOT}/$(dirname "$file")"
cp "$file" "${DEST_ROOT}/$file"
done
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 [Bug] -name '*vp9*' filter will silently skip all multi-resolution VP9 files.

Based on the current DALI_extra naming convention, the video-resolution files are named by resolution (3840_2160.mp4, 1920_1080.mp4, 334_240.mp4) with no codec suffix. The new VP9 variants placed under video_resolution/vp9/ are almost certainly named the same way. Because none of those filenames contain vp9, the find -name '*vp9*' predicate matches nothing, the while loop body never executes, and /tmp/video_resolution/vp9/ is created but left empty. Every MultipleVideoResolution test that reads from that directory will then fail with "no video files found". Drop the -name predicate — the video_resolution/vp9 path already scopes the search to VP9 content.

Suggested change
find video_resolution/vp9 -type f -name '*vp9*' -print0 | while IFS= read -r -d '' file; do
mkdir -p "${DEST_ROOT}/$(dirname "$file")"
cp "$file" "${DEST_ROOT}/$file"
done
find video_resolution/vp9 -type f -print0 | while IFS= read -r -d '' file; do

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@JanuszL JanuszL force-pushed the remove_cpu_codecs branch from dfa6ff4 to 416da1d Compare May 20, 2026 08:51
@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [51931118]: BUILD STARTED

@JanuszL JanuszL force-pushed the remove_cpu_codecs branch from 416da1d to 8812da4 Compare May 20, 2026 19:34
@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [51994441]: BUILD STARTED

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [51994441]: BUILD FAILED

@JanuszL JanuszL force-pushed the remove_cpu_codecs branch from 8812da4 to 218a982 Compare May 21, 2026 07:22
@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [52057797]: BUILD STARTED

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [52057797]: BUILD FAILED

@JanuszL JanuszL force-pushed the remove_cpu_codecs branch 2 times, most recently from d53209a to ce13c1e Compare May 22, 2026 04:56
@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [52186630]: BUILD STARTED

@JanuszL JanuszL force-pushed the remove_cpu_codecs branch from ce13c1e to 22563c3 Compare May 22, 2026 05:08
@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [52188114]: BUILD STARTED

Comment thread dali/test/python/decoder/test_video.py Outdated
}

unsupported_cpu_codec_error = r"is not supported by the CPU variant of this operator\."
unsupported_cpu_codecs = {"h264", "hevc", "mpeg4"}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Maintainability] This Python-side set duplicates the implicit allow-list in frames_decoder_cpu.cc:218-227 (everything not in {VP8, VP9, MJPEG}). When the C++ array changes, this set has to track it manually with nothing to enforce the link. Consider either:

  • A short comment here pointing to the C++ definition (# Must match the inverse of frames_decoder_cpu.cc SelectVideoStream's codec table), or
  • Defining the supported codecs once and deriving unsupported_cpu_codecs (the latter is harder across C++/Python).

Lowest-cost option is the cross-reference comment.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the point. It list the codecs which are not supported as in C++ definition. I don't think we can track this easily besides adding a comment that this comes from the CPU decoder implementation.

Comment thread dali/test/python/decoder/test_video.py Outdated
# some formats are not yet supported in the CPU operator itself
filenames = filter(lambda filename: "mpeg4" not in filename, filenames)
filenames = filter(
lambda filename: "test_1.mp4" not in filename and "test_2.mp4" not in filename,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Robustness] The substring filter "test_1.mp4" not in filename and "test_2.mp4" not in filename is fragile — any future file containing test_1.mp4 anywhere in its path (e.g. prefix_test_1.mp4, or a directory named test_1.mp4_backup) would be unintentionally dropped. Prefer an exact basename check:

excluded = {"test_1.mp4", "test_2.mp4"}
filenames = filter(lambda f: os.path.basename(f) not in excluded, filenames)

Same applies to the equivalent block at lines 356-358.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


ret = avcodec_send_packet(codec_ctx_, nullptr);
DALI_ENFORCE(ret >= 0,
DALI_ENFORCE(ret >= 0 || ret == AVERROR_EOF,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nit/Wording] Now that AVERROR_EOF is accepted as a non-error, the message "Failed to send packet to decoder" is misleading for ret == AVERROR_EOF — except, of course, that branch no longer triggers the ENFORCE. Still, if the ENFORCE does fire, the message reads better as "avcodec_send_packet (flush) failed: ..." so it's clear which call site it is.

More importantly: dropping into flush_state_ = true on AVERROR_EOF is the right thing, but worth a 1-line comment here explaining why EOF is acceptable (the decoder has already drained — no more packets to send) so this doesn't look like accidental error-swallowing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment

// Tolerate a tiny number of isolated subpixel deviations exceeding eps. The CPU VP9
// decode path occasionally produces a single byte that differs by ~32 (suspected SIMD
// glitch in libavcodec/sws_scale that Valgrind cannot instrument). A genuine regression
// produces orders of magnitude more bad subpixels, so this budget still catches real
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Question/Design] Tolerating 16 deviating subpixels per frame to mask a suspected libavcodec/sws_scale SIMD glitch is pragmatic, but it does mean a regression that introduces ≤16 wrong subpixels per frame will be silently absorbed. Given the suspected upstream cause, is there any plan to file an upstream bug and/or pin a known-good FFmpeg version, or to scope this tolerance only to the VP9 paths it actually affects (e.g. an extra arg to CompareFrame rather than a global cap)?

Not blocking — the rationale in the comment is clear and the 16/2.7M ratio is tiny — just want to make sure we don't lose the thread on the underlying glitch.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The threshold is set empirically based on the failures in tests observed. I don't know if there is a good FFmpeg we can pin. I still believe that 16 off pixels is still something we can tolerate, if something is seriously off the number if bigger.

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [52188114]: BUILD FAILED


CopyToOutput(data);
++next_frame_idx_;
if (next_frame_idx_ >= NumFrames()) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Bug — concrete failure mode for greptile's NumFrames P1] Tracing this end-to-end on a container with no nb_frames (typical for VP9 in WebM, possible elsewhere):

  1. ReadRegularFrame decodes frame 0; ++next_frame_idx_ → 1.
  2. NumFrames() sees num_frames_ < 0 and nb_frames == 0 → calls ParseNumFrames().
  3. ParseNumFrames reads packets from the current demux position to EOF (off-by-one: it counts only the remaining frames, not the one already consumed), then calls Reset().
  4. Reset() (frames_decoder_base.cc:472) reopens the file and sets next_frame_idx_ = 0.
  5. NumFrames() returns the (off-by-one) N.
  6. 0 >= N → false → don't set -1 → return true with frame-0 data.
  7. Next ReadRegularFrame re-reads frame 0 (decoder was silently rewound).

User-visible: frame_0 is delivered twice, total frame count is wrong, and VideoInput's NextFrameIdx() != -1 loop (input/video_input.h:323) overruns or wraps unpredictably. Even when num_frames_ is correctly populated from container metadata, this code path is one cache-miss away from being hit.

Mirror the HasIndex() && guard from SeekFrame (frames_decoder_base.cc:512) here and on line 195 in ReadFlushFrame, or compute NumFrames() exactly once up front.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added early initialization of NumFrames. We cannot call HasIndex as sometimes we don't have index, still we want to know the number of frames available.

}

if (next_frame_idx_ < 0) {
if (next_frame_idx_ < 0 || (HasIndex() && next_frame_idx_ >= NumFrames())) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Consistency] This is the right shape for the check — guarded by HasIndex(). But the new EOS check added in the same PR (frames_decoder_cpu.cc:161) is not guarded the same way, and neither is the pre-existing one at frames_decoder_cpu.cc:195. Asymmetric guards in the same change are suspicious — either this guard is unnecessary here (in which case those callsites are correct), or it's necessary in all three (in which case the cpu.cc callsites need fixing). The data-corruption trace in the comment on frames_decoder_cpu.cc:161 argues for the latter.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frames_decoder_cpu.cc:161 should not have negative next_frame_idx_ as in we progressing forward only. dali/operators/video/frames_decoder_base.cc has next_frame_idx_ externally controlled so it could be negative as well here.
Same for frames_decoder_cpu.cc:195



@params("cpu", "gpu")
@params("gpu")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Coverage] Dropping CPU from @params here is a real regression in test surface, not just a fixture-availability artifact. The test uses /db/video/full_dynamic_range/video.mp4 (H.264), which the new CPU codec policy rejects — so the only fix is a VP9 version of that fixture in DALI_extra. Worth filing a follow-up so this CPU path doesn't quietly stay uncovered, or extending the PR #135 set with video_vp9.mp4 here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code path that detect full range videos applies only to h264/h265, and for VP9 we would need to implement if separately. I will add this to our ToDo list.

Comment thread dali/test/python/input/test_video.py Outdated
@params(*device_values)
# The only available test video with an audio stream (sintel) is h264, which is not supported by
# the CPU variant of the operator. Run this test only on the mixed (GPU) backend.
@params("mixed")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Coverage / Question] Same shape as the test_full_range_video_in_memory coverage loss — CPU dropped because the only audio-stream fixture is H.264. However: NVIDIA/DALI_extra#135 adds sintel_trailer_vp9.mp4. Does that file retain the audio stream from the original sintel? If yes, this test can stay parameterized over both devices by switching the input to the VP9 variant. If no, worth re-encoding to preserve audio so we don't lose this CPU-side audio-stream regression check permanently.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment thread dali/test/python/decoder/test_video.py Outdated
}

unsupported_cpu_codec_error = r"is not supported by the CPU variant of this operator\."
unsupported_cpu_codecs = {"h264", "hevc", "mpeg4"}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Drift risk — separate from the maintainability comment on the original review] This set claims to enumerate "unsupported on CPU" codecs but is missing AV1, which is also unsupported on the CPU path (no decoder in libavcodec build + commented out in frames_decoder_cpu.cc:223). It works today only because both test_decoder_operator_codec_support and test_reader_operator_codec_support early-skip AV1 with raise SkipTest("...Ampere+ GPUs..."). If/when AV1 becomes generally available and that skip is removed, the codec-rejection branch will be silently bypassed and the test will attempt a real decode on CPU. Add "av1" here defensively, or guard the skip with if codec == "av1" and not has_ampere(): raise SkipTest(...).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@JanuszL JanuszL force-pushed the remove_cpu_codecs branch 2 times, most recently from c200f32 to 63ae75a Compare May 22, 2026 12:09
@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [52225500]: BUILD STARTED

Comment on lines 261 to 263
@params(*device_values)
def test_video_input_audio_stream(device):
"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Comment contradicts decorator — and audio-stream coverage may be lost.

The block comment says "Run this test only on the mixed (GPU) backend", but the decorator @params(*device_values) expands to @params("cpu", "mixed"), so the CPU case still runs. The file was switched from sintel_trailer-720p.mp4 (h264 + audio) to sintel_trailer_vp9.mp4: if the VP9 re-encode preserved the audio track, CPU can handle it and the comment is simply stale. If the audio track was dropped during re-encoding, neither device is actually testing the "video decoding when audio stream is present" scenario described in the docstring — the test would pass vacuously. Please confirm whether sintel_trailer_vp9.mp4 retains an audio stream, and update the comment (or restrict the decorator to @params("mixed")) accordingly.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@JanuszL JanuszL force-pushed the remove_cpu_codecs branch from 63ae75a to 8428bfe Compare May 22, 2026 12:19
@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [52226023]: BUILD STARTED

Restrict the CPU frames decoder to codecs supported by the currently
compiled libavcodec configuration. H264 and HEVC are no longer
advertised for the CPU variant while VP8, VP9, and MJPEG remain
enabled.

Make `ReadRegularFrame` mark end-of-stream by setting `next_frame_idx_`
to -1 when the index reaches `NumFrames()`, mirroring the existing
guard in `ReadFlushFrame`. Without this, codecs with no decoder latency
(VP9 on the new test inputs) deliver the final frame via the regular
path, leaving `next_frame_idx_` at `NumFrames()` and causing
`VideoInput` depletion to be reported one batch late.

Reset the decoder when an indexed next frame falls outside the valid
range, avoiding reuse of an invalid decoder position.

Update video decoder tests to expect CPU failures for unsupported
codecs instead of skipping only MPEG4. Use VP9 CFR/VFR test inputs
and device-less CPU pipelines where appropriate. Point the CFR/VFR
reference frame folders at `frames_{1,2}_vp9/` so CPU decode of the
new VP9 fixtures matches at the existing eps=10 tolerance. Drop the
CPU HEVC frames-decoder tests (`ConstantFrameRateHevc`,
`VariableFrameRateHevc`, `VariableFrameRateHevcNoIndex`) — HEVC is no
longer in the CPU codec allow-list.

Tolerate up to 16 isolated subpixel deviations exceeding eps in
`TestVideo::CompareFrame` (out of ~2.7M subpixels per frame). The CPU
VP9 decode path occasionally produces a single byte that differs by
~32 — a SIMD glitch inside libavcodec/sws_scale that Valgrind cannot
instrument. The budget is orders of magnitude below what any genuine
regression would produce, so test sensitivity is preserved.

In `dali/test/python/input/test_video.py`, filter out h264 from the
round-robin fixture (the unsuffixed `test_{1,2}.mp4` in `cfr/`/`vfr/`
are h264) and restrict `test_video_input_audio_stream` to the mixed
backend — the only DALI_extra video with an audio stream is h264.

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
@JanuszL JanuszL force-pushed the remove_cpu_codecs branch from 8428bfe to 5c1c613 Compare May 22, 2026 13:20
@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [52231011]: BUILD STARTED

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [52231011]: BUILD PASSED

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants