merge ffmpeg master by mraulet77 · Pull Request #4 · tbiat/FFmpeg

mraulet77 · 2021-10-13T11:37:52Z

No description provided.

(This also stops zero-allocating er_temp_buffer for H.264, reverting back to the behavior from before commit 0a1dc81.) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

Ensures samples where a missing Frame Header is handled by a subsequent Redundant one are parsed correctly. Signed-off-by: James Almer <jamrial@gmail.com>

The code was evidently designed at one point in time to support "direct" execution (not via a thread pool) for num_threads == 1, but this was never implemented. As a side benefit, reduces context creation overhead in single threaded mode (relevant e.g. inside the libswscale self test), due to not needing to spawn and destroy several thousand worker threads. Co-authored-by: Ramiro Polla <ramiro.polla@gmail.com> Signed-off-by: Niklas Haas <git@haasn.dev>

This was meant to accumulate int64_t timestamp values. Fixes: b8daba4

This (arbitrarily) returns -1, which happens to be AVERROR(EPERM) on my machine. Return the more descriptive AVERORR(EIO) instead. Also add a log message to explain what's going on.

The retry path restores this offset, but the failure path does not. This is especially important for the case of the continuation handler in http_read_stream(), which may result in subsequent loop iterations (after repeated failures to read additional data) seeking to the wrong offset.

Move this closer to the corresponding `goto`. From the PoV of the control flow, these placements are completely identical.

If http_seek_internal() gives us an unexpected position, we should close the connection to avoid leaking reading incorrect bytes on subsequent reads.

This was calling atoi() on `p + offset`, which is nonsense (p may point to the start of the cache-control string, which does not necessarilly coincide with the location of the max-age value). Based on the code, the intent was clearly to parse the value *after* the matched substring.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

Add checkasm coverage for rgb24tobgr24 with test widths exercising all code tiers (scalar, 8-pixel NEON, 16-pixel NEON). Signed-off-by: David Christle <dev@christle.is>

Add a NEON rgb24tobgr24 using ld3/st3 to swap R and B channels in packed 24bpp RGB buffers. Handles all input sizes with a 16-pixel NEON fast path, 8-pixel NEON cleanup, and scalar tail. checkasm --bench on Apple M3 Max (1920*3 = 5760 bytes): rgb24tobgr24_c: 722.0 ( 1.00x) rgb24tobgr24_neon: 94.9 ( 7.61x) Signed-off-by: David Christle <dev@christle.is>

Add checkasm coverage for rgb32tobgr24 (alpha drop) and rgb24tobgr32 (alpha insert) with test widths exercising all code tiers and overwrite detection via sentinel bytes. Signed-off-by: David Christle <dev@christle.is>

Add NEON alpha drop/insert using ldp+tbl+stp instead of ld4/st3 and ld3/st4 structure operations. Both use a 2-register sliding-window tbl with post-indexed addressing. Instruction scheduling targets narrow in-order cores (A55) while remaining neutral on wide OoO. Scalar tails use coalesced loads/stores (ldr+strh+lsr+strb for alpha drop, ldrh+ldrb+orr+str for alpha insert) to reduce per-pixel instruction count. Independent instructions placed between loads and dependent operations to fill load-use latency on in-order cores. checkasm --bench on Apple M3 Max (decicycles, 1920px): rgb32tobgr24_c: 114.4 ( 1.00x) rgb32tobgr24_neon: 64.3 ( 1.78x) rgb24tobgr32_c: 128.9 ( 1.00x) rgb24tobgr32_neon: 80.9 ( 1.59x) C baseline is clang auto-vectorized; speedup is over compiler NEON. Signed-off-by: David Christle <dev@christle.is>

Use PRIu32/PRIX32 format specifiers instead of %d/%u/%X for uint32_t variables in av_log calls. On some platforms (e.g. NuttX), uint32_t is typedef'd as unsigned long rather than unsigned int, which triggers -Wformat warnings despite both types being 4 bytes. Using PRI macros is the portable way to match the actual underlying type of uint32_t. Signed-off-by: zengshuang <zengshuang@xiaomi.com>

By excluding the Vulkan makefile entirely when --disable-unstable is passed. This also correctly avoids compiling e.g. unused GLSL compilers. Fixes: #22295 See-Also: #22366 Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>

This is a bit more forward-facing than a bare allocation, and importantly, allows the `swscale/utils.c` code to remain agnostic about how to correctly uninit this struct. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>

Non-local includes before local includes. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>

Apple M2: put_luma_hv_10_4x4_c: 36.3 ( 1.00x) put_luma_hv_10_8x8_c: 82.9 ( 1.00x) put_luma_hv_10_8x8_neon: 34.9 ( 2.37x) put_luma_hv_10_16x16_c: 239.2 ( 1.00x) put_luma_hv_10_16x16_neon: 119.0 ( 2.01x) put_luma_hv_10_32x32_c: 900.3 ( 1.00x) put_luma_hv_10_32x32_neon: 429.3 ( 2.10x) put_luma_hv_10_64x64_c: 2984.7 ( 1.00x) put_luma_hv_10_64x64_neon: 1736.2 ( 1.72x) put_luma_hv_10_128x128_c: 11194.2 ( 1.00x) put_luma_hv_10_128x128_neon: 6357.3 ( 1.76x) put_luma_hv_12_4x4_c: 35.9 ( 1.00x) put_luma_hv_12_8x8_c: 82.6 ( 1.00x) put_luma_hv_12_8x8_neon: 34.3 ( 2.41x) put_luma_hv_12_16x16_c: 240.2 ( 1.00x) put_luma_hv_12_16x16_neon: 115.3 ( 2.08x) put_luma_hv_12_32x32_c: 787.7 ( 1.00x) put_luma_hv_12_32x32_neon: 414.2 ( 1.90x) put_luma_hv_12_64x64_c: 3058.4 ( 1.00x) put_luma_hv_12_64x64_neon: 1592.3 ( 1.92x) put_luma_hv_12_128x128_c: 11350.8 ( 1.00x) put_luma_hv_12_128x128_neon: 6378.3 ( 1.78x) RPi4: put_luma_hv_10_4x4_c: 637.8 ( 1.00x) put_luma_hv_10_8x8_c: 1044.9 ( 1.00x) put_luma_hv_10_8x8_neon: 483.7 ( 2.16x) put_luma_hv_10_16x16_c: 3098.0 ( 1.00x) put_luma_hv_10_16x16_neon: 1603.1 ( 1.93x) put_luma_hv_10_32x32_c: 10054.8 ( 1.00x) put_luma_hv_10_32x32_neon: 5843.6 ( 1.72x) put_luma_hv_10_64x64_c: 40506.2 ( 1.00x) put_luma_hv_10_64x64_neon: 24384.0 ( 1.66x) put_luma_hv_10_128x128_c: 130604.2 ( 1.00x) put_luma_hv_10_128x128_neon: 99746.6 ( 1.31x) put_luma_hv_12_4x4_c: 638.2 ( 1.00x) put_luma_hv_12_8x8_c: 1074.6 ( 1.00x) put_luma_hv_12_8x8_neon: 482.6 ( 2.23x) put_luma_hv_12_16x16_c: 3094.0 ( 1.00x) put_luma_hv_12_16x16_neon: 1602.5 ( 1.93x) put_luma_hv_12_32x32_c: 10034.4 ( 1.00x) put_luma_hv_12_32x32_neon: 5843.3 ( 1.72x) put_luma_hv_12_64x64_c: 40447.5 ( 1.00x) put_luma_hv_12_64x64_neon: 24377.2 ( 1.66x) put_luma_hv_12_128x128_c: 130610.4 ( 1.00x) put_luma_hv_12_128x128_neon: 99765.8 ( 1.31x)

The deblocking filter is enabled by default. This behavior is the same as priv->deblock == 1. Signed-off-by: Tong Wu <wutong1208@outlook.com>

Clarify the behavior of seek keyboard shortcuts in both the documentation and command-line help text. Specifically: - left/right: mention custom interval option support - page down/up: improve wording for chapter seeking fallback

Newer revisions of WinSDK 10.0.26100.0 have exposed more flags for IsProcessorFeaturePresent; now there is a separate one for detecting specifically I8MM and not just SVE-I8MM. Switch to using this flag instead.

…8-d16

Apple M4: vvc_alf_filter_luma_8x8_8_c: 347.3 ( 1.00x) vvc_alf_filter_luma_8x8_8_neon: 138.7 ( 2.50x) vvc_alf_filter_luma_8x8_8_sme2: 134.5 ( 2.58x) vvc_alf_filter_luma_8x8_10_c: 299.8 ( 1.00x) vvc_alf_filter_luma_8x8_10_neon: 129.8 ( 2.31x) vvc_alf_filter_luma_8x8_10_sme2: 128.6 ( 2.33x) vvc_alf_filter_luma_8x8_12_c: 293.0 ( 1.00x) vvc_alf_filter_luma_8x8_12_neon: 126.8 ( 2.31x) vvc_alf_filter_luma_8x8_12_sme2: 126.3 ( 2.32x) vvc_alf_filter_luma_16x16_8_c: 1386.1 ( 1.00x) vvc_alf_filter_luma_16x16_8_neon: 560.3 ( 2.47x) vvc_alf_filter_luma_16x16_8_sme2: 540.1 ( 2.57x) vvc_alf_filter_luma_16x16_10_c: 1200.3 ( 1.00x) vvc_alf_filter_luma_16x16_10_neon: 515.6 ( 2.33x) vvc_alf_filter_luma_16x16_10_sme2: 531.3 ( 2.26x) vvc_alf_filter_luma_16x16_12_c: 1223.8 ( 1.00x) vvc_alf_filter_luma_16x16_12_neon: 510.7 ( 2.40x) vvc_alf_filter_luma_16x16_12_sme2: 524.9 ( 2.33x) vvc_alf_filter_luma_32x32_8_c: 5488.8 ( 1.00x) vvc_alf_filter_luma_32x32_8_neon: 2233.4 ( 2.46x) vvc_alf_filter_luma_32x32_8_sme2: 1093.6 ( 5.02x) vvc_alf_filter_luma_32x32_10_c: 4738.0 ( 1.00x) vvc_alf_filter_luma_32x32_10_neon: 2057.5 ( 2.30x) vvc_alf_filter_luma_32x32_10_sme2: 1053.6 ( 4.50x) vvc_alf_filter_luma_32x32_12_c: 4808.3 ( 1.00x) vvc_alf_filter_luma_32x32_12_neon: 1981.2 ( 2.43x) vvc_alf_filter_luma_32x32_12_sme2: 1047.7 ( 4.59x) vvc_alf_filter_luma_64x64_8_c: 22116.8 ( 1.00x) vvc_alf_filter_luma_64x64_8_neon: 8951.0 ( 2.47x) vvc_alf_filter_luma_64x64_8_sme2: 4225.2 ( 5.23x) vvc_alf_filter_luma_64x64_10_c: 19072.8 ( 1.00x) vvc_alf_filter_luma_64x64_10_neon: 8448.1 ( 2.26x) vvc_alf_filter_luma_64x64_10_sme2: 4225.8 ( 4.51x) vvc_alf_filter_luma_64x64_12_c: 19312.6 ( 1.00x) vvc_alf_filter_luma_64x64_12_neon: 8270.9 ( 2.34x) vvc_alf_filter_luma_64x64_12_sme2: 4245.4 ( 4.55x) vvc_alf_filter_luma_128x128_8_c: 88530.5 ( 1.00x) vvc_alf_filter_luma_128x128_8_neon: 35686.3 ( 2.48x) vvc_alf_filter_luma_128x128_8_sme2: 16961.2 ( 5.22x) vvc_alf_filter_luma_128x128_10_c: 76904.9 ( 1.00x) vvc_alf_filter_luma_128x128_10_neon: 32439.5 ( 2.37x) vvc_alf_filter_luma_128x128_10_sme2: 16845.6 ( 4.57x) vvc_alf_filter_luma_128x128_12_c: 77363.3 ( 1.00x) vvc_alf_filter_luma_128x128_12_neon: 32907.5 ( 2.35x) vvc_alf_filter_luma_128x128_12_sme2: 17018.1 ( 4.55x)

…ly_float Accept up to 15 ULP difference. This fixes running "checkasm --test=ac3dsp <seed>" for the seeds 2043066705, 24168 and 111972 on ARM, and the seeds 40552 and 209754 on aarch64. This is the same change as 8e4c904, increasing the tolerance further. With this change, checkasm passes for over 500 000 seeds on both ARM and aarch64.

Benchmark Results (1024 iterations, Raspberry Pi 5 - Cortex-A76): add_int16_128_c: 914.0 ( 1.00x) add_int16_128_neon: 516.9 ( 1.77x) add_int16_rnd_width_c: 914.0 ( 1.00x) add_int16_rnd_width_neon: 517.5 ( 1.77x) Co-Authored-By: Martin Storsjö <martin@martin.st>

TimothyGu force-pushed the master branch 9 times, most recently from 0914e3a to 580fb6a Compare May 4, 2022 19:01

TimothyGu force-pushed the master branch 14 times, most recently from b5aa514 to 30e2bb0 Compare May 12, 2022 16:31

TimothyGu force-pushed the master branch 7 times, most recently from dd99d34 to b8ede4d Compare May 19, 2022 09:00

mkver and others added 30 commits March 3, 2026 13:07

avcodec/h264dec,mpeg_er: Move allocating er buffers to ff_er_init()

23a58c6

(This also stops zero-allocating er_temp_buffer for H.264, reverting back to the behavior from before commit 0a1dc81.) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

avcodec/av1_parser: also decompose Redundant Frame Headers

264283b

Ensures samples where a missing Frame Header is handled by a subsequent Redundant one are parsed correctly. Signed-off-by: James Almer <jamrial@gmail.com>

libswscale/utils: prevent division by zero for chroma width 8

04fe984

avutil/cpu: add static CPU feature flag for AArch64 CRC32

22d484f

avformat/http: avoid int overflow

fb7558d

This was meant to accumulate int64_t timestamp values. Fixes: b8daba4

avformat/http: fix http_connect() offset mismatch error code

f5ddf1c

This (arbitrarily) returns -1, which happens to be AVERROR(EPERM) on my machine. Return the more descriptive AVERORR(EIO) instead. Also add a log message to explain what's going on.

avformat/http: move retry label (cosmetic)

fcc1a03

Move this closer to the corresponding `goto`. From the PoV of the control flow, these placements are completely identical.

avformat/http: close stale connection on wrong seek

7a348f6

If http_seek_internal() gives us an unexpected position, we should close the connection to avoid leaking reading incorrect bytes on subsequent reads.

avformat/mov: check for duplicate stsd before changing state

8d3b044

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

tests/checkasm: add rgb24tobgr24 test

78f6cec

Add checkasm coverage for rgb24tobgr24 with test widths exercising all code tiers (scalar, 8-pixel NEON, 16-pixel NEON). Signed-off-by: David Christle <dev@christle.is>

tests/checkasm: add rgb32tobgr24 and rgb24tobgr32 tests

86a6238

Add checkasm coverage for rgb32tobgr24 (alpha drop) and rgb24tobgr32 (alpha insert) with test widths exercising all code tiers and overwrite detection via sentinel bytes. Signed-off-by: David Christle <dev@christle.is>

swscale/vulkan: fix include order (cosmetic)

cef2fbf

Non-local includes before local includes. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>

avcodec/d3d12va_encode_h264: simplify deblock default option

5b8a4a0

The deblocking filter is enabled by default. This behavior is the same as priv->deblock == 1. Signed-off-by: Tong Wu <wutong1208@outlook.com>

fftools/ffplay: improve keyboard shortcut documentation

e686f53

Clarify the behavior of seek keyboard shortcuts in both the documentation and command-line help text. Specifically: - left/right: mention custom interval option support - page down/up: improve wording for chapter seeking fallback

aarch64: Switch to a more correct Windows flag for detecting I8MM

a0d2370

Newer revisions of WinSDK 10.0.26100.0 have exposed more flags for IsProcessorFeaturePresent; now there is a separate one for detecting specifically I8MM and not just SVE-I8MM. Switch to using this flag instead.

configure: add detection of assembler support for SME2

70691bb

configure: add detection of SME-I16I64 extension

905348d

aarch64/asm.S: to support SME added macro to save/restore registers d…

0edd75e

…8-d16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge ffmpeg master#4

merge ffmpeg master#4
mraulet77 wants to merge 10000 commits intotbiat:masterfrom
FFmpeg:master

mraulet77 commented Oct 13, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

mraulet77 commented Oct 13, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants