Skip to content

merge ffmpeg master#4

Open
mraulet77 wants to merge 10000 commits intotbiat:masterfrom
FFmpeg:master
Open

merge ffmpeg master#4
mraulet77 wants to merge 10000 commits intotbiat:masterfrom
FFmpeg:master

Conversation

@mraulet77
Copy link
Collaborator

No description provided.

@TimothyGu TimothyGu force-pushed the master branch 9 times, most recently from 0914e3a to 580fb6a Compare May 4, 2022 19:01
@TimothyGu TimothyGu force-pushed the master branch 14 times, most recently from b5aa514 to 30e2bb0 Compare May 12, 2022 16:31
@TimothyGu TimothyGu force-pushed the master branch 7 times, most recently from dd99d34 to b8ede4d Compare May 19, 2022 09:00
mkver and others added 30 commits March 3, 2026 13:07
(This also stops zero-allocating er_temp_buffer for H.264,
reverting back to the behavior from before commit
0a1dc81.)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Ensures samples where a missing Frame Header is handled by a subsequent
Redundant one are parsed correctly.

Signed-off-by: James Almer <jamrial@gmail.com>
The code was evidently designed at one point in time to support "direct"
execution (not via a thread pool) for num_threads == 1, but this was never
implemented.

As a side benefit, reduces context creation overhead in single threaded
mode (relevant e.g. inside the libswscale self test), due to not needing to
spawn and destroy several thousand worker threads.

Co-authored-by: Ramiro Polla <ramiro.polla@gmail.com>
Signed-off-by: Niklas Haas <git@haasn.dev>
This was meant to accumulate int64_t timestamp values.

Fixes: b8daba4
This (arbitrarily) returns -1, which happens to be AVERROR(EPERM) on my
machine. Return the more descriptive AVERORR(EIO) instead.

Also add a log message to explain what's going on.
The retry path restores this offset, but the failure path does not. This
is especially important for the case of the continuation handler in
http_read_stream(), which may result in subsequent loop iterations (after
repeated failures to read additional data) seeking to the wrong offset.
Move this closer to the corresponding `goto`. From the PoV of the control
flow, these placements are completely identical.
If http_seek_internal() gives us an unexpected position, we should
close the connection to avoid leaking reading incorrect bytes on subsequent
reads.
This was calling atoi() on `p + offset`, which is nonsense (p may point to
the start of the cache-control string, which does not necessarilly coincide
with the location of the max-age value). Based on the code, the intent
was clearly to parse the value *after* the matched substring.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Add checkasm coverage for rgb24tobgr24 with test widths exercising
all code tiers (scalar, 8-pixel NEON, 16-pixel NEON).

Signed-off-by: David Christle <dev@christle.is>
Add a NEON rgb24tobgr24 using ld3/st3 to swap R and B channels in
packed 24bpp RGB buffers. Handles all input sizes with a 16-pixel
NEON fast path, 8-pixel NEON cleanup, and scalar tail.

checkasm --bench on Apple M3 Max (1920*3 = 5760 bytes):
  rgb24tobgr24_c:    722.0 ( 1.00x)
  rgb24tobgr24_neon:  94.9 ( 7.61x)

Signed-off-by: David Christle <dev@christle.is>
Add checkasm coverage for rgb32tobgr24 (alpha drop) and rgb24tobgr32
(alpha insert) with test widths exercising all code tiers and
overwrite detection via sentinel bytes.

Signed-off-by: David Christle <dev@christle.is>
Add NEON alpha drop/insert using ldp+tbl+stp instead of ld4/st3 and
ld3/st4 structure operations. Both use a 2-register sliding-window
tbl with post-indexed addressing. Instruction scheduling targets
narrow in-order cores (A55) while remaining neutral on wide OoO.

Scalar tails use coalesced loads/stores (ldr+strh+lsr+strb for alpha
drop, ldrh+ldrb+orr+str for alpha insert) to reduce per-pixel
instruction count. Independent instructions placed between loads and
dependent operations to fill load-use latency on in-order cores.

checkasm --bench on Apple M3 Max (decicycles, 1920px):
  rgb32tobgr24_c:    114.4 ( 1.00x)
  rgb32tobgr24_neon:  64.3 ( 1.78x)
  rgb24tobgr32_c:    128.9 ( 1.00x)
  rgb24tobgr32_neon:  80.9 ( 1.59x)

C baseline is clang auto-vectorized; speedup is over compiler NEON.

Signed-off-by: David Christle <dev@christle.is>
Use PRIu32/PRIX32 format specifiers instead of %d/%u/%X for uint32_t
variables in av_log calls. On some platforms (e.g. NuttX), uint32_t is
typedef'd as unsigned long rather than unsigned int, which triggers
-Wformat warnings despite both types being 4 bytes. Using PRI macros
is the portable way to match the actual underlying type of uint32_t.

Signed-off-by: zengshuang <zengshuang@xiaomi.com>
By excluding the Vulkan makefile entirely when --disable-unstable is passed.
This also correctly avoids compiling e.g. unused GLSL compilers.

Fixes: #22295
See-Also: #22366

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This is a bit more forward-facing than a bare allocation, and importantly,
allows the `swscale/utils.c` code to remain agnostic about how to correctly
uninit this struct.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Non-local includes before local includes.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Apple M2:
put_luma_hv_10_4x4_c:                                   36.3 ( 1.00x)
put_luma_hv_10_8x8_c:                                   82.9 ( 1.00x)
put_luma_hv_10_8x8_neon:                                34.9 ( 2.37x)
put_luma_hv_10_16x16_c:                                239.2 ( 1.00x)
put_luma_hv_10_16x16_neon:                             119.0 ( 2.01x)
put_luma_hv_10_32x32_c:                                900.3 ( 1.00x)
put_luma_hv_10_32x32_neon:                             429.3 ( 2.10x)
put_luma_hv_10_64x64_c:                               2984.7 ( 1.00x)
put_luma_hv_10_64x64_neon:                            1736.2 ( 1.72x)
put_luma_hv_10_128x128_c:                            11194.2 ( 1.00x)
put_luma_hv_10_128x128_neon:                          6357.3 ( 1.76x)
put_luma_hv_12_4x4_c:                                   35.9 ( 1.00x)
put_luma_hv_12_8x8_c:                                   82.6 ( 1.00x)
put_luma_hv_12_8x8_neon:                                34.3 ( 2.41x)
put_luma_hv_12_16x16_c:                                240.2 ( 1.00x)
put_luma_hv_12_16x16_neon:                             115.3 ( 2.08x)
put_luma_hv_12_32x32_c:                                787.7 ( 1.00x)
put_luma_hv_12_32x32_neon:                             414.2 ( 1.90x)
put_luma_hv_12_64x64_c:                               3058.4 ( 1.00x)
put_luma_hv_12_64x64_neon:                            1592.3 ( 1.92x)
put_luma_hv_12_128x128_c:                            11350.8 ( 1.00x)
put_luma_hv_12_128x128_neon:                          6378.3 ( 1.78x)

RPi4:
put_luma_hv_10_4x4_c:                                  637.8 ( 1.00x)
put_luma_hv_10_8x8_c:                                 1044.9 ( 1.00x)
put_luma_hv_10_8x8_neon:                               483.7 ( 2.16x)
put_luma_hv_10_16x16_c:                               3098.0 ( 1.00x)
put_luma_hv_10_16x16_neon:                            1603.1 ( 1.93x)
put_luma_hv_10_32x32_c:                              10054.8 ( 1.00x)
put_luma_hv_10_32x32_neon:                            5843.6 ( 1.72x)
put_luma_hv_10_64x64_c:                              40506.2 ( 1.00x)
put_luma_hv_10_64x64_neon:                           24384.0 ( 1.66x)
put_luma_hv_10_128x128_c:                           130604.2 ( 1.00x)
put_luma_hv_10_128x128_neon:                         99746.6 ( 1.31x)
put_luma_hv_12_4x4_c:                                  638.2 ( 1.00x)
put_luma_hv_12_8x8_c:                                 1074.6 ( 1.00x)
put_luma_hv_12_8x8_neon:                               482.6 ( 2.23x)
put_luma_hv_12_16x16_c:                               3094.0 ( 1.00x)
put_luma_hv_12_16x16_neon:                            1602.5 ( 1.93x)
put_luma_hv_12_32x32_c:                              10034.4 ( 1.00x)
put_luma_hv_12_32x32_neon:                            5843.3 ( 1.72x)
put_luma_hv_12_64x64_c:                              40447.5 ( 1.00x)
put_luma_hv_12_64x64_neon:                           24377.2 ( 1.66x)
put_luma_hv_12_128x128_c:                           130610.4 ( 1.00x)
put_luma_hv_12_128x128_neon:                         99765.8 ( 1.31x)
The deblocking filter is enabled by default. This behavior is the same
as priv->deblock == 1.

Signed-off-by: Tong Wu <wutong1208@outlook.com>
Clarify the behavior of seek keyboard shortcuts in both the
documentation and command-line help text. Specifically:
- left/right: mention custom interval option support
- page down/up: improve wording for chapter seeking fallback
Newer revisions of WinSDK 10.0.26100.0 have exposed more flags for
IsProcessorFeaturePresent; now there is a separate one for
detecting specifically I8MM and not just SVE-I8MM. Switch to using
this flag instead.
Apple M4:
vvc_alf_filter_luma_8x8_8_c:                           347.3 ( 1.00x)
vvc_alf_filter_luma_8x8_8_neon:                        138.7 ( 2.50x)
vvc_alf_filter_luma_8x8_8_sme2:                        134.5 ( 2.58x)
vvc_alf_filter_luma_8x8_10_c:                          299.8 ( 1.00x)
vvc_alf_filter_luma_8x8_10_neon:                       129.8 ( 2.31x)
vvc_alf_filter_luma_8x8_10_sme2:                       128.6 ( 2.33x)
vvc_alf_filter_luma_8x8_12_c:                          293.0 ( 1.00x)
vvc_alf_filter_luma_8x8_12_neon:                       126.8 ( 2.31x)
vvc_alf_filter_luma_8x8_12_sme2:                       126.3 ( 2.32x)
vvc_alf_filter_luma_16x16_8_c:                        1386.1 ( 1.00x)
vvc_alf_filter_luma_16x16_8_neon:                      560.3 ( 2.47x)
vvc_alf_filter_luma_16x16_8_sme2:                      540.1 ( 2.57x)
vvc_alf_filter_luma_16x16_10_c:                       1200.3 ( 1.00x)
vvc_alf_filter_luma_16x16_10_neon:                     515.6 ( 2.33x)
vvc_alf_filter_luma_16x16_10_sme2:                     531.3 ( 2.26x)
vvc_alf_filter_luma_16x16_12_c:                       1223.8 ( 1.00x)
vvc_alf_filter_luma_16x16_12_neon:                     510.7 ( 2.40x)
vvc_alf_filter_luma_16x16_12_sme2:                     524.9 ( 2.33x)
vvc_alf_filter_luma_32x32_8_c:                        5488.8 ( 1.00x)
vvc_alf_filter_luma_32x32_8_neon:                     2233.4 ( 2.46x)
vvc_alf_filter_luma_32x32_8_sme2:                     1093.6 ( 5.02x)
vvc_alf_filter_luma_32x32_10_c:                       4738.0 ( 1.00x)
vvc_alf_filter_luma_32x32_10_neon:                    2057.5 ( 2.30x)
vvc_alf_filter_luma_32x32_10_sme2:                    1053.6 ( 4.50x)
vvc_alf_filter_luma_32x32_12_c:                       4808.3 ( 1.00x)
vvc_alf_filter_luma_32x32_12_neon:                    1981.2 ( 2.43x)
vvc_alf_filter_luma_32x32_12_sme2:                    1047.7 ( 4.59x)
vvc_alf_filter_luma_64x64_8_c:                       22116.8 ( 1.00x)
vvc_alf_filter_luma_64x64_8_neon:                     8951.0 ( 2.47x)
vvc_alf_filter_luma_64x64_8_sme2:                     4225.2 ( 5.23x)
vvc_alf_filter_luma_64x64_10_c:                      19072.8 ( 1.00x)
vvc_alf_filter_luma_64x64_10_neon:                    8448.1 ( 2.26x)
vvc_alf_filter_luma_64x64_10_sme2:                    4225.8 ( 4.51x)
vvc_alf_filter_luma_64x64_12_c:                      19312.6 ( 1.00x)
vvc_alf_filter_luma_64x64_12_neon:                    8270.9 ( 2.34x)
vvc_alf_filter_luma_64x64_12_sme2:                    4245.4 ( 4.55x)
vvc_alf_filter_luma_128x128_8_c:                     88530.5 ( 1.00x)
vvc_alf_filter_luma_128x128_8_neon:                  35686.3 ( 2.48x)
vvc_alf_filter_luma_128x128_8_sme2:                  16961.2 ( 5.22x)
vvc_alf_filter_luma_128x128_10_c:                    76904.9 ( 1.00x)
vvc_alf_filter_luma_128x128_10_neon:                 32439.5 ( 2.37x)
vvc_alf_filter_luma_128x128_10_sme2:                 16845.6 ( 4.57x)
vvc_alf_filter_luma_128x128_12_c:                    77363.3 ( 1.00x)
vvc_alf_filter_luma_128x128_12_neon:                 32907.5 ( 2.35x)
vvc_alf_filter_luma_128x128_12_sme2:                 17018.1 ( 4.55x)
…ly_float

Accept up to 15 ULP difference.

This fixes running "checkasm --test=ac3dsp <seed>" for the seeds
2043066705, 24168 and 111972 on ARM, and the seeds 40552 and
209754 on aarch64.

This is the same change as 8e4c904,
increasing the tolerance further.

With this change, checkasm passes for over 500 000 seeds on both
ARM and aarch64.
Benchmark Results (1024 iterations, Raspberry Pi 5 - Cortex-A76):
add_int16_128_c:                                       914.0 ( 1.00x)
add_int16_128_neon:                                    516.9 ( 1.77x)
add_int16_rnd_width_c:                                 914.0 ( 1.00x)
add_int16_rnd_width_neon:                              517.5 ( 1.77x)

Co-Authored-By: Martin Storsjö <martin@martin.st>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.