Open
Conversation
0914e3a to
580fb6a
Compare
b5aa514 to
30e2bb0
Compare
dd99d34 to
b8ede4d
Compare
(This also stops zero-allocating er_temp_buffer for H.264, reverting back to the behavior from before commit 0a1dc81.) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Ensures samples where a missing Frame Header is handled by a subsequent Redundant one are parsed correctly. Signed-off-by: James Almer <jamrial@gmail.com>
The code was evidently designed at one point in time to support "direct" execution (not via a thread pool) for num_threads == 1, but this was never implemented. As a side benefit, reduces context creation overhead in single threaded mode (relevant e.g. inside the libswscale self test), due to not needing to spawn and destroy several thousand worker threads. Co-authored-by: Ramiro Polla <ramiro.polla@gmail.com> Signed-off-by: Niklas Haas <git@haasn.dev>
This was meant to accumulate int64_t timestamp values. Fixes: b8daba4
This (arbitrarily) returns -1, which happens to be AVERROR(EPERM) on my machine. Return the more descriptive AVERORR(EIO) instead. Also add a log message to explain what's going on.
The retry path restores this offset, but the failure path does not. This is especially important for the case of the continuation handler in http_read_stream(), which may result in subsequent loop iterations (after repeated failures to read additional data) seeking to the wrong offset.
Move this closer to the corresponding `goto`. From the PoV of the control flow, these placements are completely identical.
If http_seek_internal() gives us an unexpected position, we should close the connection to avoid leaking reading incorrect bytes on subsequent reads.
This was calling atoi() on `p + offset`, which is nonsense (p may point to the start of the cache-control string, which does not necessarilly coincide with the location of the max-age value). Based on the code, the intent was clearly to parse the value *after* the matched substring.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Add checkasm coverage for rgb24tobgr24 with test widths exercising all code tiers (scalar, 8-pixel NEON, 16-pixel NEON). Signed-off-by: David Christle <dev@christle.is>
Add a NEON rgb24tobgr24 using ld3/st3 to swap R and B channels in packed 24bpp RGB buffers. Handles all input sizes with a 16-pixel NEON fast path, 8-pixel NEON cleanup, and scalar tail. checkasm --bench on Apple M3 Max (1920*3 = 5760 bytes): rgb24tobgr24_c: 722.0 ( 1.00x) rgb24tobgr24_neon: 94.9 ( 7.61x) Signed-off-by: David Christle <dev@christle.is>
Add checkasm coverage for rgb32tobgr24 (alpha drop) and rgb24tobgr32 (alpha insert) with test widths exercising all code tiers and overwrite detection via sentinel bytes. Signed-off-by: David Christle <dev@christle.is>
Add NEON alpha drop/insert using ldp+tbl+stp instead of ld4/st3 and ld3/st4 structure operations. Both use a 2-register sliding-window tbl with post-indexed addressing. Instruction scheduling targets narrow in-order cores (A55) while remaining neutral on wide OoO. Scalar tails use coalesced loads/stores (ldr+strh+lsr+strb for alpha drop, ldrh+ldrb+orr+str for alpha insert) to reduce per-pixel instruction count. Independent instructions placed between loads and dependent operations to fill load-use latency on in-order cores. checkasm --bench on Apple M3 Max (decicycles, 1920px): rgb32tobgr24_c: 114.4 ( 1.00x) rgb32tobgr24_neon: 64.3 ( 1.78x) rgb24tobgr32_c: 128.9 ( 1.00x) rgb24tobgr32_neon: 80.9 ( 1.59x) C baseline is clang auto-vectorized; speedup is over compiler NEON. Signed-off-by: David Christle <dev@christle.is>
Use PRIu32/PRIX32 format specifiers instead of %d/%u/%X for uint32_t variables in av_log calls. On some platforms (e.g. NuttX), uint32_t is typedef'd as unsigned long rather than unsigned int, which triggers -Wformat warnings despite both types being 4 bytes. Using PRI macros is the portable way to match the actual underlying type of uint32_t. Signed-off-by: zengshuang <zengshuang@xiaomi.com>
By excluding the Vulkan makefile entirely when --disable-unstable is passed. This also correctly avoids compiling e.g. unused GLSL compilers. Fixes: #22295 See-Also: #22366 Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>
This is a bit more forward-facing than a bare allocation, and importantly, allows the `swscale/utils.c` code to remain agnostic about how to correctly uninit this struct. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>
Non-local includes before local includes. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>
Apple M2: put_luma_hv_10_4x4_c: 36.3 ( 1.00x) put_luma_hv_10_8x8_c: 82.9 ( 1.00x) put_luma_hv_10_8x8_neon: 34.9 ( 2.37x) put_luma_hv_10_16x16_c: 239.2 ( 1.00x) put_luma_hv_10_16x16_neon: 119.0 ( 2.01x) put_luma_hv_10_32x32_c: 900.3 ( 1.00x) put_luma_hv_10_32x32_neon: 429.3 ( 2.10x) put_luma_hv_10_64x64_c: 2984.7 ( 1.00x) put_luma_hv_10_64x64_neon: 1736.2 ( 1.72x) put_luma_hv_10_128x128_c: 11194.2 ( 1.00x) put_luma_hv_10_128x128_neon: 6357.3 ( 1.76x) put_luma_hv_12_4x4_c: 35.9 ( 1.00x) put_luma_hv_12_8x8_c: 82.6 ( 1.00x) put_luma_hv_12_8x8_neon: 34.3 ( 2.41x) put_luma_hv_12_16x16_c: 240.2 ( 1.00x) put_luma_hv_12_16x16_neon: 115.3 ( 2.08x) put_luma_hv_12_32x32_c: 787.7 ( 1.00x) put_luma_hv_12_32x32_neon: 414.2 ( 1.90x) put_luma_hv_12_64x64_c: 3058.4 ( 1.00x) put_luma_hv_12_64x64_neon: 1592.3 ( 1.92x) put_luma_hv_12_128x128_c: 11350.8 ( 1.00x) put_luma_hv_12_128x128_neon: 6378.3 ( 1.78x) RPi4: put_luma_hv_10_4x4_c: 637.8 ( 1.00x) put_luma_hv_10_8x8_c: 1044.9 ( 1.00x) put_luma_hv_10_8x8_neon: 483.7 ( 2.16x) put_luma_hv_10_16x16_c: 3098.0 ( 1.00x) put_luma_hv_10_16x16_neon: 1603.1 ( 1.93x) put_luma_hv_10_32x32_c: 10054.8 ( 1.00x) put_luma_hv_10_32x32_neon: 5843.6 ( 1.72x) put_luma_hv_10_64x64_c: 40506.2 ( 1.00x) put_luma_hv_10_64x64_neon: 24384.0 ( 1.66x) put_luma_hv_10_128x128_c: 130604.2 ( 1.00x) put_luma_hv_10_128x128_neon: 99746.6 ( 1.31x) put_luma_hv_12_4x4_c: 638.2 ( 1.00x) put_luma_hv_12_8x8_c: 1074.6 ( 1.00x) put_luma_hv_12_8x8_neon: 482.6 ( 2.23x) put_luma_hv_12_16x16_c: 3094.0 ( 1.00x) put_luma_hv_12_16x16_neon: 1602.5 ( 1.93x) put_luma_hv_12_32x32_c: 10034.4 ( 1.00x) put_luma_hv_12_32x32_neon: 5843.3 ( 1.72x) put_luma_hv_12_64x64_c: 40447.5 ( 1.00x) put_luma_hv_12_64x64_neon: 24377.2 ( 1.66x) put_luma_hv_12_128x128_c: 130610.4 ( 1.00x) put_luma_hv_12_128x128_neon: 99765.8 ( 1.31x)
The deblocking filter is enabled by default. This behavior is the same as priv->deblock == 1. Signed-off-by: Tong Wu <wutong1208@outlook.com>
Clarify the behavior of seek keyboard shortcuts in both the documentation and command-line help text. Specifically: - left/right: mention custom interval option support - page down/up: improve wording for chapter seeking fallback
Newer revisions of WinSDK 10.0.26100.0 have exposed more flags for IsProcessorFeaturePresent; now there is a separate one for detecting specifically I8MM and not just SVE-I8MM. Switch to using this flag instead.
Apple M4: vvc_alf_filter_luma_8x8_8_c: 347.3 ( 1.00x) vvc_alf_filter_luma_8x8_8_neon: 138.7 ( 2.50x) vvc_alf_filter_luma_8x8_8_sme2: 134.5 ( 2.58x) vvc_alf_filter_luma_8x8_10_c: 299.8 ( 1.00x) vvc_alf_filter_luma_8x8_10_neon: 129.8 ( 2.31x) vvc_alf_filter_luma_8x8_10_sme2: 128.6 ( 2.33x) vvc_alf_filter_luma_8x8_12_c: 293.0 ( 1.00x) vvc_alf_filter_luma_8x8_12_neon: 126.8 ( 2.31x) vvc_alf_filter_luma_8x8_12_sme2: 126.3 ( 2.32x) vvc_alf_filter_luma_16x16_8_c: 1386.1 ( 1.00x) vvc_alf_filter_luma_16x16_8_neon: 560.3 ( 2.47x) vvc_alf_filter_luma_16x16_8_sme2: 540.1 ( 2.57x) vvc_alf_filter_luma_16x16_10_c: 1200.3 ( 1.00x) vvc_alf_filter_luma_16x16_10_neon: 515.6 ( 2.33x) vvc_alf_filter_luma_16x16_10_sme2: 531.3 ( 2.26x) vvc_alf_filter_luma_16x16_12_c: 1223.8 ( 1.00x) vvc_alf_filter_luma_16x16_12_neon: 510.7 ( 2.40x) vvc_alf_filter_luma_16x16_12_sme2: 524.9 ( 2.33x) vvc_alf_filter_luma_32x32_8_c: 5488.8 ( 1.00x) vvc_alf_filter_luma_32x32_8_neon: 2233.4 ( 2.46x) vvc_alf_filter_luma_32x32_8_sme2: 1093.6 ( 5.02x) vvc_alf_filter_luma_32x32_10_c: 4738.0 ( 1.00x) vvc_alf_filter_luma_32x32_10_neon: 2057.5 ( 2.30x) vvc_alf_filter_luma_32x32_10_sme2: 1053.6 ( 4.50x) vvc_alf_filter_luma_32x32_12_c: 4808.3 ( 1.00x) vvc_alf_filter_luma_32x32_12_neon: 1981.2 ( 2.43x) vvc_alf_filter_luma_32x32_12_sme2: 1047.7 ( 4.59x) vvc_alf_filter_luma_64x64_8_c: 22116.8 ( 1.00x) vvc_alf_filter_luma_64x64_8_neon: 8951.0 ( 2.47x) vvc_alf_filter_luma_64x64_8_sme2: 4225.2 ( 5.23x) vvc_alf_filter_luma_64x64_10_c: 19072.8 ( 1.00x) vvc_alf_filter_luma_64x64_10_neon: 8448.1 ( 2.26x) vvc_alf_filter_luma_64x64_10_sme2: 4225.8 ( 4.51x) vvc_alf_filter_luma_64x64_12_c: 19312.6 ( 1.00x) vvc_alf_filter_luma_64x64_12_neon: 8270.9 ( 2.34x) vvc_alf_filter_luma_64x64_12_sme2: 4245.4 ( 4.55x) vvc_alf_filter_luma_128x128_8_c: 88530.5 ( 1.00x) vvc_alf_filter_luma_128x128_8_neon: 35686.3 ( 2.48x) vvc_alf_filter_luma_128x128_8_sme2: 16961.2 ( 5.22x) vvc_alf_filter_luma_128x128_10_c: 76904.9 ( 1.00x) vvc_alf_filter_luma_128x128_10_neon: 32439.5 ( 2.37x) vvc_alf_filter_luma_128x128_10_sme2: 16845.6 ( 4.57x) vvc_alf_filter_luma_128x128_12_c: 77363.3 ( 1.00x) vvc_alf_filter_luma_128x128_12_neon: 32907.5 ( 2.35x) vvc_alf_filter_luma_128x128_12_sme2: 17018.1 ( 4.55x)
…ly_float Accept up to 15 ULP difference. This fixes running "checkasm --test=ac3dsp <seed>" for the seeds 2043066705, 24168 and 111972 on ARM, and the seeds 40552 and 209754 on aarch64. This is the same change as 8e4c904, increasing the tolerance further. With this change, checkasm passes for over 500 000 seeds on both ARM and aarch64.
Benchmark Results (1024 iterations, Raspberry Pi 5 - Cortex-A76): add_int16_128_c: 914.0 ( 1.00x) add_int16_128_neon: 516.9 ( 1.77x) add_int16_rnd_width_c: 914.0 ( 1.00x) add_int16_rnd_width_neon: 517.5 ( 1.77x) Co-Authored-By: Martin Storsjö <martin@martin.st>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.