Add BCn encoders/decoders with RDO support#1167
Conversation
Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
*Add encode/decode tests (that use both CompressBCn/DecodeBCn) *Add BCn ktx2 test files (transcoded from tests/resources/ktx2/color_grid_uastc_zstd_5.ktx2) *Cleanup BCn test fixtures *Remove `std::cout` statement Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
|
There are also a lot of compiler warnings from bc7enc_rdo dependency. These should be straightforward to address directly in copied files from bc7enc_rdo. |
*Add BC1, BC3, BC4, BC5, and BC7 encoding support to "ktx encode" command. *Cleanup ktxBCnParams and add BC1/BC3 quality and mode params. *Add docstrings/documentation to newly added enums/structs in ktx.h. Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
|
@walcht, Thank you for this. Can you view the build logs? One issue I notice immediately is that there are no changes to Regarding RDO and multi-threading, the UASTC encoder also has separate multi-threading options for the encoder and for the RDO step. You can follow the same model. If we need to fix warnings in the encoder I suggest forking the encoder then incorporating the fork here by way of Re. SIMD and ISPC, be careful how you support this. We need to support building and running on arm64 processors. Also compile flags to enable SSE or other SIMD options are not compatible with straightforward use of universal build tool chains, which is why this project does not do universal builds. Better is use of compiler pre-defined macros and run-time queries to discover what the software is being compiled for and running on. However since we aren't doing universal builds there is no need to obsess over this last detail. |
|
@MarkCallow - Concerning the Concerning the build/CI logs: they are mostly failing because of bc7enc_rdo compiler warnings which should be suppressed. They will also fail because I haven't re-generated the golden files yet for ktx CTS (e.g.,
Please do so (as far as I understood, this will be forked under the KhronosGroup and any updates here will be pushed there via the Concerning SIMD and ISPC: I think it makes sense to leave this for another PR, do you agree? (reasoning is this: I have to get the basics working properly, add proper testsuite that covers all supported BCn formats, etc. Once that is done, I can open another PR for SIMD performance improvements). Or I will leave this to very end (last in TODO list above). |
*Suppress bc7enc_rdo warnings like unused-variables, memset'ing a non-trivial class (in this case the class is obviously trivial hence a void* cast is used to suppress this warning). Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
*Some CIs report further unused variables/functions that are not reported when building locally. Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
|
Updates about CI jobs:
|
|
You will need to rebase to or merge current main to get the fix for the NSIS issue on Windows. I have created a fork of https://github.com/richgel999/bc7enc_rdo. To incorporate it in your add-BCn-decoder branch do the following in the repo root directory: You can make changes in this subdirectory, as you are now, and when everything is working I can push the changes to the fork.
I agree. I wanted to make you aware that arm64 is a build target. Re. reuse, you have to add an entry to REUSE.toml to get external/bc7enc_rdo ignored. It is better to do that than add SPDX comments to all the files. The entry in REUSE.toml will have to mention a license. Use the MIT license option. Here are a few high level points.
|
|
Re the macOS build failure, because the output from the Xcode build is so voluminous it is run through a script, xcpretty, to prettify it. On a past CI service, without this, the logs exceeded the maximum allowed. Recently, for reasons I have yet to investigate, it has started swallowing compile errors. You can turn it off by editing scripts/install_macos.sh and commenting out the line that installs |
*Add BCn encoder support for `ktx create` command. *Add missing tests for `ktx encode`, `ktx extract`, and `ktx create`. *Expose BC1/BC3 approximation mode option to ktx CLIs *Misc cleanups (still early-stage PR) Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
Agree. I wasn't initially aware of this. Apparently bc7f is significantly faster that the bc7 encoder used here (also the added benefit of being continuously maintained). I will integrate this right now since this seems to be straightforward (bc7enc_rdo also seems a bit not-longer-maintained so l think it's better to just integrate it now rather than waiting for it to be integrated into bc7enc_rdo repo). Concerning the decoding API: I added BCn decoders for VkUpload/GLUpload as a TODO (will follow same API as in etcunpack). Might also open a PR to add it for ASTC since I have already spent some time getting familiar with this code base.
Will add it in this PR. Will add it at the very end though since I have to finalize current formats.
I will try to address CI issues now (It is fine if I incrementally commit here to see if certain jobs pass?). |
Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
…Group/bc7enc_rdo.git external/bc7enc_rdo subrepo: subdir: "external/bc7enc_rdo" merged: "dbe416d2" upstream: origin: "https://github.com/KhronosGroup/bc7enc_rdo.git" branch: "changes_for_ktx" commit: "dbe416d2" git-subrepo: version: "0.4.9" origin: "https://github.com/ingydotnet/git-subrepo" commit: "5e0f401"
*Before this commit, bc7enc_rdo dependency was manually copied to external/bc7enc_rdo directory (only needed files were copied). This was not ideal for a lot of reasons (mainly that we are introducing changes that may be streamed back to the original repo and having a subrepo/submodule is better suited for that than manually copying dependency files). *Add bc7enc_rdo to REUSE.toml with MIT license. *Git ignore compile_commands.json file (used by clangd LSP) Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
|
The integrated basisu_transcoder is a bit outdated (actually, significantly) and doesn't contain |
|
I was not aware that bc7f is included in basisu_transcoder. By pure coincidence I have just completed integration of Basis Universal release 2.1.0. See the Neither the KTX-Software code nor our golden files required any updates for our extensive test suite to pass with BU 2.1.0. I am therefore amenable to merging it now but will have to discuss within the Khronos WG and can't make any promises. The single image decoders used by GLUpload/VkUpload should be exposed in the library API but must be independent of the ktxTexture* classes. Software reading a KTX file incrementally will find them useful. Regarding the current ETC decoder, please note that it does not have a recognized open source license so might not be suitable for your OIIO work. |
*Add initial RDO post processing step for BC1, BC3, and BC7 but without ultrasmooth blocks support see: https://richg42.blogspot.com/2021/02/updated-bc7encrdo-with-improved-smooth.html *Fix encoder in case input texture does not have multiple-of-4 dimensions. This was reading beyond std::vector size before this commit. *Add initial RDO params to BCnParams with verbose explanation/description comments. *Misc refactoring/adjustments. Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
|
@MarkCallow - please don't approve the CI workflow yet (it will fail because I haven't updated the golden test files yet).
Will add bc7f once I finish RDO post processing. |
|
I have just merged PR #1170 so you will now find bc7f in One other thing re. |
*Number of components was set to 4 which is incorrect. Changed to 3 which is what BC1 input (i.e., RGB) expects. Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
|
I have a question:
I will push some commits that finalize BCn encoder documentation and adds a lot of more tests that ensure:
I also added some option to CI scripts to print out compiler options flags. This is extremely important for us since I want to know if libktx + ktxtools are compiled used |
|
I also found out the issue why the runners fail with MT RDO (
This is why when omitting the Let me guess, your MacOS machine certainly has to have 12 physical threads (lscpu shows 12)? Otherwise this explanation is wrong. This is also incorrect in the original IMO, the work has to be distributed to RDO runners in a more clever way where:
I think this is also why UASTC RDO is producing non-deterministic output (minus some details about hashes and std::unordered_map, etc.). Might also investigate that. |
MarkCallow
left a comment
There was a problem hiding this comment.
I still have to review the changes under the tools directory. I want to submit these comments now because they become disassociated from their files when the files are updated. I have had to make the same comments 3 times in some cases while they have been pending.
Lots of nitpicks. Nothing major. Thanks for your perseverance through all this.
I don't think we are doing it with UASTC LDR. But it does make sense. Please add it for both.
I think it is only set for the basisu_encoder target. I didn't see those options in the review I've conducted so far. Who is "us" in this context? |
Fantastic. Congratulations.
Well lscpu is not available in macOS, but
This is almost certainly the reason why UASTC RDO too produces non-deterministic results. The std::unordered_map issue led to non-determinism in encoding across platforms due to differing implementations. I think the basisu code has been fixed to use its own map. Certainly my tests show deterministic results across platforms for all but one very edge case and any case using zlib supercompression. |
|
Thanks for the detailed comments/reviews. I will address this after I push the commit that fixes RDO multithreading.
Typo from my part. I meant "important for this PR" so that I know whether to watch out for UBs caused by pointer castings then dereferences. I will figure this out at the very end after I address all other issues/reviews.
If true then a command like this Also, another funny observation, if you do that (ktx create with different --threads) for typical-sized images (e.g., 1920x1080 and definitely not something extremely small like 16x16), on a typical system (e.g., 12 threads) you will probably get deterministic output. |
*Previously, RDO MT was non deterministic in the sense that a ktx create/encode with --threads 4 (or any >1) may not necessarily produce the same output as --threads 1. Not only that, the higher the --threads parameter and the lower the size of the image (in number of blocks) the more likely that the actually used RDO window size will be lower than the one that is provided! This commit addresses this by potentially using a lower-than-requested thread count to make sure that RDO MT produces exact same output as RDO ST. Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
*Why did I even add that in the first place... Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
|
There are still some reviews/comments I haven't addressed yet. Before doing the final (re-)review:
|
Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
MarkCallow
left a comment
There was a problem hiding this comment.
Thanks for the changes. These comments are from reviewing the changes. I have yet to review the tools changes.
Regarding clang_format, we haven't turned it on it. Before I do I want to create a smudge/clean filter that will apply the formatting on commit. (The smudge filter would be a nop.) I have not had the time. If you are already using clang_format in your workflow I'd love to hear how you do it.
| VkFormat decompressedVkFormat; | ||
| KTX_error_code result; | ||
|
|
||
| if (This->supercompressionScheme != KTX_SS_NONE) |
There was a problem hiding this comment.
No. You will have to load the data. Since there is no KTX_TEXTURE_CREATE_SKIP_INFLATE flag if the data has been loaded, it will have been inflated and the SS changed. If you see an SS other than NONE, you will need to check for a pending load. If available, do it. If not raise an error. Check the ASTC decoder for an example.
|
I forgot to push my last commit before your view. Will push it shortly and will request some reviews to some changes I made. |
If you want I can open a PR for that (+ build step clang-format formatting check).
I have it set up via Vim/NeoVim. On some custom key press I set, it just invokes the |
Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
That would be wonderful but let's wait until we get the BCn related stuff finished. There is already a workflow to run clang-format but the calls to it in other workflows are commented out. This is because several changes are needed in the formatting for some minor tweaks and to avoid breaking some files. We had a contractor who put in place these format-related items but the contract came to an end before the various issues could be addressed. I will have to find the closed PR where I listed the issues and refresh my memory.
Thanks for the this. I am a Vim user too. Is this a custom thing you did or is it a standard Vim extension? |
Agree. Either way, adding clang-format checks to CI should be fairly simple.
Only using built-in commands/cli-invocations. I'm mostly using NeoVim but this should also work on Vim: To format a selection of lines:
Or for formatting the whole file (this is what I usually do, then I revert hunks that I didn't change):
|
|
While adding actual robust testing to non-CTS tests (i.e., don't just check that decoding worked fine but compare the underlying data with original input within 0.08 tolerance), I found 2 more issues:
|
*Without this, ktxdiff fails with "Mismatching header" on RGB16 SFLOAT input. Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
*So that tests can compare decoded BCn KTX2 files with original files without failing because of different KVD. Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
*Previously these unorm PNG BCn inputs had sRGB colorspace and were converted via --assign-tf in ktx create calls. *Remove (i.e., set to 0) unused channels (e.g., for rg8_unorm_bc5, set B to 0 using oiiotool). See genktx2 for exact command. Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
*Add option to write failed ktxdiff BCn decode tests to temporary location (in build/Testing/Temporary/GeneratedFiles/). *Update CTS submodule to always use supercompression in tests when RDO is used. *Significantly improve BCn decoder and encoder testing via differential testing with UASTC -> BCn transcode files. Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
As discussed in #1159:
ktxTexture2_CompressBCnandktxTexture2_DecodeBCnare introduced in this PR to allow libktx users/consumers to encode/decode BCn textures from/to raw decompressed formats.https://github.com/richgel999/bc7enc_rdo does not support BC6HU/BC6HS encoding/decoding and also no BC2 (this format is essentially dead since BC3 replaces it).
Please feel free to give feedback, edit, and nitpick as much as possible.
Some context: I am adding KTX2 support to OIIO (PR: AcademySoftwareFoundation/OpenImageIO#5185) and having libktx encode/decode BCn formats significantly simplifies things (also ETC encoding/decoding which I can also open a PR for - if approved).
I haven't updated the KTX-Software-CTS with the added BCn test files.
Once this is finalized, this will fix #587.
Current TODOs:
[x] BC2 mode RDOBC2 is rarely used (I don't see any reason why to use it instead of BC3). Postponed or not planned to be implemented at all.BC6H mode RDO (HDR). I don't know if this is possible. If not, then I will not adjust the ert::reduce_entropy function for the moment.postponed (too much work/overload to include in this PR).ktx encodecommandktx extractcommandktx createcommandktxBCnParamsstruct (apparently there are many and I am not qualified to know which subset to expose or to expose them all). For the moment I will just expose them all.We agreed on using same parameters as UASTC RDO and adding additional ones if they make sense (I haven't benchmarked skip 0 MSE error option so I might remove it if it useless).
enable SIMD acceleration for BC7 encoder (using ISPC - see https://github.com/ispc/ispc) (this should be straightforward and should be enabled by default via a CMake flag; e.g., BC7_SIMD).=> since we are planning to use bc7f we will be planning to use bc7g once it is released (this is the SIMD equivalent of bc7f).LIBKTX_FEATURE_BCN_DECODERCMake flag optioncopied from MIT Licensed https://github.com/iOrange/bcdecfrom basisu)[x] add BC2 encoderpostponed/aborted (rarely used).Note1: no LLMs/AI coding tools were used in any capacity whatsoever in writing or aiding in the writing of this PR.
Note2: I am an individual contributor (main reason I am contributing here is to add support for KTX2 in Blender).
Edit: TODO list edits