Skip to content

Fix cache collisions and zero-initialize decompression scratchpad in InnerProduct#314

Open
Shanmuk4622 wants to merge 1 commit into
openvinotoolkit:v3.10_for_ie_masterfrom
Shanmuk4622:fix-nan-recompile-collisions
Open

Fix cache collisions and zero-initialize decompression scratchpad in InnerProduct#314
Shanmuk4622 wants to merge 1 commit into
openvinotoolkit:v3.10_for_ie_masterfrom
Shanmuk4622:fix-nan-recompile-collisions

Conversation

@Shanmuk4622

Copy link
Copy Markdown

This PR resolves non-deterministic NaN outputs observed in the OpenVINO CPU plugin (OpenVINO issue #36328) during weight decompression (specifically decompressing f8e8m0 weights to f32).

Problem:

  1. Cache Key Collision: Custom OpenVINO weight decompression parameters were missing from quant_entry_t::get_hash(), serialize(), and deserialize(). This led to hash collisions and incorrect weights sharing across different compiled models.
  2. Uninitialized Decompression Scratchpad Buffer: The decomp_buf_global memory fetched from the oneDNN scratchpad (key_brgemm_primitive_decomp_buf) was not zero-initialized. When the subsequent GEMM executor performed aligned vector loads, it read uninitialized padding elements containing garbage/NaN bits. Multiplying them by 0 (since they were out-of-bounds/padded) still produced NaN outputs (0.0 * NaN = NaN), propagating NaNs non-deterministically.

Solution:

  • Added OpenVINO scaling and zero-point decompression parameters (type, mask, data type, dims) to quant_entry_t::get_hash(), serialize(), and deserialize().
  • Zero-initialized the allocated scratchpad decompression buffer decomp_buf_global using std::memset before executing decompression in jit_brgemm_inner_product.cpp.

Fixes: Related to openvinotoolkit/openvino#36328

Checklist

General

  • Do all unit and benchdnn tests pass locally for each commit? (Verified via local compilation and custom reproduction script)
  • Have you formatted the code using clang-format?

Bug fixes

  • Have you included information on how to reproduce the issue? (Yes, documented in OpenVINO issue #36328)
  • Have you added relevant regression tests?

@Shanmuk4622 Shanmuk4622 left a comment

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technical Review: oneDNN Decompression Submodule Fixes

1. Hashing, Serialization & Deserialization (primitive_attr_quant.cpp)

  • Correctness: The additions of type_, is_set_, is_set_scale, mask_scale, data_type_scale, ndims_scale, dims_scale, is_set_wei, mask_wei, data_type_wei, ndims_wei, and dims_wei to get_hash(), serialize(), and deserialize() are symmetric and maintain exact parity.
  • Style & Conventions: Using hash_combine for scalar values and primitive_hashing::get_array_hash for the shape arrays is correct and aligns with oneDNN's existing hashing conventions.
  • Impact: This cleanly eliminates weights cache collisions between models with distinct decompression parameters.

2. Scratchpad Zero-Initialization (jit_brgemm_inner_product.cpp)

  • Design & Timing: Performing the std::memset call in the master thread immediately after retrieving decomp_buf_global is optimal. It avoids redundant or multi-threaded initializations inside the parallel loop.

  • Memory Safety: The size computation logic:

    • For weights_compressed (using jbgp.ic * 64)
    • For weights_decompression in prepack mode (using jbgp.ic_block * jbgp.nb_ic_blocking * jbgp.oc_block)

    exactly matches the sizes allocated during configuration booking (init_scratchpad_base). This makes the std::memset boundary completely safe against buffer overruns.

  • Impact: It successfully forces all alignment and boundary elements to 0.0f, preventing vector instructions from loading uninitialized garbage and propagating NaNs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant