Skip to content

lowmem: Unpack z lazily in verification#1025

Open
mkannwischer wants to merge 4 commits intomainfrom
verify-stream-z
Open

lowmem: Unpack z lazily in verification#1025
mkannwischer wants to merge 4 commits intomainfrom
verify-stream-z

Conversation

@mkannwischer
Copy link
Copy Markdown
Contributor

Introduce mld_zvec following the lazy polyvec pattern (eager / lazy
variants with #define dispatch on MLD_CONFIG_REDUCE_RAM):

  • mld_zvec_init: in eager mode unpacks the full polyvecl, performs the
    polyvecl-wide infinity-norm bound check, and NTTs in place. In lazy
    mode it just stores a pointer to the packed signature bytes.
  • mld_zvec_get_poly: in eager mode copies a single polynomial from
    the precomputed vector. In lazy mode unpacks one polynomial,
    performs the per-poly infinity-norm bound check, and NTTs into
    the caller-provided buffer.

The norm check thus moves out of mld_sign_verify_internal into the
zvec init / get_poly accessors, so the verify body no longer has to
sequence chknorm explicitly.

Add a fused matrix-vector helper
mld_polyvec_matrix_pointwise_montgomery_zvec used by verify:

  • The eager variant is a thin wrapper around the existing
    mld_polyvec_matrix_pointwise_montgomery_eager (z is already NTT'd
    by mld_zvec_init).
  • The lazy variant streams z via mld_zvec_get_poly_lazy and generates
    the matrix on-the-fly column-by-column,
    accumulating A[*,l] * z[l] into w.

In REDUCE_RAM mode this avoids holding the full unpacked polyvecl z
in memory at once, reducing verify allocation by 2-5 KiB per parameter
set.

Instead of allocating a full polyveck for h in attempt_signature_generation,
compute cs2, ct0, and hints one polynomial at a time using scratch polys.

This eliminates the polyveck h from the yh_u union, replacing
mld_pack_sig_c_h with incremental packing via mld_pack_sig_c and
mld_pack_sig_h_poly.
This is a prerequisite for also eliminating y in a follow up.

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
Replace the row-level matrix buffer (mld_polyvecl) with a single-poly
buffer in REDUCE_RAM mode. In the lazy path, matrix elements A[k][l]
are sampled on demand one at a time, and the matrix-vector product
accumulates element-by-element instead of row-by-row.

Restructure polymat into eager/lazy variants following the same pattern
as s1hat/s2hat/t0hat:
- mld_polymat_eager: stores full K x L matrix
- mld_polymat_lazy: stores rho + single poly_buffer + tmp
- mld_polyvec_matrix_expand_eager/_lazy: separate implementations
- mld_polyvec_matrix_pointwise_montgomery_eager/_lazy: separate
  implementations with CBMC contracts only on the eager variants

Move all polymat-related code from polyvec.h/polyvec.c into
polyvec_lazy.h/polyvec_lazy.c.

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Apr 7, 2026

CBMC Results (ML-DSA-44)

⚠️ Attention Required

Proof Status Current Previous Change
polyvecl_pointwise_acc_montgomery_c ⚠️ 332s 214s +55%
Full Results (182 proofs)
Proof Status Current Previous Change
**TOTAL** 2216s 2059s +7.6%
polyvecl_pointwise_acc_montgomery_c ⚠️ 332s 214s +55%
sign_verify_internal 268s 184s +46%
mld_attempt_signature_generation 216s 255s -15%
poly_pointwise_montgomery_c 164s 163s +1%
rej_uniform_native 147s 148s -1%
mld_invntt_layer 84s 88s -5%
mld_ct_memcmp 73s 72s +1%
mld_ntt_layer 56s 56s +0%
fqmul 22s 19s +16%
polymat_permute_bitrev_to_custom 22s 17s +29%
rej_uniform 22s 23s -4%
poly_chknorm_c 21s 21s +0%
sign_signature_internal 19s 19s +0%
polyeta_unpack 17s 16s +6%
polyvec_matrix_expand_eager 17s - new
poly_uniform_4x 15s 17s -12%
poly_uniform_eta_4x 15s 18s -17%
mld_compute_t0_t1_tr_from_sk_components 14s 13s +8%
poly_add 14s 11s +27%
rej_uniform_c 14s 15s -7%
keccakf1600x4_permute_native 13s 12s +8%
polyt0_unpack 13s 14s -7%
polyvec_matrix_expand_eager_serial 13s - new
mld_ntt_butterfly_block 12s 16s -25%
polyz_unpack_c 11s 12s -8%
sign_pk_from_sk 11s 7s +57%
mld_check_pct 10s 7s +43%
mld_polyvecl_permute_bitrev_to_custom_native 10s 8s +25%
keccak_absorb_once_x4 9s 9s +0%
keccakf1600_permute 9s 9s +0%
keccakf1600_permute_native 9s 10s -10%
polyvec_matrix_pointwise_montgomery_eager 9s - new
polyveck_use_hint 9s 8s +12%
polyvecl_chknorm 9s 6s +50%
poly_caddq_c 8s 4s +100%
polyveck_decompose 8s 5s +60%
unpack_sk 8s 7s +14%
keccak_absorb 7s 6s +17%
keccak_squeezeblocks_x4 7s 7s +0%
polyveck_sub 7s 4s +75%
sign 7s 6s +17%
sign_verify 7s 4s +75%
unpack_hints 7s 5s +40%
mld_compute_pack_z 6s 4s +50%
mld_h 6s 4s +50%
mld_sample_s1_s2 6s 4s +50%
pack_sig_h_poly 6s - new
poly_invntt_tomont_c 6s 7s -14%
poly_ntt 6s 2s +200%
polyveck_chknorm 6s 5s +20%
polyveck_power2round 6s 12s -50%
decompose 5s 4s +25%
mld_prepare_domain_separation_prefix 5s 3s +67%
poly_challenge 5s 6s -17%
poly_chknorm 5s 2s +150%
poly_chknorm_native 5s 5s +0%
poly_decompose 5s 3s +67%
poly_ntt_native 5s 3s +67%
poly_power2round 5s 5s +0%
poly_uniform_eta 5s 7s -29%
poly_use_hint_c 5s 6s -17%
polyt0_pack 5s 5s +0%
polyveck_pointwise_poly_montgomery 5s 4s +25%
polyw1_pack 5s 4s +25%
sign_open 5s 4s +25%
sign_verify_extmu 5s 3s +67%
caddq 4s 2s +100%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 4s 1s +300%
keccak_squeeze 4s 3s +33%
mld_ct_get_optblocker_u8 4s 1s +300%
mld_sample_s1_s2_serial 4s 6s -33%
montgomery_reduce 4s 3s +33%
ntt_native_aarch64 4s 5s -20%
poly_invntt_tomont 4s 4s +0%
poly_make_hint 4s 3s +33%
poly_ntt_c 4s 4s +0%
poly_pointwise_montgomery_native 4s 3s +33%
poly_reduce 4s 3s +33%
poly_sub 4s 4s +0%
poly_uniform 4s 4s +0%
poly_uniform_gamma1 4s 3s +33%
polyveck_add 4s 7s -43%
polyveck_caddq 4s 5s -20%
polyveck_invntt_tomont 4s 6s -33%
polyveck_ntt 4s 4s +0%
polyveck_shiftl 4s 7s -43%
polyvecl_ntt 4s 4s +0%
polyvecl_uniform_gamma1 4s 3s +33%
polyvecl_uniform_gamma1_serial 4s 4s +0%
polyz_unpack 4s 2s +100%
power2round 4s 4s +0%
shake128_absorb 4s 2s +100%
shake128x4_squeezeblocks 4s 1s +300%
sign_keypair_internal 4s 6s -33%
sign_signature 4s 6s -33%
sign_verify_pre_hash_internal 4s 5s -20%
intt_native_x86_64 3s 1s +200%
keccak_f1600_x1_native_aarch64_v84a 3s 3s +0%
keccak_f1600_x4_native_aarch64_v84a 3s 2s +50%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 3s 2s +50%
keccak_init 3s 2s +50%
keccakf1600_xor_bytes (big endian) 3s 2s +50%
make_hint 3s 2s +50%
mld_ct_cmask_neg_i32 3s 3s +0%
mld_ct_cmask_nonzero_u32 3s 2s +50%
mld_ct_cmask_nonzero_u8 3s 3s +0%
ntt_native_x86_64 3s 2s +50%
pack_pk 3s 2s +50%
pack_sig_c 3s - new
pack_sig_z 3s 3s +0%
pack_sk 3s 2s +50%
poly_caddq_native 3s 2s +50%
poly_chknorm_native_aarch64 3s 3s +0%
poly_invntt_tomont_native 3s 3s +0%
poly_uniform_gamma1_4x 3s 4s -25%
poly_use_hint 3s 3s +0%
polyveck_reduce 3s 4s -25%
polyveck_unpack_eta 3s 4s -25%
polyvecl_permute_bitrev_to_custom 3s 1s +200%
polyvecl_pointwise_acc_montgomery_native 3s 3s +0%
polyvecl_unpack_z 3s 2s +50%
polyz_pack 3s 3s +0%
reduce32 3s 2s +50%
rej_eta 3s 2s +50%
rej_eta_c 3s 4s -25%
rej_eta_native 3s 5s -40%
shake128_finalize 3s 3s +0%
shake128_init 3s 1s +200%
shake128_squeeze 3s 3s +0%
shake128x4_absorb_once 3s 1s +200%
shake256_finalize 3s 2s +50%
shake256_squeeze 3s 2s +50%
shake256x4_squeezeblocks 3s 3s +0%
sign_keypair 3s 4s -25%
sign_signature_extmu 3s 3s +0%
sign_signature_pre_hash_internal 3s 3s +0%
sign_signature_pre_hash_shake256 3s 6s -50%
sign_verify_pre_hash_shake256 3s 4s -25%
sys_check_capability 3s 3s +0%
fqscale 2s 3s -33%
keccak_finalize 2s 2s +0%
keccakf1600_extract_bytes (big endian) 2s 2s +0%
keccakf1600_xor_bytes 2s 1s +100%
keccakf1600x4_extract_bytes 2s 2s +0%
keccakf1600x4_permute 2s 4s -50%
keccakf1600x4_xor_bytes 2s 3s -33%
mld_ct_abs_i32 2s 3s -33%
mld_ct_get_optblocker_i64 2s 4s -50%
mld_ct_get_optblocker_u32 2s 3s -33%
mld_keccakf1600_extract_bytes 2s 3s -33%
mld_value_barrier_i64 2s 1s +100%
mld_value_barrier_u8 2s 4s -50%
poly_caddq 2s 3s -33%
poly_decompose_c 2s 5s -60%
poly_decompose_native 2s 2s +0%
poly_pointwise_montgomery 2s 2s +0%
poly_shiftl 2s 2s +0%
poly_use_hint_native 2s 5s -60%
polyeta_pack 2s 1s +100%
polyt1_pack 2s 3s -33%
polyt1_unpack 2s 4s -50%
polyvec_matrix_pointwise_montgomery_zvec_eager 2s - new
polyveck_pack_eta 2s 3s -33%
polyveck_unpack_t0 2s 5s -60%
polyvecl_pack_eta 2s 4s -50%
polyvecl_pointwise_acc_montgomery 2s 2s +0%
polyvecl_unpack_eta 2s 3s -33%
polyz_unpack_native 2s 3s -33%
shake128_release 2s 3s -33%
shake256 2s 2s +0%
shake256_absorb 2s 3s -33%
shake256_init 2s 2s +0%
shake256_release 2s 2s +0%
shake256x4_absorb_once 2s 3s -33%
unpack_pk 2s 4s -50%
keccak_f1600_x1_native_aarch64 1s 7s -86%
mld_ct_sel_int32 1s 3s -67%
mld_value_barrier_u32 1s 3s -67%
poly_caddq_native_aarch64 1s 3s -67%
polyveck_pack_t0 1s 3s -67%
polyveck_pack_w1 1s 3s -67%
use_hint 1s 4s -75%

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Apr 7, 2026

CBMC Results (ML-DSA-65)

⚠️ Attention Required

Proof Status Current Previous Change
polyvecl_pointwise_acc_montgomery_c ⚠️ 300s 167s +80%
Full Results (182 proofs)
Proof Status Current Previous Change
**TOTAL** 2345s 2314s +1.3%
polyvecl_pointwise_acc_montgomery_c ⚠️ 300s 167s +80%
sign_verify_internal 236s 319s -26%
mld_attempt_signature_generation 223s 269s -17%
poly_pointwise_montgomery_c 160s 142s +13%
rej_uniform_native 143s 145s -1%
mld_invntt_layer 99s 92s +8%
polyvec_matrix_expand_eager 99s - new
mld_ct_memcmp 78s 71s +10%
mld_ntt_layer 55s 52s +6%
polyvec_matrix_expand_eager_serial 54s - new
sign_signature_internal 33s 25s +32%
mld_compute_t0_t1_tr_from_sk_components 22s 24s -8%
rej_uniform 22s 20s +10%
poly_chknorm_c 20s 19s +5%
fqmul 18s 18s +0%
poly_uniform_eta_4x 17s 16s +6%
polyveck_decompose 17s 15s +13%
poly_uniform_4x 16s 17s -6%
polymat_permute_bitrev_to_custom 15s 27s -44%
polyt0_unpack 15s 13s +15%
polyveck_add 14s 9s +56%
rej_uniform_c 14s 14s +0%
keccakf1600x4_permute_native 12s 14s -14%
polyvec_matrix_pointwise_montgomery_eager 12s - new
polyveck_power2round 12s 12s +0%
polyvecl_chknorm 12s 12s +0%
mld_ntt_butterfly_block 11s 12s -8%
poly_add 11s 10s +10%
keccak_absorb_once_x4 10s 10s +0%
keccakf1600_permute 10s 8s +25%
keccakf1600_permute_native 9s 8s +12%
mld_check_pct 9s 10s -10%
mld_polyvecl_permute_bitrev_to_custom_native 9s 8s +12%
mld_compute_pack_z 8s 8s +0%
poly_decompose_c 8s 8s +0%
polyveck_caddq 8s 6s +33%
polyveck_ntt 8s 6s +33%
polyveck_pointwise_poly_montgomery 8s 6s +33%
polyveck_use_hint 8s 7s +14%
sign_open 8s 4s +100%
sign_pk_from_sk 8s 7s +14%
unpack_sk 8s 7s +14%
keccak_absorb 7s 9s -22%
keccak_squeezeblocks_x4 7s 6s +17%
mld_h 7s 5s +40%
mld_prepare_domain_separation_prefix 7s 6s +17%
poly_caddq_c 7s 7s +0%
polyveck_chknorm 7s 6s +17%
polyveck_invntt_tomont 7s 8s -12%
sign 7s 7s +0%
keccak_squeeze 6s 4s +50%
poly_invntt_tomont_c 6s 4s +50%
poly_power2round 6s 3s +100%
poly_reduce 6s 1s +500%
poly_uniform_eta 6s 3s +100%
poly_use_hint_c 6s 4s +50%
polyeta_unpack 6s 4s +50%
polyveck_shiftl 6s 9s -33%
polyveck_sub 6s 12s -50%
keccak_f1600_x4_native_aarch64_v84a 5s 1s +400%
mld_sample_s1_s2 5s 7s -29%
mld_sample_s1_s2_serial 5s 6s -17%
poly_invntt_tomont_native 5s 3s +67%
poly_sub 5s 4s +25%
polyt0_pack 5s 3s +67%
polyveck_reduce 5s 7s -29%
polyvecl_ntt 5s 8s -38%
rej_eta_native 5s 3s +67%
shake128_release 5s 2s +150%
shake256_squeeze 5s 3s +67%
sign_keypair_internal 5s 5s +0%
sign_verify 5s 6s -17%
sign_verify_pre_hash_internal 5s 4s +25%
sign_verify_pre_hash_shake256 5s 6s -17%
fqscale 4s 4s +0%
intt_native_x86_64 4s 3s +33%
keccakf1600_extract_bytes (big endian) 4s 1s +300%
mld_ct_sel_int32 4s 1s +300%
pack_pk 4s 3s +33%
pack_sig_h_poly 4s - new
pack_sk 4s 3s +33%
poly_caddq_native 4s 5s -20%
poly_challenge 4s 2s +100%
poly_pointwise_montgomery 4s 2s +100%
poly_pointwise_montgomery_native 4s 3s +33%
poly_uniform 4s 4s +0%
polyt1_unpack 4s 2s +100%
polyveck_pack_eta 4s 3s +33%
polyveck_pack_t0 4s 3s +33%
polyveck_unpack_t0 4s 4s +0%
polyvecl_permute_bitrev_to_custom 4s 5s -20%
polyvecl_pointwise_acc_montgomery_native 4s 5s -20%
polyz_unpack_c 4s 5s -20%
power2round 4s 1s +300%
shake128x4_absorb_once 4s 2s +100%
shake256_release 4s 3s +33%
sign_signature_extmu 4s 5s -20%
sign_signature_pre_hash_internal 4s 4s +0%
sign_signature_pre_hash_shake256 4s 2s +100%
unpack_hints 4s 5s -20%
decompose 3s 3s +0%
keccak_finalize 3s 3s +0%
keccakf1600x4_xor_bytes 3s 1s +200%
mld_ct_cmask_neg_i32 3s 1s +200%
mld_ct_cmask_nonzero_u32 3s 3s +0%
mld_ct_get_optblocker_i64 3s 2s +50%
poly_caddq 3s 1s +200%
poly_caddq_native_aarch64 3s 3s +0%
poly_chknorm 3s 4s -25%
poly_chknorm_native 3s 2s +50%
poly_chknorm_native_aarch64 3s 2s +50%
poly_decompose 3s 5s -40%
poly_invntt_tomont 3s 4s -25%
poly_ntt 3s 4s -25%
poly_ntt_c 3s 3s +0%
poly_ntt_native 3s 2s +50%
poly_shiftl 3s 3s +0%
polyeta_pack 3s 3s +0%
polyt1_pack 3s 3s +0%
polyvec_matrix_pointwise_montgomery_zvec_eager 3s - new
polyveck_pack_w1 3s 2s +50%
polyveck_unpack_eta 3s 2s +50%
polyvecl_pack_eta 3s 3s +0%
polyvecl_uniform_gamma1_serial 3s 3s +0%
polyz_pack 3s 2s +50%
polyz_unpack 3s 4s -25%
reduce32 3s 3s +0%
shake128_absorb 3s 3s +0%
shake128_init 3s 2s +50%
shake128_squeeze 3s 3s +0%
shake256_absorb 3s 2s +50%
shake256_finalize 3s 1s +200%
shake256x4_absorb_once 3s 2s +50%
shake256x4_squeezeblocks 3s 5s -40%
sign_keypair 3s 6s -50%
sign_signature 3s 2s +50%
sign_verify_extmu 3s 5s -40%
sys_check_capability 3s 1s +200%
unpack_pk 3s 2s +50%
caddq 2s 3s -33%
keccak_f1600_x1_native_aarch64 2s 2s +0%
keccak_f1600_x1_native_aarch64_v84a 2s 1s +100%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 5s -60%
keccak_init 2s 3s -33%
keccakf1600_xor_bytes (big endian) 2s 3s -33%
keccakf1600x4_extract_bytes 2s 4s -50%
keccakf1600x4_permute 2s 1s +100%
make_hint 2s 5s -60%
mld_ct_abs_i32 2s 4s -50%
mld_ct_cmask_nonzero_u8 2s 1s +100%
mld_ct_get_optblocker_u32 2s 2s +0%
mld_ct_get_optblocker_u8 2s 2s +0%
mld_value_barrier_i64 2s 2s +0%
mld_value_barrier_u8 2s 2s +0%
montgomery_reduce 2s 2s +0%
ntt_native_aarch64 2s 6s -67%
ntt_native_x86_64 2s 4s -50%
pack_sig_c 2s - new
pack_sig_z 2s 1s +100%
poly_decompose_native 2s 4s -50%
poly_make_hint 2s 2s +0%
poly_uniform_gamma1 2s 4s -50%
poly_use_hint 2s 1s +100%
poly_use_hint_native 2s 2s +0%
polyvecl_pointwise_acc_montgomery 2s 3s -33%
polyvecl_uniform_gamma1 2s 2s +0%
polyvecl_unpack_eta 2s 4s -50%
polyw1_pack 2s 3s -33%
polyz_unpack_native 2s 2s +0%
rej_eta 2s 3s -33%
rej_eta_c 2s 4s -50%
shake128_finalize 2s 1s +100%
shake128x4_squeezeblocks 2s 2s +0%
shake256 2s 1s +100%
shake256_init 2s 4s -50%
use_hint 2s 2s +0%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 1s 3s -67%
keccakf1600_xor_bytes 1s 5s -80%
mld_keccakf1600_extract_bytes 1s 1s +0%
mld_value_barrier_u32 1s 1s +0%
poly_uniform_gamma1_4x 1s 3s -67%
polyvecl_unpack_z 1s 2s -50%

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Apr 7, 2026

CBMC Results (ML-DSA-87)

Full Results (182 proofs)
Proof Status Current Previous Change
**TOTAL** 2210s 2628s -15.9%
sign_verify_internal 243s 221s +10%
mld_attempt_signature_generation 186s 238s -22%
polyvecl_pointwise_acc_montgomery_c 182s 303s -40%
polyvec_matrix_expand_eager 162s - new
rej_uniform_native 142s 154s -8%
poly_pointwise_montgomery_c 139s 175s -21%
mld_invntt_layer 92s 98s -6%
mld_ct_memcmp 65s 80s -19%
polyvec_matrix_expand_eager_serial 61s - new
mld_ntt_layer 52s 54s -4%
sign_signature_internal 35s 39s -10%
mld_compute_t0_t1_tr_from_sk_components 26s 26s +0%
polymat_permute_bitrev_to_custom 23s 47s -51%
fqmul 19s 23s -17%
rej_uniform 19s 21s -10%
poly_chknorm_c 18s 22s -18%
poly_uniform_eta_4x 18s 18s +0%
polyeta_unpack 16s 16s +0%
poly_uniform_4x 15s 15s +0%
keccakf1600x4_permute_native 14s 13s +8%
rej_uniform_c 14s 15s -7%
polyveck_power2round 13s 12s +8%
mld_ntt_butterfly_block 12s 14s -14%
polyt0_unpack 12s 15s -20%
poly_add 11s 13s -15%
polyveck_add 11s 10s +10%
polyveck_ntt 11s 7s +57%
keccak_absorb_once_x4 10s 10s +0%
keccakf1600_permute_native 10s 8s +25%
mld_polyvecl_permute_bitrev_to_custom_native 10s 13s -23%
mld_check_pct 9s 9s +0%
poly_invntt_tomont_c 9s 6s +50%
polyveck_decompose 9s 59s -85%
polyveck_shiftl 9s 5s +80%
polyvecl_ntt 9s 11s -18%
sign_open 9s 3s +200%
sign_pk_from_sk 9s 7s +29%
poly_decompose_c 8s 7s +14%
polyveck_caddq 8s 7s +14%
polyz_unpack_c 8s 11s -27%
sign_keypair_internal 8s 7s +14%
unpack_sk 8s 7s +14%
keccak_absorb 7s 6s +17%
keccakf1600_permute 7s 8s -12%
mld_sample_s1_s2 7s 8s -12%
poly_power2round 7s 6s +17%
sign 7s 7s +0%
keccak_squeezeblocks_x4 6s 5s +20%
mld_sample_s1_s2_serial 6s 7s -14%
poly_challenge 6s 5s +20%
poly_sub 6s 3s +100%
polyvec_matrix_pointwise_montgomery_eager 6s - new
polyveck_invntt_tomont 6s 4s +50%
polyveck_use_hint 6s 8s -25%
polyvecl_unpack_z 6s 3s +100%
shake256_squeeze 6s 2s +200%
sign_signature 6s 5s +20%
sign_verify_pre_hash_internal 6s 4s +50%
unpack_hints 6s 5s +20%
mld_compute_pack_z 5s 6s -17%
polyt0_pack 5s 4s +25%
polyveck_pointwise_poly_montgomery 5s 8s -38%
polyveck_sub 5s 8s -38%
polyveck_unpack_eta 5s 4s +25%
sign_signature_extmu 5s 4s +25%
sign_verify_extmu 5s 7s -29%
sign_verify_pre_hash_shake256 5s 3s +67%
intt_native_x86_64 4s 2s +100%
keccakf1600_extract_bytes (big endian) 4s 2s +100%
keccakf1600_xor_bytes (big endian) 4s 4s +0%
keccakf1600x4_permute 4s 1s +300%
make_hint 4s 2s +100%
mld_ct_cmask_neg_i32 4s 2s +100%
mld_ct_get_optblocker_u32 4s 3s +33%
mld_h 4s 5s -20%
mld_prepare_domain_separation_prefix 4s 3s +33%
pack_pk 4s 2s +100%
pack_sk 4s 3s +33%
poly_caddq_c 4s 6s -33%
poly_caddq_native 4s 2s +100%
poly_caddq_native_aarch64 4s 4s +0%
poly_chknorm 4s 3s +33%
poly_ntt_c 4s 5s -20%
poly_ntt_native 4s 2s +100%
poly_pointwise_montgomery 4s 5s -20%
poly_uniform_eta 4s 5s -20%
polyveck_reduce 4s 8s -50%
polyvecl_pack_eta 4s 4s +0%
polyvecl_uniform_gamma1 4s 4s +0%
polyvecl_unpack_eta 4s 4s +0%
polyw1_pack 4s 2s +100%
rej_eta_c 4s 4s +0%
shake256x4_squeezeblocks 4s 2s +100%
sign_signature_pre_hash_internal 4s 4s +0%
sign_signature_pre_hash_shake256 4s 6s -33%
sign_verify 4s 2s +100%
sys_check_capability 4s 4s +0%
caddq 3s 4s -25%
fqscale 3s 3s +0%
keccak_f1600_x1_native_aarch64_v84a 3s 2s +50%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 3s 3s +0%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 3s 3s +0%
keccak_init 3s 1s +200%
mld_ct_cmask_nonzero_u32 3s 3s +0%
mld_ct_cmask_nonzero_u8 3s 4s -25%
mld_ct_get_optblocker_u8 3s 4s -25%
pack_sig_c 3s - new
pack_sig_z 3s 3s +0%
poly_decompose_native 3s 3s +0%
poly_make_hint 3s 2s +50%
poly_ntt 3s 3s +0%
poly_uniform 3s 1s +200%
poly_uniform_gamma1 3s 4s -25%
poly_uniform_gamma1_4x 3s 5s -40%
poly_use_hint 3s 3s +0%
poly_use_hint_native 3s 4s -25%
polyt1_pack 3s 3s +0%
polyveck_chknorm 3s 5s -40%
polyveck_pack_eta 3s 5s -40%
polyveck_pack_t0 3s 5s -40%
polyveck_pack_w1 3s 4s -25%
polyveck_unpack_t0 3s 5s -40%
polyvecl_chknorm 3s 3s +0%
polyvecl_pointwise_acc_montgomery 3s 2s +50%
polyvecl_pointwise_acc_montgomery_native 3s 3s +0%
polyvecl_uniform_gamma1_serial 3s 3s +0%
polyz_unpack_native 3s 4s -25%
power2round 3s 2s +50%
shake128_absorb 3s 3s +0%
shake128_finalize 3s 2s +50%
shake128_init 3s 2s +50%
shake128x4_absorb_once 3s 3s +0%
shake256 3s 2s +50%
shake256_finalize 3s 1s +200%
sign_keypair 3s 3s +0%
decompose 2s 5s -60%
keccak_f1600_x1_native_aarch64 2s 3s -33%
keccak_finalize 2s 3s -33%
keccak_squeeze 2s 4s -50%
keccakf1600_xor_bytes 2s 3s -33%
keccakf1600x4_extract_bytes 2s 5s -60%
keccakf1600x4_xor_bytes 2s 4s -50%
mld_ct_sel_int32 2s 3s -33%
mld_keccakf1600_extract_bytes 2s 3s -33%
mld_value_barrier_i64 2s 2s +0%
ntt_native_aarch64 2s 3s -33%
ntt_native_x86_64 2s 3s -33%
pack_sig_h_poly 2s - new
poly_caddq 2s 3s -33%
poly_chknorm_native 2s 2s +0%
poly_chknorm_native_aarch64 2s 4s -50%
poly_decompose 2s 3s -33%
poly_invntt_tomont 2s 5s -60%
poly_invntt_tomont_native 2s 4s -50%
poly_pointwise_montgomery_native 2s 1s +100%
poly_reduce 2s 2s +0%
poly_shiftl 2s 2s +0%
poly_use_hint_c 2s 6s -67%
polyt1_unpack 2s 4s -50%
polyvec_matrix_pointwise_montgomery_zvec_eager 2s - new
polyvecl_permute_bitrev_to_custom 2s 4s -50%
polyz_pack 2s 4s -50%
polyz_unpack 2s 2s +0%
reduce32 2s 3s -33%
rej_eta 2s 2s +0%
shake128_release 2s 2s +0%
shake128_squeeze 2s 3s -33%
shake128x4_squeezeblocks 2s 3s -33%
shake256_absorb 2s 3s -33%
shake256_init 2s 3s -33%
shake256x4_absorb_once 2s 3s -33%
unpack_pk 2s 4s -50%
use_hint 2s 2s +0%
keccak_f1600_x4_native_aarch64_v84a 1s 3s -67%
mld_ct_abs_i32 1s 3s -67%
mld_ct_get_optblocker_i64 1s 2s -50%
mld_value_barrier_u32 1s 3s -67%
mld_value_barrier_u8 1s 2s -50%
montgomery_reduce 1s 1s +0%
polyeta_pack 1s 2s -50%
rej_eta_native 1s 4s -75%
shake256_release 1s 4s -75%

Introduce mld_zvec following the lazy polyvec pattern (eager / lazy
variants with #define dispatch on MLD_CONFIG_REDUCE_RAM):

  - mld_zvec_init: in eager mode unpacks the full polyvecl, performs the
    polyvecl-wide infinity-norm bound check, and NTTs in place. In lazy
    mode it just stores a pointer to the packed signature bytes.
  - mld_zvec_get_poly: in eager mode copies a single polynomial from
    the precomputed vector. In lazy mode unpacks one polynomial,
    performs the per-poly infinity-norm bound check, and NTTs into
    the caller-provided buffer.

The norm check thus moves out of mld_sign_verify_internal into the
zvec init / get_poly accessors, so the verify body no longer has to
sequence chknorm explicitly.

Add a fused matrix-vector helper
mld_polyvec_matrix_pointwise_montgomery_zvec used by verify:

  - The eager variant is a thin wrapper around the existing
    mld_polyvec_matrix_pointwise_montgomery_eager (z is already NTT'd
    by mld_zvec_init).
  - The lazy variant streams z via mld_zvec_get_poly_lazy and generates
    the matrix on-the-fly column-by-column,
    accumulating A[*,l] * z[l] into w.

In REDUCE_RAM mode this avoids holding the full unpacked polyvecl z
in memory at once, reducing verify allocation by 2-5 KiB per parameter
set.

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
@mkannwischer mkannwischer marked this pull request as ready for review April 7, 2026 08:01
@mkannwischer mkannwischer requested a review from a team as a code owner April 7, 2026 08:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants