Skip to content

lowram: Stream matrix A element-by-element to reduce memory#1019

Open
mkannwischer wants to merge 2 commits intomainfrom
lowram-stream-a
Open

lowram: Stream matrix A element-by-element to reduce memory#1019
mkannwischer wants to merge 2 commits intomainfrom
lowram-stream-a

Conversation

@mkannwischer
Copy link
Copy Markdown
Contributor

Replace the row-level matrix buffer (mld_polyvecl) with a single-poly
buffer in REDUCE_RAM mode. In the lazy path, matrix elements A[k][l]
are sampled on demand one at a time, and the matrix-vector product
accumulates element-by-element instead of row-by-row.

Restructure polymat into eager/lazy variants following the same pattern
as s1hat/s2hat/t0hat:

  • mld_polymat_eager: stores full K x L matrix
  • mld_polymat_lazy: stores rho + single poly_buffer + tmp
  • mld_polyvec_matrix_expand_eager/_lazy: separate implementations
  • mld_polyvec_matrix_pointwise_montgomery_eager/_lazy: separate
    implementations with CBMC contracts only on the eager variants

Move all polymat-related code from polyvec.h/polyvec.c into
polyvec_lazy.h/polyvec_lazy.c.

Instead of allocating a full polyveck for h in attempt_signature_generation,
compute cs2, ct0, and hints one polynomial at a time using scratch polys.

This eliminates the polyveck h from the yh_u union, replacing
mld_pack_sig_c_h with incremental packing via mld_pack_sig_c and
mld_pack_sig_h_poly.
This is a prerequisite for also eliminating y in a follow up.

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
@mkannwischer mkannwischer force-pushed the lowram-stream-a branch 2 times, most recently from 7013003 to 5ccd3c2 Compare April 5, 2026 07:10
@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Apr 5, 2026

CBMC Results (ML-DSA-65)

⚠️ Attention Required

Proof Status Current Previous Change
polymat_permute_bitrev_to_custom ⚠️ 337s 32s +953%
sign_signature_internal ⚠️ 38s 25s +52%
Full Results (177 proofs)
Proof Status Current Previous Change
**TOTAL** 2379s 2543s -6.4%
polymat_permute_bitrev_to_custom ⚠️ 337s 32s +953%
sign_verify_internal 227s 350s -35%
polyvec_matrix_expand_eager 148s - new
poly_pointwise_montgomery_c 147s 165s -11%
rej_uniform_native 140s 151s -7%
polyvecl_pointwise_acc_montgomery_c 113s 201s -44%
mld_invntt_layer 94s 99s -5%
mld_attempt_signature_generation 82s 295s -72%
polyvec_matrix_expand_eager_serial 77s - new
mld_ct_memcmp 66s 82s -20%
mld_ntt_layer 53s 58s -9%
sign_signature_internal ⚠️ 38s 25s +52%
rej_uniform 25s 23s +9%
mld_compute_t0_t1_tr_from_sk_components 22s 25s -12%
poly_chknorm_c 20s 20s +0%
fqmul 18s 20s -10%
poly_uniform_eta_4x 16s 16s +0%
poly_uniform_4x 15s 16s -6%
polyveck_decompose 14s 16s -12%
rej_uniform_c 14s 17s -18%
polyt0_unpack 13s 16s -19%
polyvec_matrix_pointwise_montgomery_eager 13s - new
polyveck_add 13s 9s +44%
keccakf1600x4_permute_native 12s 14s -14%
mld_ntt_butterfly_block 12s 14s -14%
polyveck_power2round 12s 10s +20%
keccak_absorb_once_x4 11s 9s +22%
mld_polyvecl_permute_bitrev_to_custom_native 11s 9s +22%
polyveck_use_hint 11s 12s -8%
polyvecl_chknorm 11s 13s -15%
poly_add 10s 12s -17%
poly_decompose_c 10s 7s +43%
polyveck_ntt 10s 10s +0%
sign 10s 7s +43%
sign_pk_from_sk 10s 8s +25%
polyveck_pointwise_poly_montgomery 9s 8s +12%
keccakf1600_permute 8s 7s +14%
mld_check_pct 8s 9s -11%
sign_open 8s 5s +60%
keccak_absorb 7s 6s +17%
keccakf1600_permute_native 7s 8s -12%
mld_h 7s 7s +0%
poly_invntt_tomont_c 7s 10s -30%
polyveck_caddq 7s 7s +0%
polyveck_invntt_tomont 7s 8s -12%
polyveck_reduce 7s 8s -12%
polyveck_shiftl 7s 7s +0%
polyveck_sub 7s 12s -42%
polyvecl_ntt 7s 9s -22%
unpack_sk 7s 9s -22%
mld_sample_s1_s2 6s 4s +50%
pack_sig_c 6s - new
pack_sk 6s 3s +100%
poly_use_hint_native 6s 3s +100%
polyveck_pack_w1 6s 2s +200%
polyvecl_uniform_gamma1_serial 6s 5s +20%
sign_keypair_internal 6s 5s +20%
sign_signature_pre_hash_internal 6s 5s +20%
sign_signature_pre_hash_shake256 6s 5s +20%
unpack_hints 6s 4s +50%
keccak_squeezeblocks_x4 5s 7s -29%
keccakf1600_extract_bytes (big endian) 5s 3s +67%
mld_compute_pack_z 5s 7s -29%
mld_ct_abs_i32 5s 2s +150%
mld_ct_cmask_nonzero_u8 5s 1s +400%
ntt_native_aarch64 5s 3s +67%
poly_invntt_tomont_native 5s 3s +67%
poly_ntt_c 5s 3s +67%
poly_power2round 5s 5s +0%
poly_reduce 5s 2s +150%
poly_sub 5s 2s +150%
poly_uniform 5s 7s -29%
poly_uniform_gamma1 5s 4s +25%
poly_use_hint_c 5s 4s +25%
polyveck_unpack_t0 5s 4s +25%
polyvecl_unpack_eta 5s 4s +25%
sign_keypair 5s 4s +25%
sign_signature 5s 5s +0%
sign_verify 5s 5s +0%
sign_verify_extmu 5s 3s +67%
sign_verify_pre_hash_internal 5s 6s -17%
use_hint 5s 3s +67%
caddq 4s 4s +0%
fqscale 4s 3s +33%
keccakf1600_xor_bytes 4s 4s +0%
keccakf1600_xor_bytes (big endian) 4s 3s +33%
mld_sample_s1_s2_serial 4s 5s -20%
pack_sig_z 4s 4s +0%
poly_caddq_c 4s 6s -33%
poly_challenge 4s 5s -20%
poly_ntt 4s 3s +33%
poly_ntt_native 4s 4s +0%
poly_pointwise_montgomery 4s 2s +100%
poly_uniform_eta 4s 5s -20%
poly_uniform_gamma1_4x 4s 4s +0%
polyeta_pack 4s 3s +33%
polyeta_unpack 4s 4s +0%
polyt0_pack 4s 4s +0%
polyveck_chknorm 4s 4s +0%
polyveck_unpack_eta 4s 3s +33%
polyvecl_permute_bitrev_to_custom 4s 3s +33%
polyvecl_pointwise_acc_montgomery_native 4s 7s -43%
rej_eta_c 4s 3s +33%
rej_eta_native 4s 6s -33%
shake256 4s 3s +33%
shake256x4_squeezeblocks 4s 2s +100%
decompose 3s 2s +50%
keccak_init 3s 2s +50%
keccak_squeeze 3s 2s +50%
keccakf1600x4_permute 3s 2s +50%
mld_ct_get_optblocker_u32 3s 2s +50%
mld_ct_sel_int32 3s 2s +50%
mld_keccakf1600_extract_bytes 3s 3s +0%
pack_pk 3s 4s -25%
pack_sig_h_poly 3s - new
poly_caddq_native_aarch64 3s 4s -25%
poly_chknorm 3s 1s +200%
poly_invntt_tomont 3s 5s -40%
poly_make_hint 3s 3s +0%
poly_pointwise_montgomery_native 3s 3s +0%
poly_shiftl 3s 4s -25%
poly_use_hint 3s 3s +0%
polyt1_unpack 3s 5s -40%
polyveck_pack_eta 3s 2s +50%
polyvecl_pack_eta 3s 4s -25%
polyvecl_uniform_gamma1 3s 2s +50%
polyvecl_unpack_z 3s 3s +0%
polyw1_pack 3s 3s +0%
polyz_unpack_native 3s 3s +0%
power2round 3s 1s +200%
reduce32 3s 1s +200%
shake128_squeeze 3s 3s +0%
shake256_absorb 3s 3s +0%
shake256_release 3s 3s +0%
sign_signature_extmu 3s 5s -40%
sign_verify_pre_hash_shake256 3s 3s +0%
sys_check_capability 3s 4s -25%
unpack_sig 3s 4s -25%
intt_native_x86_64 2s 6s -67%
keccakf1600x4_extract_bytes 2s 2s +0%
keccakf1600x4_xor_bytes 2s 2s +0%
make_hint 2s 2s +0%
mld_ct_cmask_neg_i32 2s 1s +100%
mld_ct_cmask_nonzero_u32 2s 3s -33%
mld_ct_get_optblocker_i64 2s 1s +100%
mld_ct_get_optblocker_u8 2s 2s +0%
mld_prepare_domain_separation_prefix 2s 4s -50%
mld_value_barrier_u8 2s 2s +0%
montgomery_reduce 2s 4s -50%
ntt_native_x86_64 2s 5s -60%
poly_caddq 2s 5s -60%
poly_caddq_native 2s 2s +0%
poly_chknorm_native 2s 5s -60%
poly_chknorm_native_aarch64 2s 3s -33%
poly_decompose_native 2s 4s -50%
polyt1_pack 2s 4s -50%
polyveck_pack_t0 2s 3s -33%
polyvecl_pointwise_acc_montgomery 2s 4s -50%
polyz_pack 2s 3s -33%
polyz_unpack 2s 4s -50%
polyz_unpack_c 2s 3s -33%
rej_eta 2s 2s +0%
shake128_finalize 2s 2s +0%
shake128_init 2s 3s -33%
shake128_release 2s 2s +0%
shake128x4_absorb_once 2s 2s +0%
shake128x4_squeezeblocks 2s 3s -33%
shake256_finalize 2s 2s +0%
shake256_init 2s 4s -50%
shake256x4_absorb_once 2s 5s -60%
unpack_pk 2s 5s -60%
keccak_finalize 1s 3s -67%
mld_value_barrier_i64 1s 1s +0%
mld_value_barrier_u32 1s 4s -75%
poly_decompose 1s 4s -75%
shake128_absorb 1s 2s -50%
shake256_squeeze 1s 3s -67%

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Apr 5, 2026

CBMC Results (ML-DSA-44)

Full Results (177 proofs)
Proof Status Current Previous Change
**TOTAL** 1953s 2020s -3.3%
sign_verify_internal 209s 179s +17%
polyvecl_pointwise_acc_montgomery_c 178s 205s -13%
poly_pointwise_montgomery_c 163s 150s +9%
rej_uniform_native 146s 144s +1%
mld_invntt_layer 92s 87s +6%
mld_attempt_signature_generation 89s 251s -65%
mld_ct_memcmp 82s 77s +6%
mld_ntt_layer 59s 54s +9%
rej_uniform 24s 21s +14%
sign_signature_internal 23s 19s +21%
poly_chknorm_c 22s 21s +5%
polyvec_matrix_expand_eager 21s - new
fqmul 20s 18s +11%
rej_uniform_c 20s 14s +43%
mld_compute_t0_t1_tr_from_sk_components 18s 12s +50%
polymat_permute_bitrev_to_custom 18s 14s +29%
poly_uniform_eta_4x 17s 20s -15%
polyeta_unpack 17s 16s +6%
poly_uniform_4x 15s 16s -6%
polyt0_unpack 14s 14s +0%
keccakf1600x4_permute_native 13s 14s -7%
mld_ntt_butterfly_block 13s 12s +8%
poly_add 13s 14s -7%
polyvec_matrix_expand_eager_serial 13s - new
polyz_unpack_c 13s 11s +18%
mld_check_pct 12s 8s +50%
polyveck_power2round 11s 10s +10%
keccak_absorb_once_x4 10s 11s -9%
polyveck_chknorm 10s 6s +67%
polyveck_decompose 10s 6s +67%
keccakf1600_permute 9s 8s +12%
unpack_hints 9s 5s +80%
keccak_absorb 8s 7s +14%
keccakf1600_permute_native 8s 9s -11%
mld_polyvecl_permute_bitrev_to_custom_native 8s 8s +0%
mld_sample_s1_s2_serial 8s 4s +100%
poly_uniform_gamma1_4x 8s 6s +33%
poly_use_hint_c 8s 6s +33%
polyvec_matrix_pointwise_montgomery_eager 8s - new
polyveck_shiftl 8s 7s +14%
sign 8s 5s +60%
sign_signature_pre_hash_internal 8s 3s +167%
poly_use_hint_native 7s 3s +133%
polyveck_reduce 7s 6s +17%
polyvecl_chknorm 7s 5s +40%
sign_pk_from_sk 7s 7s +0%
sign_signature_pre_hash_shake256 7s 4s +75%
unpack_sk 7s 6s +17%
keccak_squeezeblocks_x4 6s 6s +0%
mld_compute_pack_z 6s 4s +50%
mld_sample_s1_s2 6s 4s +50%
poly_invntt_tomont_c 6s 7s -14%
poly_sub 6s 4s +50%
polyveck_add 6s 8s -25%
polyveck_caddq 6s 4s +50%
polyvecl_ntt 6s 4s +50%
polyvecl_pack_eta 6s 6s +0%
rej_eta_native 6s 6s +0%
shake256_finalize 6s 2s +200%
sign_verify_extmu 6s 4s +50%
sign_verify_pre_hash_internal 6s 6s +0%
caddq 5s 4s +25%
mld_ct_cmask_nonzero_u32 5s 2s +150%
mld_h 5s 3s +67%
ntt_native_aarch64 5s 2s +150%
poly_caddq_c 5s 5s +0%
poly_pointwise_montgomery_native 5s 4s +25%
poly_uniform 5s 4s +25%
poly_uniform_eta 5s 6s -17%
polyveck_ntt 5s 5s +0%
polyveck_pack_eta 5s 3s +67%
polyveck_sub 5s 5s +0%
polyveck_use_hint 5s 6s -17%
sign_keypair_internal 5s 5s +0%
sign_verify_pre_hash_shake256 5s 4s +25%
unpack_sig 5s 4s +25%
fqscale 4s 4s +0%
keccak_squeeze 4s 1s +300%
keccakf1600x4_extract_bytes 4s 3s +33%
make_hint 4s 4s +0%
mld_ct_abs_i32 4s 3s +33%
mld_ct_sel_int32 4s 3s +33%
mld_prepare_domain_separation_prefix 4s 7s -43%
ntt_native_x86_64 4s 2s +100%
poly_caddq 4s 4s +0%
poly_caddq_native 4s 5s -20%
poly_challenge 4s 4s +0%
poly_decompose 4s 2s +100%
poly_ntt_c 4s 2s +100%
poly_pointwise_montgomery 4s 1s +300%
poly_reduce 4s 2s +100%
poly_shiftl 4s 5s -20%
poly_uniform_gamma1 4s 3s +33%
polyeta_pack 4s 3s +33%
polyveck_invntt_tomont 4s 3s +33%
polyveck_pack_t0 4s 4s +0%
polyveck_pack_w1 4s 2s +100%
polyvecl_uniform_gamma1 4s 4s +0%
polyvecl_uniform_gamma1_serial 4s 4s +0%
polyz_pack 4s 4s +0%
polyz_unpack 4s 2s +100%
polyz_unpack_native 4s 3s +33%
shake256x4_absorb_once 4s 2s +100%
sign_keypair 4s 3s +33%
sign_signature 4s 4s +0%
decompose 3s 4s -25%
intt_native_x86_64 3s 2s +50%
keccak_init 3s 2s +50%
keccakf1600_extract_bytes (big endian) 3s 1s +200%
keccakf1600_xor_bytes 3s 4s -25%
mld_ct_cmask_neg_i32 3s 2s +50%
mld_ct_get_optblocker_i64 3s 3s +0%
mld_keccakf1600_extract_bytes 3s 2s +50%
mld_value_barrier_u8 3s 3s +0%
montgomery_reduce 3s 2s +50%
pack_sig_c 3s - new
pack_sig_z 3s 4s -25%
pack_sk 3s 2s +50%
poly_chknorm_native 3s 4s -25%
poly_chknorm_native_aarch64 3s 3s +0%
poly_decompose_native 3s 5s -40%
poly_invntt_tomont 3s 4s -25%
poly_make_hint 3s 3s +0%
poly_ntt_native 3s 4s -25%
poly_power2round 3s 5s -40%
poly_use_hint 3s 3s +0%
polyt0_pack 3s 3s +0%
polyt1_pack 3s 1s +200%
polyt1_unpack 3s 1s +200%
polyveck_pointwise_poly_montgomery 3s 4s -25%
polyvecl_permute_bitrev_to_custom 3s 3s +0%
polyvecl_pointwise_acc_montgomery 3s 4s -25%
polyvecl_pointwise_acc_montgomery_native 3s 3s +0%
polyw1_pack 3s 3s +0%
rej_eta_c 3s 6s -50%
shake128_absorb 3s 6s -50%
shake128x4_squeezeblocks 3s 1s +200%
shake256 3s 2s +50%
shake256_release 3s 2s +50%
shake256x4_squeezeblocks 3s 4s -25%
sign_open 3s 4s -25%
sign_signature_extmu 3s 2s +50%
sign_verify 3s 7s -57%
sys_check_capability 3s 4s -25%
keccak_finalize 2s 2s +0%
keccakf1600x4_permute 2s 2s +0%
keccakf1600x4_xor_bytes 2s 1s +100%
mld_ct_cmask_nonzero_u8 2s 5s -60%
mld_ct_get_optblocker_u32 2s 2s +0%
mld_value_barrier_u32 2s 5s -60%
pack_pk 2s 6s -67%
pack_sig_h_poly 2s - new
poly_caddq_native_aarch64 2s 4s -50%
poly_chknorm 2s 4s -50%
poly_decompose_c 2s 4s -50%
poly_invntt_tomont_native 2s 2s +0%
poly_ntt 2s 4s -50%
polyveck_unpack_eta 2s 2s +0%
polyveck_unpack_t0 2s 4s -50%
polyvecl_unpack_eta 2s 4s -50%
polyvecl_unpack_z 2s 4s -50%
power2round 2s 3s -33%
reduce32 2s 2s +0%
rej_eta 2s 3s -33%
shake128_finalize 2s 2s +0%
shake128_init 2s 2s +0%
shake128_squeeze 2s 1s +100%
shake128x4_absorb_once 2s 2s +0%
shake256_absorb 2s 2s +0%
shake256_init 2s 2s +0%
shake256_squeeze 2s 1s +100%
unpack_pk 2s 5s -60%
use_hint 2s 1s +100%
keccakf1600_xor_bytes (big endian) 1s 2s -50%
mld_ct_get_optblocker_u8 1s 3s -67%
mld_value_barrier_i64 1s 1s +0%
shake128_release 1s 2s -50%

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Apr 5, 2026

CBMC Results (ML-DSA-87)

Full Results (177 proofs)
Proof Status Current Previous Change
**TOTAL** 2412s 2688s -10.3%
polyvecl_pointwise_acc_montgomery_c 419s 310s +35%
polyvec_matrix_expand_eager 160s - new
poly_pointwise_montgomery_c 155s 174s -11%
rej_uniform_native 141s 153s -8%
mld_attempt_signature_generation 125s 239s -48%
polyvec_matrix_expand_eager_serial 115s - new
sign_verify_internal 110s 215s -49%
mld_invntt_layer 92s 101s -9%
mld_ct_memcmp 72s 84s -14%
mld_ntt_layer 54s 61s -11%
sign_signature_internal 47s 39s +21%
mld_compute_t0_t1_tr_from_sk_components 29s 26s +12%
polymat_permute_bitrev_to_custom 25s 47s -47%
poly_chknorm_c 20s 22s -9%
rej_uniform 20s 24s -17%
polyeta_unpack 19s 18s +6%
fqmul 18s 22s -18%
mld_check_pct 17s 7s +143%
polyveck_decompose 16s 62s -74%
poly_uniform_4x 15s 17s -12%
poly_uniform_eta_4x 15s 17s -12%
rej_uniform_c 15s 15s +0%
keccakf1600x4_permute_native 13s 14s -7%
polyvec_matrix_pointwise_montgomery_eager 13s - new
polyt0_unpack 12s 15s -20%
polyveck_add 12s 12s +0%
mld_ntt_butterfly_block 11s 14s -21%
mld_polyvecl_permute_bitrev_to_custom_native 11s 12s -8%
polyveck_power2round 11s 10s +10%
polyvecl_ntt 11s 12s -8%
poly_add 10s 14s -29%
polyz_unpack_c 10s 9s +11%
sign_pk_from_sk 10s 9s +11%
keccak_absorb_once_x4 9s 11s -18%
keccakf1600_permute_native 9s 9s +0%
polyveck_caddq 9s 10s -10%
polyveck_ntt 9s 9s +0%
polyveck_use_hint 9s 10s -10%
keccak_absorb 8s 9s -11%
keccakf1600_permute 8s 7s +14%
polyveck_shiftl 8s 6s +33%
sign_open 8s 6s +33%
keccak_squeezeblocks_x4 7s 5s +40%
poly_caddq_c 7s 8s -12%
poly_challenge 7s 4s +75%
poly_decompose_c 7s 8s -12%
poly_uniform_eta 7s 4s +75%
polyt0_pack 7s 2s +250%
polyveck_sub 7s 7s +0%
sign 7s 7s +0%
sign_verify_extmu 7s 3s +133%
unpack_sk 7s 8s -12%
mld_h 6s 6s +0%
mld_sample_s1_s2 6s 6s +0%
mld_sample_s1_s2_serial 6s 7s -14%
poly_decompose_native 6s 4s +50%
poly_invntt_tomont_c 6s 8s -25%
poly_use_hint_native 6s 3s +100%
polyveck_invntt_tomont 6s 6s +0%
polyveck_pointwise_poly_montgomery 6s 7s -14%
sign_keypair_internal 6s 6s +0%
sign_verify 6s 5s +20%
sign_verify_pre_hash_internal 6s 4s +50%
make_hint 5s 3s +67%
mld_compute_pack_z 5s 5s +0%
montgomery_reduce 5s 4s +25%
poly_chknorm 5s 4s +25%
poly_chknorm_native_aarch64 5s 3s +67%
poly_decompose 5s 2s +150%
poly_power2round 5s 5s +0%
polyveck_chknorm 5s 3s +67%
polyveck_reduce 5s 10s -50%
polyveck_unpack_eta 5s 5s +0%
polyvecl_pack_eta 5s 2s +150%
polyvecl_uniform_gamma1_serial 5s 4s +25%
polyvecl_unpack_eta 5s 7s -29%
shake128_finalize 5s 3s +67%
shake128x4_squeezeblocks 5s 1s +400%
sign_keypair 5s 5s +0%
sign_signature_pre_hash_internal 5s 4s +25%
sign_signature_pre_hash_shake256 5s 7s -29%
unpack_hints 5s 6s -17%
decompose 4s 3s +33%
keccak_squeeze 4s 4s +0%
keccakf1600x4_permute 4s 3s +33%
mld_ct_get_optblocker_u32 4s 3s +33%
mld_prepare_domain_separation_prefix 4s 5s -20%
mld_value_barrier_i64 4s 5s -20%
ntt_native_aarch64 4s 4s +0%
pack_sig_z 4s 5s -20%
poly_caddq_native 4s 6s -33%
poly_caddq_native_aarch64 4s 3s +33%
poly_invntt_tomont_native 4s 1s +300%
poly_ntt_c 4s 4s +0%
poly_pointwise_montgomery_native 4s 5s -20%
poly_sub 4s 5s -20%
poly_uniform 4s 4s +0%
poly_uniform_gamma1 4s 2s +100%
polyeta_pack 4s 3s +33%
polyveck_pack_eta 4s 2s +100%
polyvecl_chknorm 4s 4s +0%
polyvecl_uniform_gamma1 4s 4s +0%
polyw1_pack 4s 3s +33%
shake256_absorb 4s 4s +0%
shake256_init 4s 3s +33%
shake256x4_absorb_once 4s 2s +100%
sys_check_capability 4s 3s +33%
use_hint 4s 2s +100%
caddq 3s 4s -25%
fqscale 3s 4s -25%
keccak_init 3s 2s +50%
keccakf1600_extract_bytes (big endian) 3s 3s +0%
keccakf1600_xor_bytes 3s 1s +200%
keccakf1600_xor_bytes (big endian) 3s 2s +50%
keccakf1600x4_xor_bytes 3s 5s -40%
mld_ct_cmask_nonzero_u32 3s 6s -50%
mld_ct_get_optblocker_i64 3s 3s +0%
mld_ct_get_optblocker_u8 3s 3s +0%
mld_ct_sel_int32 3s 1s +200%
ntt_native_x86_64 3s 4s -25%
pack_sig_c 3s - new
pack_sig_h_poly 3s - new
pack_sk 3s 4s -25%
poly_chknorm_native 3s 4s -25%
poly_make_hint 3s 5s -40%
poly_ntt_native 3s 2s +50%
poly_pointwise_montgomery 3s 2s +50%
poly_reduce 3s 4s -25%
poly_use_hint_c 3s 3s +0%
polyt1_unpack 3s 3s +0%
polyveck_pack_w1 3s 5s -40%
polyveck_unpack_t0 3s 5s -40%
polyvecl_permute_bitrev_to_custom 3s 3s +0%
polyvecl_pointwise_acc_montgomery 3s 4s -25%
polyvecl_unpack_z 3s 3s +0%
polyz_unpack_native 3s 3s +0%
power2round 3s 3s +0%
reduce32 3s 2s +50%
rej_eta_native 3s 8s -62%
shake128_init 3s 2s +50%
shake128_release 3s 3s +0%
shake256_finalize 3s 3s +0%
shake256_release 3s 2s +50%
shake256_squeeze 3s 4s -25%
shake256x4_squeezeblocks 3s 4s -25%
sign_signature 3s 7s -57%
sign_signature_extmu 3s 4s -25%
unpack_pk 3s 3s +0%
unpack_sig 3s 4s -25%
intt_native_x86_64 2s 3s -33%
keccak_finalize 2s 4s -50%
keccakf1600x4_extract_bytes 2s 1s +100%
mld_ct_abs_i32 2s 3s -33%
mld_ct_cmask_neg_i32 2s 2s +0%
mld_ct_cmask_nonzero_u8 2s 4s -50%
mld_keccakf1600_extract_bytes 2s 2s +0%
mld_value_barrier_u8 2s 2s +0%
pack_pk 2s 4s -50%
poly_caddq 2s 5s -60%
poly_invntt_tomont 2s 5s -60%
poly_ntt 2s 5s -60%
poly_shiftl 2s 5s -60%
poly_uniform_gamma1_4x 2s 5s -60%
poly_use_hint 2s 2s +0%
polyt1_pack 2s 3s -33%
polyveck_pack_t0 2s 2s +0%
polyvecl_pointwise_acc_montgomery_native 2s 3s -33%
polyz_pack 2s 4s -50%
rej_eta 2s 4s -50%
rej_eta_c 2s 3s -33%
shake128_absorb 2s 4s -50%
shake128_squeeze 2s 3s -33%
shake128x4_absorb_once 2s 2s +0%
shake256 2s 4s -50%
sign_verify_pre_hash_shake256 2s 4s -50%
mld_value_barrier_u32 1s 5s -80%
polyz_unpack 1s 5s -80%

@mkannwischer mkannwischer force-pushed the lowram-stream-a branch 3 times, most recently from 9fc4e08 to 3acbf60 Compare April 5, 2026 10:59
Replace the row-level matrix buffer (mld_polyvecl) with a single-poly
buffer in REDUCE_RAM mode. In the lazy path, matrix elements A[k][l]
are sampled on demand one at a time, and the matrix-vector product
accumulates element-by-element instead of row-by-row.

Restructure polymat into eager/lazy variants following the same pattern
as s1hat/s2hat/t0hat:
- mld_polymat_eager: stores full K x L matrix
- mld_polymat_lazy: stores rho + single poly_buffer + tmp
- mld_polyvec_matrix_expand_eager/_lazy: separate implementations
- mld_polyvec_matrix_pointwise_montgomery_eager/_lazy: separate
  implementations with CBMC contracts only on the eager variants

Move all polymat-related code from polyvec.h/polyvec.c into
polyvec_lazy.h/polyvec_lazy.c.

Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
@mkannwischer mkannwischer marked this pull request as ready for review April 6, 2026 00:48
@mkannwischer mkannwischer requested a review from a team as a code owner April 6, 2026 00:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants