Skip to content

wolfCrypt on TI C2000 C28x (LAUNCHXL-F28P55X)#10724

Open
dgarske wants to merge 6 commits into
wolfSSL:masterfrom
dgarske:ti_c25
Open

wolfCrypt on TI C2000 C28x (LAUNCHXL-F28P55X)#10724
dgarske wants to merge 6 commits into
wolfSSL:masterfrom
dgarske:ti_c25

Conversation

@dgarske

@dgarske dgarske commented Jun 18, 2026

Copy link
Copy Markdown
Member

wolfCrypt: CHAR_BIT != 8 (16-bit byte) support for TI C2000 C28x

Companion example PR (wolfssl-examples): wolfSSL/wolfssl-examples#576

Summary

Adds WOLFSSL_WIDE_BYTE support so wolfCrypt builds and runs correctly on word-addressed targets where CHAR_BIT != 8 - specifically the TI C2000 C28x DSP family, where a C char/unsigned char (wolfSSL's byte) is 16 bits and is the smallest addressable unit. All changes are gated and are a no-op on normal 8-bit-byte targets.

The work was validated end-to-end on a TI LAUNCHXL-F28P55X (TMS320F28P550SJ, C28x, 150 MHz) using the bare-metal example added in the companion wolfssl-examples PR. Every algorithm below passes known-answer tests on hardware, and the standard host wolfcrypt_test continues to pass (no 8-bit regression).

Validated algorithms (on C28x hardware)

  • SHA-1; SHA-224/256, SHA-384/512, SHA-512/224, SHA-512/256
  • SHA3-224/256/384/512, SHAKE128/256 (with a 32-bit split Keccak permutation for WC_16BIT_CPU that emits native instructions instead of compiler 64-bit helper calls - ~53% faster SHAKE/SHA3 on this target)
  • ML-DSA-44/65/87 (Dilithium) verify and full keygen/sign/verify; ML-KEM-512/768/1024 (FIPS 203)
  • AES-128/192/256 CBC/CTR/CFB/OFB/GCM/XTS; AES-CMAC, AES-CCM, AES-GMAC, AES-SIV, AES-EAX
  • HMAC + HKDF; ChaCha20-Poly1305; Poly1305
  • X25519 + Ed25519; X448 + Ed448 (CURVE448_SMALL/ED448_SMALL byte backend); ECDSA + ECDH (SECP256R1, SP math)
  • RSA-2048 PKCS#1 v1.5 sign and verify; DH FFDHE-2048 (SP math)

What the CHAR_BIT != 8 fixes address

All behind WOLFSSL_WIDE_BYTE (auto-enabled for CHAR_BIT != 8 and known 16-bit-char TI toolchain macros), each a no-op on 8-bit targets:

  • Byte/word aliasing. Serializing a word32/word64 by casting to byte* moves addressable cells, not octets. Replaced with explicit shift-based octet I/O via shared helpers in misc.c (WordsFromBytesBE32/BytesFromWordsBE32, BytesFromWordsLE32, the 64-bit variants, octet-correct readUnalignedWord32/readUnalignedWord64). sp_int.c sp_read_unsigned_bin uses an endian-/CHAR_BIT-agnostic shift loop for its leftover bytes (a 3-byte RSA exponent previously loaded as 1 instead of 65537).
  • (byte)x not truncating to an octet (it keeps 16 bits). Masked with WC_OCTET(x) = (byte)((x) & 0xFF). Used across the ML-KEM/ML-DSA encoders, the SP *_to_bin serializers, AES GETBYTE, base64, the DRBG, and the Curve448/Ed448 CURVE448_SMALL byte-array field backend (whose carry-store (word8) casts must mask before the next limb re-reads them).
  • Integer promotion. 1U << n is 16-bit on C28x (use 1UL); a bit width written sizeof(t) * 8 is wrong when CHAR_BIT != 8 (use CHAR_BIT * sizeof(t)); byte operands promote to a 16-bit int.
  • sizeof counting cells, not octets. e.g. CHACHA_CHUNK_BYTES must be 16 * 4, not 16 * sizeof(word32) (= 32 on C28x, which halves the ChaCha block and desyncs the counter).
  • xorbuf word stride. WOLFSSL_WORD_SIZE_LOG2 vs sizeof(word) mismatch left half of each buffer un-XORed on a 16-bit-cell target; corrected for the WC_16BIT_CPU word16 path.

It also adds WOLFSSL_MLDSA_VERIFY_SMALLEST_MEM (streams the signature z vector per-row), which combined with WOLFSSL_MLDSA_ASSIGN_KEY brings ML-DSA-87 verify to ~10.8 KB RAM with zero heap.

Commit layout

  1. wolfcrypt: add WOLFSSL_WIDE_BYTE support for CHAR_BIT != 8 targets (TI C2000 C28x) - core types, misc octet helpers, base64, DRBG
  2. sha: octet-correct SHA-1/SHA-2 byte I/O and 32-bit split Keccak permutation for CHAR_BIT != 8
  3. aes/chacha: octet-correct block, key, keystream and XTS-tweak I/O for CHAR_BIT != 8
  4. mldsa/mlkem: correct ML-DSA and ML-KEM on CHAR_BIT != 8; add WOLFSSL_MLDSA_VERIFY_SMALLEST_MEM
  5. ecc/25519/448/sp: octet-correct X25519/Ed25519/X448/Ed448 and SP byte<->mp conversion for CHAR_BIT != 8
  6. test/benchmark/ci: CHAR_BIT != 8 test vectors, NO_MALLOC benchmark, TI C2000 compile CI and docs

Footprint (measured on F28P55X, cl2000 25.11.0; octets, KB = 1024 octets)

Code size is per-object .text from the linker map (16-bit words x2). Builds are single-parameter: ML-DSA-87 only, ML-KEM-1024 only.

Item Size
ML-DSA-87 signature / public key / private key 4627 / 2592 / 7488 B
ML-KEM-1024 ciphertext / public key / private key 1568 / 1568 / 3168 B
ML-DSA sign+verify code (wc_mldsa.obj) ~22.4 KB
ML-KEM make+enc+dec code (wc_mlkem + wc_mlkem_poly) ~2.9 + 12.7 KB
SHA-3/SHAKE code (WOLFSSL_SHA3_SMALL / split-64 fast path) ~5.3 / 15.6 KB

RAM per operation, measured on hardware (heap high-water via wolfSSL_SetAllocators, stack via paint/scan):

Operation RAM
ML-DSA-87 verify ~10.8 KB with WOLFSSL_MLDSA_ASSIGN_KEY (zero heap); ~15.9 KB copying the public key into the key struct
ML-DSA-87 sign / keygen ~31.6 / 28.2 KB peak heap (small-mem signer)
ML-KEM-1024 make / encapsulate / decapsulate ~2 / 6 / 9 KB transient heap

Testing

  • Host: ./configure --enable-dilithium --enable-experimental --enable-shake256 --enable-shake128 && make && ./wolfcrypt/test/testwolfcrypt - passes (RSA, ECC, ML-DSA, ML-KEM, SHA-2/3, all crypto). No behavior change on 8-bit-byte targets.
  • Hardware: on the LAUNCHXL-F28P55X, KATs for every algorithm listed above pass, and wolfcrypt_test crypto passes.
  • CI: IDE/C2000/compile.sh runs cl2000 --compile_only over the CHAR_BIT != 8 wolfCrypt subset (SHA-1/2/3, AES + modes, ChaCha/Poly1305, X25519/Ed25519, X448/Ed448, ML-DSA verify, SP-ECC); .github/workflows/ti-c2000-compile.yml runs it on PRs (fetches/caches the TI C2000 code generation tools, with optional SHA-256 pinning of the installer).

Benchmarks (F28P55X @ 150 MHz)

Primitive Throughput
SHA-256 ~284 KiB/s
SHA-384 / SHA-512 ~166 KiB/s
SHA3-224 / 256 / 384 / 512 ~279 / 264 / 206 / 146 KiB/s
SHAKE128 / SHAKE256 ~319 / 264 KiB/s
RNG (Hash-DRBG) ~122 KiB/s

ML-DSA-87: verify ~225 ms/op (~10.8 KB RAM, zero heap); keygen and signing also run (SIGN=1).

Notes

  • wolfcrypt/src/sp_c32.c is generated. The & 0xFF octet masks added to its sp_*_to_bin_* serializers are also applied in the SP generator templates (kept in sync so a regeneration preserves them).
  • Documentation: IDE/C2000/README.md describes the support, the build options, and the benchmark/footprint results; the full bare-metal example (with KATs, benchmark, linker scripts, and per-algorithm make toggles) is in wolfssl-examples at embedded/ti-c2000-f28p55x/.

@dgarske dgarske self-assigned this Jun 18, 2026
Copilot AI review requested due to automatic review settings June 18, 2026 00:15

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds and CI-guards a bare-metal wolfCrypt port for TI C2000 C28x targets where CHAR_BIT == 16, introducing gated fixes so hashing, DRBG, ML-DSA verify, and SP-math ECC work correctly when a C “byte” is wider than 8 bits.

Changes:

  • Introduces WOLFSSL_NO_OCTET_BYTE detection and uses octet-wise load/store paths to avoid invalid byte/word aliasing on CHAR_BIT != 8 targets (SHA-256/512 family, SHA-3/SHAKE, Base64 CT decode, DRBG helpers, rotate helpers).
  • Adds “smallest memory” ML-DSA verify mode that streams z per polynomial to reduce pinned RAM in wc_MlDsaKey.
  • Adds TI C2000 compile-only guard scripts plus a GitHub Actions workflow that downloads the TI CGT and compiles a scoped subset.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
wolfssl/wolfcrypt/wc_port.h Makes atomic arg type selection robust for 16-bit int by also checking UINT_MAX.
wolfssl/wolfcrypt/wc_mldsa.h Adds WOLFSSL_MLDSA_VERIFY_SMALLEST_MEM struct layout variant for reduced verify RAM.
wolfssl/wolfcrypt/types.h Adds WOLFSSL_NO_OCTET_BYTE auto-detection; adjusts WC_16BIT_CPU 64-bit availability behavior.
wolfssl/wolfcrypt/sp_int.h Adds support for unsigned char being 16-bit (no native 8-bit type).
wolfssl/wolfcrypt/settings.h Requires explicit opt-in for SP math on 16-bit-int CPUs via WOLFSSL_SP_ALLOW_16BIT_CPU.
wolfssl/wolfcrypt/dilithium.h Adds smallest-mem verify gating and defaults slow Montgomery reduction macros on WC_16BIT_CPU.
wolfcrypt/test/test.c Switches large-digest constants from C strings to byte[] to avoid CHAR_BIT!=8 pitfalls.
wolfcrypt/src/wc_port.c Fixes init-state static assert to use CHAR_BIT instead of hardcoded 8.
wolfcrypt/src/wc_mldsa.c Adds octet-masking for packed bytes and fixes integer-promotion/sign issues on 16-bit int; adds streaming z verify path.
wolfcrypt/src/sha512.c Adds octet-wise word load/store and corrects length carry/length placement for CHAR_BIT!=8.
wolfcrypt/src/sha3.c Forces bytewise Keccak absorb/squeeze for WOLFSSL_NO_OCTET_BYTE and adds squeeze helper.
wolfcrypt/src/sha256.c Adds octet-wise word load/store and corrects length carry/length placement for CHAR_BIT!=8.
wolfcrypt/src/random.c Fixes DRBG serialization/addition helpers for non-8-bit “byte” targets.
wolfcrypt/src/misc.c Fixes rotate helpers to use CHAR_BIT-based bit width when needed.
wolfcrypt/src/coding.c Ensures Base64 CT decode returns 0xFF for invalid chars even when byte is wider than 8 bits.
wolfcrypt/benchmark/benchmark.c Adds static buffers for WOLFSSL_NO_MALLOC benchmarking and adjusts frees/allocations accordingly.
scripts/ti-c2000/user_settings.h Adds minimal CI-only config for cl2000 compile-guard.
scripts/ti-c2000/compile.sh Adds compile-only script to build a scoped source set with TI cl2000.
.github/workflows/ti-c2000-compile.yml Adds CI workflow to download/cache TI CGT and run the compile-only guard.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread wolfssl/wolfcrypt/types.h Outdated
Comment thread wolfcrypt/benchmark/benchmark.c
@dgarske dgarske force-pushed the ti_c25 branch 4 times, most recently from 39c343a to afaf660 Compare June 24, 2026 22:28
@dgarske dgarske requested a review from Copilot June 24, 2026 22:30

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 2 comments.

Comment thread wolfssl/wolfcrypt/types.h
Comment thread wolfcrypt/src/sha3.c
@dgarske dgarske force-pushed the ti_c25 branch 3 times, most recently from 853641c to 0f8a445 Compare June 26, 2026 04:40
dgarske added 6 commits June 30, 2026 08:55
…I C2000 C28x) - core types, misc octet helpers, base64, DRBG
…<->mp conversion for CHAR_BIT != 8

Curve448/Ed448 build with the CURVE448_SMALL / ED448_SMALL byte-array
field backend (the default fe_448 backend needs __uint128_t for the
sc448 mod-order arithmetic, which the C28x toolchain lacks).  The SMALL
fe448 carry-stores wrote each limb through a (word8) cast that does not
truncate to an octet when a C byte is wider than 8 bits, so the next
carry re-read saw a corrupted limb; mask each carry-store with WC_OCTET
(a no-op on the usual 8-bit-byte targets).
@dgarske dgarske marked this pull request as ready for review June 30, 2026 16:06
@github-actions

Copy link
Copy Markdown

retest this please

@dgarske dgarske assigned wolfSSL-Bot and unassigned dgarske Jun 30, 2026
@dgarske dgarske requested a review from SparkiDev June 30, 2026 16:07
@github-actions

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants