LATX, opt: add optional vpaes translation by phorcys · Pull Request #291 · lat-opensource/lat

phorcys · 2026-05-12T08:10:56Z

Using openssl's vpaes to accel latx aes-ni emu.
This is a POC, tables are load in each inst, vr0-vr7 are purge/loaded too.
But it still gain great improment.
Table should be cached cross insts (maybe keep in tb)

GCM need faster pclmulqdq emu to accel.

running x64 openssl speed -evp aes-{128,192,256}-{cbc,ecb,ctr,gcm}

Cipher	baseline kB/s	build64 vpaes kB/s	vpaes/base
AES-128-CBC	24041.01	146794.02	6.106
AES-192-CBC	18303.74	123247.80	6.733
AES-256-CBC	15705.47	104791.40	6.672
AES-128-ECB	24228.46	152503.60	6.294
AES-192-ECB	18767.13	128179.72	6.830
AES-256-ECB	16135.76	108498.49	6.724
AES-128-CTR	21613.64	151891.26	7.028
AES-192-CTR	19892.49	126388.49	6.354
AES-256-CTR	16003.36	109371.56	6.834
AES-128-GCM	16599.14	40809.53	2.459
AES-192-GCM	13885.03	38278.98	2.757
AES-256-GCM	12097.68	36680.10	3.032

luzeng87

PR #291 审查 — vpaes AES-NI 向量表加速

作者: phorcys | POC | 1 提交, 10 文件

概述

用 OpenSSL vpaes 算法替换 AES-NI helper 函数调用。AES 轮函数改为 LA LSX/LASX SIMD 指令 + 查找表实现，避免 helper 调用开销。CBC/ECB/CTR 6-7x 提升，GCM 2.5-3x。

核心问题

1. AOT 重定位破损

[LOAD_VPAES_ENC_TABLES] = (void *)0,
// ... 5 个新表地址全部 = NULL

AOT 模式下 vpaes 崩溃。要么实现重定位（存实际表地址），要么标记 vpaes 与 AOT 不兼容。POC 可接受，合入前须修。

2. 每指令重复加载表

每条 AES 指令 vpaes_load_tables_lsx() 加载 7-9 张表 + vpaes_spill_low_fprs() 保存 FPR 0-7 → 计算 → vpaes_restore_low_fprs() 恢复。AES-128 10 轮 = 10 次完整表加载 + FPR spill。

优化方向：同 TB 内缓存表只加载一次；高 FPR 驻留表省去 spill/restore。

3. FPR 冲突

低 FPR 0-7 与 x86 FPU/MMX 共享，每次 spill 开销大。表加载到高 FPR (16-31) 可彻底省掉 spill。

4. GCM 瓶颈在 pclmulqdq

GCM 2.5x vs CBC 7x — GHASH 的 pclmulqdq 未优化。

次要

tr-avx.c 纯格式化改动（无功能变更）
缺 LSX 功能检查：CPU 不支持 LSX 时打开 vpaes 应回退或报错

总结

项目	评价
性能	CBC/ECB/CTR 6-7x
AOT 兼容	破损
表重复加载	POC 最大优化点
FPR spill	高 FPR 驻留可省
集成方式	开关可控、隔离干净

合入前提: AOT 表地址修正 + 同 TB 内表只加载一次。其余可后续。

phorcys · 2026-05-12T15:19:04Z

@luzeng87

我尝试了跨指令时候保持 f0-f7的表缓存.
但因为latx的 f0-f7现在是固定映射x87/mmx 没有dirty检测
所以实现很脏:

改了 CPUX86State 加 runtime flag
改了tr_save_registers_to_env() / tr_load_registers_from_env()，让它们在vpaes dirty 时候不要写回f0-f7
SMC 恢复路径需要知道并跳过vpaes dirty f0-f7
TU/TB 出入口/helper 检测到 x87 / MMX 指令复原f0-f7
未来任何新增 exit、helper、trace、signal 恢复路径都可能漏掉vpaes

获得的性能提升则不多, 只有ecb 略高, 估计是因为ecb的aes-ni指令比较密集.

Cipher	load per inst kB/s	cache per tu kB/s	change
AES-128-CBC	146794.02	152469.50	+3.9%
AES-192-CBC	123247.80	125911.04	+2.2%
AES-256-CBC	104791.40	107380.74	+2.5%
AES-128-ECB	152503.60	247578.62	+62.3%
AES-192-ECB	128179.72	205864.96	+60.6%
AES-256-ECB	108498.49	175472.64	+61.7%
AES-128-CTR	151891.26	151764.99	-0.1%
AES-192-CTR	126388.49	135479.30	+7.2%
AES-256-CTR	109371.56	121913.34	+11.5%
AES-128-GCM	40809.53	41074.69	+0.6%
AES-192-GCM	38278.98	39636.04	+3.5%
AES-256-GCM	36680.10	37964.54	+3.5%

我感觉:"是不是就让它每次都重新加载好了. 至少比c helper 快5-6倍? 而且不会带来一堆脏东西."

luzeng87 · 2026-05-13T07:04:06Z

ok, 把LOAD_VPAES_*这几个重定位项加上。让AOT可用即可。

phorcys · 2026-05-13T08:29:21Z

已更新.

luzeng87 reviewed May 12, 2026

View reviewed changes

latx: add optional vpaes translation

239e902

phorcys force-pushed the la64_vpaes branch from bbe89d0 to 239e902 Compare May 13, 2026 08:17

phorcys changed the title ~~[POC] latx: add optional vpaes translation~~ LATX, opt: add optional vpaes translation May 13, 2026

phorcys requested a review from luzeng87 May 13, 2026 08:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LATX, opt: add optional vpaes translation#291

LATX, opt: add optional vpaes translation#291
phorcys wants to merge 1 commit into
lat-opensource:masterfrom
phorcys:la64_vpaes

phorcys commented May 12, 2026 •

edited

Loading

Uh oh!

luzeng87 left a comment

Uh oh!

phorcys commented May 12, 2026 •

edited

Loading

Uh oh!

luzeng87 commented May 13, 2026

Uh oh!

phorcys commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

phorcys commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

luzeng87 left a comment

Choose a reason for hiding this comment

PR #291 审查 — vpaes AES-NI 向量表加速

概述

核心问题

次要

总结

Uh oh!

phorcys commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

luzeng87 commented May 13, 2026

Uh oh!

phorcys commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

phorcys commented May 12, 2026 •

edited

Loading

phorcys commented May 12, 2026 •

edited

Loading