Skip to content

new SME kernels are breaking clang on Windows #5585

@vtjnash

Description

@vtjnash

Trying to build the new SME kernels fails with this error in llvm-21:

error: Incorrect size for ssyrk_direct_sme1_2VLx2VL prologue: 108 bytes of instructions in range, but .seh directives corresponding to 96 bytes

This seems to be fixed in llvm-22 (The newer version moves the rdsvl after .seh_endprologue, avoiding the SEH metadata issue.)

But it runs into this error in llvm-22:

  LLVM ERROR: SME hazard padding is not supported on Windows

Further analysis of the PR suggests the combination of alloca with SME is not implemented on Windows:

LLVM Version Behavior
21.1.7 (old) Generates invalid .seh_save_reg x9/x0 directives
22.0.0git (new) Explicitly errors: "SME hazard padding is not supported on Windows"

The newer LLVM "fixed" the assembler issues by rejecting the combination entirely rather than fixing the SEH generation. The check is in AArch64PrologueEpilogue.cpp: when hasSVECalleeSavesAboveFrameRecord and hasStackHazardSlotIndex are both true.

fatal usage error added in https://github.com/llvm/llvm-project/pull/138609/files#diff-abc2806d934cd854992bdf139a4ab9405859556f01b2fc4ab17aa6b8a50e72c4R1902-R1918

https://github.com/llvm/llvm-project/pull/156467/files#diff-3da805da491ea568bb4b1b76dcbf5066b69923612f30e6cf97737f7eb30194cfR99

$ /home/jameson/llvm-mingw/build/bin/aarch64-w64-mingw32-clang -g -DSMALL_MATRIX_OPT -DGEMM_GEMV_FORWARD -DSBGEMM_GEMV_FORWARD -DBGEMM_GEMV_FORWARD -DMS_ABI -DMAX_STACK_ALLOC=2048 -Wall -DF_INTERFACE_GFORT -DDYNAMIC_ARCH -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=512 -DMAX_PARALLEL_NUMBER=1 -DBUILD_BFLOAT16 -DBUILD_SINGLE=1 -DBUILD_DOUBLE=1 -DBUILD_COMPLEX=1 -DBUILD_COMPLEX16=1 -DVERSION=\"0.3.30.dev\" -march=armv9-a+sve2+sme -UASMNAME -UASMFNAME -UNAME -UCNAME -UCHAR_NAME -UCHAR_CNAME -DASMNAME=ssyrk_direct_alpha_betaLT_ARMV9SME -DASMFNAME=ssyrk_direct_alpha_betaLT_ARMV9SME_ -DNAME=ssyrk_direct_alpha_betaLT_ARMV9SME_ -DCNAME=ssyrk_direct_alpha_betaLT_ARMV9SME -DCHAR_NAME=\"ssyrk_direct_alpha_betaLT_ARMV9SME_\" -DCHAR_CNAME=\"ssyrk_direct_alpha_betaLT_ARMV9SME\" -DNO_AFFINITY -DTS=_ARMV9SME -I.. -DBUILD_KERNEL -DTABLE_NAME=gotoblas_ARMV9SME -DHAVE_SME -march=armv9-a+sve2+sme -DBUILD_KERNEL -DTABLE_NAME=gotoblas_ARMV9SME -UDOUBLE  -UCOMPLEX -UDOUBLE -UCOMPLEX -UUPPER -DTRANSA ../kernel/arm64/ssyrk_direct_alpha_beta_arm64_sme1.c -Wclang -o /dev/null -c
warning: unknown warning option '-Wclang'; did you mean '-Wvla'? [-Wunknown-warning-option]
In file included from ../kernel/arm64/ssyrk_direct_alpha_beta_arm64_sme1.c:6:
In file included from ../common.h:61:
../config_kernel.h:22:9: warning: 'HAVE_SME' macro redefined [-Wmacro-redefined]
   22 | #define HAVE_SME
      |         ^
<command line>:35:9: note: previous definition is here
   35 | #define HAVE_SME 1
      |         ^
In file included from ../kernel/arm64/ssyrk_direct_alpha_beta_arm64_sme1.c:6:
../common.h:378:9: warning: 'YIELDING' macro redefined [-Wmacro-redefined]
  378 | #define YIELDING        __asm__ __volatile__ ("nop;nop;nop;nop;nop;nop;nop;nop; \n");
      |         ^
../common.h:373:9: note: previous definition is here
  373 | #define YIELDING        SwitchToThread()
      |         ^
../kernel/arm64/ssyrk_direct_alpha_beta_arm64_sme1.c:191:35: warning: passing 'const float *' to parameter of type 'float *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
  191 |       kernel_2x2(&a_ptr[row_idx], &b_ptr[col_idx],
      |                                   ^~~~~~~~~~~~~~~
../kernel/arm64/ssyrk_direct_alpha_beta_arm64_sme1.c:35:35: note: passing argument to parameter 'B' here
   35 | kernel_2x2(const float *A, float *B, float *C, size_t shared_dim,
      |                                   ^
../kernel/arm64/ssyrk_direct_alpha_beta_arm64_sme1.c:21:17: warning: unused function 'sve_cntw' [-Wunused-function]
   21 | static uint64_t sve_cntw() {
      |                 ^~~~~~~~
error: Incorrect size for ssyrk_direct_sme1_2VLx2VL prologue: 108 bytes of instructions in range, but .seh directives corresponding to 96 bytes

5 warnings and 1 error generated.

It appears the code may be considered to have alloca because of the use of vscale functions

This is clearly an upstream bug, but it might be necessary to disable those new AArch64 kernels on Windows with Clang?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions