Skip to content

optimize zgemm, ic/zamin and sdot lsx kernel for 2k3000 cpu#5822

Open
ErnstPeng wants to merge 3 commits into
OpenMathLib:developfrom
ErnstPeng:la-dev
Open

optimize zgemm, ic/zamin and sdot lsx kernel for 2k3000 cpu#5822
ErnstPeng wants to merge 3 commits into
OpenMathLib:developfrom
ErnstPeng:la-dev

Conversation

@ErnstPeng

@ErnstPeng ErnstPeng commented May 29, 2026

Copy link
Copy Markdown
Contributor

On the 2k3000 CPU, the performance of zgemm lsx is not good. The kernel was optimized and the PQR parameters were set according to the hardware characteristics.
performance of ./zgemm.goto 1000 2000 100, LOOPS=20 THREADS=8,
before:

M=1000,N=1000,	K=1000:	40705.41 MFlops	1.965341 sec
M=1100,N=1100,	K=1100:	42272.97 MFlops	2.518867 sec
M=1200,N=1200,	K=1200:	41692.56 MFlops	3.3157	 sec
M=1300,N=1300,	K=1300:	41171.31 MFlops	4.268993 sec
M=1400,N=1400,	K=1400:	42131.12 MFlops	5.210401 sec
M=1500,N=1500,	K=1500:	41990.57 MFlops	6.430015 sec
M=1600,N=1600,	K=1600:	42440.13 MFlops	7.720994 sec
M=1700,N=1700,	K=1700:	42314.03 MFlops	9.288645 sec
M=1800,N=1800,	K=1800:	42347.5	 MFlops	11.017415 sec
M=1900,N=1900,	K=1900:	42294.43 MFlops	12.973811 sec
M=2000,N=2000,	K=2000:	42755.69 MFlops	14.968767 sec

after:

M=1000,N=1000,	K=1000:	48969.39 MFlops	3.267347 sec
M=1100,N=1100,	K=1100:	49745.22 MFlops	4.281014 sec
M=1200,N=1200,	K=1200:	50175.55 MFlops	5.510254 sec
M=1300,N=1300,	K=1300:	50176.97 MFlops	7.005605 sec
M=1400,N=1400,	K=1400:	50822.99 MFlops	8.63861	 sec
M=1500,N=1500,	K=1500:	50509.5	 MFlops	10.691058 sec
M=1600,N=1600,	K=1600:	51319.72 MFlops	12.77014 sec
M=1700,N=1700,	K=1700:	51523.32 MFlops	15.256781 sec
M=1800,N=1800,	K=1800:	51183.36 MFlops	18.230923 sec
M=1900,N=1900,	K=1900:	51136.7	 MFlops	21.460909 sec
M=2000,N=2000,	K=2000:	51041.54 MFlops	25.077615 sec

@ErnstPeng

Copy link
Copy Markdown
Contributor Author

@XiWeiGu

@martin-frbg martin-frbg added this to the 0.3.34 milestone Jun 5, 2026
@ErnstPeng ErnstPeng changed the title optimize zgemm lsx kernel for 2k3000 cpu optimize zgemm, ic/zamin and sdot lsx kernel for 2k3000 cpu Jun 8, 2026
@ErnstPeng

Copy link
Copy Markdown
Contributor Author

performance of ./izamin.goto 1000 2000 100, LOOPS=10 THREADS=8,
before:

SIZE       Flops
   1000 :     5351.17 MFlops   0.000003 sec
   1100 :     5141.69 MFlops   0.000003 sec
   1200 :     5385.69 MFlops   0.000004 sec
   1300 :     5394.19 MFlops   0.000004 sec
   1400 :     5365.27 MFlops   0.000004 sec
   1500 :     5416.38 MFlops   0.000004 sec
   1600 :     5415.70 MFlops   0.000005 sec
   1700 :     5414.01 MFlops   0.000005 sec
   1800 :     5418.63 MFlops   0.000005 sec
   1900 :     5418.89 MFlops   0.000006 sec
   2000 :     5420.05 MFlops   0.000006 sec

after:

SIZE       Flops
   1000 :     7483.63 MFlops   0.000002 sec
   1100 :     7482.99 MFlops   0.000002 sec
   1200 :     7523.51 MFlops   0.000003 sec
   1300 :     7511.74 MFlops   0.000003 sec
   1400 :     7544.63 MFlops   0.000003 sec
   1500 :     7528.23 MFlops   0.000003 sec
   1600 :     7531.63 MFlops   0.000003 sec
   1700 :     7561.86 MFlops   0.000004 sec
   1800 :     7563.03 MFlops   0.000004 sec
   1900 :     7573.49 MFlops   0.000004 sec
   2000 :     7317.63 MFlops   0.000004 sec

performance of ./sdot.goto 1000 2000 100, LOOPS=100 THREADS=8,
before:

   SIZE       Flops
   1000 :     1865.67 MFlops   0.000001 sec
   1100 :     1869.32 MFlops   0.000001 sec
   1200 :     1871.93 MFlops   0.000001 sec
   1300 :     1867.68 MFlops   0.000001 sec
   1400 :     1804.94 MFlops   0.000002 sec
   1500 :     1787.95 MFlops   0.000002 sec
   1600 :     1607.15 MFlops   0.000002 sec
   1700 :     1359.4  MFlops   0.000003 sec
   1800 :     1483.25 MFlops   0.000002 sec
   1900 :     8467.02 MFlops   0.000003 sec
   2000 :     1434.72 MFlops   0.000003 sec

after:

   SIZE       Flops
   1000 :     7785.13 MFlops   0.000000 sec
   1100 :     7885.30 MFlops   0.000000 sec
   1200 :     8486.56 MFlops   0.000000 sec
   1300 :     8389.80 MFlops   0.000000 sec
   1400 :     7788.60 MFlops   0.000000 sec
   1500 :     7896.81 MFlops   0.000000 sec
   1600 :     8458.90 MFlops   0.000000 sec
   1700 :     8572.87 MFlops   0.000000 sec
   1800 :     8452.69 MFlops   0.000000 sec
   1900 :     8467.02 MFlops   0.000000 sec
   2000 :     8886.91 MFlops   0.000000 sec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants