I think OMP usage shoud be optional. Apple really does not like omp and compiling with OMP on apple is somewhat involved.
Did you try to benchmark your actual OMP code? I briefly looked at it and it looks suspicious to me. In fact I think it most likely hurt than help.
E.g. with default OMP setting on Apple MacOS (M4 Pro) (14 threads):
octave:9> A = mpfr_t (randn (30), 113);
octave:10> tic; B = A + 1e-20; toc
warning: mpfr_t.m (plus at line 381): Inexact operation.
Suppress MPFR_T inexactness warning messages with:
warning ('off', 'mpfr_t:inexactOperation')
Elapsed time is 0.287056 seconds.
With % OMP_NUM_THREADS=1 octave
octave:2> A = mpfr_t (randn (30), 113);
octave:3> tic; B = A + 1e-20; toc
warning: mpfr_t.m (plus at line 381): Inexact operation.
Suppress MPFR_T inexactness warning messages with:
warning ('off', 'mpfr_t:inexactOperation')
Elapsed time is 0.00146723 seconds.
I think OMP usage shoud be optional. Apple really does not like omp and compiling with OMP on apple is somewhat involved.
Did you try to benchmark your actual OMP code? I briefly looked at it and it looks suspicious to me. In fact I think it most likely hurt than help.
E.g. with default OMP setting on Apple MacOS (M4 Pro) (14 threads):
With
% OMP_NUM_THREADS=1 octave