Refactor arithmetic ops to use std::variant + std::visit by jysh1214 · Pull Request #723 · Cytnx-dev/Cytnx

jysh1214 · 2025-12-01T14:34:43Z

Remove Arithmetic_internal.hpp/cpp and Ari_ii from linalg_internal_interface

jysh1214 · 2025-12-01T14:42:15Z

Not sure what happened in github action:

Collecting package metadata (repodata.json): | ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��/ ��- ��\ ��| ��failed
CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://conda.anaconda.org/conda-forge/noarch/repodata.json>
Elapsed: -

An HTTP error occurred when trying to retrieve this URL.
HTTP errors are often intermittent, and a simple retry will get you on your way.
'https//conda.anaconda.org/conda-forge/noarch'

https://github.com/Cytnx-dev/Cytnx/actions/runs/19826283869/job/56800387981?pr=723

codecov · 2025-12-01T14:56:11Z

Codecov Report

❌ Patch coverage is 9.35094% with 824 lines in your changes missing coverage. Please review.
✅ Project coverage is 35.85%. Comparing base (e87c3a2) to head (9548e52).
⚠️ Report is 3 commits behind head on master.

Files with missing lines	Patch %	Lines
src/linalg/Mod.cpp	3.58%	188 Missing ⚠️
src/linalg/Div.cpp	4.10%	185 Missing and 2 partials ⚠️
src/linalg/Sub.cpp	7.69%	175 Missing and 5 partials ⚠️
src/linalg/Cpr.cpp	0.00%	108 Missing ⚠️
src/linalg/Add.cpp	24.07%	77 Missing and 5 partials ⚠️
src/linalg/Mul.cpp	26.85%	74 Missing and 5 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #723      +/-   ##
==========================================
+ Coverage   32.35%   35.85%   +3.50%     
==========================================
  Files         215      214       -1     
  Lines       36363    32020    -4343     
  Branches    14597    13303    -1294     
==========================================
- Hits        11764    11481     -283     
+ Misses      22659    18636    -4023     
+ Partials     1940     1903      -37

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

ianmccul · 2025-12-01T19:22:19Z

This is looking good - you've removed about 20,000 lines of code in the last couple of weeks? 😁

There is still a lot of repitition with the Tensor Add<T>(const T &lc, const Tensor &Rt) templates in src/linalg/Add.cpp, is it possible to replace that with something like:

    template <typename T>
    Tensor Add(const T &lc, const Tensor &Rt) {
      auto type = cy_typeid_v<T>;
      Storage Cnst(1, type);
      Cnst.at<T>(0) = lc;
      Tensor out;
      out._impl = Rt._impl->_clone_meta_only();
      out._impl->storage() =
        Storage(Rt._impl->storage().size(), std::min(type, Rt.dtype()), Rt.device());

      if (Rt.device() == Device.cpu) {
        std::visit(
          [&](auto *rptr) {
            using TR = std::remove_pointer_t<decltype(rptr)>;
            cytnx::linalg_internal::AddInternalImpl<type, TR>(
              out._impl->storage()._impl, Cnst._impl, Rt._impl->storage()._impl,
              Rt._impl->storage()._impl->size(), {}, {}, {});
          },
          Rt.ptr());
      } else {
  #ifdef UNI_GPU
        checkCudaErrors(cudaSetDevice(Rt.device()));
        linalg_internal::lii.cuAri_iitType][Rt.dtype()](
          out._impl->storage()._impl, Cnst._impl, Rt._impl->storage()._impl,
          Rt._impl->storage()._impl->size(), {}, {}, {}, 0);
  #else
        cytnx_error_msg(true, "[Add] fatal error, the tensor is on GPU without CUDA support.%s",
                        "\n");
  #endif
      }

Now the type of out looks very suspicious. The old code (equivalent to the std::min() I used above) is not identical to the result of the type_promote() function, which is most likely a bug¹. Although they are both broken (eg complex<float> + double would end up as complex<float> under either mechanism), it is absolutely essential that the type selected here matches the output type used in the linalg_internal function, since this is just assumed and there is no checking.

Since AddInternalImpl defines using TOut = cytnx::Scalar_init_interface::type_promote_t<TLin, TRin> the same function must be used here as well, otherwise there will be a mismatch. This would be detectable with mixed types involving unsigned, eg type_promote(uint32, int16) is int32, but min(uint32, int16) is uint32. So addition of Tensor of uint32 and int16 will incorrectly give a result tensor that is an unsigned type. (Note that int32 is also the wrong type here, since converting uint32 to int32 might overflow. It ought to be int64, but that is not fixable until all of the implicitly assumed return types are fixed...)

But cytnx::linalg_internal::AddInternalImpl is now a template anyway, so why convert to a type-erased interface, only to convert back again? Better would be to construct the output tensor and pass that in to a typed interface:

cytnx::linalg_internal::AddInternalImpl<lhs_type, rhs_type, out_type>(lhs, rhs, out);

This involves either a triple dispatch of std::variant (OK, but lots of combinations!), or do a double dispatch on lhs and rhs, and a runtime assert that out.dtype() matches type_promote(lhs, rhs). Or alternatively, move the implementation of AddInternalImpl into Add, since both functions are now fairly simple. (That is a better option I think, because then the output type is localized to the Add function and does not need to be kept in syncronization with anything else.)

Now, for the GPU version, I just noticed:

Cytnx/src/backend/linalg_internal_gpu/cuAdd_internal.cu

Lines 5291 to 5300 in c5dd9c4

    
           void cuAdd_internal_u32ti16(boost::intrusive_ptr<Storage_base> &out, 
        
                                       boost::intrusive_ptr<Storage_base> &Lin, 
        
                                       boost::intrusive_ptr<Storage_base> &Rin, 
        
                                       const unsigned long long &len, 
        
                                       const std::vector<cytnx_uint64> &shape, 
        
                                       const std::vector<cytnx_uint64> &invmapper_L, 
        
                                       const std::vector<cytnx_uint64> &invmapper_R) { 
        
             cytnx_uint32 *_out = (cytnx_uint32 *)out->data(); 
        
             cytnx_uint32 *_Lin = (cytnx_uint32 *)Lin->data(); 
        
             cytnx_int16 *_Rin = (cytnx_int16 *)Rin->data();

ARGH: the output type is hard-coded to be uint32! This is different to what is given by type_promote(uint32, int16) and is just broken². So for the time being, I'd try

template <typename T>
Tensor Add(const T &lc, const Tensor &Rt) {
  // CPU path: use the templated AddInternalImpl directly, with the output type
  // determined by type_promote().
  if (Rt.device() == Device.cpu) {
    const auto lhs_type = cy_typeid_v<T>;
    const auto rhs_type = Rt.dtype();
    const auto out_type = cytnx::Scalar_init_interface::type_promote(lhs_type, rhs_type);

    // Construct constant tensor and output tensor with correct promoted dtype
    Tensor Cnst({1}, lhs_type);  // one element, rank 1 tensor
    Cnst.at<T>(0) = lc;
    Tensor out;
    out._impl = Rt._impl->_clone_meta_only();
    out._impl->storage() = Storage(Rt._impl->storage().size(), out_type, Rt.device());

    // Dispatch to fully-typed templated kernel
    cytnx::linalg_internal::AddInternalImpl(Cnst, Rt, out);

    return out;
  } else {
#ifdef UNI_GPU
    // GPU path: existing cuAdd kernels assume the output dtype is std::min(lhs_type, rhs_type)) 
    // so we must match this until the kernels
    // are regenerated with correct type promotion semantics.
    const auto lhs_type = cy_typeid_v<T>;
    const auto rhs_type = Rt.dtype();
    const auto out_type = std::min(lhs_type, rhs_type);

    Storage Cnst(1, lhs_type);
    Cnst.at<T>(0) = lc;
    Tensor out;
    out._impl = Rt._impl->_clone_meta_only();
    out._impl->storage() = Storage(Rt._impl->storage().size(), out_type, Rt.device());

    checkCudaErrors(cudaSetDevice(Rt.device()));
    linalg_internal::lii.cuAri_ii[lhs_type][rhs_type](
    out._impl->storage()._impl, Cnst._impl, Rt._impl->storage()._impl,
    Rt._impl->storage()._impl->size(), {}, {}, {}, 0);

    return out;
#else
    cytnx_error_msg(true, "[Add] fatal error: GPU tensor without CUDA support.%s", "\n");
#endif
  }
}

There is nothing we can do about the GPU version; in order to call the existing linalg_internal::lii.cuAri_ii we must use the incorrect type given by min(lhs, rhs). Replacing the GPU linalg_internal::lii.cuAri_ii functions with type-safe templates should be done soon! And in the process, relying on Unified Memory to pass a single number into a CUDA kernel isn't great (paging a 4K block from the CPU to GPU) -- it would be much better to write a separate kernel to add a constant to a tensor, and pass the constant as a parameter to the kernel!

It is also obvious that there are no unit tests covering tricky cases involving type promotions and mixed signed/unsigned etc. Someone should add some, with some annotations that some corner cases are currently expected to fail. For example, addition of a Tensor<uint32> and Tensor<int16> will give Tensor<int32> if running on a CPU, but Tensor<uint32> if running on a GPU. But the actual result should be Tensor<int64>. Also Tensor<complex<float>> + double should give Tensor<complex<double>> but currently gives Tensor<complex<float>>.

For some archaology, my guess is that originally the code used pytorch min(lhs, rhs) for type promotions, but pytorch historically never had unsigned types (it does now I think, incompletely). When unsigned int was added, or it was realized that min(lhs, rhs) doesn't work properly with unsigned, the type_promote mechanism was added, but some of the kernels involving unsigned were never updated... ↩
Well, C/C++ is broken in the same way... ↩

IvanaGyro · 2025-12-05T05:47:01Z

Not sure what happened in github action:

For HTTP error, it is fine to just rerun the failed test.

jysh1214 · 2025-12-06T18:13:23Z

It is also obvious that there are no unit tests covering tricky cases involving type promotions and mixed signed/unsigned etc.

Yeah, you’re right, it’s a critical problem.
I’ve sent PR#730 for the mixed-type unit tests. Please help review it. Thx.

I will solve others issues later.

Refactor arithmetic ops to use std::variant + std::visit

9548e52

Remove Arithmetic_internal.hpp/cpp and Ari_ii from linalg_internal_interface

jysh1214 requested a review from IvanaGyro December 1, 2025 14:35

IvanaGyro mentioned this pull request Dec 10, 2025

Add mixed-type GPU arith ops tests and fix Cpr/Mod dispatcher #730

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor arithmetic ops to use std::variant + std::visit#723

Refactor arithmetic ops to use std::variant + std::visit#723
jysh1214 wants to merge 1 commit intomasterfrom
dev/alexchiang/refactor-arithop

jysh1214 commented Dec 1, 2025

Uh oh!

jysh1214 commented Dec 1, 2025

Uh oh!

codecov bot commented Dec 1, 2025

Uh oh!

ianmccul commented Dec 1, 2025 •

edited

Loading

Uh oh!

IvanaGyro commented Dec 5, 2025

Uh oh!

jysh1214 commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jysh1214 commented Dec 1, 2025

Uh oh!

jysh1214 commented Dec 1, 2025

Uh oh!

codecov bot commented Dec 1, 2025

Codecov Report

Uh oh!

ianmccul commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Footnotes

Uh oh!

IvanaGyro commented Dec 5, 2025

Uh oh!

jysh1214 commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ianmccul commented Dec 1, 2025 •

edited

Loading