Refactor arithmetic ops to use std::variant + std::visit#723
Refactor arithmetic ops to use std::variant + std::visit#723
Conversation
Remove Arithmetic_internal.hpp/cpp and Ari_ii from linalg_internal_interface
|
Not sure what happened in github action: https://github.com/Cytnx-dev/Cytnx/actions/runs/19826283869/job/56800387981?pr=723 |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #723 +/- ##
==========================================
+ Coverage 32.35% 35.85% +3.50%
==========================================
Files 215 214 -1
Lines 36363 32020 -4343
Branches 14597 13303 -1294
==========================================
- Hits 11764 11481 -283
+ Misses 22659 18636 -4023
+ Partials 1940 1903 -37 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
This is looking good - you've removed about 20,000 lines of code in the last couple of weeks? 😁 There is still a lot of repitition with the template <typename T>
Tensor Add(const T &lc, const Tensor &Rt) {
auto type = cy_typeid_v<T>;
Storage Cnst(1, type);
Cnst.at<T>(0) = lc;
Tensor out;
out._impl = Rt._impl->_clone_meta_only();
out._impl->storage() =
Storage(Rt._impl->storage().size(), std::min(type, Rt.dtype()), Rt.device());
if (Rt.device() == Device.cpu) {
std::visit(
[&](auto *rptr) {
using TR = std::remove_pointer_t<decltype(rptr)>;
cytnx::linalg_internal::AddInternalImpl<type, TR>(
out._impl->storage()._impl, Cnst._impl, Rt._impl->storage()._impl,
Rt._impl->storage()._impl->size(), {}, {}, {});
},
Rt.ptr());
} else {
#ifdef UNI_GPU
checkCudaErrors(cudaSetDevice(Rt.device()));
linalg_internal::lii.cuAri_iitType][Rt.dtype()](
out._impl->storage()._impl, Cnst._impl, Rt._impl->storage()._impl,
Rt._impl->storage()._impl->size(), {}, {}, {}, 0);
#else
cytnx_error_msg(true, "[Add] fatal error, the tensor is on GPU without CUDA support.%s",
"\n");
#endif
}Now the type of Since But cytnx::linalg_internal::AddInternalImpl<lhs_type, rhs_type, out_type>(lhs, rhs, out);This involves either a triple dispatch of Now, for the GPU version, I just noticed: Cytnx/src/backend/linalg_internal_gpu/cuAdd_internal.cu Lines 5291 to 5300 in c5dd9c4 ARGH: the output type is hard-coded to be template <typename T>
Tensor Add(const T &lc, const Tensor &Rt) {
// CPU path: use the templated AddInternalImpl directly, with the output type
// determined by type_promote().
if (Rt.device() == Device.cpu) {
const auto lhs_type = cy_typeid_v<T>;
const auto rhs_type = Rt.dtype();
const auto out_type = cytnx::Scalar_init_interface::type_promote(lhs_type, rhs_type);
// Construct constant tensor and output tensor with correct promoted dtype
Tensor Cnst({1}, lhs_type); // one element, rank 1 tensor
Cnst.at<T>(0) = lc;
Tensor out;
out._impl = Rt._impl->_clone_meta_only();
out._impl->storage() = Storage(Rt._impl->storage().size(), out_type, Rt.device());
// Dispatch to fully-typed templated kernel
cytnx::linalg_internal::AddInternalImpl(Cnst, Rt, out);
return out;
} else {
#ifdef UNI_GPU
// GPU path: existing cuAdd kernels assume the output dtype is std::min(lhs_type, rhs_type))
// so we must match this until the kernels
// are regenerated with correct type promotion semantics.
const auto lhs_type = cy_typeid_v<T>;
const auto rhs_type = Rt.dtype();
const auto out_type = std::min(lhs_type, rhs_type);
Storage Cnst(1, lhs_type);
Cnst.at<T>(0) = lc;
Tensor out;
out._impl = Rt._impl->_clone_meta_only();
out._impl->storage() = Storage(Rt._impl->storage().size(), out_type, Rt.device());
checkCudaErrors(cudaSetDevice(Rt.device()));
linalg_internal::lii.cuAri_ii[lhs_type][rhs_type](
out._impl->storage()._impl, Cnst._impl, Rt._impl->storage()._impl,
Rt._impl->storage()._impl->size(), {}, {}, {}, 0);
return out;
#else
cytnx_error_msg(true, "[Add] fatal error: GPU tensor without CUDA support.%s", "\n");
#endif
}
}There is nothing we can do about the GPU version; in order to call the existing It is also obvious that there are no unit tests covering tricky cases involving type promotions and mixed signed/unsigned etc. Someone should add some, with some annotations that some corner cases are currently expected to fail. For example, addition of a Footnotes
|
For HTTP error, it is fine to just rerun the failed test. |
Yeah, you’re right, it’s a critical problem. I will solve others issues later. |
Remove Arithmetic_internal.hpp/cpp and Ari_ii from linalg_internal_interface