Migrate qBraid Target in CUDAQ to qBraid v2 by TheGupta2012 · Pull Request #5 · qBraid/cuda-quantum

TheGupta2012 · 2026-03-13T08:20:46Z

Reference: Builds on top of the PR qBraid integration MVP #4

Add the updates for migrating cudaq qbraid target to use qbraid platform v2. Include updates for jobs, main api update, auth updates for api-key auth, etc.

TheGupta2012 · 2026-03-13T08:21:48Z

@ryanhill1 I discarded the commit in current main so that we don't have to resolve conflicts there. I pulled from upstream/main, rebased in my local and now building on top of your changes

* working implementation using openQasm * modified and added test files(incomplete) * fix emulate command alignment * update polling + format * update polling interval and make code more readable * remove ionq fields from target-arguments * fix formatting * Add qBraid mock python server for testing Signed-off-by: Ryan Hill <ryanjh88@gmail.com> * Update __init__.py Signed-off-by: Ryan Hill <ryanjh88@gmail.com> * QbraidTester running correctly * added documentation for qbraid --------- Signed-off-by: Ryan Hill <ryanjh88@gmail.com> Co-authored-by: feelerx <superfeelerxx@gmail.com>

The deployments cleanup job only removes `default` environment deployments but not `ghcr-ci` ones. Every CI run creates multiple ghcr-ci deployments via dev_environment.yml, leaving "copy-pr-bot temporarily deployed to ghcr-ci — Inactive" entries cluttering PR timelines. Extend the existing cleanup loop to also delete ghcr-ci deployments. The production `ghcr-deployment` environment used by deployments.yml is not affected. Signed-off-by: mitchdz <mitch_dz@hotmail.com>

…DIA#4320) Fixes NVIDIA#4319. The basis-driven pattern selection in `decomposition{basis=...}` failed to select decomposition chains involving `SToR1` and `TToR1` because these patterns were registered with `s(1)`/`t(1)` metadata (controlled-only) despite their implementations handling any control count. The graph lookup in `DecompositionPatternSelection.cpp` used exact hash matching on `OperatorInfo`, so an unbounded `(n)` entry could not match a concrete control count. This left `CCX` gates undecomposed when `t` was not directly in the target basis. The fix updates `SToR1`/`TToR1`/`R1ToU3`/`U3ToRotations` registration to `(n)` and adds `OperatorInfo::matches()` for wildcard control count matching in `incomingPatterns()` and `findGateDist()`. Signed-off-by: Thomas Alexander <talexander@nvidia.com>

…4332) Signed-off-by: Adam Geller <adgeller@nvidia.com>

…IA#4330) This updates the unittest so that cudaq::state objects are used to capture and pass state information (amplitude vectors) into kernels. The new API contract is that this sort of state information shall be passed into CUDA-Q kernels as state objects and not raw vectors. --------- Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>

Migrating Python bindings from pybind11 to nanobind - Adding nanobind as a submodule - Creating NanobindAdaptors for MLIR C-API type casters - Keeping pybind11 only for upstream MLIR Python extensions - Converting all `*_py.cpp ` binding files, headers, CUDAQuantumExtension.cpp, pyDynamics, interop library, and PYSCF plugin to nanobind --------- Signed-off-by: Sachin Pisal <spisal@nvidia.com>

I, Harshit <harshit.11235@gmail.com>, hereby add my Signed-off-by to this commit: 9cd62cf I, Harshit <harshit.11235@gmail.com>, hereby add my Signed-off-by to this commit: 3b0a1e4 I, Harshit <harshit.11235@gmail.com>, hereby add my Signed-off-by to this commit: 1a24c66 Signed-off-by: Harshit <harshit.11235@gmail.com>

I, TheGupta2012 <harshit.11235@gmail.com>, hereby add my Signed-off-by to this commit: 925ae39 I, TheGupta2012 <harshit.11235@gmail.com>, hereby add my Signed-off-by to this commit: 41fe248 I, TheGupta2012 <harshit.11235@gmail.com>, hereby add my Signed-off-by to this commit: d74243d Signed-off-by: TheGupta2012 <harshit.11235@gmail.com>

This is a rewrite of NVIDIA#4329, using a stateless class with static functions rather than a builder pattern. Signed-off-by: Luca Mondada <luca@mondada.net>

Fixes NVIDIA#4343. Signed-off-by: Sachin Pisal <spisal@nvidia.com>

…VIDIA#4335) When a kernel returns a vector (for `cudaq::run`), we insert `__nvqpp_vectorCopyCtor` which performs a `malloc` + `memcpy` to copy stack data to the heap. After `AggressiveInlining` and `ReturnToOutputLog`, the heap copy becomes dead but remains in the IR. This is normally cleaned up by LLVM's optimization passes, but on code paths that emit MLIR directly (e.g., `nop` for backends that consume `quake`), these dead allocations persist and get sent to the server. This PR adds a new MLIR pass, `eliminate-dead-heap-copy`, that redirects reads from the `malloc`'d buffer to the original `memcpy` source (the stack `alloca`), then erases the dead `malloc`, `memcpy`, and `cc.stdvec_init` ops. This can be added on-demand via target yml file. Update the mock server test to demonstrate that. --------- Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

Updating cuquantum version to 26.03.1 --------- Signed-off-by: Sachin Pisal <spisal@nvidia.com>

## Background `cudaq.sample` with `set_target("braket")` fails on v0.14.0+ with: RuntimeError: [line 10] cannot declare bit register. Only 1 bit register(s) is/are supported Amazon Braket's OpenQASM 2.0 parser enforces exactly one classical register per circuit. The payload CUDA-Q emits for the Bell-state reproducer in NVIDIA#4341 contains two. ## Root cause `addPipelineTranslateToOpenQASM` (`lib/Optimizer/CodeGen/Pipelines.cpp`) was refactored in NVIDIA#3693 to run `ExpandMeasurements` unconditionally. For `qasm2` backends that run `combine-measurements` in the mid pipeline (Braket, Scaleway, Quantum Machines), the sequence becomes: 1. Mid pipeline: `combine-measurements` merges per-qubit measurements into a single `quake.mz` on the whole `!quake.veq` - the intent being "emit one `creg` spanning all qubits". 2. Translate pipeline: `ExpandMeasurements` re-expands the combined `mz` into one `mz` per qubit, then loop-unrolls. 3. OpenQASM2.0 emitter: writes one `creg` declaration per `mz`. Target-specific YAML intent is silently overridden in the translate pipeline. ## Fix 1. `lib/Optimizer/CodeGen/Pipelines.cpp`: revert `addPipelineTranslateToOpenQASM` to the thin cleanup it was before NVIDIA#3693 . Each backend's YAML now drives measurement expansion. 2. `infleqtion.yml` and `tii.yml`: add `jit-high-level-pipeline: "expand-measurements"`. These targets previously depended on the unconditional expansion to get one `creg` per measured qubit; the explicit entry preserves that behavior. 3. `test/Translate/OpenQASM/basic.qke` and `test/Translate/openqasm2_*.cpp`: update CHECK lines to match the single-`creg` output for a vector `mz` (which is what the emitter produces after the fix). ## Impact | Backend | creg count for `mz(qvector(n))` | |---|---| | Braket, Scaleway, Quantum Machines | 1 (single `creg` of size n) | | Infleqtion, TII | n (preserved via new YAML entry) | | Quantinuum, IQM, OQC, Anyon, QCI | n (unchanged; already had `expand-measurements` in YAML) | The change is scoped to `addPipelineTranslateToOpenQASM`, which only runs for `codegen-emission: qasm2`. Simulators and non-OpenQASM2.0 backends are unaffected. ## Testing - `ninja check-cudaq-mlir` passes with the updated CHECK lines. - `cudaq.translate(kernel, format="openqasm2")` under `set_target(...)` for Braket, Scaleway, Infleqtion, TII — creg counts match the matrix above. - Reproducer from NVIDIA#4341 now emits exactly the "expected" OpenQASM2.0 shown in the issue: `creg var3[2]; measure var0 -> var3;`. - Manually tested against real servers: `test_braket.py`, `test_Infleqtion.py`, `test_tii.py`, `test_scaleway.py`. ## Follow-up An automated local test set up for OpenQASM payload validator will be added in a separate PR. Fixes NVIDIA#4341. --------- Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>

…frastructure (NVIDIA#4349) ## Summary Reverts PRs - NVIDIA#3800, - NVIDIA#4204, - NVIDIA#4208, - NVIDIA#4266, - NVIDIA#4267. Following an architecture alignment meeting (Apr 17), we are changing direction on how measurement results are represented in CUDA-Q. The `measure_result` standalone class and `!quake.measurements<N>` Quake type introduced by these PRs are being replaced by a new `measure_handle` approach with fundamentally different semantics. This revert restores: * `measure_result` as a typedef to bool (compiler mode) * Multi-qubit mz returning `!cc.stdvec<!quake.measure>` * Removes `!quake.measurements<N>` type, `quake.get_measure`, `quake.measurements_size` ops * Removes `quake.relax_size` extension for measurements * Removes `QIRResultArrayCreate` / `QIRResultArrayGetElementPtr1d` QIR intrinsics * Removes 8 test files added by the reverted PRs ### Forward direction (follow-up PRs): New `measure_handle` Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>

Skipping identity terms when building the Pauli word and coefficient lists passed to the Krylov kernel. Controlled exp_pauli does not handle the identity terms. We add their contribution back when assembling the Hamiltonian matrix. Fixes https://github.com/NVIDIA/cuda-quantum/actions/runs/24584888146/job/71904057326#step:5:1955 Signed-off-by: Sachin Pisal <spisal@nvidia.com>

…rs (NVIDIA#4351) Fixed the `test_state_mps.py - AttributeError: 'list' object has no attribute 'dtype'` errors in https://github.com/NVIDIA/cuda-quantum/actions/runs/24624569814/job/72005503960#step:7:43857 The fix for the rest of the failure (`RuntimeError: invalid value`) will come in a separate PR. Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

This PR removes argument synthesis by default for Python kernels run on the local simulator, instead directly invoking them with the arguments (currently, by constructing a message buffer through `.argsCreator` which is passed to the kernel's `thunk`). This only affects entry point kernels. Benefits: 1. This makes it unnecessary to recompile kernels for different arguments in this setting, simplifying the `reuse_compiler_artifacts` logic. 2. It aligns the python local simulation path more closely with C++, where arguments are similarly not synthesized. 3. As a result of 1 and 2, it is a useful and important first step towards an inter-launch caching strategy for python. --------- Signed-off-by: Adam Geller <adgeller@nvidia.com> Signed-off-by: Luca Mondada <luca@mondada.net> Co-authored-by: Luca Mondada <luca@mondada.net>

Signed-off-by: TheGupta2012 <harshit.11235@gmail.com>

…place qasm normalization and fix sudoku tests Signed-off-by: TheGupta2012 <harshit.11235@gmail.com>

…IA#4393) When calling `cudaq.ptsbe.sample(..., return_execution_data=True)`, the TraceInstruction objects returned for Noise-type instructions now carry the channel's numeric parameters and the full kraus_channel object. Previously a Noise TraceInstruction had an empty `params` list, and its underlying `cudaq::kraus_channel` (populated in C++) was not bound to Python. Users could see which channel fired via `inst.name` and which Kraus operator was selected via `kraus_selections[i].kraus_operator_index`, but could not recover the channel probability or the Kraus matrices from the trace without re-inspecting the NoiseModel. Signed-off-by: Thomas Alexander <talexander@nvidia.com>

Signed-off-by: efratshabtai <efratshabtai@users.noreply.github.com> Co-authored-by: Sachin Pisal <spisal@nvidia.com>

While exploring and prototyping options for compile-time checks of QPUs, I keep running into the issue that llvm headers bleed into user code. I would like to put an end to this issue once and for all by replacing it with our own registry. This registry is widely borrowed from LLVM by Claude and reviewed by codex. I have asked it to keep the current model, whereby it is instantiated in one place so that a registration from a shared lib is visible to all shared libs. I believe, from discussions with Bruno, that this should also help the LLVM update work. --------- Signed-off-by: Renaud Kauffmann <rkauffmann@nvidia.com>

…#4396) This is needed due to the recent release of `build` 1.4.4 * https://pypi.org/project/build/1.4.4/ * https://github.com/pypa/build/releases/tag/1.4.4 More specifically, the `build` package now adds `--ignore-installed` during part of the build process, and since our cmake version pinnings weren't consistent across `[build-system]` and `[tool.scikit-build]`, cmake got updated to be >4 halfway through the build. CUDA-Q does not support cmake>4. Signed-off-by: Ben Howe <bhowe@nvidia.com>

This unifies the two overloads of `Compiler::lowerQuakeCode` into one by moving the responsibility of loading the kernel into an MLIR ModuleOp to the caller. This also means that `extractQuakeCodeAndContext` is now a public method of `Compiler`. I've taken this opportunity to rename it to `loadQuakeCodeByName` which I found clearer. Happy to revert that. Signed-off-by: Luca Mondada <luca@mondada.net>

…4387) On top of NVIDIA#4378 This was the last place in the code base (for Python) that wasn't producing `CompiledModule`s before launching the kernels. --------- Signed-off-by: Luca Mondada <luca@mondada.net>

…IA#4397) Change signatures in `RemoteRuntimeClient`, `ArgumentConverter::gen*` and `mergeAllCallableClosures` to take `std::span`s instead of `const std::vector<void *> *` or `const std::vector<void *> &`. This is more general and casting from a reference to span is implicit, so most call sites remain unchanged. Note that technically, a pointer to a vector can distinguish the empty vector from no vector (`nullptr`), but this distinction is never made in the code, so using `span`s removes this unnecessary distinction as well. Signed-off-by: Luca Mondada <luca@mondada.net> Co-authored-by: Claude <claude@mondada.net>

…4388) This removes `specializeModule` within `QPU` and `quantum_platform`. Instead, both when specializing and launching a kernel from Python, `QPU::compileModule` is called. Signed-off-by: Luca Mondada <luca@mondada.net>

Signed-off-by: mdzurick <mitch_dz@hotmail.com>

…dling (Ctrl+C) (NVIDIA#4284) This branch is based on NVIDIA#4241 and should only be merged after it is merged on upstream/main and rebased. When Python calls into CUDA-Q for kernel compilation, the GIL is held for the entire duration. This means that `Ctrl+C` does nothing. Python signal handlers (including KeyboardInterrupt) can only run when the GIL is held by a Python thread. Since compilation holds the GIL in C++ the whole time, pressing Ctrl+C is silently queued and never delivered until compilation finishes. For large circuits or complex pass pipelines, this can mean minutes of uninterruptible execution. Another side effect is that there is no python thread concurrency. Other python threads (progress bars, async I/O, timeouts) are blocked from running during compilation. This PR updates the execution to release the GIL at Python-to-C++ entry points and check for pending Python signals between MLIR passes through instrumentation of the pass pipelines. The interruption granularity is at a per-pass level, so a single long-running pass will still block until it completes (which can be an big issue with SABRE right now. Ultimately I had to work around this in benchmarking by forking for my use-case). The reason being that MLIR/LLVM is built with `-fno-exceptions`, so signals cannot safely unwind through pass execution. Instead, pending signals are detected between passes via PyErr_CheckSignals and converted to an MLIR error diagnostic that stops the pipeline through normal control flow. --------- Signed-off-by: Thomas Alexander <talexander@nvidia.com>

Implements the feature request in NVIDIA#2220: - Add a regression test for the case reported in that bug report. - Add a test for legitimate empty kernels. Resolved: NVIDIA#2220 --------- Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

## Summary - Introduces a process-wide `Tracer` that dispatches span begin/end events to pluggable backends. - `SpdlogTraceBackend` preserves identical output with the existing `ScopedTrace` implementation. - `ChromeTraceBackend` captures events in memory and serializes as Chrome Trace Event Format JSON (viewable in Perfetto / speedscope). - `TracePassInstrumentation` attached at every in-repo MLIR `PassManager` construction sites. - Env-var flow: `CUDAQ_TRACE_FORMAT=chrome|spdlog` / `CUDAQ_TRACE_PATH=<path>` enables tracing at `initializeLogger` time with no code changes. ## Design notes - `TraceBackend` inherits `std::enable_shared_from_this` so a follow-up Python-bindings PR can hold backends as first-class objects with independent C++ / Python shared ownership. - Fork safety: `ChromeTraceBackend` records `ownerPid` at construction and skips the destructor file write in forked children, avoiding parent-output clobber. ## Dependencies Based on `feature/gil-release-compilation`. Precedes PR NVIDIA#4284 (`feature/tracer-python`) which stacks on this. --------- Signed-off-by: Thomas Alexander <talexander@nvidia.com>

The comment makes no sense in the context in which it appears. Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>

* Follow-up to PR NVIDIA#3824 * Fix the `assert_close` function. * Manually tested ``` python3 -m pytest -v python/tests/backends/test_Infleqtion.py =================================================================== test session starts =================================================================== platform linux -- Python 3.12.3, pytest-8.3.0, pluggy-1.6.0 -- /usr/bin/python3 cachedir: .pytest_cache rootdir: /workspaces/cuda-quantum configfile: pyproject.toml plugins: anyio-4.13.0, xdist-3.8.0 collected 9 items python/tests/backends/test_Infleqtion.py::test_simple_kernel PASSED [ 11%] python/tests/backends/test_Infleqtion.py::test_all_gates PASSED [ 22%] python/tests/backends/test_Infleqtion.py::test_multiple_qvector PASSED [ 33%] python/tests/backends/test_Infleqtion.py::test_multiple_measure PASSED [ 44%] python/tests/backends/test_Infleqtion.py::test_observe PASSED [ 55%] python/tests/backends/test_Infleqtion.py::test_state_synthesis PASSED [ 66%] python/tests/backends/test_Infleqtion.py::test_state_preparation PASSED [ 77%] python/tests/backends/test_Infleqtion.py::test_state_preparation_builder PASSED [ 88%] python/tests/backends/test_Infleqtion.py::test_exp_pauli PASSED [100%] =================================================================== 9 passed in 18.45s ==================================================================== ``` --------- Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com>

Update mgpu code to use new `measureSpinOp` signature and some downstream test updates. Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

## Summary This change fixes `cudaq.observe()` broadcasting on REST-based QPU targets when `ExecutionContext.getExpectationValue()` returns `None`. REST backends such as OQC and Quantinuum do not always populate `executionContext->expectationValue`. The non-broadcast `observe()` path already handled that by reconstructing the expectation value from sampled term results, but `__broadcastObserve()` passed the `None` value directly into `ObserveResult`, which caused a crash. This patch makes the broadcast path use the same fallback behavior as the non-broadcast path. ## What changed - Added a shared helper in `python/cudaq/runtime/observe.py` to resolve the expectation value: - return `ctx.getExpectationValue()` when available - otherwise reconstruct it from the sampled term expectations - Updated `__broadcastObserve()` to use that helper - Updated the existing non-broadcast path to reuse the same helper instead of duplicating the fallback logic - Added backend regression tests for: - `python/tests/backends/test_OQC.py` - `python/tests/backends/test_Quantinuum_kernel.py` ## Why this fixes the issue Previously, the broadcast path assumed the expectation value was always present in the execution context. On REST targets that assumption is false, so `ObserveResult(...)` received `None` and raised a `TypeError`. With this change, broadcasted `observe()` calls now fall back to computing the expectation value from the returned sample counts, matching the behavior already used in the non-broadcast path. ## Testing I added regression tests covering broadcasted `observe()` calls for OQC and Quantinuum. What I was able to verify locally: - the runtime fix is present in `python/cudaq/runtime/observe.py` - the new backend regression tests are present and selected by pytest What I could not fully verify locally: - end-to-end execution of the new tests in my WSL environment Reason: - local runs abort during kernel compilation / MLIR lowering before `observe()` execution begins - the crash occurs in `cudaq/kernel/ast_bridge.py` / `compile_to_mlir` - because of that, the local environment does not reach the broadcast observe path, so it does not validate the new fallback behavior end-to-end This appears to be unrelated to the `observe` broadcast fix itself, since the abort happens before the `observe()` runtime path is exercised. ## Local environment notes During local setup I had to: - build CUDA-Q from source in WSL - build a custom LLVM/MLIR toolchain - disable the Braket backend locally to avoid unrelated AWS SDK dependency issues Even after that, the backend tests still abort earlier during kernel compilation in this environment. Signed-off-by: Zeel <desaizeel2128@gmail.com> Co-authored-by: Thien Nguyen <58006629+1tnguyen@users.noreply.github.com>

…dynamics issues (NVIDIA#4327) ## Summary This PR fixes `cudaq.State.from_data(...)` for strided CuPy device arrays. Before this change, CuPy arrays with explicit stride metadata could be misinterpreted during state construction. In particular, transposed views, Fortran-ordered arrays, and other non-contiguous CuPy arrays could be read as if they were flat contiguous buffers, which silently changed their logical layout. This affected both state-vector-like inputs and 2D density matrix inputs, with the 2D case being especially dangerous because the matrix contents could be silently reordered. ## How to reproduce A minimal reproducer is to pass a transposed or Fortran-ordered CuPy array into `cudaq.State.from_data(...)`. Example: ```python import cupy as cp import cudaq import numpy as np from cudaq.dynamics import Schedule from cudaq.operators import spin cudaq.set_target("dynamics") base = cp.array([[1.0 + 0.0j, 2.0 + 0.0j], [3.0 + 0.0j, 4.0 + 0.0j]], dtype=cp.complex128) rho = base.T # or cp.asfortranarray(base) state = cudaq.State.from_data(rho) result = cudaq.evolve( 0.0 * spin.z(0), {0: 2}, Schedule([0.0], ["t"]), state, observables=[], collapse_operators=[], store_intermediate_results=cudaq.IntermediateResultSave.NONE, ) print(np.array(result.final_state())) print(cp.asnumpy(rho)) ``` Before this change, these two values could differ for strided CuPy inputs even though they should represent the same logical matrix. ## Root Cause The CuPy import path did not consistently preserve logical layout information. CuPy arrays expose device memory through __cuda_array_interface__, including shape and stride metadata. However, the previous implementation could reduce CuPy inputs to a raw device pointer plus element count, which is only safe for contiguous layouts. For strided arrays such as a.T or cp.asfortranarray(a), this loses the logical indexing semantics and can cause the array to be interpreted using its underlying flat memory layout instead of its intended logical values. ## What this PR changes * Read CuPy stride metadata from __cuda_array_interface__ * Preserve safe fast paths for contiguous inputs * Canonicalize CuPy arrays when needed before constructing the state * Add regression coverage for: - strided 1D CuPy views - C-order 2D CuPy arrays - Fortran-order 2D CuPy arrays - transposed 2D CuPy views ## Why this matters Users expect cudaq.State.from_data(cupy_array) to preserve the logical values of the CuPy array, regardless of whether the array is contiguous, transposed, or stored with non-default strides. This PR fixes cases where that expectation was not met and prevents silent layout corruption for GPU-backed inputs. ------ update in 4/24 ## Adjacent fixes discovered during review Verifying the fix under the post-nanobind merge surfaced two independent correctness gaps on the dynamics target. They are committed separately so each can be reviewed or reverted on its own. ### 1. Reject non-square 2D CuPy arrays at `from_data` time (commit `0f762a09`) A non-square 2D CuPy array on the dynamics target previously slipped through `createStateFromPyBuffer` and was flattened into a 1D buffer inside `TensorStateData`. The failure was deferred until `initialize_cudm()` raised a cryptic `Invalid hilbertSpaceDims for the state data` with no pointer to the real cause. Now mirrors the host 2D validation path and rejects at `from_data` time with `state.from_data 2D array (density matrix) input must be square matrix data.`. ### 2. Propagate `isDensityMatrix` in `createFromSizeAndPtr` (commit `21e4bdde`) PR NVIDIA#2853 ("Migrate Python dynamics solver implementation to pybind11", May 2025) trimmed the `CuDensityMatState` constructor's `isDm` parameter but did not add an equivalent assignment in `createFromSizeAndPtr`, leaving the locally computed `isDm` dead for ~11 months. As a result, `cudaq.State.from_data(cupy_2d)` on the dynamics target produced a state whose `getTensor().extents` returned `[N*N]` instead of `[N, N]` and whose `np.array(state).shape` was `(N*N,)` instead of `(N, N)`. The inconsistency was masked because `evolve()` re-infers the shape during `initialize_cudm()`, so no existing test inspected the state before `evolve`. Restored the flag after construction. `dimension` already stores the flat element count and `getTensor()`/`operator()` already `sqrt` it when `isDensityMatrix` is true, so no other field needs to change. --------- Signed-off-by: huaweil <huaweil@nvidia.com> Co-authored-by: Sachin Pisal <spisal@nvidia.com>

## Summary Follow-up to NVIDIA#4395 (which fixed `cudaq.observe` broadcast on REST QPU targets). That PR added regression tests for OQC and Quantinuum but not IonQ, even though NVIDIA#4363 explicitly listed IonQ as affected. IonQ uses the same `BaseRemoteRESTQPU` path, so an analogous test in `test_IonQ.py` closes the coverage gap. The new test mirrors `test_OQC_observe_broadcast` / `test_quantinuum_observe_broadcast` exactly: a 4-sample parameter sweep through `cudaq.observe(...)` on `spin.z(0)`, with results compared to the analytical `cos(theta)` answer. ## Verification I reproduced the original `TypeError` against `target='ionq'` (both `emulate=True` and the real cloud `qpu='simulator'`), applied the fix from NVIDIA#4395, and confirmed the broadcast call now returns correct expectation values within shot noise. The new test passes on the post-NVIDIA#4395 main. ## Test plan - [x] `yapf --style google` clean - [x] Manually verified against the real IonQ cloud simulator (broadcast `observe`, 3 parameter sets, 200 shots, results within 0.02 of analytical `cos(theta)`) - [ ] CI: `python/tests/backends/test_IonQ.py::test_ionq_observe_broadcast` Signed-off-by: Spencer Churchill <25377399+splch@users.noreply.github.com>

## Summary Introduce `!cc.measure_handle` - the IR alias for the source-language `cudaq::measure_handle` - and widen `quake.mz`/`mx`/`my` and `quake.discriminate` ODS / verifiers to admit it alongside the existing `!quake.measure` form. Pure IR vocabulary: no path yet produces or consumes the new type. This is the prologue of a small stack; lowering through `convert-to-qir-api` lands in a follow-up PR, frontend bindings after that. ## Motivation `cudaq::measure_handle` is a distinct source-language type from both raw integers and the existing measurement token: integer-shaped at the bit level, but identity-preserving for analyses that need to distinguish a measurement event from arbitrary i64 traffic. Landing the IR vocabulary first gives the QIR conversion PR and the frontend PRs a stable target without forcing the ODS contracts to churn step by step. ## What Changed - **New type** `!cc.measure_handle` in the CC dialect; i64 payload, opaque to the IR. Registered with the CC dialect, lowered to `i64` in the CC->LLVM type converter, and `cc.cast` admits no-op `i64 <-> !cc.measure_handle`. - **ODS widening** on `quake.mz`/`mx`/`my` results and `quake.discriminate` operand: now `!cc.measure_handle` or `!cc.stdvec<!cc.measure_handle>` are admitted in addition to the prior `!quake.measure` forms. - **Verifier widening**: `verifyMeasurements` and `DiscriminateOp::verify` accept the new shape; arity diagnostics mention both spellings so users see why a scalar-typed result is rejected when measuring a register. - **Tests**: `test/Transforms/roundtrip-ops.qke` (passthrough + `i64 <-> !cc.measure_handle` `cc.cast` round-trip), `test/Transforms/invalid.qke` (verifier negatives). ## Risks No behavioral change in this PR: no path produces or consumes `!cc.measure_handle` until the follow-up PR lands. Risk is bounded to ODS coverage gaps that would surface in the consumer; the follow-up wires up `--convert-to-qir-api` and tests it. ## Downstream Impact - CUDA-QX: none. - Public API: none. - Stack: lowering through `--convert-to-qir-api` lands in a follow-up PR built on this branch; C++/Python frontend bindings land after that. --------- Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com> Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>

Signed-off-by: mdzurick <mitch_dz@hotmail.com>

Fixes a miscompile of cc.loop while-loop with closed-interval comparisons (>=, <=). - Adding uge/sge to isClosedIntervalForm. - After the loop, replacing external uses of the induction-position result Fixes NVIDIA#4401 --------- Signed-off-by: Sachin Pisal <spisal@nvidia.com> Signed-off-by: Eric Schweitz <eschweitz@nvidia.com> Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>

Move the anonymous namespace to wrap only BufferInfo and mark hostDataFromDevice plus the CuPy helper functions static, matching the convention used elsewhere in the file. Signed-off-by: Sachin Pisal <spisal@nvidia.com>

## Summary - Binds the runtime `Tracer` as `cudaq.util.trace` with: - `span(name, **kwargs)` context manager - `traced(name=None)` decorator (name defaults to `fn.__module__ + "." + fn.__qualname__`) - `TraceBackend` / `ChromeBackend` / `SpdlogBackend` first-class classes - `set_backend` / `get_backend` / `reset_backend` - Backends constructed via `std::make_shared` factories (`nb::new_`). Python wrapper and C++ Tracer slot each hold an independent `shared_ptr`, so Python finalization tears down wrappers without the C++ slot losing its reference, and C++ static destruction runs the ChromeBackend file write cleanly without touching Python state. - `ChromeBackend` exposes `to_json` / `to_dict` / `write_file` / `clear` for in-memory inspection with no filesystem round trip. - Applies `@trace.traced` to every public kernel-action entry point (`sample`, `observe`, `run`, `get_state`, `get_unitary`, `estimate_resources`, `draw`, `translate`, `evolve`, `ptsbe.sample`, and async variants) and to `PyKernelDecorator.compile` / `prepare_call`, with a nested `kernel.clone_module` span around the `cudaq_runtime.cloneModule` call. ## Dependencies Stacks on PR NVIDIA#4389 Rebase onto main after PR 1 merges and retarget the PR base. --------- Signed-off-by: Thomas Alexander <talexander@nvidia.com>

NOTE: This is a re-post of NVIDIA#4392, which I merged into the wrong branch! It's already been reviewed, discussed and approved. --- This PR splits out the container that is used in CompiledModule into its own type. This is so that it can be re-used by other upcoming types that look very similar, e.g. KernelArgs. I took the opportunity to change to using a vector of pairs instead of a std::map to store the artifacts. This should be faster (most of the time, there will be <5 artifacts) and means that several artifacts of different types can share the same name. This removes the need to adopt some naming convention to differentiate multiple artifact types for the same kernel, as they can share the same name. Signed-off-by: Luca Mondada <luca@mondada.net>

…IA#4404) ## Summary * Lower `!cc.measure_handle` to its `i64` payload through `--convert-to-qir-api`'s existing `TypeConverter`, completing the IR-side of the `cudaq::measure_handle` feature. * Builds on NVIDIA#4403. ## Motivation NVIDIA#4403 introduced `!cc.measure_handle` as IR vocabulary; nothing yet routes it to QIR. This PR adds the converter rule plus boundary bridging on `quake.mz` (which still calls a QIR function returning `Result*`) and `quake.discriminate` (which still consumes `Result*`), so handle-form kernels reach the QIR pipeline as `i64` payloads through the same `TypeConverter` machinery the rest of QIR conversion already uses. ## What Changed - **`QIRAPITypeConverter`** gains three `addConversion` rules: `!cc.measure_handle -> i64`, plus recursive descent through `!cc.array<...>` and `!cc.stdvec<...>` so container-shaped function signatures, allocations, and pointers see consistent post-conversion element types. The `!cc.ptr<...>` recursion was already in place. - **`MeasurementOpPattern`** detects when the original `quake.mz` produced a handle (its `measOut` is `!cc.measure_handle`) and emits a `cc.cast Result* -> i64` so downstream uses see the converted payload. The cast is materialized in the mz call's block, ahead of the optional terminator-relative insertion point used for record-output, so it dominates downstream `quake.discriminate` uses. - **`DiscriminateOpToCallRewrite`** mirrors this on the read side: when the post-conversion operand is integer-typed it emits `cc.cast i64 -> Result*` before delegating to the existing read-result lowering. In the full-QIR (`!discriminateToClassical`) branch the bridge cast and the inner double-cast fold against each other, leaving a single `cc.cast i64 -> !cc.ptr<i1>` + `cc.load`. - **`ExpandMeasurements`** accepts `!cc.measure_handle` alongside `!quake.measure` in `usesIndividualQubit`, so single-qubit handle measurements aren't rewritten as registers. - **Predicate rename**: the misnamed `hasQuakeType` is now `needsTypeConversion`, leaf check extended to include `!cc.measure_handle`, recursion extended to descend through `!cc.array`/`!cc.stdvec`. The old name was incorrect — it has always reported "this op carries a type the converter rewrites," not "this op carries a Quake type." - **Test**: `test/Transforms/qir_api_measure_handle.qke` covering scalar handle measurement + discriminate, function signature with handle parameter and return, `cc.alloca` of a scalar handle, static- and dynamic-size arrays of handles, `cc.stdvec<!cc.measure_handle>` in a function signature, `cc.indirect_callable<() -> !cc.measure_handle>`, and a no-handle negative. ## Risks - `cc.loop` iter-args carrying `!cc.measure_handle` are not exercised by the conversion's region-aware patterns. Low immediate risk because no current frontend or test produces such IR; flagged as a follow-up. - Container types beyond `cc.array`/`cc.stdvec` (e.g., a `cc.struct` with a handle field) are not in the converter's recursion. None of the current frontends produce these; not a regression vs. the prototype. ## Downstream Impact - CUDA-QX: none. - Public API: none. - Stack: the next PR adds C++/Python frontend bindings that produce handle-form IR, which this PR now correctly routes. --------- Signed-off-by: Pradnya Khalate <pkhalate@nvidia.com> Signed-off-by: Pradnya Khalate <148914294+khalatepradnya@users.noreply.github.com>

Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>

TheGupta2012 changed the title ~~Migrate CUDAQ v2~~ Migrate qBraid Target in CUDAQ to qBraid v2 Mar 13, 2026

TheGupta2012 self-assigned this Mar 16, 2026

TheGupta2012 force-pushed the migrate-cudaq-v2 branch from a72236c to 46593c0 Compare April 2, 2026 11:59

ryanhill1 and others added 6 commits April 14, 2026 12:03

update: migrate cudaq to platform v2

925ae39

fix: merge conflicts

41fe248

add: api_key and device to set_target for qbraid

d74243d

fix: submodule hashes and v2 platform implementation and test

9cd62cf

fix: formatting and headers

3b0a1e4

TheGupta2012 force-pushed the migrate-cudaq-v2 branch from 46593c0 to 3b0a1e4 Compare April 15, 2026 10:41

mitchdz and others added 19 commits April 15, 2026 13:33

Fail earlier if counting resources from IR won't be possible (NVIDIA#…

06a903d

…4332) Signed-off-by: Adam Geller <adgeller@nvidia.com>

fix: docs for qbraid helper and update examples

1a24c66

Merge branch 'main' into migrate-cudaq-v2

71b9450

Add stateless CompiledModuleHelper (NVIDIA#4338)

66a1e2d

This is a rewrite of NVIDIA#4329, using a stateless class with static functions rather than a builder pattern. Signed-off-by: Luca Mondada <luca@mondada.net>

removing email (NVIDIA#4344)

ac40cc2

Fixes NVIDIA#4343. Signed-off-by: Sachin Pisal <spisal@nvidia.com>

Updating cuquantum version to 26.03.1 (NVIDIA#4342)

8d7e922

Updating cuquantum version to 26.03.1 --------- Signed-off-by: Sachin Pisal <spisal@nvidia.com>

fix: formatting issues, api key leak and default num of qubits

f6ba810

Signed-off-by: TheGupta2012 <harshit.11235@gmail.com>

TheGupta2012 and others added 30 commits April 24, 2026 10:54

fix: review comments, add combine meas and other codegen passes to re…

c47f54a

…place qasm normalization and fix sudoku tests Signed-off-by: TheGupta2012 <harshit.11235@gmail.com>

Add application hub launchable link to Applications page (NVIDIA#4231)

90caa23

Signed-off-by: efratshabtai <efratshabtai@users.noreply.github.com> Co-authored-by: Sachin Pisal <spisal@nvidia.com>

Merge branch 'main' into migrate-cudaq-v2

a525bfb

[NFC] Split BaseRemoteSimulator launch into compile + launch (NVIDIA#…

cb0dd66

…4387) On top of NVIDIA#4378 This was the last place in the code base (for Python) that wasn't producing `CompiledModule`s before launching the kernels. --------- Signed-off-by: Luca Mondada <luca@mondada.net>

update realtime NOTICE (NVIDIA#4400)

56a5f86

Signed-off-by: mdzurick <mitch_dz@hotmail.com>

[NFC] Update a comment (NVIDIA#4402)

1529df7

The comment makes no sense in the context in which it appears. Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>

[Publishing] Update mgpu Gitlab SHA (NVIDIA#4394)

23c01f9

Update mgpu code to use new `measureSpinOp` signature and some downstream test updates. Signed-off-by: Thien Nguyen <thiennguyen@nvidia.com>

remove python 3.10, add python 3.13 to docs (NVIDIA#4399)

0376c3d

Signed-off-by: mdzurick <mitch_dz@hotmail.com>

Restoring static linkage (NVIDIA#4417)

5c9a3c6

Move the anonymous namespace to wrap only BufferInfo and mark hostDataFromDevice plus the CuPy helper functions static, matching the convention used elsewhere in the file. Signed-off-by: Sachin Pisal <spisal@nvidia.com>

[NFC] Let's follow the coding conventions (NVIDIA#4422)

73596a7

Signed-off-by: Eric Schweitz <eschweitz@nvidia.com>

Merge branch 'main' into migrate-cudaq-v2

9262e69

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate qBraid Target in CUDAQ to qBraid v2#5

Migrate qBraid Target in CUDAQ to qBraid v2#5
TheGupta2012 wants to merge 92 commits intomainfrom
migrate-cudaq-v2

TheGupta2012 commented Mar 13, 2026

Uh oh!

TheGupta2012 commented Mar 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

17 participants

Conversation

TheGupta2012 commented Mar 13, 2026

Uh oh!

TheGupta2012 commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

17 participants

TheGupta2012 commented Mar 13, 2026 •

edited

Loading