Skip to content

flasharray/kvm/adaptive: NVMe-TCP transport for FlashArray primary storage#13061

Open
genegr wants to merge 7 commits intoapache:mainfrom
genegr:feat/flasharray-nvme-tcp-support
Open

flasharray/kvm/adaptive: NVMe-TCP transport for FlashArray primary storage#13061
genegr wants to merge 7 commits intoapache:mainfrom
genegr:feat/flasharray-nvme-tcp-support

Conversation

@genegr
Copy link
Copy Markdown

@genegr genegr commented Apr 22, 2026

Description

Adds an end-to-end NVMe-over-TCP data path for CloudStack on KVM, using the FlashArray adaptive plugin as the first (and currently only) consumer. The change is opt-in — existing Fibre Channel FlashArray / Primera deployments continue to work unchanged.

A FlashArray pool is switched to NVMe-TCP by adding a single transport=nvme-tcp query parameter to the pool URL on createStoragePool:

url=https://<user>:<pass>@<fa-ip>:443/api?pod=<pod>&transport=nvme-tcp&hostgroup=<hg>

When that parameter is present the adaptive lifecycle stamps the pool with the new StoragePoolType.NVMeTCP, the KVM agent dispatches to a brand-new MultipathNVMeOFAdapterBase / NVMeTCPAdapter pair, and the FlashArray adapter attaches volumes as host-group-scoped NVMe connections, builds EUI-128 NGUIDs in the layout /dev/disk/by-id/nvme-eui.<32-hex> that udev emits for a Pure namespace, and reverses that layout when CloudStack looks up a volume by address.

The seven commits are split along natural seams (address type, FA REST-side support, storage pool type, KVM adapter, adaptive lifecycle routing, docs, copyPhysicalDisk) so each can be reviewed independently.

Why a separate NVMeTCP pool type (and a separate MultipathNVMeOFAdapterBase) rather than reusing FiberChannel / MultipathSCSIAdapterBase?

  • NVMe-oF is a different command set (NVMe, not SCSI), identifies namespaces by EUI-128 NGUIDs (not SCSI WWNs), and on Linux is multipathed natively by the nvme driver rather than by device-mapper multipath. Keeping it out of the SCSI code path avoids special-casing inside every method that handles paths, connect, disconnect, or size lookup.
  • The new base class is fabric-agnostic: a future NVMe-RoCE or NVMe-FC adapter would only need a concrete subclass and a new pool-type value, without touching the SCSI code.

Types of changes

  • Enhancement (non-breaking change which adds functionality)
  • Bugfix
  • Breaking change

Feature/Enhancement Scale or Bug Severity

Feature. Opt-in via transport=nvme-tcp URL parameter on pool registration. Defaults are unchanged.

How Has This Been Tested?

Validated end-to-end on a 4.23-SNAPSHOT lab against a Pure Storage FlashArray running Purity 6.7.7:

  • Pre-requisites on each KVM host: an OVS bridge cloudbr-nvme with an IP on the NVMe subnet, nvme-cli + nvme_tcp kernel module, a persistent /etc/nvme/hostnqn, a populated /etc/nvme/discovery.conf and nvme connect-all enabled at boot.
  • Pre-requisites on the array: a pod (cloudstack), a hostgroup matching the CloudStack cluster name (cluster1), one host per KVM host inside the hostgroup bound to the host's NQN.
  • Registered a FlashArray primary pool with provider="Flash Array", transport=nvme-tcp, hostgroup=cluster1 → pool enters Up state, type: NVMeTCP.
  • Created and attached a 20 GiB tags=nvme disk offering volume to a Rocky 9 VM: the volume's path carried type=NVMETCP; address=<EUI-128>; connid.kvm01=1; connid.kvm02=1;; both hosts saw /dev/disk/by-id/nvme-eui.<that EUI> via the host-group NVMe connection; libvirt presented the namespace to the guest as /dev/vdb.
  • Inside the guest: mkfs.ext4 /dev/vdb, wrote 16 MiB of /dev/urandom with conv=fsync, recorded SHA-256, unmounted/remounted, re-checksummed → hash matched.
  • Live-migrated the VM between the two KVM hosts while a sha256sum probe loop was running against /mnt/nvme/pattern.bin every 2 s. Migration completed in 6 s, the loop output showed the same hash across the migration window with no gap (multi-path/hostgroup-scope proof).
  • Full-NVMe VM: deployed a second VM with both root and data disks on the NVMe-TCP pool. copyPhysicalDisk converted the Rocky 9 cloud template qcow2 into a raw NVMe namespace (10 GB root), the VM booted from it, cloud-init injected an SSH key, and a 20 GB tags=nvme data disk was attached. lsblk inside the guest showed both vda and vdb as NVMe-backed virtio block devices.
  • Snapshot / revert cycle on the full-NVMe VM: created sentinel files on both vda and vdb with a known SHA-256, took a createVMSnapshot with quiescevm=true, snapshotmemory=false, deleted both sentinel files, issued revertToVMSnapshot, restarted, and confirmed both files reappeared with the identical SHA-256 content. Array-side snapshots cloudstack::vol-4-1-2-<id>.1 for both volumes visible on Purity during the window. The StorageVMSnapshotStrategy path is what CloudStack dispatches here, so any adaptive-plugin consumer gets the same behaviour.
  • Default-path Fibre Channel registrations (no transport= parameter) continue to work — type: FiberChannel, FC WWN addressing, same MultipathSCSIAdapterBase code path as before.

Notes

Eugenio Grosso added 6 commits April 20, 2026 22:06
Preparatory data-model changes for NVMe-TCP support on the adaptive
storage framework. No behaviour change for existing Fibre Channel
users - the extra enum value, field, and getter/setter are only
exercised by callers that explicitly use them.

ProviderVolume.AddressType gains a NVMETCP value alongside FIBERWWN,
so adapters can declare that a volume is addressed by an NVMe EUI-128
(NGUID) rather than a SCSI WWN.

FlashArrayVolume.getAddress() produces the NGUID layout expected by
the Linux kernel for a FlashArray NVMe namespace:

    00 + serial[0:14] + 24a937 (Pure 6-hex OUI) + serial[14:24]

which matches the /dev/disk/by-id/nvme-eui.<id> symlink emitted by
udev. Fibre Channel callers (addressType != NVMETCP) still get the
existing 6 + 24a9370 + serial form.

FlashArrayConnection gains a nsid field to carry the namespace id the
FlashArray REST API attaches to host-group-scoped NVMe connections,
when it is present.
Teach FlashArrayAdapter to talk to a pool over NVMe over TCP instead of
Fibre Channel.

The transport is selected from a new transport= option on the storage
pool URL (or the equivalent storage_pool_details entry), e.g.

    https://user:pass@fa:443/api?pod=cs&transport=nvme-tcp&hostgroup=cluster1

Defaults remain Fibre Channel / WWN addressing when transport is absent
or anything other than nvme-tcp, so existing FC pools are unaffected.

Beyond the transport parsing itself the adapter now:

  * Tracks a per-pool volumeAddressType (AddressType.NVMETCP or
    FIBERWWN) and stamps every volume it hands back to the framework
    with it (withAddressType), so the adaptive driver path stores the
    correct type=... field in the CloudStack volume path (used later
    by the KVM driver to locate the device).

  * Attaches pod-backed NVMe-TCP volumes at the host-group level
    (POST /connections?host_group_names=...) instead of per-host, so
    the array assigns a consistent NSID to every member host; falls
    back to per-host attach for FC or when no hostgroup is configured.

  * Tolerates a missing nsid in the FlashArray connections response
    for NVMe-TCP - Purity does not return one for host-group NVMe
    connections; the namespace is identified on the host by EUI-128
    from FlashArrayVolume.getAddress(), so a placeholder value is
    returned to the caller purely for informational tracking.

  * Resolves NVMETCP addresses back to volumes in getVolumeByAddress
    by reversing the EUI-128 layout (strip optional eui. prefix, drop
    leading 00 and the embedded Pure OUI).

  * Indexes NVMe connections in getConnectionIdMap by host name (the
    array returns one entry per host inside a host-group connection),
    so connid.<hostname> tokens in the path still match in
    parseAndValidatePath on the KVM side.

Followed by a matching adaptive/KVM driver change (separate commit).
NVMe-oF over TCP (NVMe-TCP) is conceptually a separate storage fabric
from Fibre Channel / iSCSI: it speaks the NVMe command set rather than
SCSI, identifies namespaces by EUI-128 NGUIDs rather than WWNs, and on
Linux is multipathed natively by the nvme driver rather than by
device-mapper multipath. Giving it its own StoragePoolType lets the
KVM agent dispatch the adaptive driver to a dedicated NVMe-oF adapter
(added in the next commit) without polluting the existing Fibre Channel
code path.

The new value is wired into the same format-routing and derivePath
fall-through paths that already special-case FiberChannel in
KVMStorageProcessor: NVMe-TCP volumes are also RAW and carry their
device path in DataObjectTO.path rather than in a managedStoreTarget
detail.
Introduce an NVMe-over-Fabrics counterpart to the existing
MultipathSCSIAdapterBase / FiberChannelAdapter pair.

NVMe-oF is conceptually distinct from SCSI - it speaks the NVMe command
set, identifies namespaces by EUI-128 NGUIDs, and is multipathed by the
kernel natively rather than by device-mapper - so keeping it out of the
SCSI code path avoids special-casing inside every method that handles
volume paths, connect, disconnect, or size lookup.

MultipathNVMeOFAdapterBase (abstract)

  * Parses volume paths of the form
        type=NVMETCP; address=<eui>; connid.<host>=<nsid>; ...
    into an AddressInfo whose path is
        /dev/disk/by-id/nvme-eui.<eui>
    which is the udev symlink the kernel emits for every NVMe namespace.

  * connectPhysicalDisk polls the udev path and, on every iteration,
    triggers nvme ns-rescan on all local NVMe controllers, to cover
    target/firmware combinations that do not send an asynchronous event
    notification when a new namespace is mapped.

  * disconnectPhysicalDisk is a no-op; the kernel drops the namespace
    when the target removes the host-group connection. The
    ByPath variant only claims paths starting with
    /dev/disk/by-id/nvme-eui. so foreign paths still fall through to
    other adapters.

  * Delegates getPhysicalDisk, isConnected, and getPhysicalDiskSize to
    plain test -b / blockdev --getsize64 calls - no SCSI rescan, no dm
    multipath, no multipath-map cleanup timer.

  * createPhysicalDisk / createTemplateFromDisk / listPhysicalDisks /
    copyPhysicalDisk all throw UnsupportedOperationException - these
    are the responsibility of the storage provider, not the KVM
    adapter, same as the SCSI base.

MultipathNVMeOFPool

  * KVMStoragePool mirror of MultipathSCSIPool. Defaults to
    Storage.StoragePoolType.NVMeTCP in the parameterless-fallback
    constructor.

NVMeTCPAdapter

  * Concrete adapter that registers itself for
    Storage.StoragePoolType.NVMeTCP via the reflection-based scan in
    KVMStoragePoolManager. Carries no logic of its own beyond binding
    the base to the pool type.

A similar MultipathNVMeOFAdapterBase-derived NVMeRoCEAdapter (or
NVMeFCAdapter) can later be added by adding one concrete subclass and a
new pool-type value; the base does not assume any particular
fabric-level transport.
The adaptive storage framework hard-coded FiberChannel as the KVM-side
pool type for every provider it fronts. With a separate NVMeTCP pool
type now available (and a dedicated NVMe-oF adapter on the KVM side),
teach the lifecycle to route a pool to the right adapter based on a
transport= URL parameter:

  https://user:pass@host/api?...&transport=nvme-tcp

  -> StoragePoolType.NVMeTCP -> NVMeTCPAdapter on the KVM host

When the query parameter is absent the default stays FiberChannel, so
existing FC deployments on Primera or FlashArray continue to work
unchanged.

The choice is made in the shared AdaptiveDataStoreLifeCycleImpl rather
than inside each vendor plugin so every adaptive provider (FlashArray,
Primera, any future one) speaks the same configuration vocabulary.
@winterhazel
Copy link
Copy Markdown
Member

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@winterhazel a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

The NVMe-oF KVM adapter refused every template copy request from the
adaptive storage orchestrator with UnsupportedOperationException, which
made it impossible to use an NVMe-TCP pool as primary storage for a VM
root disk: every deploy that landed a root volume on the pool failed
as soon as CloudStack tried to lay down the template.

Implement it the same way FiberChannel (SCSI) does: the storage provider
creates and connects a raw namespace ahead of time, then the adapter
resolves the host-side /dev/disk/by-id/nvme-eui.<NGUID> path via the
existing getPhysicalDisk plumbing (which will nvme ns-rescan and wait
for the symlink if the kernel has not yet picked it up) and qemu-img
converts the source image into the raw block device.

User-space encrypted source or destination volumes are rejected: the
FlashArray already encrypts at rest and layering qemu-img LUKS on top
of a hostgroup-scoped namespace shared between hosts is not a sensible
layering. Source encryption would also break on migration because the
passphrase does not travel.

With this change a CloudStack KVM VM can have its ROOT volume on an
NVMe-TCP pool (tested end-to-end on 4.23-SNAPSHOT against Purity 6.7.7:
template copy, first boot, live migrate with data disk, VM snapshot
with quiesce, and revert all work).

Signed-off-by: Eugenio Grosso <eugenio.grosso@gmail.com>
@genegr
Copy link
Copy Markdown
Author

genegr commented Apr 22, 2026

Heads-up: pushed an additional commit c0cdfa41da — kvm: implement copyPhysicalDisk on MultipathNVMeOFAdapterBase to this PR. The original description noted this as "future work, not in this PR", but after validating the rest of the NVMe-TCP path end-to-end I wanted VMs to be fully deployable on an NVMe-TCP pool (root + data), which requires copyPhysicalDisk to land the template as a raw image on the provisioned namespace.

Implementation mirrors MultipathSCSIAdapterBase.copyPhysicalDisk: resolve the destination device path via the existing getPhysicalDisk plumbing (which triggers nvme ns-rescan and waits for the by-id/nvme-eui.<NGUID> symlink), then qemu-img convert the source image into the raw block device. User-space encrypted source or destination volumes are rejected by design — the FlashArray already encrypts at rest and layering qemu-img LUKS on top of a hostgroup-scoped namespace is not a sensible layering (and would break across live-migration).

With this commit I was able to:

  • Deploy a Rocky 9 VM with pooltype: NVMeTCP on the root volume (previously the deploy failed as soon as the root disk tried to land on the NVMe-TCP pool).
  • Attach an additional tags=nvme data disk, so both vda and vdb are NVMe-backed.
  • createVMSnapshot with quiescevm=true, snapshotmemory=false → array-side snapshots on both volumes, CloudStack state: Ready, type: Disk.
  • revertToVMSnapshot → both volumes came back with identical SHA-256 content to pre-snapshot.

I've also updated the PR description to reflect the 7-commit set and add the full-NVMe test evidence. Happy to split this commit into a separate follow-up PR if reviewers prefer — let me know.

@blueorangutan
Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 17578

* {@link KVMStoragePoolManager} can find it via reflection.
*/
public class NVMeTCPAdapter extends MultipathNVMeOFAdapterBase {
private static final Logger LOGGER_NVMETCP = LogManager.getLogger(NVMeTCPAdapter.class);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private static final Logger LOGGER_NVMETCP = LogManager.getLogger(NVMeTCPAdapter.class);
private static final Logger LOGGER = LogManager.getLogger(NVMeTCPAdapter.class);

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds opt-in NVMe-over-TCP (NVMe-oF/TCP) support for KVM managed primary storage via the adaptive storage framework, with the FlashArray adaptive plugin as the first consumer. This introduces a new StoragePoolType.NVMeTCP, NVMe EUI-128 addressing, and a KVM-side NVMe-oF adapter base to surface namespaces via /dev/disk/by-id/nvme-eui.<eui>.

Changes:

  • Introduces NVMe-TCP transport selection (transport=nvme-tcp) and maps it to a new StoragePoolType.NVMeTCP.
  • Extends FlashArray adapter to generate/parse NVMe EUI-128 addresses and use host-group scoped connections for consistent namespace identity.
  • Adds KVM NVMe-oF adapter/pool implementations and updates KVM storage processor handling (RAW format + path derivation) for the new pool type.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
plugins/storage/volume/flasharray/src/main/java/org/apache/cloudstack/storage/datastore/adapter/flasharray/FlashArrayVolume.java Adds NVMe EUI-128 address construction for NVMe-TCP volumes.
plugins/storage/volume/flasharray/src/main/java/org/apache/cloudstack/storage/datastore/adapter/flasharray/FlashArrayConnection.java Adds nsid field to model NVMe namespace IDs in connection payloads.
plugins/storage/volume/flasharray/src/main/java/org/apache/cloudstack/storage/datastore/adapter/flasharray/FlashArrayAdapter.java Adds transport selection, NVMe attach/lookup behavior, and address-type stamping for returned volumes.
plugins/storage/volume/adaptive/src/main/java/org/apache/cloudstack/storage/datastore/lifecycle/AdaptiveDataStoreLifeCycleImpl.java Chooses pool type from provider URL transport= query parameter (defaults to FiberChannel).
plugins/storage/volume/adaptive/src/main/java/org/apache/cloudstack/storage/datastore/adapter/ProviderVolume.java Adds AddressType.NVMETCP for provider volume addressing.
plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/storage/NVMeTCPAdapter.java Registers a KVM storage adapter for StoragePoolType.NVMeTCP.
plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/storage/MultipathNVMeOFPool.java Adds a pool implementation delegating operations back to the NVMe-oF adapter.
plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/storage/MultipathNVMeOFAdapterBase.java Implements NVMe-oF attach/wait-for-namespace and qemu-img convert copy into namespaces.
plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/storage/KVMStorageProcessor.java Treats NVMeTCP pools like other managed/shared block pools for RAW format and path derivation.
api/src/main/java/com/cloud/storage/Storage.java Adds new enum value StoragePoolType.NVMeTCP.
PendingReleaseNotes Documents the new NVMe-oF/TCP support and required components.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +237 to +238
p.waitFor(NS_RESCAN_TIMEOUT_SECS, TimeUnit.SECONDS);
}
Comment on lines +188 to 195
if (AddressType.NVMETCP.equals(volumeAddressType)) {
if (conn.getHostGroup() != null && conn.getHostGroup().getName() != null
&& conn.getHostGroup().getName().equals(hostgroup)) {
return conn.getNsid() != null ? "" + conn.getNsid() : "1";
}
} else if (conn.getHost() != null && conn.getHost().getName() != null &&
(conn.getHost().getName().equals(hostname) || conn.getHost().getName().equals(hostname.substring(0, hostname.indexOf('.')))) &&
conn.getLun() != null) {
Comment on lines 162 to 164
if (list == null || list.getItems() == null || list.getItems().size() == 0) {
throw new RuntimeException("Volume attach did not return lun information");
}
Comment on lines +291 to +295
// Reverse the EUI-128 layout: serial = eui[2:16] + eui[22:32], after
// stripping the optional "eui." prefix that appears in udev paths.
String eui = address.startsWith("eui.") ? address.substring(4) : address;
serial = (eui.substring(2, 16) + eui.substring(22)).toUpperCase();
} else {
@@ -781,6 +816,13 @@ private FlashArrayVolume getSnapshot(String snapshotName) {
return (FlashArrayVolume) getFlashArrayItem(list);
Comment on lines +114 to +118
if (AddressType.NVMETCP.equals(addressType)) {
// EUI-128 layout for FlashArray NVMe namespaces:
// 00 + serial[0:14] + <Pure OUI (24a937)> + serial[14:24]
// This is the value the Linux kernel exposes as
// /dev/disk/by-id/nvme-eui.<result>
if (details != null && details.containsKey(com.cloud.storage.StorageManager.STORAGE_POOL_DISK_WAIT.toString())) {
String waitTime = details.get(com.cloud.storage.StorageManager.STORAGE_POOL_DISK_WAIT.toString());
if (StringUtils.isNotEmpty(waitTime)) {
waitSecs = Integer.parseInt(waitTime);
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 23, 2026

Codecov Report

❌ Patch coverage is 0.50251% with 396 lines in your changes missing coverage. Please review.
✅ Project coverage is 19.15%. Comparing base (3166e64) to head (c0cdfa4).

Files with missing lines Patch % Lines
...rvisor/kvm/storage/MultipathNVMeOFAdapterBase.java 0.00% 244 Missing ⚠️
...ud/hypervisor/kvm/storage/MultipathNVMeOFPool.java 0.00% 79 Missing ⚠️
...atastore/adapter/flasharray/FlashArrayAdapter.java 0.00% 38 Missing ⚠️
...m/cloud/hypervisor/kvm/storage/NVMeTCPAdapter.java 0.00% 14 Missing ⚠️
...tore/lifecycle/AdaptiveDataStoreLifeCycleImpl.java 0.00% 10 Missing ⚠️
...ud/hypervisor/kvm/storage/KVMStorageProcessor.java 0.00% 4 Missing ⚠️
...store/adapter/flasharray/FlashArrayConnection.java 0.00% 3 Missing ⚠️
...tack/storage/datastore/adapter/ProviderVolume.java 0.00% 2 Missing ⚠️
...datastore/adapter/flasharray/FlashArrayVolume.java 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #13061      +/-   ##
============================================
+ Coverage     18.01%   19.15%   +1.13%     
+ Complexity    16607    16603       -4     
============================================
  Files          6029     5568     -461     
  Lines        542160   502404   -39756     
  Branches      66451    58940    -7511     
============================================
- Hits          97682    96245    -1437     
+ Misses       433461   395337   -38124     
+ Partials      11017    10822     -195     
Flag Coverage Δ
uitests ?
unittests 19.15% <0.50%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants