Skip to content

Reduce strict-order dependency resolution overhead during large DataPack deploys#436

Draft
Copilot wants to merge 3 commits intomainfrom
copilot/fix-high-cpu-load-deployment
Draft

Reduce strict-order dependency resolution overhead during large DataPack deploys#436
Copilot wants to merge 3 commits intomainfrom
copilot/fix-high-cpu-load-deployment

Conversation

Copy link

Copilot AI commented Mar 9, 2026

Large DataPack deployments were spending excessive CPU in deployable-record selection, especially with strictOrder enabled, due to repeated datapack-level dependency checks across many records. This PR narrows that hotspot by reusing computed dependency state within each selection pass.

  • What changed

    • Added per-pass caches in DatapackDeployment.getDeployableRecords() for:
      • datapack status lookups (getDatapackStatus)
      • circular datapack dependency checks (isCircularDatapackDependency)
    • Threaded these caches through hasPendingDependencies(...) so repeated checks for the same datapack/pair are resolved once per pass instead of per record.
    • Kept existing behavior intact when hasPendingDependencies(...) is called without caches (non-cached fallback path).
  • Why this addresses CPU spikes

    • Strict-order resolution previously recomputed the same graph/status information many times in large batches.
    • Reuse of per-pass results reduces repeated traversal and dependency evaluation work, which scales better for DataPacks with high record counts.
  • Test updates

    • Extended datapackDeployment.test.ts with a focused case asserting lookup reuse in strict-order selection:
      • repeated dependencies to the same datapack trigger a single datapack-status lookup
      • repeated checks for the same datapack pair trigger a single circular-dependency evaluation
const datapackStatusCache = new Map<string, DeploymentStatus | undefined>();
const circularDependencyCache = new Map<string, boolean>();

if (record.isPending && record.retryCount == 0 &&
    !this.hasPendingDependencies(record, datapackStatusCache, circularDependencyCache)) {
  records.set(record.sourceKey, record);
}
Original prompt

This section details on the original issue you should resolve

<issue_title>High CPU load during deployment when watching large DataPacks</issue_title>
<issue_description>## Problem

When deploying DataPacks containing many files (>1000 files), the Vlocode extension causes sustained high CPU load throughout the deployment process, sometimes reaching 100% on a single core. This makes the system unresponsive and degrades the development experience.

Related to #424 but more specifically scoped: the issue appears during active deployment operations, not just when the extension is idle.

Expected behavior

Deployment should complete without causing system-wide CPU spikes. CPU usage should scale proportionally with DataPack size and deployment speed, not spike to maximum available resources.

Actual behavior

  • CPU usage jumps to 100% when vlocode deploy starts processing large DataPacks
  • The spike persists for the entire deployment duration
  • No correlated increase in memory usage, suggesting the issue is algorithmic rather than memory-related
  • Smaller DataPacks (<100 files) do not trigger the spike

Reproduction steps

  1. Create or load a Vlocity DataPack with >1000 files
  2. Run deployment via the VSCode extension or CLI
  3. Monitor CPU usage with top or Activity Monitor
  4. Observe CPU pinned at or near 100% during deployment

Investigation notes

The likely culprits are:

  • vlocity-deploy package iterating over files synchronously without yielding (blocking the event loop)
  • File system watcher callbacks firing for every file change during deployment
  • Polling loops in core IoC framework without backoff
  • Missing pagination or batching in Salesforce API calls within salesforce package

Since deployment is the high-load scenario, focus should be on packages/vlocity-deploy and its usage of packages/salesforce APIs.

Why this matters

Developers working with large DataPacks (common in enterprise Vlocity implementations) cannot use the extension productively. The system becomes unresponsive, blocking other work and degrading CI/CD pipeline performance.


Contributed by Klement Gunndu</issue_description>

Comments on the Issue (you are @copilot in this section)

@Codeneos Thanks for the detailed investigation — I really appreciate the level of detail and the suggestions. This gives some good leads to investigate further.

One thing to note is that during a deployment we typically don’t create or modify files locally, so the VSCode file watchers would probably not be the main culprit here.

You mentioned potential polling loops in the IOC core. Could you be a bit more specific about where you are seeing those? I don’t recall there being any polling loops there, but I may be mistaken and it would be useful to double-check.

Regarding the Salesforce API calls: those are probably not directly related to the CPU spike either. I have considered moving some of the calls to the generic batch API, but that would require quite a bit of refactoring and implementing it transparently is not trivial. At the moment that mechanism is only used during initialization when loading entity definitions.</comment_new>


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 2 commits March 9, 2026 22:31
Co-authored-by: Codeneos <787686+Codeneos@users.noreply.github.com>
Co-authored-by: Codeneos <787686+Codeneos@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix high CPU load during deployment of large DataPacks Reduce strict-order dependency resolution overhead during large DataPack deploys Mar 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

High CPU load during deployment when watching large DataPacks

2 participants