Skip to content

ESCU 6 YAML Porting and Updates#4082

Draft
pyth0n1c wants to merge 8 commits into
developfrom
escu_6
Draft

ESCU 6 YAML Porting and Updates#4082
pyth0n1c wants to merge 8 commits into
developfrom
escu_6

Conversation

@pyth0n1c
Copy link
Copy Markdown
Collaborator

@pyth0n1c pyth0n1c commented May 13, 2026

ESCU 6.0.0 — Migration to contentctl-ng

Note: Because there are presently several files that require MANUAL_REVIEW and there are tooling changes in flight, this content is not yet validated by CI/CD workflows.

This PR contains changes relevant to STRT's migration from the legacy contentctl tool to the new ESCU workflow and tooling (contentctl-ng). As part of this migration, a number of high-level changes have been made to the structure of YML files across all content types.


High-Level Changes

Removal of deployments/ and creation of schedules/

Deployment objects, which contained information about both scheduling and the configuration of Notable and RBA generation, have been removed. As a replacement:

  1. Generation of finding and intermediate_findings is now explicit in Detection YMLs themselves.
  2. The schedules/ directory has been created to support the definition of reusable schedule objects. These objects contain information about scheduling and only scheduling.

Migration of removed/deprecation_mapping.YML information into relevant content files

The removed/deprecation_mapping.YML file has been removed from the repository. However, all information contained in this file has been persisted into the relevant content files (including those already present in removed/detections/, removed/stories/, and removed/baselines/). This information is now contained in a section called deprecation_info.

New and Updated Fields (all content types)

The following fields are now common to all content types: baselines/, dashboards/, data_sources/, detections/, lookups/, macros/, and stories/.

version

  1. Version was added to all macros, with all macros having version: 1.
  2. For all other objects, version was bumped by 1 (version = previous_version + 1), reflecting the changes made to all YMLs.

author
Macros did not previously contain an author: field. These files have all been updated with author: Splunk Threat Research Team.

id
Macros did not previously contain an id: field. These files have all been updated with a generated id: uuid.UUID4().

date: replaced by creation_date: and modification_date:

In previous versions of the repo, only the date: field existed, making it difficult to determine the true "age" of content. This field has been removed and replaced with two fields:

  • creation_date: — Populated automatically by digging through git history to determine when a file was first created, using git log --follow. While this accounts for renames and moves, it may not be perfect. This is a best-effort population of the creation date. In the future, when content is updated, this value should not change.
  • modification_date: — Set to today, as all content was updated when it was ported. In the future, when content is updated, this field should change to reflect that the content has been modified.

YAML Formatting Preservation

The porting tool uses ruamel.yaml in round-trip mode to preserve inline comments and YAML formatting during transformation. This works correctly for block scalars (| literal and > folded style), which represent the majority of multi-line string values in this repository (e.g., description, search, how_to_implement).

Known limitation — flow scalar line wrapping: The YAML 1.2 specification defines that line-break positions within flow scalars (plain, single-quoted, or double-quoted strings that are not block scalars) are purely cosmetic — they are folded to spaces at parse time and their exact positions are not stored. As a result, ruamel cannot preserve original line-break positions for flow scalars in round-trip mode: this information is irrecoverably lost when the document is loaded, before any dump occurs. The porting tool mitigates this by setting yaml.width = 32768, which prevents the emitter from introducing new line breaks into flow scalars that were already on a single line. However, flow scalars that were originally written across multiple lines in source will be emitted as a single line after porting. The string values are semantically identical — only the cosmetic line-wrap formatting changes. No workaround exists for this limitation within the standard ruamel load/dump API; it is an acknowledged open issue in the ruamel.yaml tracker (tickets #568, #561, #562).

Known limitation - spacing inconsistencies rectified in non-detection YML files: Files with spacing inconsistent with spacing in detections files, for instance:

  • lists must have four leading spaces
  • mapping types (nested objects) must have four leading spaces
    Are not preserved. This is a limitaiton of the ruamel.yaml parser/dumper - the number of leading whitespaces are not preserved after loading. As an example of these differences, please look at any data_source file.

Known gap — inline comment loss: One file was identified where inline comments were not preserved during serialization: detections/network/cisco_secure_firewall___high_volume_of_intrusion_events_per_host.yml had inline comments on its mitre_attack_id entries (e.g., # Command and Scripting Interpreter). These have been manually restored.


Changes by Content Type

Baselines

REMOVED – date
ADDED – creation_date, modification_date

REMOVED – type: "Baseline"
This field was a required literal in the legacy schema but is now redundant given the directory location. It has been removed.

REMOVED – tags.analytic_story
Removed per schema discussions with STRT. These values are not surfaced in product or on research.splunk.com. If they must be surfaced in the future, they will be computed at runtime by walking the detections that reference each baseline and combining those detections' analytic_story fields.

MIGRATED – tags.detections
Detections that use baselines now directly reference the name of the baseline in their detection under the baselines: key. The relationship has been inverted: previously baselines listed their detections; now detections list their baselines.

MIGRATED – deployment
This field previously contained custom scheduling information. It has been migrated to the custom_schedule key.

ADDED (where appropriate) – schedule
Because baselines frequently define custom schedules, every baseline must now explicitly declare either a custom_schedule or schedule. Baselines without custom schedules include schedule: Default Baseline.

ADDED (where appropriate) – custom_schedule
This field replaces the deployment field that could be declared in a baseline. It expresses scheduling information specific to this baseline.


Dashboards

REMOVED – date
ADDED – creation_date, modification_date


Data Sources

REMOVED – date
ADDED – creation_date, modification_date

REMOVED – status
Always "production" for data sources; removed as redundant.


Deployments

All deployments have been removed. Their scheduling logic has been migrated into schedules/default_baseline.yml and schedules/default_eventbaseddetection.yml. The behavior expressed in deployments now lives in the finding: and intermediate_findings: sections of detection files themselves.


Detections

REMOVED – date
ADDED – creation_date, modification_date

ADDED – baselines
Any baselines that this detection uses are now listed directly in the detection under the baselines: key.

ADDED – deprecation_info
Only for detections with status: deprecated. Please see the note on deprecation_info above. Five detections in detections/deprecated/ have had this modification made.

ADDED (where appropriate) – finding
ADDED (where appropriate) – intermediate_findings

MIGRATED (where appropriate) – rba
RBA information has been migrated into finding and intermediate_findings objects where appropriate. See the RBA Migration section below for full details.

ADDED – category
Expresses the category of the detection and the directory it must live in. Supported categories: application, cloud, endpoint, network, web, and deprecated. Note: while category: deprecated is currently a valid value (assigned automatically from the filesystem path), this allowance will be removed in a future release. At that point, deprecated detections will be required to retain their original category value (e.g., endpoint, cloud) when moved into detections/deprecated/. The "deprecated" category value exists only for backward compatibility during this transition.

REQUIRED – data_source
Every detection must now include a data_source key. An empty list (data_source: []) is acceptable for detections that do not depend on a specific data source. Detections missing this key entirely have been flagged with MANUAL_REVIEW.

ADDED – test_type (within tests)
Each test entry now requires a test_type field. The mapping from legacy to new is:

Legacy detection status Resulting test_type
production unit
deprecated unit
experimental experimental (+ auto-generated description)
tags.manual_test present experimental (ported rationale used as description)

Additionally, attack_data[N].custom_index has been renamed to index.


Investigations

contentctl-ng does not process or validate investigation files. Investigations were removed from ESCU in v5.0.0 and are no longer an ES feature. The removed/investigations/ directory is carried forward unchanged besides modification to contain creation_date, modification_date, and deprecation_info in line with all other removed/* content.


Lookups

REMOVED – date
ADDED – creation_date, modification_date

ADDED – subdirectory structure
CSV lookups are now located under lookups/csv/ and KVStore lookups under lookups/kvstore/. Previously all lookup YMLs lived flat in lookups/.

REMOVED – mlmodel lookups
5 mlmodel lookup files exist in the repository. These lookups supported MLTK-based detections that were deprecated in v5.25.0 (#3922) and whose corresponding detections were removed in v5.26.0 (#3989). The mlmodel files should have been removed at that time but were not. Removing them now is housekeeping. No active detections reference these files. They are not supported by contentctl-ng and have been excluded from the ported output.

FIXED – default_match type coercion
Several lookup YMLs had default_match: false, which YAML parsers interpret as a boolean. These have been corrected to default_match: "false" (string) to match the intended field type.


Macros

ADDED – id
ADDED – version (set to 1)
ADDED – creation_date
ADDED – modification_date
ADDED – author (set to Splunk Threat Research Team)
REMOVED – date


Playbooks

contentctl-ng does not validate or build playbook content. The playbooks/ directory and its associated .json/.py companion files are not processed by -ng and are preserved unchanged in this PR. Note that legacy contentctl did validate playbook YMLs and their companion files — this validation is no longer being performed.

Action requested from STRT: Please confirm whether ongoing validation of playbook content is desired. If yes, a short-term workaround is a dedicated GitHub Actions workflow that runs legacy contentctl validation against playbooks only, providing a bridge until native contentctl-ng support is scoped and implemented.


Schedules

Schedule objects have been created, maintaining the scheduling information originally contained in Deployment objects.


Stories

REMOVED – date
ADDED – creation_date, modification_date

REMOVED – "Splunk Security Analytics for AWS" from product
This product value is no longer valid in the -ng schema. It has been removed from all story product lists where present.


Removed Content

Removed content in removed/detections/, removed/stories/, removed/baselines/, and removed/investigations/ has intentionally not been modified to conform to new schema requirements. However, these files have been updated in two ways:

  • deprecation_info is now contained in these files, as noted in the section on deprecation_info above.
  • The date: field has been replaced with creation_date and modification_date.

RBA Migration

The rba: block has been removed from all detections and its content migrated into three new top-level fields: finding, intermediate_findings, and threat_objects. The conversion logic differs by detection type.

TTP

Legacy rba.risk_objects are converted to finding and (optionally) intermediate_findings according to the following rules:

Scenario finding entity intermediate_findings
Exactly one user entity That user All system/other entities
Multiple user entities First user Remaining users + all system/other — flagged MANUAL_REVIEW
No user entities, one entity total That entity None
No user entities, multiple entities First entity Remaining entities — flagged MANUAL_REVIEW
  • finding.titlerba.message
  • intermediate_findings.entities[N].messagerba.message (replicated per entity)
  • threat_objectsrba.threat_objects (if non-empty)

Anomaly

All rba.risk_objects become intermediate_findings.entities. No finding is created.

  • intermediate_findings.entities[N].messagerba.message
  • threat_objectsrba.threat_objects (if non-empty)

Correlation

Legacy Correlation detections had no rba section. The new schema requires a finding with one entity. Because there is no source data to migrate, all Correlation detections have been flagged with MANUAL_REVIEW — a content author must supply the finding entity for each one.

Hunting

Legacy Hunting detections had no risk_objects and no threat_objects. No RBA-derived fields are added.


MANUAL_REVIEW

Approximately 73 YML files require manual review. Each of these files contains a MANUAL_REVIEW key in their YML explaining why the file has been flagged. These issues must be resolved and the MANUAL_REVIEW key removed for these files to pass validation.

A summary of the most common reasons for MANUAL_REVIEW:

Reason Count
Multiple user-type entities for a TTP — first user selected automatically 32 detections
Correlation detections require a finding with one entity; legacy had no rba section — must be authored manually 15 detections
finding.title or intermediate_findings[N].message is missing at least one $fieldName$ token (at least one required) 9 detections
A baseline references a detection that does not exist 6 baselines
No user-type entity, but multiple entities found for a TTP — first entity selected automatically 5 detections
Detection references a baseline that is itself flagged for manual review 2 detections
finding.title or intermediate_findings[N].message has mismatched $ delimiters 1 detection

For verbose information on these errors, search the branch for the string MANUAL_REVIEW or review the rba_upgrade_tracking.json file included in this PR.

pyth0n1c added 5 commits May 13, 2026 14:02
…could not be git moved AND updated in the same operation because git instead interpreted this as deleting the old file and creating a new one. To preserve git history, the files have been moved in this commit and will be updated in the next commit.
…entionally moved, but not updated, have now been updated with the new format.
…t missing from here - notably content which has a MANUAL_REVIEW flag. This content does not parse until it has been updated, which means it could not be compiled into the schemas. These files will be updated when all content in the repo successfully parses.
@github-actions github-actions Bot added Macros and removed Stories labels May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant