Conversation
…could not be git moved AND updated in the same operation because git instead interpreted this as deleting the old file and creating a new one. To preserve git history, the files have been moved in this commit and will be updated in the next commit.
…entionally moved, but not updated, have now been updated with the new format.
…t missing from here - notably content which has a MANUAL_REVIEW flag. This content does not parse until it has been updated, which means it could not be compiled into the schemas. These files will be updated when all content in the repo successfully parses.
…ged since the pr was first opened
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
ESCU 6.0.0 — Migration to contentctl-ng
This PR contains changes relevant to STRT's migration from the legacy
contentctltool to the new ESCU workflow and tooling (contentctl-ng). As part of this migration, a number of high-level changes have been made to the structure of YML files across all content types.High-Level Changes
Removal of
deployments/and creation ofschedules/Deployment objects, which contained information about both scheduling and the configuration of Notable and RBA generation, have been removed. As a replacement:
findingandintermediate_findingsis now explicit in Detection YMLs themselves.schedules/directory has been created to support the definition of reusable schedule objects. These objects contain information about scheduling and only scheduling.Migration of
removed/deprecation_mapping.YMLinformation into relevant content filesThe
removed/deprecation_mapping.YMLfile has been removed from the repository. However, all information contained in this file has been persisted into the relevant content files (including those already present inremoved/detections/,removed/stories/, andremoved/baselines/). This information is now contained in a section calleddeprecation_info.New and Updated Fields (all content types)
The following fields are now common to all content types:
baselines/,dashboards/,data_sources/,detections/,lookups/,macros/, andstories/.versionversion: 1.version = previous_version + 1), reflecting the changes made to all YMLs.authorMacros did not previously contain an
author:field. These files have all been updated withauthor: Splunk Threat Research Team.idMacros did not previously contain an
id:field. These files have all been updated with a generatedid: uuid.UUID4().date:replaced bycreation_date:andmodification_date:In previous versions of the repo, only the
date:field existed, making it difficult to determine the true "age" of content. This field has been removed and replaced with two fields:creation_date:— Populated automatically by digging through git history to determine when a file was first created, usinggit log --follow. While this accounts for renames and moves, it may not be perfect. This is a best-effort population of the creation date. In the future, when content is updated, this value should not change.modification_date:— Set to today, as all content was updated when it was ported. In the future, when content is updated, this field should change to reflect that the content has been modified.YAML Formatting Preservation
The porting tool uses
ruamel.yamlin round-trip mode to preserve inline comments and YAML formatting during transformation. This works correctly for block scalars (|literal and>folded style), which represent the majority of multi-line string values in this repository (e.g.,description,search,how_to_implement).Known limitation — flow scalar line wrapping: The YAML 1.2 specification defines that line-break positions within flow scalars (plain, single-quoted, or double-quoted strings that are not block scalars) are purely cosmetic — they are folded to spaces at parse time and their exact positions are not stored. As a result, ruamel cannot preserve original line-break positions for flow scalars in round-trip mode: this information is irrecoverably lost when the document is loaded, before any dump occurs. The porting tool mitigates this by setting
yaml.width = 32768, which prevents the emitter from introducing new line breaks into flow scalars that were already on a single line. However, flow scalars that were originally written across multiple lines in source will be emitted as a single line after porting. The string values are semantically identical — only the cosmetic line-wrap formatting changes. No workaround exists for this limitation within the standard ruamel load/dump API; it is an acknowledged open issue in the ruamel.yaml tracker (tickets #568, #561, #562).Known limitation - spacing inconsistencies rectified in non-detection YML files: Files with spacing inconsistent with spacing in detections files, for instance:
Are not preserved. This is a limitaiton of the ruamel.yaml parser/dumper - the number of leading whitespaces are not preserved after loading. As an example of these differences, please look at any data_source file.
Known gap — inline comment loss: One file was identified where inline comments were not preserved during serialization:
detections/network/cisco_secure_firewall___high_volume_of_intrusion_events_per_host.ymlhad inline comments on itsmitre_attack_identries (e.g.,# Command and Scripting Interpreter). These have been manually restored.Changes by Content Type
Baselines
REMOVED –
dateADDED –
creation_date,modification_dateREMOVED –
type: "Baseline"This field was a required literal in the legacy schema but is now redundant given the directory location. It has been removed.
REMOVED –
tags.analytic_storyRemoved per schema discussions with STRT. These values are not surfaced in product or on research.splunk.com. If they must be surfaced in the future, they will be computed at runtime by walking the detections that reference each baseline and combining those detections'
analytic_storyfields.MIGRATED –
tags.detectionsDetections that use baselines now directly reference the name of the baseline in their detection under the
baselines:key. The relationship has been inverted: previously baselines listed their detections; now detections list their baselines.MIGRATED –
deploymentThis field previously contained custom scheduling information. It has been migrated to the
custom_schedulekey.ADDED (where appropriate) –
scheduleBecause baselines frequently define custom schedules, every baseline must now explicitly declare either a
custom_scheduleorschedule. Baselines without custom schedules includeschedule: Default Baseline.ADDED (where appropriate) –
custom_scheduleThis field replaces the
deploymentfield that could be declared in a baseline. It expresses scheduling information specific to this baseline.Dashboards
REMOVED –
dateADDED –
creation_date,modification_dateData Sources
REMOVED –
dateADDED –
creation_date,modification_dateREMOVED –
statusAlways
"production"for data sources; removed as redundant.Deployments
All deployments have been removed. Their scheduling logic has been migrated into
schedules/default_baseline.ymlandschedules/default_eventbaseddetection.yml. The behavior expressed in deployments now lives in thefinding:andintermediate_findings:sections of detection files themselves.Detections
REMOVED –
dateADDED –
creation_date,modification_dateADDED –
baselinesAny baselines that this detection uses are now listed directly in the detection under the
baselines:key.ADDED –
deprecation_infoOnly for detections with
status: deprecated. Please see the note ondeprecation_infoabove. Five detections indetections/deprecated/have had this modification made.ADDED (where appropriate) –
findingADDED (where appropriate) –
intermediate_findingsMIGRATED (where appropriate) –
rbaRBA information has been migrated into
findingandintermediate_findingsobjects where appropriate. See the RBA Migration section below for full details.ADDED –
categoryExpresses the category of the detection and the directory it must live in. Supported categories:
application,cloud,endpoint,network,web, anddeprecated. Note: whilecategory: deprecatedis currently a valid value (assigned automatically from the filesystem path), this allowance will be removed in a future release. At that point, deprecated detections will be required to retain their original category value (e.g.,endpoint,cloud) when moved intodetections/deprecated/. The"deprecated"category value exists only for backward compatibility during this transition.REQUIRED –
data_sourceEvery detection must now include a
data_sourcekey. An empty list (data_source: []) is acceptable for detections that do not depend on a specific data source. Detections missing this key entirely have been flagged withMANUAL_REVIEW.ADDED –
test_type(within tests)Each test entry now requires a
test_typefield. The mapping from legacy to new is:test_typeproductionunitdeprecatedunitexperimentalexperimental(+ auto-generateddescription)tags.manual_testpresentexperimental(ported rationale used asdescription)Additionally,
attack_data[N].custom_indexhas been renamed toindex.Investigations
contentctl-ngdoes not process or validate investigation files. Investigations were removed from ESCU in v5.0.0 and are no longer an ES feature. Theremoved/investigations/directory is carried forward unchanged besides modification to containcreation_date,modification_date, anddeprecation_infoin line with all otherremoved/*content.Lookups
REMOVED –
dateADDED –
creation_date,modification_dateADDED – subdirectory structure
CSV lookups are now located under
lookups/csv/and KVStore lookups underlookups/kvstore/. Previously all lookup YMLs lived flat inlookups/.REMOVED –
mlmodellookups5 mlmodel lookup files exist in the repository. These lookups supported MLTK-based detections that were deprecated in v5.25.0 (#3922) and whose corresponding detections were removed in v5.26.0 (#3989). The mlmodel files should have been removed at that time but were not. Removing them now is housekeeping. No active detections reference these files. They are not supported by
contentctl-ngand have been excluded from the ported output.FIXED –
default_matchtype coercionSeveral lookup YMLs had
default_match: false, which YAML parsers interpret as a boolean. These have been corrected todefault_match: "false"(string) to match the intended field type.Macros
ADDED –
idADDED –
version(set to1)ADDED –
creation_dateADDED –
modification_dateADDED –
author(set toSplunk Threat Research Team)REMOVED –
datePlaybooks
contentctl-ngdoes not validate or build playbook content. Theplaybooks/directory and its associated.json/.pycompanion files are not processed by -ng and are preserved unchanged in this PR. Note that legacycontentctldid validate playbook YMLs and their companion files — this validation is no longer being performed.Action requested from STRT: Please confirm whether ongoing validation of playbook content is desired. If yes, a short-term workaround is a dedicated GitHub Actions workflow that runs legacy
contentctlvalidation against playbooks only, providing a bridge until nativecontentctl-ngsupport is scoped and implemented.Schedules
Schedule objects have been created, maintaining the scheduling information originally contained in Deployment objects.
Stories
REMOVED –
dateADDED –
creation_date,modification_dateREMOVED –
"Splunk Security Analytics for AWS"fromproductThis product value is no longer valid in the -ng schema. It has been removed from all story
productlists where present.Removed Content
Removed content in
removed/detections/,removed/stories/,removed/baselines/, andremoved/investigations/has intentionally not been modified to conform to new schema requirements. However, these files have been updated in two ways:deprecation_infois now contained in these files, as noted in the section ondeprecation_infoabove.date:field has been replaced withcreation_dateandmodification_date.RBA Migration
The
rba:block has been removed from all detections and its content migrated into three new top-level fields:finding,intermediate_findings, andthreat_objects. The conversion logic differs by detection type.TTP
Legacy
rba.risk_objectsare converted tofindingand (optionally)intermediate_findingsaccording to the following rules:findingentityintermediate_findingsfinding.title←rba.messageintermediate_findings.entities[N].message←rba.message(replicated per entity)threat_objects←rba.threat_objects(if non-empty)Anomaly
All
rba.risk_objectsbecomeintermediate_findings.entities. Nofindingis created.intermediate_findings.entities[N].message←rba.messagethreat_objects←rba.threat_objects(if non-empty)Correlation
Legacy Correlation detections had no
rbasection. The new schema requires afindingwith one entity. Because there is no source data to migrate, all Correlation detections have been flagged withMANUAL_REVIEW— a content author must supply thefindingentity for each one.Hunting
Legacy Hunting detections had no
risk_objectsand nothreat_objects. No RBA-derived fields are added.MANUAL_REVIEW
Approximately 73 YML files require manual review. Each of these files contains a
MANUAL_REVIEWkey in their YML explaining why the file has been flagged. These issues must be resolved and theMANUAL_REVIEWkey removed for these files to pass validation.A summary of the most common reasons for
MANUAL_REVIEW:findingwith one entity; legacy had norbasection — must be authored manuallyfinding.titleorintermediate_findings[N].messageis missing at least one$fieldName$token (at least one required)finding.titleorintermediate_findings[N].messagehas mismatched$delimitersFor verbose information on these errors, search the branch for the string
MANUAL_REVIEWor review therba_upgrade_tracking.jsonfile included in this PR.