Skip to content

tests: migrate to data-driven fixtures for feature presence#2986

Draft
williballenthin wants to merge 2 commits intomasterfrom
data-fixtures-2
Draft

tests: migrate to data-driven fixtures for feature presence#2986
williballenthin wants to merge 2 commits intomasterfrom
data-fixtures-2

Conversation

@williballenthin
Copy link
Copy Markdown
Collaborator

@williballenthin williballenthin commented Apr 2, 2026

closes #2743

Checklist

  • No CHANGELOG update needed
  • No new tests needed
  • No documentation update needed
  • This submission includes AI-generated code and I have provided details in the description.

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

Comment on lines +67 to +88
if key == "basic block":
return capa.features.basicblock.BasicBlock()

if key.startswith("operand[") and key.endswith("].number"):
index = int(key[len("operand[") : -len("].number")])
return capa.features.insn.OperandNumber(index, int(raw_value, 0))

if key.startswith("operand[") and key.endswith("].offset"):
index = int(key[len("operand[") : -len("].offset")])
return capa.features.insn.OperandOffset(index, int(raw_value, 0))

if key == "regex":
return capa.features.common.Regex(raw_value)

if key in ("number", "offset"):
value = int(raw_value, 0)
elif key == "bytes":
value = capa.rules.parse_bytes(raw_value)
else:
value = raw_value

FeatureClass = capa.rules.parse_feature(key)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if we can/should merge this functionality into capa.rules.parse_feature

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request migrates test feature definitions from hardcoded Python lists to a centralized fixtures.json file, enabling a more data-driven testing framework. New utility functions, parse_feature_string and _load_feature_tests, were added to handle the dynamic loading and parsing of these tests. Review feedback highlights a bug in regex pattern initialization, the omission of test marks (skips/xfails) from the loading logic, and opportunities to improve performance and maintainability by sorting all test lists and consolidating file path definitions to avoid duplication.


for entry in data["features"]:
feature = parse_feature_string(entry["feature"])
test_tuple = (entry["file"], entry["location"], feature, entry["expected"])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The marks field defined in fixtures.json (e.g., lines 95-106) is currently ignored by the loading logic. These marks are intended to skip or xfail tests for specific backends (like idalib or freeze). Ignoring them may lead to unexpected test failures when running against those backends.

Comment on lines +107 to +108
presence_tests.sort(key=lambda t: (t[0], t[1]))
return presence_tests, symtab_tests
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Sorting the test cases by file and location helps improve performance by making the LRU cache for extractors more effective. This should be applied to symtab_tests as well for consistency.

Suggested change
presence_tests.sort(key=lambda t: (t[0], t[1]))
return presence_tests, symtab_tests
presence_tests.sort(key=lambda t: (t[0], t[1]))
symtab_tests.sort(key=lambda t: (t[0], t[1]))
return presence_tests, symtab_tests

Comment on lines +550 to +551
elif name.startswith("2d3edc"):
return CD / "data" / "2d3edc218a90f03089cc01715a9f047f.exe_"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The file path for 2d3edc is now defined in both fixtures.json (lines 64-67) and hardcoded here. To maintain a single source of truth and fully leverage the data-driven approach introduced in this PR, consider refactoring get_data_path_by_name to load paths from the files section of fixtures.json instead of continuing to add hardcoded entries.

@williballenthin williballenthin force-pushed the data-fixtures-2 branch 2 times, most recently from 3c61284 to 463ff59 Compare April 13, 2026 14:48
@williballenthin williballenthin force-pushed the data-fixtures-2 branch 2 times, most recently from 283ff40 to 6e4bb6b Compare April 14, 2026 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

migrate feature tests fixtures to a JSON representation

1 participant