tests: migrate to data-driven fixtures for feature presence#2986
tests: migrate to data-driven fixtures for feature presence#2986williballenthin wants to merge 2 commits intomasterfrom
Conversation
There was a problem hiding this comment.
Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed
tests/fixtures.py
Outdated
| if key == "basic block": | ||
| return capa.features.basicblock.BasicBlock() | ||
|
|
||
| if key.startswith("operand[") and key.endswith("].number"): | ||
| index = int(key[len("operand[") : -len("].number")]) | ||
| return capa.features.insn.OperandNumber(index, int(raw_value, 0)) | ||
|
|
||
| if key.startswith("operand[") and key.endswith("].offset"): | ||
| index = int(key[len("operand[") : -len("].offset")]) | ||
| return capa.features.insn.OperandOffset(index, int(raw_value, 0)) | ||
|
|
||
| if key == "regex": | ||
| return capa.features.common.Regex(raw_value) | ||
|
|
||
| if key in ("number", "offset"): | ||
| value = int(raw_value, 0) | ||
| elif key == "bytes": | ||
| value = capa.rules.parse_bytes(raw_value) | ||
| else: | ||
| value = raw_value | ||
|
|
||
| FeatureClass = capa.rules.parse_feature(key) |
There was a problem hiding this comment.
i wonder if we can/should merge this functionality into capa.rules.parse_feature
There was a problem hiding this comment.
Code Review
This pull request migrates test feature definitions from hardcoded Python lists to a centralized fixtures.json file, enabling a more data-driven testing framework. New utility functions, parse_feature_string and _load_feature_tests, were added to handle the dynamic loading and parsing of these tests. Review feedback highlights a bug in regex pattern initialization, the omission of test marks (skips/xfails) from the loading logic, and opportunities to improve performance and maintainability by sorting all test lists and consolidating file path definitions to avoid duplication.
|
|
||
| for entry in data["features"]: | ||
| feature = parse_feature_string(entry["feature"]) | ||
| test_tuple = (entry["file"], entry["location"], feature, entry["expected"]) |
There was a problem hiding this comment.
| presence_tests.sort(key=lambda t: (t[0], t[1])) | ||
| return presence_tests, symtab_tests |
There was a problem hiding this comment.
Sorting the test cases by file and location helps improve performance by making the LRU cache for extractors more effective. This should be applied to symtab_tests as well for consistency.
| presence_tests.sort(key=lambda t: (t[0], t[1])) | |
| return presence_tests, symtab_tests | |
| presence_tests.sort(key=lambda t: (t[0], t[1])) | |
| symtab_tests.sort(key=lambda t: (t[0], t[1])) | |
| return presence_tests, symtab_tests |
| elif name.startswith("2d3edc"): | ||
| return CD / "data" / "2d3edc218a90f03089cc01715a9f047f.exe_" |
There was a problem hiding this comment.
The file path for 2d3edc is now defined in both fixtures.json (lines 64-67) and hardcoded here. To maintain a single source of truth and fully leverage the data-driven approach introduced in this PR, consider refactoring get_data_path_by_name to load paths from the files section of fixtures.json instead of continuing to add hardcoded entries.
71b4687 to
dc455d1
Compare
3c61284 to
463ff59
Compare
283ff40 to
6e4bb6b
Compare
6e4bb6b to
f8d4c8b
Compare
closes #2743
Checklist