Enhance ghidra backend with existing project feature#3087
Enhance ghidra backend with existing project feature#3087saniyafatima07 wants to merge 16 commits into
Conversation
There was a problem hiding this comment.
Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed
There was a problem hiding this comment.
Code Review
This pull request introduces support for Ghidra projects (.gpr files) by adding utilities to navigate Ghidra project structures and updating the loader to handle project-based analysis. The changes include automatic backend detection for Ghidra projects and updated CLI logic to skip generic file extraction when a project is provided. Review feedback highlights the need for better terminal formatting in error messages, improved resource management to prevent program leaks during exceptions, and the enablement of function filters for the Ghidra backend.
| if backend == BACKEND_GHIDRA: | ||
| return {} | ||
|
|
||
| if input_format in STATIC_FORMATS: |
There was a problem hiding this comment.
Disabling extractor filters for the Ghidra backend prevents users from using --restrict-to-functions. Since Ghidra is a static analysis backend, it should support these filters. Merging this with the static format check allows filters to be applied correctly for Ghidra regardless of the input format detection.
| if backend == BACKEND_GHIDRA: | |
| return {} | |
| if input_format in STATIC_FORMATS: | |
| if input_format in STATIC_FORMATS or backend == BACKEND_GHIDRA: |
There was a problem hiding this comment.
@saniyafatima07 thoughts here? I'd expect the Ghidra backend to handle function restrictions set by the user. Is there a reason we're skipping them?
There was a problem hiding this comment.
I agree, Mike. Since Ghidra is a static-analysis backend, it should support function restrictions too. I had a slight misinterpretation while implementing this logic. I’ll correct it.
There was a problem hiding this comment.
@saniyafatima07 this still needs to be addressed.
CHANGELOG updated or no update needed, thanks! 😄
|
@mike-hunhoff |
mike-hunhoff
left a comment
There was a problem hiding this comment.
Great work @saniyafatima07 ! I left comments for your review. I'll do some more thinking on how to best handle the .gpr tests in the meantime.
There was a problem hiding this comment.
Great work splitting up the code into helper functions to keep things concise.
| if backend == BACKEND_GHIDRA: | ||
| return {} | ||
|
|
||
| if input_format in STATIC_FORMATS: |
There was a problem hiding this comment.
@saniyafatima07 thoughts here? I'd expect the Ghidra backend to handle function restrictions set by the user. Is there a reason we're skipping them?
Thank you for the review Mike. |
|
@mike-hunhoff @larchchen @Maijin I have made all the requested changes. |
mike-hunhoff
left a comment
There was a problem hiding this comment.
Thank you @saniyafatima07 ! I've left comments for your review.
| exctype_str = str(exctype) | ||
| # Give a targeted message when the Ghidra project DB is locked. | ||
| if "LockException" in exctype_str or "ghidra.framework.store.LockException" in exctype_str: | ||
| print( | ||
| f"Unexpected exception raised: {exctype}.\n It looks like the Ghidra project database is locked. " | ||
| "Please close the project in the Ghidra GUI (or other process) and try again. For details, run in debug mode (-d/--debug).", | ||
| file=sys.stderr, | ||
| ) |
There was a problem hiding this comment.
This is very specific to Ghidra, so please move it closer to the Ghidra-specific code related to opening an existing project. Let's add a new exception type under capa.exceptions called LockedProjectDatabaseError that gracefully handles this case, and propagates the message accordingly. We should also consider trimming down the message, e.g. "Ghidra project database is locked. Ensure all programs accessing <database_name>.gpr are closed before proceeding."
There was a problem hiding this comment.
Also, please add a new return value for this case:
Lines 116 to 132 in 33701d6
| if backend == BACKEND_GHIDRA: | ||
| return {} | ||
|
|
||
| if input_format in STATIC_FORMATS: |
There was a problem hiding this comment.
@saniyafatima07 this still needs to be addressed.
|
Thank you for the review @mike-hunhoff . I have addressed all the requested changes. |
mike-hunhoff
left a comment
There was a problem hiding this comment.
Thanks @saniyafatima07 , I've left comments for your review!
| except Exception: | ||
| if program is not None: | ||
| program.release(consumer) | ||
| project_cm.__exit__(None, None, None) | ||
| tmpdir.cleanup() | ||
| if tmpdir: | ||
| tmpdir.cleanup() | ||
| raise |
There was a problem hiding this comment.
This is critical to preventing an error or exception from holding the database lock, so let's bump the robustness just a bit by isolating each step in a nested try/except:
except Exception:
if program is not None:
try:
program.release(consumer)
except Exception:
logger.warning("failed to release program handle", exc_info=True)
try:
project_cm.__exit__(None, None, None)
except Exception:
logger.warning("failed to close Ghidra project", exc_info=True)
if tmpdir:
try:
tmpdir.cleanup()
except Exception:
pass
raise
There was a problem hiding this comment.
One small thing to mention, I replaced the try/except/pass(tmpdir) with contextlib.suppress(Exception) to satisfy Ruff's SIM105 rule.
| exctype_str = str(exctype) | ||
| # Give a targeted message when the Ghidra project DB is locked. | ||
| if "LockException" in exctype_str or "ghidra.framework.store.LockException" in exctype_str: | ||
| print( | ||
| f"Unexpected exception raised: {exctype}.\n It looks like the Ghidra project database is locked. " | ||
| "Please close the project in the Ghidra GUI (or other process) and try again. For details, run in debug mode (-d/--debug).", | ||
| file=sys.stderr, | ||
| ) |
| def get_extractor_filters_from_cli(args, input_format, backend: Optional[str] = None) -> FilterConfig: | ||
| if not hasattr(args, "restrict_to_processes") and not hasattr(args, "restrict_to_functions"): | ||
| # no processes or function filters were installed in the args | ||
| return {} | ||
|
|
||
| if input_format in STATIC_FORMATS: | ||
| if input_format in STATIC_FORMATS or backend == BACKEND_GHIDRA: |
There was a problem hiding this comment.
To avoid adding Ghidra specific checks in this function, can we simply add the given FORMAT_ to the list of STATIC_FORMATS, similar to how FORMAT_BINJA_DB is handled for Binary Ninja? If so, let's remove the Ghidra specific check and revert the backend method argument.
|
Done @mike-hunhoff . |
This PR adds support for analyzing existing Ghidra projects directly using
.gprproject input.Users can now provide input in the format:
For multi-program projects:
Motivation & Context
Currently, the Ghidra backend always creates a temporary project and re-imports the binary. This:
This change enables reuse of existing analyzed Ghidra projects while keeping the implementation localized to the Ghidra backend with minimal architecture changes.
Implementation Details
.gprdetection to select the Ghidra backend when a Ghidra project file is provided as input.domain_file.getPathname()to discover programs within the project.CAPA_GHIDRA_PROGRAM_PATHsupport for selecting the target program in multi-program projects.create=Falseconsume_program.gprinput.gprinputs.Tests
Added tests for:
.gprinputCloses #3004
Checklist
Parts of this implementation were assisted using AI tools (Github Copilot, ChatGPT).
AI was used for:
All code was reviewed, modified and tested manually before submission.