Skip to content

Enhance ghidra backend with existing project feature#3087

Open
saniyafatima07 wants to merge 16 commits into
mandiant:masterfrom
saniyafatima07:ghidra-feature-new
Open

Enhance ghidra backend with existing project feature#3087
saniyafatima07 wants to merge 16 commits into
mandiant:masterfrom
saniyafatima07:ghidra-feature-new

Conversation

@saniyafatima07

@saniyafatima07 saniyafatima07 commented May 25, 2026

Copy link
Copy Markdown
Collaborator

This PR adds support for analyzing existing Ghidra projects directly using .gpr project input.

Users can now provide input in the format:

capa /path/to/project.gpr

For multi-program projects:

CAPA_GHIDRA_PROGRAM_PATH=/folder/program capa /path/to/project.gpr

Motivation & Context

Currently, the Ghidra backend always creates a temporary project and re-imports the binary. This:

  • increases analysis time
  • ignores previously analyzed projects and annotations
  • duplicates existing analysis work

This change enables reuse of existing analyzed Ghidra projects while keeping the implementation localized to the Ghidra backend with minimal architecture changes.

Implementation Details

  • Added automatic .gpr detection to select the Ghidra backend when a Ghidra project file is provided as input.
  • Added recursive Ghidra project file enumeration using domain_file.getPathname() to discover programs within the project.
  • Added automatic program selection for single-program projects.
  • Added CAPA_GHIDRA_PROGRAM_PATH support for selecting the target program in multi-program projects.
  • Added informative error handling that lists available project program paths when disambiguation is required.
  • Updated Ghidra loader flow to:
    • open existing projects using create=False
    • reuse already analyzed programs via consume_program
    • skip temporary project creation/import flow for .gpr input
  • Default behavior remains unchanged for non-.gpr inputs.

Tests

Added tests for:

  • automatic Ghidra backend selection for .gpr input
  • skipping generic file extractor probing for Ghidra project input

Closes #3004

Checklist

  • CHANGELOG updated
  • Added few tests
  • Documentation updated
  • This submission includes AI-generated code and I have provided details in the description.

Parts of this implementation were assisted using AI tools (Github Copilot, ChatGPT).
AI was used for:

  • refining implementation approach
  • improving edge case handling

All code was reviewed, modified and tested manually before submission.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for Ghidra projects (.gpr files) by adding utilities to navigate Ghidra project structures and updating the loader to handle project-based analysis. The changes include automatic backend detection for Ghidra projects and updated CLI logic to skip generic file extraction when a project is provided. Review feedback highlights the need for better terminal formatting in error messages, improved resource management to prevent program leaks during exceptions, and the enablement of function filters for the Ghidra backend.

Comment thread capa/ghidra/helpers.py
Comment thread capa/ghidra/helpers.py
Comment thread capa/loader.py
Comment thread capa/loader.py
Comment thread capa/main.py Outdated
Comment on lines 928 to 931
if backend == BACKEND_GHIDRA:
return {}

if input_format in STATIC_FORMATS:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Disabling extractor filters for the Ghidra backend prevents users from using --restrict-to-functions. Since Ghidra is a static analysis backend, it should support these filters. Merging this with the static format check allows filters to be applied correctly for Ghidra regardless of the input format detection.

Suggested change
if backend == BACKEND_GHIDRA:
return {}
if input_format in STATIC_FORMATS:
if input_format in STATIC_FORMATS or backend == BACKEND_GHIDRA:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saniyafatima07 thoughts here? I'd expect the Ghidra backend to handle function restrictions set by the user. Is there a reason we're skipping them?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, Mike. Since Ghidra is a static-analysis backend, it should support function restrictions too. I had a slight misinterpretation while implementing this logic. I’ll correct it.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saniyafatima07 this still needs to be addressed.

@github-actions github-actions Bot dismissed their stale review May 25, 2026 13:10

CHANGELOG updated or no update needed, thanks! 😄

@saniyafatima07

Copy link
Copy Markdown
Collaborator Author

@mike-hunhoff
I have tried implementing this feature with the new approach as per the discussion in #3066 .
Could you please review it?
Thank you for your time!

@saniyafatima07 saniyafatima07 marked this pull request as ready for review May 25, 2026 13:42

@mike-hunhoff mike-hunhoff left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @saniyafatima07 ! I left comments for your review. I'll do some more thinking on how to best handle the .gpr tests in the meantime.

Comment thread capa/ghidra/helpers.py

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work splitting up the code into helper functions to keep things concise.

Comment thread capa/loader.py
Comment thread capa/loader.py
Comment thread capa/main.py Outdated
Comment on lines 928 to 931
if backend == BACKEND_GHIDRA:
return {}

if input_format in STATIC_FORMATS:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saniyafatima07 thoughts here? I'd expect the Ghidra backend to handle function restrictions set by the user. Is there a reason we're skipping them?

Comment thread doc/usage.md Outdated
Comment thread CHANGELOG.md Outdated
Comment thread capa/ghidra/helpers.py Outdated
@mike-hunhoff mike-hunhoff requested a review from a team May 28, 2026 16:27
@saniyafatima07

Copy link
Copy Markdown
Collaborator Author

Great work @saniyafatima07 ! I left comments for your review. I'll do some more thinking on how to best handle the .gpr tests in the meantime.

Thank you for the review Mike.
I will address all the comments.
Sure.

@saniyafatima07

Copy link
Copy Markdown
Collaborator Author

@mike-hunhoff @larchchen @Maijin I have made all the requested changes.
Could you please review it?

@mike-hunhoff mike-hunhoff left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @saniyafatima07 ! I've left comments for your review.

Comment thread capa/main.py Outdated
Comment on lines +204 to +211
exctype_str = str(exctype)
# Give a targeted message when the Ghidra project DB is locked.
if "LockException" in exctype_str or "ghidra.framework.store.LockException" in exctype_str:
print(
f"Unexpected exception raised: {exctype}.\n It looks like the Ghidra project database is locked. "
"Please close the project in the Ghidra GUI (or other process) and try again. For details, run in debug mode (-d/--debug).",
file=sys.stderr,
)

@mike-hunhoff mike-hunhoff Jun 4, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very specific to Ghidra, so please move it closer to the Ghidra-specific code related to opening an existing project. Let's add a new exception type under capa.exceptions called LockedProjectDatabaseError that gracefully handles this case, and propagates the message accordingly. We should also consider trimming down the message, e.g. "Ghidra project database is locked. Ensure all programs accessing <database_name>.gpr are closed before proceeding."

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please add a new return value for this case:

capa/capa/main.py

Lines 116 to 132 in 33701d6

E_MISSING_RULES = 10
E_MISSING_FILE = 11
E_INVALID_RULE = 12
E_CORRUPT_FILE = 13
E_FILE_LIMITATION = 14
E_INVALID_SIG = 15
E_INVALID_FILE_TYPE = 16
E_INVALID_FILE_ARCH = 17
E_INVALID_FILE_OS = 18
E_UNSUPPORTED_IDA_VERSION = 19
E_UNSUPPORTED_GHIDRA_VERSION = 20
E_MISSING_CAPE_STATIC_ANALYSIS = 21
E_MISSING_CAPE_DYNAMIC_ANALYSIS = 22
E_EMPTY_REPORT = 23
E_UNSUPPORTED_GHIDRA_EXECUTION_MODE = 24
E_INVALID_INPUT_FORMAT = 25
E_INVALID_FEATURE_EXTRACTOR = 26

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saniyafatima07 this code can be removed now?

Comment thread capa/main.py Outdated
Comment on lines 928 to 931
if backend == BACKEND_GHIDRA:
return {}

if input_format in STATIC_FORMATS:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saniyafatima07 this still needs to be addressed.

@saniyafatima07

Copy link
Copy Markdown
Collaborator Author

Thank you for the review @mike-hunhoff . I have addressed all the requested changes.

@mike-hunhoff mike-hunhoff left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @saniyafatima07 , I've left comments for your review!

Comment thread capa/loader.py
Comment on lines 510 to 516
except Exception:
if program is not None:
program.release(consumer)
project_cm.__exit__(None, None, None)
tmpdir.cleanup()
if tmpdir:
tmpdir.cleanup()
raise

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is critical to preventing an error or exception from holding the database lock, so let's bump the robustness just a bit by isolating each step in a nested try/except:

            except Exception:
                if program is not None:
                    try:
                        program.release(consumer)
                    except Exception:
                        logger.warning("failed to release program handle", exc_info=True)
                try:
                    project_cm.__exit__(None, None, None)
                except Exception:
                    logger.warning("failed to close Ghidra project", exc_info=True)
                if tmpdir:
                    try:
                        tmpdir.cleanup()
                    except Exception:
                        pass
                raise

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small thing to mention, I replaced the try/except/pass(tmpdir) with contextlib.suppress(Exception) to satisfy Ruff's SIM105 rule.

Comment thread capa/main.py Outdated
Comment on lines +204 to +211
exctype_str = str(exctype)
# Give a targeted message when the Ghidra project DB is locked.
if "LockException" in exctype_str or "ghidra.framework.store.LockException" in exctype_str:
print(
f"Unexpected exception raised: {exctype}.\n It looks like the Ghidra project database is locked. "
"Please close the project in the Ghidra GUI (or other process) and try again. For details, run in debug mode (-d/--debug).",
file=sys.stderr,
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saniyafatima07 this code can be removed now?

Comment thread capa/main.py Outdated
Comment on lines +939 to +944
def get_extractor_filters_from_cli(args, input_format, backend: Optional[str] = None) -> FilterConfig:
if not hasattr(args, "restrict_to_processes") and not hasattr(args, "restrict_to_functions"):
# no processes or function filters were installed in the args
return {}

if input_format in STATIC_FORMATS:
if input_format in STATIC_FORMATS or backend == BACKEND_GHIDRA:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid adding Ghidra specific checks in this function, can we simply add the given FORMAT_ to the list of STATIC_FORMATS, similar to how FORMAT_BINJA_DB is handled for Binary Ninja? If so, let's remove the Ghidra specific check and revert the backend method argument.

@saniyafatima07

Copy link
Copy Markdown
Collaborator Author

Done @mike-hunhoff .
Could you please review?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ghidra: enable feature extraction from existing Ghidra project binary

2 participants