Fix inverted path traversal logic and enhance torch.load security#1479
Fix inverted path traversal logic and enhance torch.load security#1479RinZ27 wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
Greptile SummaryThis PR fixes two security issues in Key changes:
Important Files Changed
Last reviewed commit: 4f75549 |
| or not os.path.realpath(os.path.join(local_path, member.name)).startswith( | ||
| resolved_local_path | ||
| ) |
There was a problem hiding this comment.
startswith check missing trailing separator
The path containment check uses startswith(resolved_local_path) without a trailing separator. If the temp directory is /tmp/tmpABCDEF, a path like /tmp/tmpABCDEFmalicious/payload would incorrectly pass this check since it starts with the same prefix.
While the earlier ".." in member.name and os.path.isabs(member.name) guards prevent exploitation in the common attack vectors, this third check on its own is subtly incomplete. The canonical fix is:
| or not os.path.realpath(os.path.join(local_path, member.name)).startswith( | |
| resolved_local_path | |
| ) | |
| or not os.path.realpath(os.path.join(local_path, member.name)).startswith( | |
| resolved_local_path + os.sep | |
| ) | |
| and os.path.realpath(os.path.join(local_path, member.name)) != resolved_local_path |
A cleaner approach would be:
resolved = os.path.realpath(os.path.join(local_path, member.name))
outside_dir = not (resolved == resolved_local_path or resolved.startswith(resolved_local_path + os.sep))so the condition covers both the exact-directory case and properly-bounded subdirectory paths.
physicsnemo/core/module.py
Outdated
| ): | ||
| yield member | ||
| else: | ||
| print(f"Skipping potentially malicious file: {member.name}") |
There was a problem hiding this comment.
Use logging.warning instead of print
Security-relevant events like skipping a potentially malicious file should go through the standard logging system rather than print(). This ensures warnings are surfaced consistently (respecting log levels, handlers, and formatters) and are not silenced or missed in non-interactive environments.
Since _safe_members is a @staticmethod, consider passing a logger or using the module-level logger:
| print(f"Skipping potentially malicious file: {member.name}") | |
| logging.warning(f"Skipping potentially malicious file: {member.name}") |
(Make sure logging is already imported at the top of the file — it is used elsewhere in module.py.)
| model_dict = torch.load( | ||
| io.BytesIO(model_bytes), map_location=device, weights_only=True | ||
| ) |
There was a problem hiding this comment.
weights_only=True may break backward-compatible checkpoint loading
weights_only=True is a great security improvement for new checkpoints, but be aware it restricts deserialization to basic types (tensors, dicts, lists, scalars, etc.). If any existing checkpoint's model.pt was saved with custom Python objects embedded in the state dict (e.g., from a third-party integration or an older PhysicsNeMo version), loading will raise a pickle.UnpicklingError or RuntimeError.
Looking at the save code (torch.save(self.state_dict(), ...)), standard state dicts only contain tensors and should be fine. However, if subclasses override state_dict() to include custom objects, this will silently break for those users. Consider documenting this behavioral change in the changelog and/or providing a fallback mechanism for users with legacy checkpoints that contain non-standard types.
e311a68 to
4f75549
Compare
PhysicsNeMo Pull Request
Description
A critical logic error in
_safe_memberswas identified where the path traversal protection was inverted, allowing unsafe members while blocking legitimate ones. This fix correctly yields only safe members during tarball extraction.Security of model loading is also improved by enabling
weights_only=Truein all coretorch.loadcalls. This mitigates the risk of arbitrary code execution from malicious pickles in model checkpoints.Checklist
Dependencies
None.