Conversation
Agent-Logs-Url: https://github.com/matmair/InvenTree/sessions/913a447a-5efa-4fa3-b8b1-6af5feaa24f0 Co-authored-by: matmair <66015116+matmair@users.noreply.github.com>
📝 WalkthroughWalkthroughMigrates HTML/SVG sanitization from bleach to nh3 across core helpers and sanitizer modules, updates related docstrings and tests, and adds/adjusts pinned dependencies to include Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/backend/InvenTree/InvenTree/sanitizer.py (1)
247-278:⚠️ Potential issue | 🟠 MajorThe
stripparameter is misleading and non-functional for its documented purpose.The
stripparameter only controlsstrip_comments, not whether disallowed SVG tags are stripped.nh3.clean()always strips disallowed tags regardless of this parameter (unlike bleach, which offers an escape option withstrip=False). Callers expectingstrip=Falseto preserve disallowed tags will silently get them removed. Consider adding a guard to raise an error whenstrip=Falseis passed, or remove the parameter entirely if no code relies on it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/InvenTree/InvenTree/sanitizer.py` around lines 247 - 278, The sanitize_svg function's strip parameter is misleading because it only maps to nh3.clean's strip_comments and does not control stripping of disallowed tags; update sanitize_svg to either (A) remove the strip parameter from the signature and callers (and always call nh3.clean without pretending to support non-stripping), or (B) keep the parameter but validate it and raise a clear ValueError if strip is False, explaining that nh3 always removes disallowed tags; in either case adjust the function signature and any callers of sanitize_svg and ensure the call to nh3.clean (tags=set(elements), attributes=attrs_dict, filter_style_properties=_SVG_ALLOWED_CSS_PROPERTIES, strip_comments=strip, ...) is consistent with the chosen behavior.
🧹 Nitpick comments (1)
src/backend/InvenTree/InvenTree/tests.py (1)
1606-1612: Avoid asserting the exact serialized SVG.
nh3normalizesstyledeclarations whenfilter_style_propertiesis enabled, so this test is now coupled to serializer formatting rather than sanitizer behavior. Parse the SVG and assert the retained nodes/attributes plus absence of<script>instead. (nh3.readthedocs.io)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/InvenTree/InvenTree/tests.py` around lines 1606 - 1612, The test currently asserts exact serialized output of valid_string after calling sanitize_svg which couples the test to nh3's serializer; update the test that uses sanitize_svg, valid_string and dangerous_string to instead parse the returned SVG (e.g., with ElementTree or lxml) and assert that the expected elements and attributes (such as the outer svg element, the path element with id="path1" and its d attribute and fill style preserved) remain present and that no <script> node exists; keep the dangerous_string input and assert sanitize_svg(dangerous_string) does not contain any script elements or event-handler attributes rather than asserting exact string equality.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/backend/InvenTree/InvenTree/helpers.py`:
- Around line 977-1030: Remove the dead CSS allowlist: delete the _default_css
frozenset definition and the assignment to whitelist_styles (which reads
markdownify_settings.get('WHITELIST_STYLES', _default_css)); instead, stop
referencing _default_css anywhere in this module and ensure markdownify_settings
is only used for supported keys (e.g., whitelist_tags and whitelist_attrs remain
unchanged). If you need to enforce style stripping, implement it separately
(e.g., strip style attributes where HTML is sanitized) rather than keeping the
unused _default_css/whitelist_styles symbols.
In `@src/backend/InvenTree/InvenTree/mixins.py`:
- Around line 104-105: Update the clean_data() docstring to correct nh3
behavior: state that nh3.clean() strips (removes) disallowed tags rather than
encoding them, and clarify that the clean_content_tags option controls whether
the contents of disallowed tags are also removed; reference nh3.clean() and the
clean_content_tags option in the docstring so future readers understand the
precise sanitizer behavior.
---
Outside diff comments:
In `@src/backend/InvenTree/InvenTree/sanitizer.py`:
- Around line 247-278: The sanitize_svg function's strip parameter is misleading
because it only maps to nh3.clean's strip_comments and does not control
stripping of disallowed tags; update sanitize_svg to either (A) remove the strip
parameter from the signature and callers (and always call nh3.clean without
pretending to support non-stripping), or (B) keep the parameter but validate it
and raise a clear ValueError if strip is False, explaining that nh3 always
removes disallowed tags; in either case adjust the function signature and any
callers of sanitize_svg and ensure the call to nh3.clean (tags=set(elements),
attributes=attrs_dict, filter_style_properties=_SVG_ALLOWED_CSS_PROPERTIES,
strip_comments=strip, ...) is consistent with the chosen behavior.
---
Nitpick comments:
In `@src/backend/InvenTree/InvenTree/tests.py`:
- Around line 1606-1612: The test currently asserts exact serialized output of
valid_string after calling sanitize_svg which couples the test to nh3's
serializer; update the test that uses sanitize_svg, valid_string and
dangerous_string to instead parse the returned SVG (e.g., with ElementTree or
lxml) and assert that the expected elements and attributes (such as the outer
svg element, the path element with id="path1" and its d attribute and fill style
preserved) remain present and that no <script> node exists; keep the
dangerous_string input and assert sanitize_svg(dangerous_string) does not
contain any script elements or event-handler attributes rather than asserting
exact string equality.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 56b9fc07-f350-409b-adea-b9255d4ed593
📒 Files selected for processing (8)
src/backend/InvenTree/InvenTree/helpers.pysrc/backend/InvenTree/InvenTree/mixins.pysrc/backend/InvenTree/InvenTree/sanitizer.pysrc/backend/InvenTree/InvenTree/tests.pysrc/backend/InvenTree/report/templatetags/barcode.pysrc/backend/requirements-3.14.txtsrc/backend/requirements.insrc/backend/requirements.txt
| _default_css = frozenset([ | ||
| 'azimuth', | ||
| 'background-color', | ||
| 'border-bottom-color', | ||
| 'border-collapse', | ||
| 'border-color', | ||
| 'border-left-color', | ||
| 'border-right-color', | ||
| 'border-top-color', | ||
| 'clear', | ||
| 'color', | ||
| 'cursor', | ||
| 'direction', | ||
| 'display', | ||
| 'elevation', | ||
| 'float', | ||
| 'font', | ||
| 'font-family', | ||
| 'font-size', | ||
| 'font-style', | ||
| 'font-variant', | ||
| 'font-weight', | ||
| 'height', | ||
| 'letter-spacing', | ||
| 'line-height', | ||
| 'overflow', | ||
| 'pause', | ||
| 'pause-after', | ||
| 'pause-before', | ||
| 'pitch', | ||
| 'pitch-range', | ||
| 'richness', | ||
| 'speak', | ||
| 'speak-header', | ||
| 'speak-numeral', | ||
| 'speak-punctuation', | ||
| 'speech-rate', | ||
| 'stress', | ||
| 'text-align', | ||
| 'text-decoration', | ||
| 'text-indent', | ||
| 'unicode-bidi', | ||
| 'vertical-align', | ||
| 'voice-family', | ||
| 'volume', | ||
| 'white-space', | ||
| 'width', | ||
| ]) | ||
| _default_protocols = frozenset(['http', 'https', 'mailto']) | ||
|
|
||
| # nh3 sanitizer settings | ||
| whitelist_tags = markdownify_settings.get('WHITELIST_TAGS', _default_tags) | ||
| whitelist_attrs = markdownify_settings.get('WHITELIST_ATTRS', _default_attrs) | ||
| whitelist_styles = markdownify_settings.get('WHITELIST_STYLES', _default_css) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Remove unused CSS allowlist code.
The _default_css frozenset (lines 977-1024) and whitelist_styles (line 1030) are defined but cannot be used because nh3 does not support CSS property filtering. This is dead code that adds confusion.
If CSS property filtering is not required, remove these definitions. If it is required, an alternative approach will be needed (such as stripping all style attributes).
🔧 Proposed fix to remove unused code
- _default_css = frozenset([
- 'azimuth',
- 'background-color',
- ... # all CSS properties
- 'width',
- ])
_default_protocols = frozenset(['http', 'https', 'mailto'])
# nh3 sanitizer settings
whitelist_tags = markdownify_settings.get('WHITELIST_TAGS', _default_tags)
whitelist_attrs = markdownify_settings.get('WHITELIST_ATTRS', _default_attrs)
- whitelist_styles = markdownify_settings.get('WHITELIST_STYLES', _default_css)
whitelist_protocols = markdownify_settings.get(
'WHITELIST_PROTOCOLS', _default_protocols
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/backend/InvenTree/InvenTree/helpers.py` around lines 977 - 1030, Remove
the dead CSS allowlist: delete the _default_css frozenset definition and the
assignment to whitelist_styles (which reads
markdownify_settings.get('WHITELIST_STYLES', _default_css)); instead, stop
referencing _default_css anywhere in this module and ensure markdownify_settings
is only used for supported keys (e.g., whitelist_tags and whitelist_attrs remain
unchanged). If you need to enforce style stripping, implement it separately
(e.g., strip style attributes where HTML is sanitized) rather than keeping the
unused _default_css/whitelist_styles symbols.
| This uses nh3 under the hood to disable certain html tags by | ||
| encoding them - this leads to script tags etc. to not work. |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n src/backend/InvenTree/InvenTree/mixins.py | sed -n '100,110p'Repository: matmair/InvenTree
Length of output: 566
🌐 Web query:
nh3.clean() behavior tags stripping encoding
💡 Result:
nh3.clean is a HTML sanitization function from the Python nh3 library (binding to Rust's ammonia crate). It parses input as HTML using html5ever (browser-like parsing), removes disallowed tags and attributes based on whitelists, strips HTML comments by default (strip_comments=True), and serializes the resulting DOM to a clean HTML string. Key behaviors related to "tags stripping encoding": - Strips (removes) disallowed HTML tags entirely, keeping their text content. Allowed tags are kept with allowed attributes. - Strips HTML comments () by default. - Does not perform character entity encoding on text content; it preserves existing HTML entities and text as-is after parsing (unlike nh3.clean_text, which escapes < > & etc. for plain text). - For disallowed tags, text content is preserved (not escaped as entities); the tag delimiters are removed. - Handles malformed HTML robustly due to HTML5 parsing. Examples from docs: nh3.clean("<script>alert('xss')</script>safe", clean_content_tags={"script"}) -> "safe" (strips tag + content) nh3.clean("hello", tags={"b"}) -> "hello" No evidence of stripping or altering character encoding/HTML entities in text; focuses on structural sanitization. For plain text escaping, use clean_text.
Citations:
- 1: https://nh3.readthedocs.io/
- 2: https://nh3.readthedocs.io/en/stable/
- 3: https://docs.rs/ammonia/latest/src/ammonia/lib.rs.html
- 4: https://www.pydanny.com/posts/2023-06-converting-from-bleach-to-nh3
- 5: Escaping bad tags instead of deleting them rust-ammonia/ammonia#145
The clean_data() docstring describes incorrect nh3 behavior.
nh3.clean() removes (strips) disallowed tags; it does not "encode" them. The clean_content_tags option is separate and controls whether tag contents are also removed. The current wording will mislead developers debugging sanitizer output.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/backend/InvenTree/InvenTree/mixins.py` around lines 104 - 105, Update the
clean_data() docstring to correct nh3 behavior: state that nh3.clean() strips
(removes) disallowed tags rather than encoding them, and clarify that the
clean_content_tags option controls whether the contents of disallowed tags are
also removed; reference nh3.clean() and the clean_content_tags option in the
docstring so future readers understand the precise sanitizer behavior.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/backend/InvenTree/InvenTree/helpers.py`:
- Around line 31-36: The import currently pulls the misspelled symbol
DEAFAULT_ATTRS from InvenTree.sanitizer; after correcting the symbol name in
sanitizer.py, update this import to DEFAULT_ATTRS and adjust any references that
use DEAFAULT_ATTRS (e.g., usage near where DEFAULT_ATTRS is consumed) to
reference DEFAULT_ATTRS instead so the module imports and usages match the
corrected symbol name.
- Around line 984-992: The nh3.clean call in helpers.py (the invocation that
produces clean_html) is passing an unsupported parameter filter_style_properties
(and using whitelist_styles/DEFAULT_CSS) which raises a TypeError; remove the
filter_style_properties argument and any use of whitelist_styles/DEFAULT_CSS in
the clean_markdown/clean_html path, and instead ensure the 'style' attribute is
not allowed in attrs_dict (or explicitly strip the style attribute) so CSS
filtering is not attempted (update the nh3.clean call and the attrs_dict
construction used by clean_markdown/clean_html accordingly).
In `@src/backend/InvenTree/InvenTree/sanitizer.py`:
- Line 262: Rename the misspelled constant DEAFAULT_ATTRS to DEFAULT_ATTRS in
sanitizer.py and update any imports/usages (e.g., the import in helpers.py) to
reference DEFAULT_ATTRS; ensure the constant name is updated where it's defined
and any modules that import or use DEAFAULT_ATTRS (search for DEAFAULT_ATTRS,
DEFAULT_ATTRS) are changed to the corrected name to maintain consistency.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 59187010-7e7b-49bc-bff3-38de56d0c4d4
📒 Files selected for processing (4)
src/backend/InvenTree/InvenTree/helpers.pysrc/backend/InvenTree/InvenTree/mixins.pysrc/backend/InvenTree/InvenTree/sanitizer.pysrc/backend/InvenTree/part/test_api.py
✅ Files skipped from review due to trivial changes (2)
- src/backend/InvenTree/part/test_api.py
- src/backend/InvenTree/InvenTree/mixins.py
| from InvenTree.sanitizer import ( | ||
| DEAFAULT_ATTRS, | ||
| DEFAULT_CSS, | ||
| DEFAULT_PROTOCOLS, | ||
| DEFAULT_TAGS, | ||
| ) |
There was a problem hiding this comment.
Imports the typo DEAFAULT_ATTRS from sanitizer.
Once the typo is fixed in sanitizer.py, update this import accordingly.
🐛 Proposed fix (after sanitizer.py is fixed)
from InvenTree.sanitizer import (
- DEAFAULT_ATTRS,
+ DEFAULT_ATTRS,
DEFAULT_CSS,
DEFAULT_PROTOCOLS,
DEFAULT_TAGS,
)And update the usage at line 969:
- whitelist_attrs = markdownify_settings.get('WHITELIST_ATTRS', DEAFAULT_ATTRS)
+ whitelist_attrs = markdownify_settings.get('WHITELIST_ATTRS', DEFAULT_ATTRS)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/backend/InvenTree/InvenTree/helpers.py` around lines 31 - 36, The import
currently pulls the misspelled symbol DEAFAULT_ATTRS from InvenTree.sanitizer;
after correcting the symbol name in sanitizer.py, update this import to
DEFAULT_ATTRS and adjust any references that use DEAFAULT_ATTRS (e.g., usage
near where DEFAULT_ATTRS is consumed) to reference DEFAULT_ATTRS instead so the
module imports and usages match the corrected symbol name.
| clean_html = nh3.clean( | ||
| html, | ||
| tags=set(whitelist_tags), | ||
| attributes=attrs_dict, | ||
| url_schemes=set(whitelist_protocols), | ||
| filter_style_properties=set(whitelist_styles), | ||
| link_rel=None, | ||
| strip_comments=True, | ||
| ) |
There was a problem hiding this comment.
filter_style_properties is not a valid nh3.clean parameter - this will cause a runtime TypeError.
Same issue as in sanitizer.py. The nh3 library does not support CSS property filtering, so this call will fail when clean_markdown() is invoked during field validation (via the CleaningMixin shown in context snippet 3).
Since DEFAULT_CSS and whitelist_styles cannot be used with nh3, consider:
- Removing lines 970 and 989 entirely (CSS filtering is not possible with nh3)
- If CSS filtering is required, strip the
styleattribute from allowed attributes instead
🐛 Proposed fix to remove unsupported CSS filtering
# nh3 sanitizer settings
whitelist_tags = markdownify_settings.get('WHITELIST_TAGS', DEFAULT_TAGS)
whitelist_attrs = markdownify_settings.get('WHITELIST_ATTRS', DEAFAULT_ATTRS)
- whitelist_styles = markdownify_settings.get('WHITELIST_STYLES', DEFAULT_CSS)
whitelist_protocols = markdownify_settings.get(
'WHITELIST_PROTOCOLS', DEFAULT_PROTOCOLS
)
# Convert bleach-style attributes (list or dict) to nh3-compatible dict format
if isinstance(whitelist_attrs, (list, tuple, set, frozenset)):
attrs_dict = {'*': set(whitelist_attrs)}
elif isinstance(whitelist_attrs, dict):
attrs_dict = {tag: set(allowed) for tag, allowed in whitelist_attrs.items()}
else:
attrs_dict = None
# Clean the HTML content (for comparison). This must be the same as the original content
clean_html = nh3.clean(
html,
tags=set(whitelist_tags),
attributes=attrs_dict,
url_schemes=set(whitelist_protocols),
- filter_style_properties=set(whitelist_styles),
link_rel=None,
strip_comments=True,
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/backend/InvenTree/InvenTree/helpers.py` around lines 984 - 992, The
nh3.clean call in helpers.py (the invocation that produces clean_html) is
passing an unsupported parameter filter_style_properties (and using
whitelist_styles/DEFAULT_CSS) which raises a TypeError; remove the
filter_style_properties argument and any use of whitelist_styles/DEFAULT_CSS in
the clean_markdown/clean_html path, and instead ensure the 'style' attribute is
not allowed in attrs_dict (or explicitly strip the style attribute) so CSS
filtering is not attempted (update the nh3.clean call and the attrs_dict
construction used by clean_markdown/clean_html accordingly).
| 'strong', | ||
| 'ul', | ||
| ]) | ||
| DEAFAULT_ATTRS = {'a': {'href', 'title'}, 'abbr': {'title'}, 'acronym': {'title'}} |
There was a problem hiding this comment.
Typo in constant name: DEAFAULT_ATTRS should be DEFAULT_ATTRS.
This typo propagates to helpers.py where it's imported. Fix the spelling to prevent confusion and maintain consistency.
🐛 Proposed fix
-DEAFAULT_ATTRS = {'a': {'href', 'title'}, 'abbr': {'title'}, 'acronym': {'title'}}
+DEFAULT_ATTRS = {'a': {'href', 'title'}, 'abbr': {'title'}, 'acronym': {'title'}}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/backend/InvenTree/InvenTree/sanitizer.py` at line 262, Rename the
misspelled constant DEAFAULT_ATTRS to DEFAULT_ATTRS in sanitizer.py and update
any imports/usages (e.g., the import in helpers.py) to reference DEFAULT_ATTRS;
ensure the constant name is updated where it's defined and any modules that
import or use DEAFAULT_ATTRS (search for DEAFAULT_ATTRS, DEFAULT_ATTRS) are
changed to the corrected name to maintain consistency.
Most work done by copilot (claude sonnet 4.6)
weasyprint has some issues and we need to bump, but we can not upgrade due to complex dependencies between weasyprint and bleach (via some transitive deps); so our usage of bleach needs to be replaced by something - in this case nh3 which seems to be what most users of bleach switched to
Summary by CodeRabbit
Chores
Bug Fixes
Tests
Documentation