Skip to content

Do not inlcude symbols in general project outputs#2076

Open
uttam282005 wants to merge 5 commits intoaboutcode-org:mainfrom
uttam282005:remove-symbols-from-extra-data
Open

Do not inlcude symbols in general project outputs#2076
uttam282005 wants to merge 5 commits intoaboutcode-org:mainfrom
uttam282005:remove-symbols-from-extra-data

Conversation

@uttam282005
Copy link

Issues

Changes

This PR removes symbol-related data (source_symbols, source_strings, source_comments) from general project outputs (JSON and XLSX) to reduce file sizes, while making symbols data available through a dedicated download option for debugging.

Summary of Changes:
Backend (Core Implementation):

  • Added strip_symbols() function to remove symbol keys from dictionaries without mutating input
  • Modified JSON output generation to strip symbols from:
    • CodebaseResource extra_data fields
    • ProjectMessage details fields
  • Modified XLSX output generation to strip symbols from ProjectMessage details fields
  • Implemented new to_symbols_json() function that generates symbol-only JSON output
  • Created SymbolsJSONResultsGenerator class that filters and includes only resources with symbol data
  • Added "symbols" format to FORMAT_TO_FUNCTION_MAPPING
  • Updated API and web views to expose symbols format option
    Testing:
  • Added unit tests for strip_symbols() function verifying non-mutation behavior
  • Added integration tests for JSON output symbol stripping
  • Added integration tests for XLSX output symbol stripping
  • Added integration tests for symbols-only JSON output
  • Added API endpoint test for symbols format download
  • Updated/regenerated integration test fixtures to reflect new output format (symbols excluded)
    UI Integration:
  • TO BE DECIDED Add symbols download option to project detail page
  • TO BE DECIDED Add symbols download option to project list dropdown
  • TO BE DECIDED Add symbols format to bulk download form

Checklist


Questions for Review Discussion:

  1. UI Placement: Should the symbols download option be:

    • A) Added as a simple button in the main download bar (like JSON/XLSX)
    • B) Placed in a new "Debug" or "Advanced" dropdown (recommended)
    • C) Added to the existing "Tools formats" dropdown
  2. Bulk Downloads: Should symbols format be included in multi-project bulk download options?

  3. Documentation: Should user-facing documentation be included in this PR or as a follow-up?

…on for downloading symbols

Signed-off-by: uttam282005 <uttam282005@gmail.com>
Signed-off-by: uttam282005 <uttam282005@gmail.com>
Signed-off-by: uttam282005 <uttam282005@gmail.com>
Signed-off-by: uttam282005 <uttam282005@gmail.com>
@uttam282005 uttam282005 force-pushed the remove-symbols-from-extra-data branch from 103f70a to 2d965e4 Compare March 3, 2026 16:51
…d too many conditionals

Signed-off-by: uttam282005 <uttam282005@gmail.com>
@uttam282005
Copy link
Author

failing tests seem to be unrelated to the changes made

FAIL: test_scanpipe_docker_pipeline_alpine_integration (scanpipe.tests.test_pipelines.PipelinesIntegrationTest.test_scanpipe_docker_pipeline_alpine_integration)

Traceback (most recent call last):
File "/home/runner/work/scancode.io/scancode.io/scanpipe/tests/test_pipelines.py", line 1153, in test_scanpipe_docker_pipeline_alpine_integration
self.assertPipelineResultEqual(expected_file, result_file)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/work/scancode.io/scancode.io/scanpipe/tests/test_pipelines.py", line 698, in assertPipelineResultEqual
self.assertEqual(expected_data, result_data)
~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: {'hea[18298 chars]/libc-utils@0.7.2-r3?arch=x86_64', 'type': 'al[177772 chars]: []} != {'hea[18298 chars]/libcrypto1.1@1.1.1n-r0?arch=x86_64', 'type': [177772 chars]: []}
Diff is 330130 characters long. Set self.maxDiff to None to see it.

======================================================================
FAIL: test_scanpipe_docker_pipeline_rpm_integration (scanpipe.tests.test_pipelines.PipelinesIntegrationTest.test_scanpipe_docker_pipeline_rpm_integration)

Traceback (most recent call last):
File "/home/runner/work/scancode.io/scancode.io/scanpipe/tests/test_pipelines.py", line 1203, in test_scanpipe_docker_pipeline_rpm_integration
self.assertPipelineResultEqual(expected_file, result_file)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/work/scancode.io/scancode.io/scanpipe/tests/test_pipelines.py", line 698, in assertPipelineResultEqual
self.assertEqual(expected_data, result_data)
~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: {'hea[2365665 chars]m/gpg-pubkey@d4082792', 'type': 'rpm', 'namesp[847359 chars]: []} != {'hea[2365665 chars]m/gpgme@1.10.0?arch=x86_64', 'type': 'rpm', 'n[847359 chars]: []}
Diff is 11811929 characters long. Set self.maxDiff to None to see it.

======================================================================
FAIL: test_scanpipe_resolved_dependencies_cocoapods (scanpipe.tests.test_pipelines.PipelinesIntegrationTest.test_scanpipe_resolved_dependencies_cocoapods)

Traceback (most recent call last):
File "/home/runner/work/scancode.io/scancode.io/scanpipe/tests/test_pipelines.py", line 1049, in test_scanpipe_resolved_dependencies_cocoapods
self.assertPipelineResultEqual(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
expected_file, result_file, sort_dependencies=True
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/home/runner/work/scancode.io/scancode.io/scanpipe/tests/test_pipelines.py", line 698, in assertPipelineResultEqual
self.assertEqual(expected_data, result_data)
~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: {'hea[1036 chars]ods/AFNetworkActivityLogger@2.0.4', 'type': 'c[80784 chars]: []} != {'hea[1036 chars]ods/Aerodramus@2.0.0', 'type': 'cocoapods', 'n[80784 chars]: []}
Diff is 214188 characters long. Set self.maxDiff to None to see it.


Ran 852 tests in 178.768s

FAILED (failures=3, expected failures=1)

@uttam282005 uttam282005 marked this pull request as ready for review March 3, 2026 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Do not inlcude symbols in general project outputs

1 participant