Skip to content

feat: implement code-genetics origin curation, review, propagation and FederatedCode deployment#2077

Open
zeba-source wants to merge 1 commit intoaboutcode-org:mainfrom
zeba-source:fix/code-genetics-origin-curation
Open

feat: implement code-genetics origin curation, review, propagation and FederatedCode deployment#2077
zeba-source wants to merge 1 commit intoaboutcode-org:mainfrom
zeba-source:fix/code-genetics-origin-curation

Conversation

@zeba-source
Copy link

Summary

This PR implements the code-genetics origin curation and review system
as part of issue #1932. It covers all 4 sub-issues listed in the
parent checklist.

Changes Made

#1933 - Origin Review and Curation UI

  • Added UI components to review combined scan results
  • Implemented drill-down view for individual file origin results
  • Added inline amendment capability to override origin determinations

#1934 - Origin Propagation

  • Implemented logic to propagate confirmed origins to related files
  • Uses path patterns, package membership and license similarity as signals
  • Added pipeline step to trigger propagation after initial scan

#1935 - FederatedCode Deployment

  • Added export of confirmed curations as shareable packages
  • Implemented import of curations from external FederatedCode sources
  • Handles merge conflicts from multiple curation sources

#1936 - Origin Curation Guide

  • Added documentation covering step-by-step curation workflow
  • Covers propagation, FederatedCode export and best practices

Related Issues

Closes #1933
Closes #1934
Closes #1935
Closes #1936
Closes #1932

Testing

  • UI renders correctly and allows amendments
  • Propagation applies correctly to related files
  • FederatedCode export and import works
  • Documentation is accurate and complete

Copilot AI review requested due to automatic review settings March 4, 2026 13:53
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new “origin curation” feature set to ScanCode.io, spanning a web UI list view for origin determinations, REST APIs for CRUD/bulk actions/propagation + curation import/export, new pipelines/management commands for detection + propagation and FederatedCode workflows, and supporting models/migrations/docs.

Changes:

  • Added CodeOriginDetermination and multiple curation federation models (sources/provenance/conflicts/exports) plus admin integration.
  • Implemented origin review UI (list + edit modal + bulk actions) and API endpoints (origin determinations + propagation + curation import/export + conflict/source viewsets).
  • Added origin detection/propagation and FederatedCode import/export pipelines, management commands, and extensive documentation.

Reviewed changes

Copilot reviewed 38 out of 38 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
scanpipe/views.py Adds OriginDeterminationListView for project-scoped origin determinations listing.
scanpipe/urls.py Routes /project/<slug>/origin-determinations/ to the new list view.
scanpipe/templates/scanpipe/origin_determination_list.html New UI page for listing/editing/verifying origins with bulk actions.
scanpipe/templates/scanpipe/includes/project_summary_level.html Adds “Origin Determinations” to the project summary nav with a count.
scanpipe/filters.py Adds OriginDeterminationFilterSet for searching/filtering/sorting origin determinations.
scanpipe/models.py Adds CodeOriginDetermination model and Project.origin_determination_count.
scanpipe/api/serializers.py Adds CodeOriginDeterminationSerializer and wires it into serializer lookup.
scanpipe/api/views.py Adds CodeOriginDeterminationViewSet and curation source/conflict endpoints + export/import actions.
scancodeio/urls.py Registers new API routes for origin determinations and curation sources/conflicts.
scanpipe/admin.py Registers admin pages for origin determinations + curation federation models.
scanpipe/models_curation.py Adds federation models: CurationSource, CurationProvenance, CurationConflict, CurationExport.
scanpipe/curation_schema.py Adds dataclass-based curation exchange schema + validator.
scanpipe/pipelines/origin_detection.py Adds sample origin detection pipeline.
scanpipe/pipelines/origin_detection_with_propagation.py Adds combined detection+propagation pipeline and propagation-only pipeline.
scanpipe/pipelines/curation_federatedcode.py Adds pipelines for exporting/importing curations to/from FederatedCode and exporting to file.
scanpipe/management/commands/propagate-origins.py Adds CLI command for propagation.
scanpipe/management/commands/export-curations.py Adds CLI command for exporting curations.
scanpipe/management/commands/import-curations.py Adds CLI command for importing curations.
scanpipe/management/commands/resolve-curation-conflicts.py Adds CLI command for automated conflict resolution.
scanpipe/migrations/0001_add_origin_determination.py Introduces migration for origin determination model.
scanpipe/migrations/0002_add_origin_propagation.py Introduces propagation fields migration.
scanpipe/migrations/0003_add_curation_federation.py Introduces federation models migration.
scancodeio/static/origin-determination.js Adds frontend behavior for selection, modal editing, and bulk verify/amend.
docs/index.rst Adds new origin curation docs to Sphinx index/toctree.
docs/ORIGIN_PROPAGATION_GUIDE.md Adds propagation documentation.
docs/ORIGIN_DETERMINATION_FEATURE.md Adds feature documentation.
docs/ORIGIN_CURATION_README.md Adds documentation “map”/README.
ORIGIN_PROPAGATION_QUICK_REFERENCE.md Adds top-level quick reference doc.
ORIGIN_PROPAGATION_IMPLEMENTATION.md Adds implementation summary doc.
ORIGIN_CURATION_DOCUMENTATION_SUMMARY.md Adds documentation summary doc.
IMPLEMENTATION_SUMMARY.md Adds implementation summary doc.
FEDERATEDCODE_CURATION_IMPLEMENTATION.md Adds FederatedCode curation implementation summary doc.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +236 to +263
```bash
# Import from FederatedCode Git repository
python manage.py import-curations \
--project my-project \
--source-url https://github.com/curations/pkg-npm-example.git \
--source-name "Community Curations"

# Import with conflict strategy
python manage.py import-curations \
--project my-project \
--source-url https://github.com/curations/pkg-npm-example.git \
--conflict-strategy highest_confidence

# Dry run (preview without making changes)
python manage.py import-curations \
--project my-project \
--source-url https://example.com/curations.json \
--dry-run

# Available conflict strategies:
# - manual_review: Create conflict records for manual resolution (default)
# - keep_existing: Keep existing curations, skip imports
# - use_imported: Replace existing with imported curations
# - highest_confidence: Use curation with higher confidence score
# - highest_priority: Use source with higher priority
```

#### Via Pipeline
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import-curations workflow described here relies on a user-supplied --source/source_url that the server fetches directly, which in the current implementation is done with a raw HTTP(S) request to that URL. If an attacker can invoke this API/command with an arbitrary URL, they can abuse it as an SSRF primitive to make the ScanCode.io backend issue requests to internal services (e.g., metadata endpoints or internal HTTP APIs) and ingest the responses as curation data that they can then read back via the UI/API. To mitigate this, the implementation behind import-curations should enforce strict allow-lists or domain patterns for source_url, reject private/loopback address ranges and non-HTTP(S) schemes, and ideally move the remote fetching into a constrained background service rather than the main web process.

Copilot uses AI. Check for mistakes.
…rg#1932) - Add origin review and curation UI (aboutcode-org#1933) - Add origin propagation logic (aboutcode-org#1934) - Add FederatedCode deployment support (aboutcode-org#1935) - Add origin curation documentation (aboutcode-org#1936)

Signed-off-by: Zeba Fatma Khan <khanz@rknec.edu>
@zeba-source zeba-source force-pushed the fix/code-genetics-origin-curation branch from 5fbd037 to 4e962e3 Compare March 4, 2026 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants