feat: implement code-genetics origin curation, review, propagation and FederatedCode deployment#2077
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a new “origin curation” feature set to ScanCode.io, spanning a web UI list view for origin determinations, REST APIs for CRUD/bulk actions/propagation + curation import/export, new pipelines/management commands for detection + propagation and FederatedCode workflows, and supporting models/migrations/docs.
Changes:
- Added
CodeOriginDeterminationand multiple curation federation models (sources/provenance/conflicts/exports) plus admin integration. - Implemented origin review UI (list + edit modal + bulk actions) and API endpoints (origin determinations + propagation + curation import/export + conflict/source viewsets).
- Added origin detection/propagation and FederatedCode import/export pipelines, management commands, and extensive documentation.
Reviewed changes
Copilot reviewed 38 out of 38 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
scanpipe/views.py |
Adds OriginDeterminationListView for project-scoped origin determinations listing. |
scanpipe/urls.py |
Routes /project/<slug>/origin-determinations/ to the new list view. |
scanpipe/templates/scanpipe/origin_determination_list.html |
New UI page for listing/editing/verifying origins with bulk actions. |
scanpipe/templates/scanpipe/includes/project_summary_level.html |
Adds “Origin Determinations” to the project summary nav with a count. |
scanpipe/filters.py |
Adds OriginDeterminationFilterSet for searching/filtering/sorting origin determinations. |
scanpipe/models.py |
Adds CodeOriginDetermination model and Project.origin_determination_count. |
scanpipe/api/serializers.py |
Adds CodeOriginDeterminationSerializer and wires it into serializer lookup. |
scanpipe/api/views.py |
Adds CodeOriginDeterminationViewSet and curation source/conflict endpoints + export/import actions. |
scancodeio/urls.py |
Registers new API routes for origin determinations and curation sources/conflicts. |
scanpipe/admin.py |
Registers admin pages for origin determinations + curation federation models. |
scanpipe/models_curation.py |
Adds federation models: CurationSource, CurationProvenance, CurationConflict, CurationExport. |
scanpipe/curation_schema.py |
Adds dataclass-based curation exchange schema + validator. |
scanpipe/pipelines/origin_detection.py |
Adds sample origin detection pipeline. |
scanpipe/pipelines/origin_detection_with_propagation.py |
Adds combined detection+propagation pipeline and propagation-only pipeline. |
scanpipe/pipelines/curation_federatedcode.py |
Adds pipelines for exporting/importing curations to/from FederatedCode and exporting to file. |
scanpipe/management/commands/propagate-origins.py |
Adds CLI command for propagation. |
scanpipe/management/commands/export-curations.py |
Adds CLI command for exporting curations. |
scanpipe/management/commands/import-curations.py |
Adds CLI command for importing curations. |
scanpipe/management/commands/resolve-curation-conflicts.py |
Adds CLI command for automated conflict resolution. |
scanpipe/migrations/0001_add_origin_determination.py |
Introduces migration for origin determination model. |
scanpipe/migrations/0002_add_origin_propagation.py |
Introduces propagation fields migration. |
scanpipe/migrations/0003_add_curation_federation.py |
Introduces federation models migration. |
scancodeio/static/origin-determination.js |
Adds frontend behavior for selection, modal editing, and bulk verify/amend. |
docs/index.rst |
Adds new origin curation docs to Sphinx index/toctree. |
docs/ORIGIN_PROPAGATION_GUIDE.md |
Adds propagation documentation. |
docs/ORIGIN_DETERMINATION_FEATURE.md |
Adds feature documentation. |
docs/ORIGIN_CURATION_README.md |
Adds documentation “map”/README. |
ORIGIN_PROPAGATION_QUICK_REFERENCE.md |
Adds top-level quick reference doc. |
ORIGIN_PROPAGATION_IMPLEMENTATION.md |
Adds implementation summary doc. |
ORIGIN_CURATION_DOCUMENTATION_SUMMARY.md |
Adds documentation summary doc. |
IMPLEMENTATION_SUMMARY.md |
Adds implementation summary doc. |
FEDERATEDCODE_CURATION_IMPLEMENTATION.md |
Adds FederatedCode curation implementation summary doc. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ```bash | ||
| # Import from FederatedCode Git repository | ||
| python manage.py import-curations \ | ||
| --project my-project \ | ||
| --source-url https://github.com/curations/pkg-npm-example.git \ | ||
| --source-name "Community Curations" | ||
|
|
||
| # Import with conflict strategy | ||
| python manage.py import-curations \ | ||
| --project my-project \ | ||
| --source-url https://github.com/curations/pkg-npm-example.git \ | ||
| --conflict-strategy highest_confidence | ||
|
|
||
| # Dry run (preview without making changes) | ||
| python manage.py import-curations \ | ||
| --project my-project \ | ||
| --source-url https://example.com/curations.json \ | ||
| --dry-run | ||
|
|
||
| # Available conflict strategies: | ||
| # - manual_review: Create conflict records for manual resolution (default) | ||
| # - keep_existing: Keep existing curations, skip imports | ||
| # - use_imported: Replace existing with imported curations | ||
| # - highest_confidence: Use curation with higher confidence score | ||
| # - highest_priority: Use source with higher priority | ||
| ``` | ||
|
|
||
| #### Via Pipeline |
There was a problem hiding this comment.
The import-curations workflow described here relies on a user-supplied --source/source_url that the server fetches directly, which in the current implementation is done with a raw HTTP(S) request to that URL. If an attacker can invoke this API/command with an arbitrary URL, they can abuse it as an SSRF primitive to make the ScanCode.io backend issue requests to internal services (e.g., metadata endpoints or internal HTTP APIs) and ingest the responses as curation data that they can then read back via the UI/API. To mitigate this, the implementation behind import-curations should enforce strict allow-lists or domain patterns for source_url, reject private/loopback address ranges and non-HTTP(S) schemes, and ideally move the remote fetching into a constrained background service rather than the main web process.
…rg#1932) - Add origin review and curation UI (aboutcode-org#1933) - Add origin propagation logic (aboutcode-org#1934) - Add FederatedCode deployment support (aboutcode-org#1935) - Add origin curation documentation (aboutcode-org#1936) Signed-off-by: Zeba Fatma Khan <khanz@rknec.edu>
5fbd037 to
4e962e3
Compare
Summary
This PR implements the code-genetics origin curation and review system
as part of issue #1932. It covers all 4 sub-issues listed in the
parent checklist.
Changes Made
#1933 - Origin Review and Curation UI
#1934 - Origin Propagation
#1935 - FederatedCode Deployment
#1936 - Origin Curation Guide
Related Issues
Closes #1933
Closes #1934
Closes #1935
Closes #1936
Closes #1932
Testing