This module introduces/lifts restrictions (authorization) for the management of data and meta data on DCOR. The corresponding UI elements are modified accordingly:
Authorization (auth.py)
- datasets: do not allow deleting datasets unless they are drafts
- datasets: allow purging of deleted datasets
- datasets: do not allow switching to a more restrictive license
- datasets: do not allow changing the name (slug)
- datasets: do not allow adding resources to non-draft datasets
- datasets: do not allow to set the visibility of a public dataset to private
- datasets: do not allow uploading datasets via the web interface
- organization: do not allow bulk_update_delete (e.g. datasets by organization admins)
- resources: do not allow deleting resources unless they are drafts
- resources: only allow changing the "description"
- resources: do not allow setting a resource id when uploading
- user: allow all logged-in users to create datasets, circles, and collections
Validation (validate.py)
- datasets: force user to select authors
- datasets: author list "authors" is CSV
- datasets: parse DOI field (remove URL part)
- datasets: force user to select a license
- datasets: restrict to basic CC licenses
- datasets: automatically generate dataset name (slug) using random characters if necessary (does not apply to admins)
- datasets: a dataset without resources is considered to be a draft; it's state cannot be set to "active"
- datasets: the state of a dataset that does not contain at least one valid .rtdc resource cannot be set to "active"
- resources: do not allow uploading resources with the same name for a dataset (important for archiving and reproducibility)
- resources: make sure the resource name matches the file name of the upload; this is actually implemented in plugin.before_create (IResourceController) and not in validate.py
- resources: custom resource name is overridden during upload
- resources: do not allow weird characters in resource names
- resources: restrict upload data extensions to .'.rtdc', '.ini', '.csv', '.tsv', '.pdf', '.txt', '.jpg', '.png', '.tif', '.py', '.ipynb', '.poly', '.sof', '.so2'
- resources: configuration metadata (using dclab.dfn.config_funcs)
- resources: 's3_available' and 's3_url' for identifying objects that are available on S3 and for downloading public datasets.
IPermissionLabels (plugin.py)
- Allow a user A to see user B's private dataset if the private dataset is in a group that user A is a member of.
UI Dataset:
- hide "add new resource" button in
templates/package/resources.html - add field
authors(csv list) - add field
doi(validator parses URLs) - add field
references(parses arxiv, bioRxiv, DOI, links) - add CC license file
licenses.json(only show less restrictive licenses when editing the dataset)
- hide "add new resource" button in
UI Organization:
- remove "Delete" button in bulk view
UI Resource:
- Do not show these variables (because they are redundant):
['last modified', 'revision id', 'url type', 'state', 'on same domain']
(
templates/package/resource_read.html) - Show DC config data via "toggle-more"
- Add supplementary resource schema via json files located in dcor_schemas/resource_schema_supplements
- Do not show these variables (because they are redundant):
['last modified', 'revision id', 'url type', 'state', 'on same domain']
(
Background jobs:
- set the mimetype for each dataset
- populate "dc:sec:key" metadata for each DC dataset
- generates sha256 hash upon resource creation
- populate etag resource property from S3 storage upon resource creation
Configuration keywords:
- the
ckanext.dcor_schemas.allow_content_listing_for_anonboolean parameter can be set to False to prevent anonymous users to see circles, colletions, and other content. - the
ckanext.dcor_schemas.allow_public_datasetsboolean parameter can be used to disable the creation of public datasets (e.g. for DCOR-med). - the
ckanext.dcor_schemas.json_resource_schema_dirparameter can be used to specify a directory containing .json files that define the supplementary resource schema. The default ispackagewhich means that the supplementary resource schema of this extension is used. - the
ckanext.dcor_schemas.notify_user_createboolean parameter defines whether the site maintainer receives an email for ever user that is created. - These DCOR-wide configuration options for accessing S3 object storage
endpoint_urlbucket_nameaccess_key_idsecret_access_keyssl_verify
- the
API extensions:
resource_upload_s3_urlsreturns a dictionary containing the upload URLs (single file or multipart) required for uploading a new resource directly to S3resource_schema_supplementsreturns a dictionary of the current supplementary resource schemasupported_resource_suffixesreturns a list of supported resource suffixes
Signals:
- Send an email to the maintainer when a user is created.
CLI:
CKAN command
list-circlesreturns the list of DCOR circlesCKAN command
list-collectionsreturns the list of DCOR collectionsCKAN command
list-group-resources <NAME>returns the list of resources in a DCOR circle or collectionCKAN command
list-zombie-usersfor users with no datasets and no activity for a certain amount of timeCKAN command
run-jobs-dcor-schemasthat runs all background jobs for all resources (if not already done)CKAN command
dcor-move-dataset-to-circlefor moving a dataset to a different circleCKAN command
dcor-prune-draft-datasetsfor removing old draft datasets from the CKAN database:ckan dcor-prune-draft-datasets --older-than-days 21 --dry-run
CKAN command
dcor-prune-orphaned-s3-artifactsfor removing objects from S3 that are not in the CKAN database:ckan dcor-prune-orphaned-s3-artifacts --older-than-days 21 --dry-run
CKAN command
dcor-purge-unused-collections-and-circlesfor removing collections and circles that are old and don't contain any datasets:ckan dcor-purge-unused-collections-and-circles --modified-before-months 12 --dry-run
CKAN command
send_mailfor sending emails using the CKAN email credentials
Simply run
pip install ckanext-dcor_schemas
In the configuration file ckan.ini:
ckan.plugins = [...] dcor_schemas ckan.extra_resource_fields = etag sha256
If CKAN/DCOR is installed and setup for testing, this extension can be tested with pytest:
pytest ckanext
Testing is implemented via GitHub Actions. You may also set up a local docker container with CKAN and MinIO. Take a look at the GitHub Actions workflow for more information.