Skip to content

feat: redact PII fields before delete in user retirement flows#38426

Open
ktyagiapphelix2u wants to merge 32 commits into
openedx:masterfrom
ktyagiapphelix2u:ktyagi/primaryemail
Open

feat: redact PII fields before delete in user retirement flows#38426
ktyagiapphelix2u wants to merge 32 commits into
openedx:masterfrom
ktyagiapphelix2u:ktyagi/primaryemail

Conversation

@ktyagiapphelix2u
Copy link
Copy Markdown
Contributor

@ktyagiapphelix2u ktyagiapphelix2u commented Apr 23, 2026

Summary

Adds support for redacting sensitive PII fields before deletion in DeletableByUserValue models.

Changes include:

  • Added redact_before_delete_fields hook to model mixin
  • Redact emails before delete for:
    • CourseEnrollmentAllowed
    • PendingEmailChange
    • UnregisteredLearnerCohortAssignments
  • Added SQL assertion helpers to verify UPDATE occurs before DELETE
  • Added regression/unit tests for safe ID-filtered redaction and deletion flows
  • Updated email change flow to use delete_by_user_value

Ticket & Reference

https://2u-internal.atlassian.net/browse/BOMS-498

Related PR

https://2u-internal.atlassian.net/browse/BOMS-565
https://2u-internal.atlassian.net/browse/BOMS-564

@ktyagiapphelix2u ktyagiapphelix2u marked this pull request as ready for review April 23, 2026 11:29
@ktyagiapphelix2u ktyagiapphelix2u requested a review from a team as a code owner April 23, 2026 11:29
Comment thread common/djangoapps/student/tests/test_models.py Outdated
Comment thread common/djangoapps/student/models/user.py Outdated
Comment thread common/djangoapps/student/tests/test_models.py Outdated
Comment thread common/djangoapps/student/models/user.py Outdated
Comment thread common/djangoapps/student/tests/test_models.py Outdated
Comment thread common/djangoapps/student/models/user.py Outdated
Comment thread common/djangoapps/student/models/user.py Outdated
Comment thread openedx/core/djangoapps/user_api/accounts/views.py Outdated
@ktyagiapphelix2u ktyagiapphelix2u force-pushed the ktyagi/primaryemail branch 2 times, most recently from 0d5d5e6 to 2593ac2 Compare May 4, 2026 08:49
@vgulati-apphelix
Copy link
Copy Markdown
Contributor

This change addresses a privacy issue in the retirement flow for users who have a pending primary email change.

Please elaborate the problem statement and solution provided.

Comment thread common/djangoapps/student/models/user.py Outdated
Comment thread common/djangoapps/student/models/user.py Outdated
Comment thread openedx/core/djangoapps/user_api/accounts/views.py Outdated
Comment thread common/djangoapps/student/models/user.py Outdated
Comment thread common/djangoapps/student/tests/test_email.py Outdated
Comment thread common/djangoapps/student/tests/test_email.py Outdated
Copy link
Copy Markdown
Contributor

@robrap robrap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

Comment thread common/djangoapps/student/models/user.py Outdated
Comment thread openedx/core/djangoapps/user_api/accounts/tests/test_retirement_views.py Outdated
Comment thread openedx/core/djangoapps/user_api/accounts/views.py Outdated
Comment thread common/djangoapps/student/tests/test_email.py Outdated
Comment thread common/djangoapps/student/tests/test_email.py Outdated
@ktyagiapphelix2u ktyagiapphelix2u requested a review from a team as a code owner May 21, 2026 07:37
Comment thread openedx/core/djangolib/model_mixins.py Outdated
Comment thread common/djangoapps/student/tests/test_email.py Outdated
@ktyagiapphelix2u ktyagiapphelix2u changed the title fix: redact pending primary email before retirement deletion feat: redact PII fields before delete in user retirement flows May 22, 2026
Copy link
Copy Markdown
Contributor

@robrap robrap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get through the review fully. Sorry. So just leaving you with what I have.

Comment thread common/djangoapps/student/models/course_enrollment.py Outdated
Comment thread common/djangoapps/student/tests/test_models.py Outdated
Comment thread common/djangoapps/student/views/management.py
Comment thread common/djangoapps/student/views/management.py
Comment thread openedx/core/djangoapps/course_groups/tests/test_cohorts.py Outdated
Copy link
Copy Markdown
Contributor

@robrap robrap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Is looking good and getting close. We're almost there.

Comment thread common/djangoapps/student/views/management.py Outdated
Comment thread common/djangoapps/student/views/management.py
table,
num_redact_delete_pairs=1,
require_id_filter=False,
expected_redacted_value=None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why this is optional? At least one of the tests doesn't send this. Is there a reason?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is fair. I kept expected_redacted_value optional for helper reuse, but I updated the current PendingEmailChange tests to pass the explicit redacted value so coverage is stronger and no test relies on the default here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I still don't understand why this is optional. This is a test for redaction. The update shouldn't be any update, but one that includes the redacted value. Is there a call where you couldn't send the redacted value?
  2. I'm not clear on the usefulness of require_id_filter. Do we care how it is filtered? I would consider dropping this and the accompanying assertions, but maybe you could explain its importance?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. agreed on both. I've made expected_redacted_value a required positional parameter there's no valid call site that couldn't provide it, so making it optional was just unnecessary flexibility.

  2. Dropped require_id_filter entirely as well, we don't need to assert how the rows are filtered, only that the redaction UPDATE happens before the DELETE, which is what the helper already verifies.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both issues are fixed.

expected_redacted_value is now a required positional parameter since every valid call site already provides it, so the optional default was unnecessary.

require_id_filter and its related assertions have been removed. We only need to verify that the redaction UPDATE happens before the DELETE

Comment thread openedx/core/djangolib/model_mixins.py Outdated
Comment on lines +53 to +54
record_ids = list(records_matching_user_value.values_list('id', flat=True))
records_matching_ids = cls.objects.filter(id__in=record_ids)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. If needed at all, shouldn't this live in if redact_fields: conditional, so it is only needed when we actually need to do an update?
  2. What is the difference between records_matching_user_value and records_matching_ids, and why do we need to do another lookup? The original code already had records_matching_user_value.delete(), which is presumably deleting the correct records. Isn't this what we want to update?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve updated the mixin so the second lookup only happens when redaction is actually needed.

  • records_matching_user_value is the original queryset filtered by the user field (for example email=value).

  • We only build the ID-based queryset when redact_fields is non-empty, because redacting may change that same filter field, and we still need to delete the exact same rows.

  • For non-redacting models, we now directly call delete on the original queryset (no extra lookup).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #38426 (comment) for continuation.


@classmethod
def redact_before_delete_fields(cls):
return {'email': 'redacted-before-delete@safe.com'}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we adjust this to include two fields to ensure that would work appropriately?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes updated

Comment on lines +76 to +77
with mock.patch.object(model_cls, 'objects', create=True) as mock_objects:
mock_objects.filter.return_value = queryset
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this mocking makes me nervous. Can the test models be updated to be real Django models, like?:

class NonRedactingModel(DeletableByUserValue, models.Model):

Not sure how simple this would be, but the less mocking the better, especially in this case.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. but

I kept this test mocked because it is a small unit test for mixin logic only.
Making real Django models here would add a lot of setup and make the test harder to read.

We already test this behavior with real database models in:

common/djangoapps/student/tests/test_models.py
openedx/core/djangoapps/course_groups/tests/test_cohorts.py
openedx/core/djangoapps/user_api/accounts/tests/test_retirement_views.py

So this test stays simple, and real DB behavior is already covered elsewhere.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktyagiapphelix2u: I think mock or non-mock would be fine, so feel free to choose what you prefer. However, I'm confused, because I thought when we met you said you liked the full Django model version better. Did you have trouble with it when you removed from INSTALLED_APPS, or something else?

Copy link
Copy Markdown
Contributor

@robrap robrap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get through review, but here is something...

table,
num_redact_delete_pairs=1,
require_id_filter=False,
expected_redacted_value=None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I still don't understand why this is optional. This is a test for redaction. The update shouldn't be any update, but one that includes the redacted value. Is there a call where you couldn't send the redacted value?
  2. I'm not clear on the usefulness of require_id_filter. Do we care how it is filtered? I would consider dropping this and the accompanying assertions, but maybe you could explain its importance?

Comment thread openedx/core/djangolib/model_mixins.py Outdated
Comment on lines +55 to +59
# When redacting the field used for filtering, switch to ID-based
# operations so the subsequent delete still targets the same rows.
record_ids = list(records_matching_user_value.values_list('id', flat=True))
records_matching_ids = cls.objects.filter(id__in=record_ids)
records_matching_ids.update(**redact_fields)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[New comment for addressing https://github.com//pull/38426#discussion_r3309021075]

I still don't understand why this can't be records_matching_user_value.update(**redact_fields). The existing code uses records_matching_user_value.delete(), and includes all records that need to be deleted. We simply want to redact all records we are about to delete.

Does this code not work for some reason, or are you just trying to delete values outside of the scope of how this function was called? Doesn't that seem dangerous? We only want to redact what is being deleted, and we don't want to delete anything different from what this was already deleting.

Copy link
Copy Markdown
Contributor Author

@ktyagiapphelix2u ktyagiapphelix2u May 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robrap I got it know that records_matching_user_value.update(**redact_fields) is the correct and simpler approach in most cases and the code now uses it by default. The ID-based path is only taken in the one specific case where it's unavoidable when the filter field is the same field being redacted (e.g. CourseEnrollmentAllowed and UnregisteredLearnerCohortAssignments both filter by email and redact email).

In that case, records_matching_user_value.update(email='redacted-before-delete@safe.com') succeeds, but the subsequent records_matching_user_value.delete() generates DELETE WHERE email = 'original@example.com' — which now matches zero rows because the email was just changed. The rows get redacted but never deleted.

Capturing IDs before the update is the minimal fix for that specific situation, and it only affects those two models. All other callers (filtering by user, not email) now use the simple path you suggested.

Updated the CODE

Copy link
Copy Markdown
Contributor

@robrap robrap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktyagiapphelix2u: It's looking great. Very minor proposals. Let's try to look at your final changes together on Monday so we can merge. Thank you.

"""


def assert_update_before_delete(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename this to assert_redact_before_delete.

Comment on lines +13 to +14
Assert that UPDATE and DELETE queries for ``table`` occur in consecutive pairs,
and that each UPDATE contains the expected redacted value.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposed update:

Suggested change
Assert that UPDATE and DELETE queries for ``table`` occur in consecutive pairs,
and that each UPDATE contains the expected redacted value.
Assert that PII is redacted before deleted.
Redacting before deleting protects downstream sync of data from holding PII
in soft-deleted records. This helper ensures UPDATE and DELETE queries for
``table`` occur in consecutive pairs, and that each UPDATE contains the
expected redacted value.

@@ -0,0 +1,34 @@
"""
SQL assertion helpers for tests.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just move this to https://github.com/openedx/openedx-platform/blob/master/openedx/core/djangolib/testing/utils.py, rather than creating a new utility file.

Comment on lines +76 to +77
with mock.patch.object(model_cls, 'objects', create=True) as mock_objects:
mock_objects.filter.return_value = queryset
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktyagiapphelix2u: I think mock or non-mock would be fine, so feel free to choose what you prefer. However, I'm confused, because I thought when we met you said you liked the full Django model version better. Did you have trouble with it when you removed from INSTALLED_APPS, or something else?

def delete_by_user_value(cls, value, field):
"""
Deletes instances of this model where ``field`` equals ``value``.
Redacts and deletes instances of this model where ``field`` equals ``value``.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor update:

Suggested change
Redacts and deletes instances of this model where ``field`` equals ``value``.
Optionally redacts and always deletes instances of this model where ``field`` equals ``value``.

Comment on lines +41 to +42
If ``redact_before_delete_fields`` returns a non-empty dict, matching
records are bulk-updated with those redacted values before delete.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposed update:

Suggested change
If ``redact_before_delete_fields`` returns a non-empty dict, matching
records are bulk-updated with those redacted values before delete.
If ``redact_before_delete_fields()`` returns a non-empty dict, the
returned PII fields are redacted before any records are deleted.

def assert_update_before_delete(
sql_list,
table,
expected_redacted_value,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want this to be expected_redacted_value_list, and we can ensure it is non-empty, and then loop over it and check each value, in case we redact multiple values. You don't need to check that the right field is updated for each field, although we could. That would require passing in the field as well, so maybe we just keep it simple.

Note: This will allow this to be re-used for the social auth PR, I hope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants