Fully compliant user purge operation, part 1#22166
Conversation
e57dcf1 to
8f6d3e5
Compare
There was a problem hiding this comment.
'anonymizes user.username'
d7f5880 to
bf581bf
Compare
There was a problem hiding this comment.
This introduces knowledge of UserGeo into the User model. Are we okay with that?
There was a problem hiding this comment.
The cross-model awareness seems fine to me, especially since they're related, but why don't we set this up as an association?
Something like:
has_many :user_geos, -> {order(updated_at: :desc)}
def latest_user_geo
user_geos.first
endThere was a problem hiding this comment.
That seems great. I made an assumption that @asherkach had a reason for not doing this back when he introduced the UserGeo model, something like isolating PII. I wonder if he has any thoughts on this.
There was a problem hiding this comment.
My guess is that there wasn't an obvious need for the association, so I didn't add it. Doubtful that it was related to isolating PII, as the association wouldn't have carried to Redshift (where most of the isolation concerns were centered).
There was a problem hiding this comment.
Thanks for taking a look!
There was a problem hiding this comment.
@poorvasingal @bencodeorg Please take a look at this test file and let me know if it makes any sense to you. I'd like to make these tests as readable to you as possible so we can all be on the same page about what our system actually does at any time.
There was a problem hiding this comment.
Other reviewers, I'd particularly like feedback on anything that might make these tests more readable (though I might not make changes until a future PR)
There was a problem hiding this comment.
This test against the User model is now redundant, and the behavior it was testing is strictly a subset of my larger test suite.
There was a problem hiding this comment.
Converted to a class for testability.
|
@ewjordan I think this is a good stopping point for this PR; PTAL. @aoby for a quick look - a near-future followup to this PR will start to touch PLC data as well, so might as well keep you posted. @bencodeorg @poorvasingal To check readability and keep you informed of my progress. |
There was a problem hiding this comment.
Shouldn't this be DeleteAccountsHelper.new ?
There was a problem hiding this comment.
I wrote this static helper for constructing a configured DeleteAccountsHelper while converting it to a class so I could test with stub dependencies.
Would it be more idiomatic to put default arguments on the constructor?
There was a problem hiding this comment.
Ah, for some reason I didn't see this the first time around and thought it was a typo here.
The static helper is fine, but I do think it would be a bit more idiomatic and discoverable if the constructor had default arguments. Your call.
There was a problem hiding this comment.
The cross-model awareness seems fine to me, especially since they're related, but why don't we set this up as an association?
Something like:
has_many :user_geos, -> {order(updated_at: :desc)}
def latest_user_geo
user_geos.first
endThere was a problem hiding this comment.
🎉 factories.
Why not make Seattle the default, and add a user association, so a simple create :user_geo will work? All the other combos will still be supported.
There was a problem hiding this comment.
One of the kinds of UserGeo records I want to test against is mostly empty - after sign in, we often create a UserGeo record with just a user id and IP address, nothing else. For that reason I'm leaving the :seattle and :sydney traits separate, but I've done a pass cleaning things up.
There was a problem hiding this comment.
Note these don't need to be set explicitly in a block. Since they're exposed as attributes, they will work as params to FactoryGirl.create:
user = create :teacher,
ops_first_name: 'test_value',
# ...There was a problem hiding this comment.
TIL! Wasn't sure how serialized_properties would be handled.
7f05a3e to
a420390
Compare
| # | ||
|
|
||
| class UserGeo < ActiveRecord::Base | ||
| belongs_to :user |
There was a problem hiding this comment.
@aoby Is it best practice to add this matching association even though I'm not using any of the helpers it generates, or should I omit it because it's "unused?"
There was a problem hiding this comment.
I don't know of guidance either way on that one, and I don't have a strong preference. It shouldn't be any extra overhead to add it since they're lazily loaded.
|
Feedback addressed and I've finally got a green test run after the Saucelabs excitement yesterday. PTAL? |
| user.reload | ||
| assert_empty user.email | ||
| assert_empty user.hashed_email | ||
| assert_nil user.parent_email |
There was a problem hiding this comment.
Out of curiosity, why is parent_email nil, and the others empty? Should we be consistent?
Also note that empty string will pass refute_nil so the initial 2 assertions don't necessarily show that email and hashed_email changed.
There was a problem hiding this comment.
Kind of a legacy thing - email has not null in our schema even though in practice it's often empty, but parent_email, added later, uses null as its empty value.
code-dot-org/dashboard/app/models/user.rb
Lines 7 to 8 in 132e640
I'm in favor of consistency but think it's a concern that should be addressed in its own change, which would include updating these tests.
Thanks for catching the insufficient assertion here!
| user = create :student, | ||
| provider: 'clever', | ||
| uid: 'fake-clever-uid' | ||
| assert_equal user.provider, 'clever' |
There was a problem hiding this comment.
nit: switch the args: expected first. See https://apidock.com/ruby/Test/Unit/Assertions/assert_equal
| purge_user user | ||
|
|
||
| user.reload | ||
| assert_equal user.provider, 'clever' |
There was a problem hiding this comment.
same as above.
assert_equal 'clever', user.provider| user = create :teacher | ||
| studio_person = user.studio_person | ||
| refute_nil user.studio_person_id | ||
| other_user = create :teacher, studio_person_id: studio_person.id |
There was a problem hiding this comment.
note: since studio_person is an association on this id, you can also set it directly:
create :teacher, studio_person: studio_person| def purge_user(user) | ||
| SolrHelper.stubs(:delete_document).once | ||
| DeleteAccountsHelper.new(solr: {}).purge_user(user) | ||
| end |
There was a problem hiding this comment.
Since it appears that we call user.reload after every call to purge_user they could be combined.
There was a problem hiding this comment.
I'm in favor of something like this, but wasn't sure whether it would read well. Basically, I feel like it's obvious that purge_user affects the db record, but less obvious that the user instance passed in gets mutated.
student = create :student
purge_user student
# Is the 'student' instance mutated here?
This might read better if the mutated object was returned?
student = create :student
student = purge_user student
# Not surprising that 'student' has changed
Or is this just my JavaScript idiom bleeding into Ruby?
There was a problem hiding this comment.
Discussed in person, since this helper is private to this file and purge_user student pretty obviously mutates the student when read as English, deduplicating should take priority over concerns about mutating arguments.
First in a series of PRs attempting to finish the work Asher started in #15210 as we move toward May deadlines for privacy.
Feature goal
Working toward compliance with this spec describing all the user data we need to remove from our system if a user requests to be permanently removed.
What's in this PR
I'm introducing a new test file for the currently-unused
DeleteAccountsHelper, and tests for the correct deletion behavior with respect to theUser,UserGeoandStudioPersonmodels.I've introduced just a handful of behavior changes to match spec - in most cases we were already doing the right thing.
StudioPersonon save for a purged teacher account.secret_picture_idandsecret_words.schoolandschool_info_idfrom the account.urmandracesdata if the account is from outside the US.studio_peoplerow if this was the only account using it.What's next
There's a lot more tests to write to catch up with the spec. Next I'll be diving into the correct behavior for anonymizing level progress and project data.