Skip to content

Fully compliant user purge operation, part 1#22166

Merged
islemaster merged 27 commits into
stagingfrom
purge-user-tests
May 10, 2018
Merged

Fully compliant user purge operation, part 1#22166
islemaster merged 27 commits into
stagingfrom
purge-user-tests

Conversation

@islemaster

@islemaster islemaster commented May 2, 2018

Copy link
Copy Markdown
Contributor

First in a series of PRs attempting to finish the work Asher started in #15210 as we move toward May deadlines for privacy.

Feature goal

Working toward compliance with this spec describing all the user data we need to remove from our system if a user requests to be permanently removed.

What's in this PR

I'm introducing a new test file for the currently-unused DeleteAccountsHelper, and tests for the correct deletion behavior with respect to the User, UserGeo and StudioPerson models.

I've introduced just a handful of behavior changes to match spec - in most cases we were already doing the right thing.

  • Don't create a new StudioPerson on save for a purged teacher account.
  • Remove the account's secret_picture_id and secret_words.
  • Remove any school and school_info_id from the account.
  • Remove urm and races data if the account is from outside the US.
  • Remove the associated studio_people row if this was the only account using it.

What's next

There's a lot more tests to write to catch up with the spec. Next I'll be diving into the correct behavior for anonymizing level progress and project data.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'anonymizes user.username'

@islemaster islemaster force-pushed the purge-user-tests branch 2 times, most recently from d7f5880 to bf581bf Compare May 8, 2018 00:57
@islemaster islemaster changed the title [WIP] Fully compliant user purge operation Fully compliant user purge operation, part 1 May 8, 2018
Comment thread dashboard/app/models/user.rb Outdated

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces knowledge of UserGeo into the User model. Are we okay with that?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cross-model awareness seems fine to me, especially since they're related, but why don't we set this up as an association?

Something like:

has_many :user_geos, -> {order(updated_at: :desc)}
def latest_user_geo
  user_geos.first
end

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems great. I made an assumption that @asherkach had a reason for not doing this back when he introduced the UserGeo model, something like isolating PII. I wonder if he has any thoughts on this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess is that there wasn't an obvious need for the association, so I didn't add it. Doubtful that it was related to isolating PII, as the association wouldn't have carried to Redshift (where most of the isolation concerns were centered).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@poorvasingal @bencodeorg Please take a look at this test file and let me know if it makes any sense to you. I'd like to make these tests as readable to you as possible so we can all be on the same page about what our system actually does at any time.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other reviewers, I'd particularly like feedback on anything that might make these tests more readable (though I might not make changes until a future PR)

Comment thread dashboard/test/models/user_test.rb Outdated

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test against the User model is now redundant, and the behavior it was testing is strictly a subset of my larger test suite.

Comment thread lib/cdo/delete_accounts_helper.rb Outdated

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converted to a class for testability.

@islemaster islemaster requested a review from aoby May 8, 2018 01:22
@islemaster

islemaster commented May 8, 2018

Copy link
Copy Markdown
Contributor Author

@ewjordan I think this is a good stopping point for this PR; PTAL.

@aoby for a quick look - a near-future followup to this PR will start to touch PLC data as well, so might as well keep you posted.

@bencodeorg @poorvasingal To check readability and keep you informed of my progress.

Comment thread bin/cron/delete_accounts Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be DeleteAccountsHelper.new ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote this static helper for constructing a configured DeleteAccountsHelper while converting it to a class so I could test with stub dependencies.

https://github.com/code-dot-org/code-dot-org/blob/7f05a3eca19d9f5a9620c0f7028153b221ef3ab2/lib/cdo/delete_accounts_helper.rb#L21-L34

Would it be more idiomatic to put default arguments on the constructor?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, for some reason I didn't see this the first time around and thought it was a typo here.

The static helper is fine, but I do think it would be a bit more idiomatic and discoverable if the constructor had default arguments. Your call.

Comment thread dashboard/app/models/user.rb Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cross-model awareness seems fine to me, especially since they're related, but why don't we set this up as an association?

Something like:

has_many :user_geos, -> {order(updated_at: :desc)}
def latest_user_geo
  user_geos.first
end

Comment thread dashboard/test/factories/factories.rb Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 factories.

Why not make Seattle the default, and add a user association, so a simple create :user_geo will work? All the other combos will still be supported.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the kinds of UserGeo records I want to test against is mostly empty - after sign in, we often create a UserGeo record with just a user id and IP address, nothing else. For that reason I'm leaving the :seattle and :sydney traits separate, but I've done a pass cleaning things up.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note these don't need to be set explicitly in a block. Since they're exposed as attributes, they will work as params to FactoryGirl.create:

user = create :teacher, 
  ops_first_name: 'test_value',
  # ...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL! Wasn't sure how serialized_properties would be handled.

#

class UserGeo < ActiveRecord::Base
belongs_to :user

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aoby Is it best practice to add this matching association even though I'm not using any of the helpers it generates, or should I omit it because it's "unused?"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know of guidance either way on that one, and I don't have a strong preference. It shouldn't be any extra overhead to add it since they're lazily loaded.

@islemaster

Copy link
Copy Markdown
Contributor Author

Feedback addressed and I've finally got a green test run after the Saucelabs excitement yesterday. PTAL?

user.reload
assert_empty user.email
assert_empty user.hashed_email
assert_nil user.parent_email

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why is parent_email nil, and the others empty? Should we be consistent?

Also note that empty string will pass refute_nil so the initial 2 assertions don't necessarily show that email and hashed_email changed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of a legacy thing - email has not null in our schema even though in practice it's often empty, but parent_email, added later, uses null as its empty value.

# email :string(255) default(""), not null
# parent_email :string(255)

I'm in favor of consistency but think it's a concern that should be addressed in its own change, which would include updating these tests.

Thanks for catching the insufficient assertion here!

user = create :student,
provider: 'clever',
uid: 'fake-clever-uid'
assert_equal user.provider, 'clever'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: switch the args: expected first. See https://apidock.com/ruby/Test/Unit/Assertions/assert_equal

purge_user user

user.reload
assert_equal user.provider, 'clever'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above.

assert_equal 'clever', user.provider

user = create :teacher
studio_person = user.studio_person
refute_nil user.studio_person_id
other_user = create :teacher, studio_person_id: studio_person.id

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: since studio_person is an association on this id, you can also set it directly:

create :teacher, studio_person: studio_person

def purge_user(user)
SolrHelper.stubs(:delete_document).once
DeleteAccountsHelper.new(solr: {}).purge_user(user)
end

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it appears that we call user.reload after every call to purge_user they could be combined.

@islemaster islemaster May 10, 2018

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of something like this, but wasn't sure whether it would read well. Basically, I feel like it's obvious that purge_user affects the db record, but less obvious that the user instance passed in gets mutated.

student = create :student
purge_user student
# Is the 'student' instance mutated here?

This might read better if the mutated object was returned?

student = create :student
student = purge_user student
# Not surprising that 'student' has changed

Or is this just my JavaScript idiom bleeding into Ruby?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed in person, since this helper is private to this file and purge_user student pretty obviously mutates the student when read as English, deduplicating should take priority over concerns about mutating arguments.

@islemaster islemaster merged commit 7ddc491 into staging May 10, 2018
@islemaster islemaster deleted the purge-user-tests branch May 10, 2018 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants