Skip to content

Populates hashed_email of contacts and requires it to be non-null.#9369

Merged
ashercodeorg merged 1 commit into
stagingfrom
contactsHashedEmail
Jul 7, 2016
Merged

Populates hashed_email of contacts and requires it to be non-null.#9369
ashercodeorg merged 1 commit into
stagingfrom
contactsHashedEmail

Conversation

@ashercodeorg

@ashercodeorg ashercodeorg commented Jul 7, 2016

Copy link
Copy Markdown
Contributor

Depends on #9368 (for migration ordering) (note that commit 828d584 is part of that PR).

Note that hashed_email has been populated in production via bin/oneoff/back_populate_contacts_hashed_email. Thus this migration serves to (a) populate the hashed_email in non-production environments and (b) enforce that hashed_email is non-null hereout.

SELECT COUNT(0) FROM contacts;
+----------+
| COUNT(0) |
+----------+
|  2868453 |
+----------+
SELECT COUNT(0) FROM contacts WHERE hashed_email IS NULL;
+----------+
| COUNT(0) |
+----------+
|        0 |
+----------+

@ashercodeorg ashercodeorg force-pushed the contactsHashedEmail branch from fdfb6ef to db8e296 Compare July 7, 2016 17:55
@aoby

aoby commented Jul 7, 2016

Copy link
Copy Markdown
Contributor

LGTM

@Hamms

Hamms commented Jul 7, 2016

Copy link
Copy Markdown
Contributor

Any reason in particular we're not using mysql's built-in MD5? It would likely be orders of magnitude faster.

@ashercodeorg

Copy link
Copy Markdown
Contributor Author

You mean run the update as a native MySQL command rather than through rails?

Regardless, since this migration only computes hashes in non-prod environments, I don't think efficiency is a major concern.

@Hamms

Hamms commented Jul 7, 2016

Copy link
Copy Markdown
Contributor

Ah, very fair point. In that case, readability is definitely king.

This LGTM!

@aoby

aoby commented Jul 7, 2016

Copy link
Copy Markdown
Contributor

TIL MySql has built-in MD5 functionality. Thanks @Hamms. That would have probably saved a bit of time in the earlier migration that did run on production #9339. At this point it's moot.

@ashercodeorg

Copy link
Copy Markdown
Contributor Author

Is there reason to believe that MySQL's MD5 computation would be faster than Rails MD5 computation? Beyond the general speedup offered by MySQL over Rails? My impression was that we as a team want to keep as much of our database interactions as possible in Rails rather than MySQL.

@aoby

aoby commented Jul 7, 2016

Copy link
Copy Markdown
Contributor

I certainly appreciate the readability of doing it on the rails side, and I think that's where the ongoing updates belong.

However for a large-scale one-time bulk update, it's possible that leveraging Sql to update the whole table would be faster than reading and updating every row. Maybe if we find ourselves in this situation again, it would be worth a little experimentation?

@Hamms

Hamms commented Jul 7, 2016

Copy link
Copy Markdown
Contributor

Yeah, as a general rule MySQL has a decent fixed cost per update, so doing one large update tends to be shockingly faster than doing many small ones. We've run into this a couple times in the past, where two migrations that affected similar numbers of rows and did similar things ended up taking significantly different amounts of time based on how we went about doing those things.

Which really only means that it makes the most sense to do the one-off production migration in pure SQL when we can, and do this kind of migration exactly as you did.

@ashercodeorg

Copy link
Copy Markdown
Contributor Author

Not so sure. The advantage of running the data fix through Rails as in PR#9339 is that the DB load stayed low. Had we run this as one bulk update, it would have locked the entire table for some length of time (rather than locking a [changing] small number of rows briefly).

Definitely something to keep in mind.

@Hamms

Hamms commented Jul 7, 2016

Copy link
Copy Markdown
Contributor

Well, we could still run it as a handful of smaller bulk updates.

I am definitely in favor of doing some experimenting next time we have the opportunity!

@ashercodeorg

Copy link
Copy Markdown
Contributor Author

Aside: My plan is to merge this after PR#9368 successfully runs in production.

@ashercodeorg ashercodeorg merged commit 0ee0815 into staging Jul 7, 2016
@ashercodeorg ashercodeorg deleted the contactsHashedEmail branch July 7, 2016 23:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants