Populates hashed_email of contacts and requires it to be non-null.#9369
Conversation
fdfb6ef to
db8e296
Compare
|
LGTM |
|
Any reason in particular we're not using mysql's built-in MD5? It would likely be orders of magnitude faster. |
|
You mean run the update as a native MySQL command rather than through rails? Regardless, since this migration only computes hashes in non-prod environments, I don't think efficiency is a major concern. |
|
Ah, very fair point. In that case, readability is definitely king. This LGTM! |
|
Is there reason to believe that MySQL's MD5 computation would be faster than Rails MD5 computation? Beyond the general speedup offered by MySQL over Rails? My impression was that we as a team want to keep as much of our database interactions as possible in Rails rather than MySQL. |
|
I certainly appreciate the readability of doing it on the rails side, and I think that's where the ongoing updates belong. However for a large-scale one-time bulk update, it's possible that leveraging Sql to update the whole table would be faster than reading and updating every row. Maybe if we find ourselves in this situation again, it would be worth a little experimentation? |
|
Yeah, as a general rule MySQL has a decent fixed cost per update, so doing one large update tends to be shockingly faster than doing many small ones. We've run into this a couple times in the past, where two migrations that affected similar numbers of rows and did similar things ended up taking significantly different amounts of time based on how we went about doing those things. Which really only means that it makes the most sense to do the one-off production migration in pure SQL when we can, and do this kind of migration exactly as you did. |
|
Not so sure. The advantage of running the data fix through Rails as in PR#9339 is that the DB load stayed low. Had we run this as one bulk update, it would have locked the entire table for some length of time (rather than locking a [changing] small number of rows briefly). Definitely something to keep in mind. |
|
Well, we could still run it as a handful of smaller bulk updates. I am definitely in favor of doing some experimenting next time we have the opportunity! |
|
Aside: My plan is to merge this after PR#9368 successfully runs in production. |
Depends on #9368 (for migration ordering) (note that commit 828d584 is part of that PR).
Note that
hashed_emailhas been populated in production viabin/oneoff/back_populate_contacts_hashed_email. Thus this migration serves to (a) populate thehashed_emailin non-production environments and (b) enforce thathashed_emailis non-null hereout.