Skip to content

remove user data from cached bonus level summaries#40035

Merged
cforkish merged 3 commits into
stagingfrom
cforkish/remove-user-data-from-summarize
Apr 21, 2021
Merged

remove user data from cached bonus level summaries#40035
cforkish merged 3 commits into
stagingfrom
cforkish/remove-user-data-from-summarize

Conversation

@cforkish

@cforkish cforkish commented Apr 12, 2021

Copy link
Copy Markdown
Contributor

this is a followup to #39733, which was reverted due to a failing eyes test during DTT. this PR separates out issue (2) mentioned in that PR because when rebasing that PR i encountered merge conflicts that highlight the need to raise awareness of this issue separately from the specific bugs #39733 was addressing.

currently, we cache bonus level summaries on a script instance, and when summarizing the bonus levels, we pass in the current user to get a user-specific summary. however, our script instances are cached per EC2 instance, which means that when we retrieve bonus level summaries, they include user-specific data for whatever user first viewed those bonus levels on that EC2 instance! consequently, every time the page is reloaded it will show the data cached on the EC2 instance that our load balancer routed to.

this results in very unexpected behavior, because each page reload will potentially show different progress, and occasionally even a different language!

this PR updates it so we don't include user data in the cached summaries, and instead add it per request.

Links

scope: [:data, :bubble_choice_description],
default: ul.level.bubble_choice_description
)
localized_level_display_name = I18n.t(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to lurk here but I'm wondering why you didn't add the locale to the cache key and keep the localization logic where it was? It seems to make more sense in the model (to me) if possible

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for lurking! i have no idea what you're talking about!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that said, i would like to push back on this making more sense in the model. ScriptLevel and UserLevel are separate models, so imo a summary of one shouldn't include anything from the other.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry i was mixing up the user data issue with the localization issue. i don't think we should be including user data in the script level summary, but including localization in the model makes sense if we have locale-based caching... which i don't believe we have yet.

@davidsbailey davidsbailey Apr 13, 2021

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if this got cleared up yesterday, but we do have locale-based caching, configured here: https://github.com/code-dot-org/code-dot-org/blob/staging/cookbooks/cdo-varnish/libraries/http_cache.rb if the language cookie (and header) is allowed through, then the page can be cached differently for each locale.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the language cookie (and header) the only ways to set locale in our system?

my understanding is yes, but the other reviewers may have more accurate info.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing why a lesson extras path in the express course would be cached at all, though, because I do not see express listed in the CACHED_SCRIPTS_MAP.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http_cache.rb controls which cookies and headers are allowed through which affect the granularity with which we cache things. the other half of the puzzle is what cache headers get set / whether the page is allowed to be cached at all. if express is not supposed to be cached, then it seems like we actually shouldn't be allowing caching. so it could be worth checking the cache headers, and/or seeing if this logic is doing the right thing for lesson extras:

def disable_session_for_cached_pages
if ScriptLevelsController.cachable_request?(request)
request.session_options[:skip] = true
end
end

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, the cache that Charlie is talking about here is the script cache in dashboard (not http caching).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, thanks James! Sorry, I misunderstood that. most of what I said here is off topic, then.

@cforkish

cforkish commented Apr 13, 2021

Copy link
Copy Markdown
Contributor Author

I've created FND-1555 to track potential followup work.


def get_bonus_script_levels(current_stage, current_user)
def get_bonus_script_levels(current_stage)
unless @all_bonus_script_levels

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something, I thought a script is only cached once here

script_cache.fetch(cache_key) do
# Populate cache on miss.
script_cache[cache_key] = get_without_cache(id_or_name)
end

Bonus levels are not cached there, but in the @all_bonus_script_levels class instance variable. If you have to re-fetch/re-process bonus level data every time, isn't it easier to not use @all_bonus_script_levels anymore?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better solution, keep the @all_bonus_script_levels variable but only do level summarization on-demand.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hacodeorg thanks for the suggestion. i had been thinking that our script cache stored the script summaries, but now i see that it's only the instances, and summaries are always done on-demand. so this problem is unique to the bonus levels since we were caching the summaries locally on the instance.

can you take a look at the update i pushed to Script.get_bonus_script_levels to see if that's what you had in mind? i am still caching @all_bonus_levels, but only summarizing them on demand as you suggested.

i will close [FND-1555] since it no longer seems relevant.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Charlie. This part looks good to me.


def get_bonus_script_levels(current_stage, current_user)
def get_bonus_script_levels(current_stage)
unless @all_bonus_script_levels

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Charlie. This part looks good to me.

# so we need to merge in the user's progress.
script_bonus_levels_by_stage.each do |stage|
stage[:levels].each do |level_summary|
ul = UserLevel.find_by(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We sometime return more than 10+ bonus levels at once, an example: https://studio.code.org/s/coursef-2020/stage/19/extras (there are arrows to navigate among bonus levels). Could it cause a performance hit by making one DB query per level here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this query was already happening once per level in script_level.rb line 497, and is necessary in order to show users' progress on the levels. this will cause a slight performance hit since previously the results of those queries were cached, but that's specifically the bug that this PR is fixing -- since the results were cached, we were showing the wrong progress data (whatever data happened to be cached, rather than the current user's own data).

so while this change does result in a performance hit, it's a necessary one, and in my testing i have not seen a noticeable degradation in page load time.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reducing DB queries is also important for maintaining database uptime during peak traffic, though I'm not sure if this is still a big concern now that we've moved to aurora. in general our strategy is to cache HOC scripts (which do not use lesson extras anyway) so that these types of requests do not hit the server or DB on each request. all this to say it seems ok to me to add a few DB queries here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you both for the explanation; I'm okay with that.
In case it causes performance hit later, combining all those individual queries into 1 may help UserLevel.find_by(user_id: user.id, script: @script, level_id: <an_array_of_level_ids>).

@hacodeorg hacodeorg left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Charlie. FND will pay attention to situations where translations could be inadvertently cached.

# so we need to merge in the user's progress.
script_bonus_levels_by_stage.each do |stage|
stage[:levels].each do |level_summary|
ul = UserLevel.find_by(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you both for the explanation; I'm okay with that.
In case it causes performance hit later, combining all those individual queries into 1 may help UserLevel.find_by(user_id: user.id, script: @script, level_id: <an_array_of_level_ids>).

@cforkish cforkish force-pushed the cforkish/remove-user-data-from-summarize branch from 9d2dd6c to 25a6f5d Compare April 21, 2021 00:28
@cforkish cforkish merged commit d869516 into staging Apr 21, 2021
@cforkish cforkish deleted the cforkish/remove-user-data-from-summarize branch April 21, 2021 19:15
cforkish pushed a commit that referenced this pull request May 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants