JotForm Survey Summary Pipeline Design by hacodeorg · Pull Request #28071 · code-dot-org/code-dot-org

hacodeorg · 2019-04-17T18:07:20Z

https://codedotorg.atlassian.net/browse/PLC-237

To see friendly format of the design doc: File changed -> survey_summary_design.md -> View file

You can comment on specific lines here. I'm not sure if there is a better way to do this yet.

This format is an experiment to open our design publicly just like our code, and also to compare Github to Google docs for gathering design feedback.

agealy · 2019-04-19T17:41:11Z

@hacodeorg do you have a rough cost estimate for approach #3? both
a. starting with step 3 and building only enough to support 2019 surveys
b. building the complete pipeline to support 2019 surveys and be readily extensible for future survey additions

hacodeorg · 2019-04-19T18:12:01Z

I will add a ballpark estimation. However, I'm not quite confident with my estimation right now. Looking for the team help to estimate it.

@hacodeorg do you have a rough cost estimate for approach #3? both
a. starting with step 3 and building only enough to support 2019 surveys
b. building the complete pipeline to support 2019 surveys and be readily extensible for future survey additions

islemaster · 2019-04-20T00:31:39Z

+  :Modifier;
+  note right
+    Modify question and answer data to make them aggregatable.
+    E.g. update question unique name or convert value range to make values consistent.


I'd like more detail on this step.

How does the system know what modifications to perform?

Who writes these modification rules?

Are we encoding these rules as checked-in code, or is this configuration in the same way that survey ids are configuration?

When are the rules written, or how do we find out that new rules need to be written? Is there any user impact in a possible delay here?

How do these rules propagate across environments?

Can we present some specific example rules?

Added to Appendix.

islemaster · 2019-04-20T00:33:00Z

+
+
+# Recommendation
+Option 3. Build a new generic survey pipeline, starting from step 3(Summarize module) first and create adapters to plug it in the current pipeline. This approach allows us to take advantage of existing work and support 2019 CSF survey first. Then, gradually build out the rest of the pipeline later.


I agree with this recommendation - this is necessary work, but we probably can't do it all at once, so building a small part at a time seems good. Can we get any more specific on what we'd build first? Step 3 is fairly large.

islemaster · 2019-04-20T00:34:20Z

I'd like to see a rollout plan as well: Do we move existing surveys over to the new system, or continue supporting the old system indefinitely? What does that process look like? How are we planning to ensure a smooth transition?

islemaster · 2019-04-20T00:40:11Z

+
+**Cons:** Expensive. (This could be mitigated by building the pipeline from the end first as discussed below.)
+
+**Cost Estimate:** 3-4 weeks for the complete pipeline. 7 ± 2 days to implement step 3 (Summarize) and the first component of step 4 (Present) first, enough to support CSF 2019 surveys.


Can you provide a more detailed breakdown of this estimate, or at least the step 3 part? I'd like to see it broken into tasks of 3 days or less. I'd expect time for the Retriever/Modifier/Transformer/Mapper/Reducer implementations, but also time for the adapters you mentioned, buffer for tests and pipeline delays, potentially time for migrating existing survey data and validation afterward, and maybe some communication or training with team members authoring surveys so that they understand our new workflow.

Added rollout plan and specific things we will build in Recommendation section

clareconstantine · 2019-04-22T23:01:03Z

+
+**Pros:** The cheapest option in term of engineering cost to enable. Survey owners can create report themselves. Report is automatically updated (don't need to download data locally and upload to Tableau).
+
+**Cons:** Can only do basic calculation and visualization (not the same level as Tableau or PowerBI). User can not interact with the report. Cannot pass parameter to personalize the report.


I would state this even more strongly - this does not support some of our existing views, for example showing facilitators their averages scores across all workshops they've facilitated. We would have to build an individual report per facilitator which is not feasible.

clareconstantine · 2019-04-22T23:02:37Z

Could you go into a little more detail about what engineering work will be required for each new survey or survey type we want to show results for once option 3 is complete?

islemaster · 2019-04-23T00:24:12Z

+4. Build another implementation of Retriever (in step 3) which will read from the new db created in step 2 & 3.
+5. Build Presenter (in step 4) that can display result from a single query (e.g. summary results of 1 JotForm survey of a specific workshop)
+6. Build Decorator (in step 4) to provide enough information for the Presenter built in step 5.
+7. Switch 2019 surveys to use the new pipeline. It should then be completely independent from the older pipeline.


I'm confused by step 1 "...just enough to support 2019 surveys," and step 7 "Switch 2019 surveys to use the new pipeline." What's the difference?

In the 1st step, 2019 surveys will use a hybrid of older pipeline and newer pipeline (with just 5 components at that time). In 7th step, 2019 surveys completely switch to the newer pipeline.

islemaster · 2019-04-23T00:27:45Z

+  - Compile minimum information needed for the current UI view (`local_summer_workshop_daily_survey/results.jsx`) to display summary results.
+  - Estimate: 1d
+
+Note, we skip Modifier component because it's an optional feature.


For some reason I had the idea that the Modifier was the whole point, because we currently have surveys we can't roll up together with the existing system (e.g. the Organizer Survey Results problem @clareconstantine is working through). But you don't think we need it right away? Can you help me understand what you find valuable about this work without the Modifier?

The main motivation of the new pipeline is to reduce the cost to summarize and present summaries for new surveys. It will support rolls up multiple surveys from multiple workshop also because it processes data at question_unique_name level, not at survey or workshop level.

I wrote a few more in the Motivation section. We could discuss this in person if it isn't clear yet.

hacodeorg · 2019-04-23T00:29:02Z

Could you go into a little more detail about what engineering work will be required for each new survey or survey type we want to show results for once option 3 is complete?

I will add that. (I assume this means when the new pipeline is complete, not just the part to support 2019 surveys.)

islemaster · 2019-04-23T00:29:30Z

+
+## Rollout plan
+1. Build 5 components just enough to support 2019 surveys: Retriever (for current db), Transformer, Mapper, Reducer and Decorator (for current UI view). They are components in step 3 and 4 of the above diagram.
+2. Build Fetcher (step 1) to download survey submissions from JotForm to our database.


I think our existing system more-or-less does exactly this - all JotForm submissions get dropped into one table. Can we just reuse what we've got, for the Fetcher, or are there specific changes you expect we'll need?

Yes, we can potentially reuse most of the code for this part.

One thing that I know we will have to do differently is that the current code purposefully strips away question info from submissions downloaded by GET /form/{id}/submissions. It later links answers back to question info downloaded from another request GET /form/{id}/questions.
The problem with this is it breaks question-answer relationship, making the submission content not self-sufficient. An answer for older-version of a question could be linked to newer-version of the same question.

islemaster · 2019-04-23T00:30:44Z

+
+start
+
+partition 1.Ingest {


Note, I don't think it's a problem with this plan but what you've described doesn't capture the existing placeholder / hydration behavior with JotForm surveys that allows us to note that a submission occurred immediately, before actually receiving the contents of that submission from JotForm on a cronjob later. (I think)

Correct, it is not in the scope of this design. For the purpose of downloading only new submissions, we can use this API https://api.jotform.com/docs/#form-id-submissions with submission range filter like {"id:gt":"31974353596870"}. (We will have to save mapping from jotform_id to last_downloaded_submission_id somewhere.)

Do you know what are other purposes of the placeholder?

islemaster · 2019-04-24T00:28:15Z

Great work on this design and facilitation of discussion Ha! I'm sold, especially to your recommendation of the initial components to build for stories PLC-47 and PLC-48.

For the final format of this design document, I'm fine with this living in the repo, but am interested in seeing a final pass changing it from a proposal with different options into a living design document that captures the design we have agreed upon, with notes about which parts we've implemented so far, and maybe records rejected options in the appendices. Maybe that doesn't happen until we do the initial implementation of this design. Alternatively, if we don't think it's important for this doc to stay in lockstep with our code, it might make sense to move to the GitHub wiki (too bad that's not a better forum for discussion).

hacodeorg · 2019-04-24T00:31:59Z

Oh I like the idea of a living doc. Can we tag certain commit as a released version?

islemaster · 2019-04-24T00:34:43Z

I suppose if you leave it in this repo and update it along with the code, our existing release tags would apply.

hacodeorg added 6 commits April 17, 2019 10:47

Survey summary pipeline first draft

ba5ae6a

add diagram png file

49278c9

Add skeleton for desgin doc [ci skip]

a02cdc4

Update design doc

1b6e38b

Update diagram [ci skip]

96234c7

[ci skip] Update design and diagram

e96be9d

hacodeorg changed the title ~~Design doc for Jotform Survey Summary Pipeline~~ JotForm Survey Summary Pipeline Design Apr 19, 2019

hacodeorg requested review from a user, agealy, breville, clareconstantine and islemaster April 19, 2019 03:05

hacodeorg marked this pull request as ready for review April 19, 2019 03:06

hacodeorg assigned ajosuarez Apr 19, 2019

add design pattern and cost estimation

4263919

islemaster reviewed Apr 20, 2019

View reviewed changes

Ha Nguyen and others added 4 commits April 22, 2019 11:06

Merge branch 'staging' into ha/survey-design

eadf7fb

add timeline

88aff55

add estimation

eafc28a

detail desgin for Modifier

e1fec85

hacodeorg requested a review from islemaster April 22, 2019 22:14

clareconstantine reviewed Apr 22, 2019

View reviewed changes

islemaster reviewed Apr 23, 2019

View reviewed changes

hacodeorg added 2 commits April 23, 2019 12:11

fix pr comments

58b741d

update text

e15adc2

hacodeorg added 2 commits April 24, 2019 15:29

change file names

c97894a

Move files to docs folder

87a4318

agealy approved these changes Apr 26, 2019

View reviewed changes

clareconstantine approved these changes Apr 26, 2019

View reviewed changes

hacodeorg merged commit b915770 into staging Apr 26, 2019

hacodeorg deleted the ha/survey-design branch May 23, 2019 00:17



		# Recommendation
		Option 3. Build a new generic survey pipeline, starting from step 3(Summarize module) first and create adapters to plug it in the current pipeline. This approach allows us to take advantage of existing work and support 2019 CSF survey first. Then, gradually build out the rest of the pipeline later. No newline at end of file


		Cons: Expensive. (This could be mitigated by building the pipeline from the end first as discussed below.)

		Cost Estimate: 3-4 weeks for the complete pipeline. 7 ± 2 days to implement step 3 (Summarize) and the first component of step 4 (Present) first, enough to support CSF 2019 surveys.


		Pros: The cheapest option in term of engineering cost to enable. Survey owners can create report themselves. Report is automatically updated (don't need to download data locally and upload to Tableau).

		Cons: Can only do basic calculation and visualization (not the same level as Tableau or PowerBI). User can not interact with the report. Cannot pass parameter to personalize the report.


		start

		partition 1.Ingest {

Conversation

hacodeorg commented Apr 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

agealy commented Apr 19, 2019

Uh oh!

hacodeorg commented Apr 19, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

islemaster commented Apr 20, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clareconstantine commented Apr 22, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hacodeorg commented Apr 23, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

islemaster commented Apr 24, 2019

Uh oh!

hacodeorg commented Apr 24, 2019

Uh oh!

islemaster commented Apr 24, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

hacodeorg commented Apr 17, 2019 •

edited

Loading