Skip to content

JotForm Survey Summary Pipeline Design#28071

Merged
hacodeorg merged 15 commits into
stagingfrom
ha/survey-design
Apr 26, 2019
Merged

JotForm Survey Summary Pipeline Design#28071
hacodeorg merged 15 commits into
stagingfrom
ha/survey-design

Conversation

@hacodeorg

@hacodeorg hacodeorg commented Apr 17, 2019

Copy link
Copy Markdown
Contributor

https://codedotorg.atlassian.net/browse/PLC-237

To see friendly format of the design doc: File changed -> survey_summary_design.md -> View file

You can comment on specific lines here. I'm not sure if there is a better way to do this yet.

This format is an experiment to open our design publicly just like our code, and also to compare Github to Google docs for gathering design feedback.

@hacodeorg hacodeorg changed the title Design doc for Jotform Survey Summary Pipeline JotForm Survey Summary Pipeline Design Apr 19, 2019
@hacodeorg hacodeorg marked this pull request as ready for review April 19, 2019 03:06
@agealy

agealy commented Apr 19, 2019

Copy link
Copy Markdown

@hacodeorg do you have a rough cost estimate for approach #3? both
a. starting with step 3 and building only enough to support 2019 surveys
b. building the complete pipeline to support 2019 surveys and be readily extensible for future survey additions

@hacodeorg

Copy link
Copy Markdown
Contributor Author

I will add a ballpark estimation. However, I'm not quite confident with my estimation right now. Looking for the team help to estimate it.

@hacodeorg do you have a rough cost estimate for approach #3? both
a. starting with step 3 and building only enough to support 2019 surveys
b. building the complete pipeline to support 2019 surveys and be readily extensible for future survey additions

:Modifier;
note right
Modify question and answer data to make them aggregatable.
E.g. update question unique name or convert value range to make values consistent.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like more detail on this step.

  • How does the system know what modifications to perform?
  • Who writes these modification rules?
  • Are we encoding these rules as checked-in code, or is this configuration in the same way that survey ids are configuration?
  • When are the rules written, or how do we find out that new rules need to be written? Is there any user impact in a possible delay here?
  • How do these rules propagate across environments?
  • Can we present some specific example rules?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to Appendix.



# Recommendation
Option 3. Build a new generic survey pipeline, starting from step 3(Summarize module) first and create adapters to plug it in the current pipeline. This approach allows us to take advantage of existing work and support 2019 CSF survey first. Then, gradually build out the rest of the pipeline later. No newline at end of file

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this recommendation - this is necessary work, but we probably can't do it all at once, so building a small part at a time seems good. Can we get any more specific on what we'd build first? Step 3 is fairly large.

@islemaster

Copy link
Copy Markdown
Contributor

I'd like to see a rollout plan as well: Do we move existing surveys over to the new system, or continue supporting the old system indefinitely? What does that process look like? How are we planning to ensure a smooth transition?


**Cons:** Expensive. (This could be mitigated by building the pipeline from the end first as discussed below.)

**Cost Estimate:** 3-4 weeks for the complete pipeline. 7 ± 2 days to implement step 3 (Summarize) and the first component of step 4 (Present) first, enough to support CSF 2019 surveys.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide a more detailed breakdown of this estimate, or at least the step 3 part? I'd like to see it broken into tasks of 3 days or less. I'd expect time for the Retriever/Modifier/Transformer/Mapper/Reducer implementations, but also time for the adapters you mentioned, buffer for tests and pipeline delays, potentially time for migrating existing survey data and validation afterward, and maybe some communication or training with team members authoring surveys so that they understand our new workflow.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added rollout plan and specific things we will build in Recommendation section

@hacodeorg hacodeorg requested a review from islemaster April 22, 2019 22:14

**Pros:** The cheapest option in term of engineering cost to enable. Survey owners can create report themselves. Report is automatically updated (don't need to download data locally and upload to Tableau).

**Cons:** Can only do basic calculation and visualization (not the same level as Tableau or PowerBI). User can not interact with the report. Cannot pass parameter to personalize the report.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would state this even more strongly - this does not support some of our existing views, for example showing facilitators their averages scores across all workshops they've facilitated. We would have to build an individual report per facilitator which is not feasible.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@clareconstantine

Copy link
Copy Markdown

Could you go into a little more detail about what engineering work will be required for each new survey or survey type we want to show results for once option 3 is complete?

4. Build another implementation of Retriever (in step 3) which will read from the new db created in step 2 & 3.
5. Build Presenter (in step 4) that can display result from a single query (e.g. summary results of 1 JotForm survey of a specific workshop)
6. Build Decorator (in step 4) to provide enough information for the Presenter built in step 5.
7. Switch 2019 surveys to use the new pipeline. It should then be completely independent from the older pipeline.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by step 1 "...just enough to support 2019 surveys," and step 7 "Switch 2019 surveys to use the new pipeline." What's the difference?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the 1st step, 2019 surveys will use a hybrid of older pipeline and newer pipeline (with just 5 components at that time). In 7th step, 2019 surveys completely switch to the newer pipeline.

- Compile minimum information needed for the current UI view (`local_summer_workshop_daily_survey/results.jsx`) to display summary results.
- Estimate: 1d

Note, we skip Modifier component because it's an optional feature.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason I had the idea that the Modifier was the whole point, because we currently have surveys we can't roll up together with the existing system (e.g. the Organizer Survey Results problem @clareconstantine is working through). But you don't think we need it right away? Can you help me understand what you find valuable about this work without the Modifier?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main motivation of the new pipeline is to reduce the cost to summarize and present summaries for new surveys. It will support rolls up multiple surveys from multiple workshop also because it processes data at question_unique_name level, not at survey or workshop level.

I wrote a few more in the Motivation section. We could discuss this in person if it isn't clear yet.

@hacodeorg

Copy link
Copy Markdown
Contributor Author

Could you go into a little more detail about what engineering work will be required for each new survey or survey type we want to show results for once option 3 is complete?

I will add that. (I assume this means when the new pipeline is complete, not just the part to support 2019 surveys.)


## Rollout plan
1. Build 5 components just enough to support 2019 surveys: Retriever (for current db), Transformer, Mapper, Reducer and Decorator (for current UI view). They are components in step 3 and 4 of the above diagram.
2. Build Fetcher (step 1) to download survey submissions from JotForm to our database.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think our existing system more-or-less does exactly this - all JotForm submissions get dropped into one table. Can we just reuse what we've got, for the Fetcher, or are there specific changes you expect we'll need?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can potentially reuse most of the code for this part.

One thing that I know we will have to do differently is that the current code purposefully strips away question info from submissions downloaded by GET /form/{id}/submissions. It later links answers back to question info downloaded from another request GET /form/{id}/questions.
The problem with this is it breaks question-answer relationship, making the submission content not self-sufficient. An answer for older-version of a question could be linked to newer-version of the same question.


start

partition 1.Ingest {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, I don't think it's a problem with this plan but what you've described doesn't capture the existing placeholder / hydration behavior with JotForm surveys that allows us to note that a submission occurred immediately, before actually receiving the contents of that submission from JotForm on a cronjob later. (I think)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, it is not in the scope of this design. For the purpose of downloading only new submissions, we can use this API https://api.jotform.com/docs/#form-id-submissions with submission range filter like {"id:gt":"31974353596870"}. (We will have to save mapping from jotform_id to last_downloaded_submission_id somewhere.)

Do you know what are other purposes of the placeholder?

@islemaster

Copy link
Copy Markdown
Contributor

Great work on this design and facilitation of discussion Ha! I'm sold, especially to your recommendation of the initial components to build for stories PLC-47 and PLC-48.

For the final format of this design document, I'm fine with this living in the repo, but am interested in seeing a final pass changing it from a proposal with different options into a living design document that captures the design we have agreed upon, with notes about which parts we've implemented so far, and maybe records rejected options in the appendices. Maybe that doesn't happen until we do the initial implementation of this design. Alternatively, if we don't think it's important for this doc to stay in lockstep with our code, it might make sense to move to the GitHub wiki (too bad that's not a better forum for discussion).

@hacodeorg

Copy link
Copy Markdown
Contributor Author

Oh I like the idea of a living doc. Can we tag certain commit as a released version?

@islemaster

Copy link
Copy Markdown
Contributor

I suppose if you leave it in this repo and update it along with the code, our existing release tags would apply.

@hacodeorg hacodeorg merged commit b915770 into staging Apr 26, 2019
@hacodeorg hacodeorg deleted the ha/survey-design branch May 23, 2019 00:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants