JotForm Survey Summary Pipeline Design#28071
Conversation
|
@hacodeorg do you have a rough cost estimate for approach #3? both |
|
I will add a ballpark estimation. However, I'm not quite confident with my estimation right now. Looking for the team help to estimate it.
|
| :Modifier; | ||
| note right | ||
| Modify question and answer data to make them aggregatable. | ||
| E.g. update question unique name or convert value range to make values consistent. |
There was a problem hiding this comment.
I'd like more detail on this step.
- How does the system know what modifications to perform?
- Who writes these modification rules?
- Are we encoding these rules as checked-in code, or is this configuration in the same way that survey ids are configuration?
- When are the rules written, or how do we find out that new rules need to be written? Is there any user impact in a possible delay here?
- How do these rules propagate across environments?
- Can we present some specific example rules?
There was a problem hiding this comment.
Added to Appendix.
|
|
||
|
|
||
| # Recommendation | ||
| Option 3. Build a new generic survey pipeline, starting from step 3(Summarize module) first and create adapters to plug it in the current pipeline. This approach allows us to take advantage of existing work and support 2019 CSF survey first. Then, gradually build out the rest of the pipeline later. No newline at end of file |
There was a problem hiding this comment.
I agree with this recommendation - this is necessary work, but we probably can't do it all at once, so building a small part at a time seems good. Can we get any more specific on what we'd build first? Step 3 is fairly large.
|
I'd like to see a rollout plan as well: Do we move existing surveys over to the new system, or continue supporting the old system indefinitely? What does that process look like? How are we planning to ensure a smooth transition? |
|
|
||
| **Cons:** Expensive. (This could be mitigated by building the pipeline from the end first as discussed below.) | ||
|
|
||
| **Cost Estimate:** 3-4 weeks for the complete pipeline. 7 ± 2 days to implement step 3 (Summarize) and the first component of step 4 (Present) first, enough to support CSF 2019 surveys. |
There was a problem hiding this comment.
Can you provide a more detailed breakdown of this estimate, or at least the step 3 part? I'd like to see it broken into tasks of 3 days or less. I'd expect time for the Retriever/Modifier/Transformer/Mapper/Reducer implementations, but also time for the adapters you mentioned, buffer for tests and pipeline delays, potentially time for migrating existing survey data and validation afterward, and maybe some communication or training with team members authoring surveys so that they understand our new workflow.
There was a problem hiding this comment.
Added rollout plan and specific things we will build in Recommendation section
|
|
||
| **Pros:** The cheapest option in term of engineering cost to enable. Survey owners can create report themselves. Report is automatically updated (don't need to download data locally and upload to Tableau). | ||
|
|
||
| **Cons:** Can only do basic calculation and visualization (not the same level as Tableau or PowerBI). User can not interact with the report. Cannot pass parameter to personalize the report. |
There was a problem hiding this comment.
I would state this even more strongly - this does not support some of our existing views, for example showing facilitators their averages scores across all workshops they've facilitated. We would have to build an individual report per facilitator which is not feasible.
|
Could you go into a little more detail about what engineering work will be required for each new survey or survey type we want to show results for once option 3 is complete? |
| 4. Build another implementation of Retriever (in step 3) which will read from the new db created in step 2 & 3. | ||
| 5. Build Presenter (in step 4) that can display result from a single query (e.g. summary results of 1 JotForm survey of a specific workshop) | ||
| 6. Build Decorator (in step 4) to provide enough information for the Presenter built in step 5. | ||
| 7. Switch 2019 surveys to use the new pipeline. It should then be completely independent from the older pipeline. |
There was a problem hiding this comment.
I'm confused by step 1 "...just enough to support 2019 surveys," and step 7 "Switch 2019 surveys to use the new pipeline." What's the difference?
There was a problem hiding this comment.
In the 1st step, 2019 surveys will use a hybrid of older pipeline and newer pipeline (with just 5 components at that time). In 7th step, 2019 surveys completely switch to the newer pipeline.
| - Compile minimum information needed for the current UI view (`local_summer_workshop_daily_survey/results.jsx`) to display summary results. | ||
| - Estimate: 1d | ||
|
|
||
| Note, we skip Modifier component because it's an optional feature. |
There was a problem hiding this comment.
For some reason I had the idea that the Modifier was the whole point, because we currently have surveys we can't roll up together with the existing system (e.g. the Organizer Survey Results problem @clareconstantine is working through). But you don't think we need it right away? Can you help me understand what you find valuable about this work without the Modifier?
There was a problem hiding this comment.
The main motivation of the new pipeline is to reduce the cost to summarize and present summaries for new surveys. It will support rolls up multiple surveys from multiple workshop also because it processes data at question_unique_name level, not at survey or workshop level.
I wrote a few more in the Motivation section. We could discuss this in person if it isn't clear yet.
I will add that. (I assume this means when the new pipeline is complete, not just the part to support 2019 surveys.) |
|
|
||
| ## Rollout plan | ||
| 1. Build 5 components just enough to support 2019 surveys: Retriever (for current db), Transformer, Mapper, Reducer and Decorator (for current UI view). They are components in step 3 and 4 of the above diagram. | ||
| 2. Build Fetcher (step 1) to download survey submissions from JotForm to our database. |
There was a problem hiding this comment.
I think our existing system more-or-less does exactly this - all JotForm submissions get dropped into one table. Can we just reuse what we've got, for the Fetcher, or are there specific changes you expect we'll need?
There was a problem hiding this comment.
Yes, we can potentially reuse most of the code for this part.
One thing that I know we will have to do differently is that the current code purposefully strips away question info from submissions downloaded by GET /form/{id}/submissions. It later links answers back to question info downloaded from another request GET /form/{id}/questions.
The problem with this is it breaks question-answer relationship, making the submission content not self-sufficient. An answer for older-version of a question could be linked to newer-version of the same question.
|
|
||
| start | ||
|
|
||
| partition 1.Ingest { |
There was a problem hiding this comment.
Note, I don't think it's a problem with this plan but what you've described doesn't capture the existing placeholder / hydration behavior with JotForm surveys that allows us to note that a submission occurred immediately, before actually receiving the contents of that submission from JotForm on a cronjob later. (I think)
There was a problem hiding this comment.
Correct, it is not in the scope of this design. For the purpose of downloading only new submissions, we can use this API https://api.jotform.com/docs/#form-id-submissions with submission range filter like {"id:gt":"31974353596870"}. (We will have to save mapping from jotform_id to last_downloaded_submission_id somewhere.)
Do you know what are other purposes of the placeholder?
|
Great work on this design and facilitation of discussion Ha! I'm sold, especially to your recommendation of the initial components to build for stories PLC-47 and PLC-48. For the final format of this design document, I'm fine with this living in the repo, but am interested in seeing a final pass changing it from a proposal with different options into a living design document that captures the design we have agreed upon, with notes about which parts we've implemented so far, and maybe records rejected options in the appendices. Maybe that doesn't happen until we do the initial implementation of this design. Alternatively, if we don't think it's important for this doc to stay in lockstep with our code, it might make sense to move to the GitHub wiki (too bad that's not a better forum for discussion). |
|
Oh I like the idea of a living doc. Can we tag certain commit as a released version? |
|
I suppose if you leave it in this repo and update it along with the code, our existing release tags would apply. |
https://codedotorg.atlassian.net/browse/PLC-237
To see friendly format of the design doc: File changed -> survey_summary_design.md -> View file
You can comment on specific lines here. I'm not sure if there is a better way to do this yet.
This format is an experiment to open our design publicly just like our code, and also to compare Github to Google docs for gathering design feedback.