forked from radovankavicky/dash-docs
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathsharing_state.py
More file actions
542 lines (442 loc) · 20.1 KB
/
Copy pathsharing_state.py
File metadata and controls
542 lines (442 loc) · 20.1 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
# -*- coding: utf-8 -*-
from textwrap import dedent as s
import dash_core_components as dcc
import dash_html_components as html
from components import Example, Syntax
import tools
examples = {
'filesystem-session-cache': tools.load_example(
'tutorial/examples/sharing_state_filesystem_sessions.py'
)
}
layout = html.Div([
html.H1('Sharing State Between Callbacks'),
dcc.Markdown(s('''
One of the core Dash principles explained in the
[Getting Started Guide on Callbacks](/getting-started-part-2)
is that **Dash Callbacks must never modify variables outside of their
scope**. It is not safe to modify any `global` variables.
This chapter explains why and provides some alternative patterns for
sharing state between callbacks.
## Why Share State?
In some apps, you may have multiple callbacks that depend on expensive data
processing tasks like making SQL queries,
running simulations, or downloading data.
Rather than have each callback run the same expensive task,
you can have one callback run the task and then share the
results to the rest of the callbacks.
''')),
dcc.Markdown(s('''
## Why `global` variables will break your app
Dash is designed to work in multi-user environments
where multiple people may view the application at the
same time and will have **independent sessions**.
If your app uses modified `global` variables,
then one user's session could set the variable to one value
which would affect the next user's session.
Dash is also designed to be able to run with **multiple python
workers** so that callbacks can be executed in parallel.
This is commonly done with `gunicorn` using syntax like
```
$ gunicorn --workers 4 app:server
```
(`app` refers to a file named `app.py` and `server` refers to a variable
in that file named `server`: `server = app.server`).
When Dash apps run across multiple workers, their memory
_is not shared_. This means that if you modify a global
variable in one callback, that modification will not be
applied to the rest of the workers.
***
''')),
Syntax('''df = pd.DataFrame({
'a': [1, 2, 3],
'b': [4, 1, 4],
'c': ['x', 'y', 'z'],
})
app.layout = html.Div([
dcc.Dropdown(
id='dropdown',
options=[{'label': i, 'value': i} for i in df['c'].unique()],
value='a'
),
html.Div(id='output'),
])
@app.callback(Output('output', 'children'),
[Input('dropdown', 'value')])
def update_output_1(value):
# Here, `df` is an example of a variable that is
# "outside the scope of this function".
# *It is not safe to modify or reassign this variable
# inside this callback.*
global df = df[df['c'] == value] # do not do this, this is not safe!
return len(df)
''', summary='''
Here is a sketch of an app with a callback that modifies data
out of it's scope. This type of pattern *will not work reliably*
for the reasons outlined above.'''),
Syntax('''df = pd.DataFrame({
'a': [1, 2, 3],
'b': [4, 1, 4],
'c': ['x', 'y', 'z'],
})
app.layout = html.Div([
dcc.Dropdown(
id='dropdown',
options=[{'label': i, 'value': i} for i in df['c'].unique()],
value='a'
),
html.Div(id='output'),
])
@app.callback(Output('output', 'children'),
[Input('dropdown', 'value')])
def update_output_1(value):
# Safely reassign the filter to a new variable
filtered_df = df[df['c'] == value]
return len(filtered_df)
''', summary='''
To fix this example, simply re-assign the filter to a new
variable inside the callback or follow one of the strategies
outlined in the next part of this guide.'''),
dcc.Markdown(s('''
## Sharing Data Between Callbacks
In order for to share data safely across multiple python
processes, we need to store the data somewhere that is accessible to
each of the processes.
There are 3 main places to store this data:
1 - In the user's browser session
2 - On the disk (e.g. on a file or on a new database)
3 - In a shared memory space like with Redis
The following three examples illustrate these approaches.
## Example 1 - Storing Data in the Browser with a Hidden Div
To save data in user's browser's session:
- Implemented by saving the data as part of Dash's front-end store
through methods explained in
[https://community.plot.ly/t/sharing-a-dataframe-between-plots/6173](https://community.plot.ly/t/sharing-a-dataframe-between-plots/6173)
- Data has to be converted to a string like JSON for storage and transport
- Data that is cached in this way will _only be available in the
user's current session_.
- If you open up a new browser, the app's callbacks will always
compute the data. The data is only cached and transported between
callbacks within the session.
- As such, unlike with caching, this method doesn't increase the
memory footprint of the app.
- There could be a cost in network transport. If your sharing 10MB
of data between callbacks, then that data will be transported over
the network between each callback.
- If the network cost is too high, then compute the aggregations
upfront and transport those.
Your app likely won't be displaying 10MB of data,
it will just be displaying a subset or an aggregation of it.
''')),
Syntax(
summary=('''
This example outlines how you can perform an expensive data processing
step in one callback, serialize the output at JSON, and provide it
as an input to the other callbacks. This example uses standard dash
callbacks and stores the JSON-ified data inside a hidden div in
the app.
'''), children=s('''
global_df = pd.read_csv('...')
app.layout = html.Div([
dcc.Graph(id='graph'),
html.Table(id='table'),
dcc.Dropdown(id='dropdown'),
# Hidden div inside the app that stores the intermediate value
html.Div(id='intermediate-value', style={'display': 'none'})
])
@app.callback(Output('intermediate-value', 'children'), [Input('dropdown', 'value')])
def clean_data(value):
# some expensive clean data step
cleaned_df = your_expensive_clean_or_compute_step(value)
# more generally, this line would be
# json.dumps(cleaned_df)
return cleaned_df.to_json(date_format='iso', orient='split')
@app.callback(Output('graph', 'figure'), [Input('intermediate-value', 'children')])
def update_graph(jsonified_cleaned_data):
# more generally, this line would be
# json.loads(jsonified_cleaned_data)
dff = pd.read_json(jsonified_cleaned_data, orient='split')
figure = create_figure(dff)
return figure
@app.callback(Output('table', 'children'), [Input('intermediate-value', 'children')])
def update_table(jsonified_cleaned_data):
dff = pd.read_json(jsonified_cleaned_data, orient='split')
table = create_table(dff)
return table
''')),
dcc.Markdown(s('''
***
## Example 2 - Computing Aggregations Upfront
Sending the computed data over the network can be expensive if
the data is large. In some cases, serializing this data and JSON
can also be expensive.
In many cases, your app will only display a subset or an aggregation
of the computed or filtered data. In these cases, you could precompute
your aggregations in your data processing callback and transport these
aggregations to the remaining callbacks.
''')),
Syntax(children=s('''
@app.callback(
Output('intermediate-value', 'children'),
[Input('dropdown', 'value')])
def clean_data(value):
# an expensive query step
cleaned_df = your_expensive_clean_or_compute_step(value)
# a few filter steps that compute the data
# as it's needed in the future callbacks
df_1 = cleaned_df[cleaned_df['fruit'] == 'apples']
df_2 = cleaned_df[cleaned_df['fruit'] == 'oranges']
df_3 = cleaned_df[cleaned_df['fruit'] == 'figs']
datasets = {
'df_1': df_1.to_json(orient='split', date_format='iso'),
'df_2': df_2.to_json(orient='split', date_format='iso'),
'df_3': df_3.to_json(orient='split', date_format='iso'),
}
return json.dumps(datasets)
@app.callback(
Output('graph', 'figure'),
[Input('intermediate-value', 'children')])
def update_graph_1(jsonified_cleaned_data):
datasets = json.loads(jsonified_cleaned_data)
dff = pd.read_json(datasets['df_1'], orient='split')
figure = create_figure_1(dff)
return figure
@app.callback(
Output('graph', 'figure'),
[Input('intermediate-value', 'children')])
def update_graph_2(jsonified_cleaned_data):
datasets = json.loads(jsonified_cleaned_data)
dff = pd.read_json(datasets['df_2'], orient='split')
figure = create_figure_2(dff)
return figure
@app.callback(
Output('graph', 'figure'),
[Input('intermediate-value', 'children')])
def update_graph_3(jsonified_cleaned_data):
datasets = json.loads(jsonified_cleaned_data)
dff = pd.read_json(datasets['df_3'], orient='split')
figure = create_figure_3(dff)
return figure
'''), summary='''Here's a simple example of how you might transport
filtered or aggregated data to multiple callbacks.
'''),
dcc.Markdown(s('''
***
## Example 3 - Caching and Signaling
This example:
- Uses Redis via Flask-Cache for storing “global variables”.
This data is accessed through a function who’s output is
cached and keyed by its input arguments.
- Uses the hidden div solution to send a signal to the other
callbacks when the expensive computation is complete
- Note that instead of Redis, you could also save this to the file
system. See https://flask-caching.readthedocs.io/en/latest/
for more details.
- This “signaling” is cool because it allows the expensive
computation to only take up one process.
Without this type of signaling, each callback could end up
computing the expensive computation in parallel,
locking 4 processes instead of 1.
This approach also has the advantage that future sessions
use the pre-computed value.
This will work well for apps that have a small number of inputs.
Here’s what this example looks like. Some things to note:
- I’ve simulated an expensive process by using a time.sleep(5).
- When the app loads, it takes 5 seconds to render all 4 graphs
- The initial computation only blocks 1 process
- Once the computation is complete, the signal is sent and 4 callbacks
are executed in parallel to render the graphs.
Each of these callbacks retrieves the data from the
“global store”: the redis or filesystem cache.
- I’ve set processes=6 in app.run_server so that multiple callbacks
can be executed in parallel. In production, this is done with
something like $ gunicorn --workers 6 --threads 2 app:server
- Selecting a value in the dropdown will take less than 5 seconds
if it has already been selected in the past.
This is because the value is being pulled from the cache.
- Similarly, reloading the page or opening the app in a new window
is also fast because the initial state and the initial expensive
computation has already been computed.
''')),
html.Div(
children=html.Img(
src='https://user-images.githubusercontent.com/1280389/31468665-bf1b6026-aeac-11e7-9388-d9a5e71d964e.gif',
alt='Example of a Dash App that uses Caching'
),
className="gallery"
),
Syntax(summary="Here's what this example looks like in code",
children=s('''
import os
import copy
import time
import datetime
import dash
import dash_core_components as dcc
import dash_html_components as html
import numpy as np
import pandas as pd
from dash.dependencies import Input, Output
from flask_caching import Cache
app = dash.Dash(__name__)
CACHE_CONFIG = {
# try 'filesystem' if you don't want to setup redis
'CACHE_TYPE': 'redis',
'CACHE_REDIS_URL': os.environ.get('REDIS_URL', 'localhost:6379')
}
cache = Cache()
cache.init_app(app.server, config=CACHE_CONFIG)
N = 100
df = pd.DataFrame({
'category': (
(['apples'] * 5 * N) +
(['oranges'] * 10 * N) +
(['figs'] * 20 * N) +
(['pineapples'] * 15 * N)
)
})
df['x'] = np.random.randn(len(df['category']))
df['y'] = np.random.randn(len(df['category']))
app.layout = html.Div([
dcc.Dropdown(
id='dropdown',
options=[{'label': i, 'value': i} for i in df['category'].unique()],
value='apples'
),
html.Div([
html.Div(dcc.Graph(id='graph-1'), className="six columns"),
html.Div(dcc.Graph(id='graph-2'), className="six columns"),
], className="row"),
html.Div([
html.Div(dcc.Graph(id='graph-3'), className="six columns"),
html.Div(dcc.Graph(id='graph-4'), className="six columns"),
], className="row"),
# hidden signal value
html.Div(id='signal', style={'display': 'none'})
])
# perform expensive computations in this "global store"
# these computations are cached in a globally available
# redis memory store which is available across processes
# and for all time.
@cache.memoize()
def global_store(value):
# simulate expensive query
print('Computing value with {}'.format(value))
time.sleep(5)
return df[df['category'] == value]
def generate_figure(value, figure):
fig = copy.deepcopy(figure)
filtered_dataframe = global_store(value)
fig['data'][0]['x'] = filtered_dataframe['x']
fig['data'][0]['y'] = filtered_dataframe['y']
fig['layout'] = {'margin': {'l': 20, 'r': 10, 'b': 20, 't': 10}}
return fig
@app.callback(Output('signal', 'children'), [Input('dropdown', 'value')])
def compute_value(value):
# compute value and send a signal when done
global_store(value)
return value
@app.callback(Output('graph-1', 'figure'), [Input('signal', 'children')])
def update_graph_1(value):
# generate_figure gets data from `global_store`.
# the data in `global_store` has already been computed
# by the `compute_value` callback and the result is stored
# in the global redis cached
return generate_figure(value, {
'data': [{
'type': 'scatter',
'mode': 'markers',
'marker': {
'opacity': 0.5,
'size': 14,
'line': {'border': 'thin darkgrey solid'}
}
}]
})
@app.callback(Output('graph-2', 'figure'), [Input('signal', 'children')])
def update_graph_2(value):
return generate_figure(value, {
'data': [{
'type': 'scatter',
'mode': 'lines',
'line': {'shape': 'spline', 'width': 0.5},
}]
})
@app.callback(Output('graph-3', 'figure'), [Input('signal', 'children')])
def update_graph_3(value):
return generate_figure(value, {
'data': [{
'type': 'histogram2d',
}]
})
@app.callback(Output('graph-4', 'figure'), [Input('signal', 'children')])
def update_graph_4(value):
return generate_figure(value, {
'data': [{
'type': 'histogram2dcontour',
}]
})
# Dash CSS
app.css.append_css({
"external_url": "https://codepen.io/chriddyp/pen/bWLwgP.css"})
# Loading screen CSS
app.css.append_css({
"external_url": "https://codepen.io/chriddyp/pen/brPBPO.css"})
if __name__ == '__main__':
app.run_server(debug=True, processes=6)
'''
)),
dcc.Markdown(s('''
***
## Example 4 - User-Based Session Data on the Server
The previous example cached computations on the filesystem and
those computations were accessible for all users.
In some cases, you want to keep the data isolated to user sessions:
one user's derived data shouldn't update the next user's derived data.
One way to do this is to save the data in a hidden `Div`,
as demonstrated in the first example.
Another way to do this is to save the data on the
filesystem cache with with a seession id and reference the data
using that session id. In this method, since data is saved on the server,
instead of transported over the network, it is generally faster than the
"hidden div" method.
This example was originally discussed in a
[Dash Community Forum thread](https://community.plot.ly/t/capture-window-tab-closing-event/7375/2?u=chriddyp).
This example:
- Caches data using the `flask_caching` filesystem cache. You can also save to an in-memory database like Redis..
- Serializes the data as JSON.
- If you are using Pandas, consider serializing
with Apache Arrow. [Community thread](https://community.plot.ly/t/fast-way-to-share-data-between-callbacks/8024/2)
- Saves session data up to the number of expected concurrent users.
This prevents the cache from being overfilled with data.
- Creates unique session IDs by embedding a hidden random string into
the app's layout and serving a unique layout on every page load.
> Note: As with all examples that send data to the client, be aware
> that these sessions aren't necessarily secure or encrypted.
> These session IDs may be vulnerable to
> [Session Fixation](https://en.wikipedia.org/wiki/Session_fixation)
> style attacks.
''')),
Syntax(
examples['filesystem-session-cache'][0],
summary="Here's what this example looks like in code"
),
html.Div(
children=html.Img(
src='https://user-images.githubusercontent.com/1280389/37941518-8f47b71a-313c-11e8-8b00-80ffbb012c4a.gif',
alt='Example of a Dash App that uses User Session Caching'
)
),
dcc.Markdown(s('''
There are three things to notice in this example:
- The timestamps of the dataframe don't update when we retrieve
the data. This data is cached as part of the user's session.
- Retrieving the data initially takes 5 seconds but successive queries
are instant, as the data has been cached.
- The second session displays different data than the first session:
the data that is shared between callbacks is isolated to individual
user sessions.
Questions? Discuss these examples on the
[Dash Community Forum](https://community.plot.ly/c/dash)
'''))
])