@@ -9,51 +9,65 @@ Spline regression
99 :suppress:
1010
1111 import numpy as np
12- from patsy import build_design_matrices, incr_dbuilder
12+ from patsy import dmatrix, build_design_matrices
1313
1414 Patsy offers a set of specific stateful transforms (for more details about
1515stateful transforms see :ref: `stateful-transforms `) that you can use in
16- formulas to generate splines basis and express non-linear fits.
16+ formulas to generate splines bases and express non-linear fits.
1717
1818B-splines
1919---------
2020
21- B-spline basis can be generated with the :func: `bs ` stateful transform.
22- The following figure illustrates a typical basis and the resulting spline:
21+ B-spline bases can be generated with the :func: `bs ` stateful transform.
22+ The following code illustrates a typical basis and the resulting spline:
23+
24+
25+ .. ipython :: python
26+
27+ import matplotlib.pyplot as plt
28+ plt.title(" B-spline basis example (degree=3)" );
29+ x = np.linspace(0 ., 1 ., 100 )
30+ y = dmatrix(" bs(x, df=6, degree=3, include_intercept=True) - 1" , {" x" : x})
31+ # Define some coefficients
32+ b = np.array([1.3 , 0.6 , 0.9 , 0.4 , 1.6 , 0.7 ])
33+ # Plot B-spline basis functions (colored curves) each multiplied by its coeff
34+ plt.plot(x, y* b);
35+ @savefig basis -bspline.png align=center
36+ # Plot the spline itself (sum of the basis functions, thick black curve)
37+ plt.plot(x, np.dot(y, b), color = ' k' , linewidth = 3 );
2338
24- .. figure :: figures/bspline-basis.png
25- :align: center
2639
27- Illustration of (cubic) B-spline basis functions (colored curves) each multiplied
28- by an associated coefficient, and the spline itself (sum of the basis
29- functions, thick black curve). The basis functions were obtained
30- using ``bs(x, df=6, degree=3, include_intercept=True) ``.
3140
3241 In the following example we first set up our B-spline basis using some data and
3342then make predictions on a new set of data:
3443
3544.. ipython :: python
3645
37- x = np.linspace(0 ., 1 ., 100 )
38- data_chunked = [{ " x " : x[: 50 ]}, { " x " : x[ 50 :]}]
46+ data = { " x " : np.linspace(0 ., 1 ., 100 )}
47+ design_matrix = dmatrix( " bs(x, df=4) " , data)
3948
40- builder = incr_dbuilder( " bs(x, df=4) " ,
41- lambda : iter (data_chunked))
49+ new_data = { " x " : [ 0.1 , 0.25 , 0.9 ]}
50+ build_design_matrices([design_matrix.design_info.builder], new_data)[ 0 ]
4251
43- new_x = np.array([0.1 , 0.25 , 0.9 ])
44- new_data = {" x" : new_x}
45- design_matrix = build_design_matrices([builder], new_data)[0 ]
46- design_matrix
4752
4853 Cubic regression splines
4954------------------------
5055
5156Natural and cyclic cubic regression splines are provided through the stateful
5257transforms :func: `cr ` and :func: `cc ` respectively. Here the spline is
53- parameterized directly using its values at the knots. These splines are fully
54- compatible with those found in the R package
58+ parameterized directly using its values at the knots. These splines were designed
59+ to be compatible with those found in the R package
5560`mgcv <http://cran.r-project.org/web/packages/mgcv/index.html >`_
5661(these are called *cr *, *cs * and *cc * in the context of *mgcv *).
62+
63+ .. warning ::
64+ Note that the compatibility with *mgcv * applies only to the **generation of
65+ spline bases **: we do not implement any kind of *mgcv *-compatible penalized
66+ fitting process.
67+ Thus these spline bases can be used to precisely reproduce
68+ predictions from a model previously fitted with *mgcv *, or to serve as
69+ building blocks for other regression models (like OLS).
70+
5771Note that the API is different from *mgcv *:
5872
5973* In patsy one can specify the number of degrees of freedom directly (actual number of
@@ -65,44 +79,54 @@ Note that the API is different from *mgcv*:
6579 which can be useful for cyclic spline for instance.
6680* In *mgcv * a centering/identifiability constraint is automatically computed and
6781 absorbed in the resulting design matrix.
68- The purpose of this is to ensure that if ``b `` is the array of fitted coefficients, our
69- model is centered, ie ``np.mean(np.dot(design_matrix, b)) `` is zero.
82+ The purpose of this is to ensure that if ``b `` is the array of *initial * parameters
83+ (corresponding to the *initial * unconstrained design matrix ``dm ``), our
84+ model is centered, ie ``np.mean(np.dot(dm, b)) `` is zero.
7085 We can rewrite this as ``np.dot(c, b) `` being zero with ``c `` a 1-row
71- constraint matrix containing the mean of each column of ``design_matrix ``.
72- Absorbing this constraint in the final design matrix means that we rewrite the model
73- in terms of unconstrained coefficients (this is done through a QR-decomposition
74- of the constraint matrix).
86+ constraint matrix containing the mean of each column of ``dm ``.
87+ Absorbing this constraint in the *final * design matrix means that we rewrite the model
88+ in terms of *unconstrained * parameters (this is done through a QR-decomposition
89+ of the constraint matrix). Those unconstrained parameters have the property
90+ that when projected back into the initial parameters space (let's call ``b_back ``
91+ the result of this projection), the constraint
92+ ``np.dot(c, b_back) `` being zero is automatically verified.
7593 In patsy one can choose between no
7694 constraint, a centering constraint like *mgcv * (``'center' ``) or a user provided
7795 constraint matrix.
7896
79- .. figure :: figures/crspline-basis.png
80- :align: center
97+ Here are some illustrations of typical natural and cyclic spline bases:
98+
99+ .. ipython :: python
81100
82- Illustration of natural cubic regression spline basis functions
83- (colored curves) each multiplied by an associated coefficient, and the
84- spline itself (sum of the basis functions, thick black curve).
85- The basis functions were obtained using ``cr(x, df=6) ``.
101+ plt.title(" Natural cubic regression spline basis example" );
102+ y = dmatrix(" cr(x, df=6) - 1" , {" x" : x})
103+ # Plot natural cubic regression spline basis functions (colored curves) each multiplied by its coeff
104+ plt.plot(x, y* b);
105+ @savefig basis -crspline.png align=center
106+ # Plot the spline itself (sum of the basis functions, thick black curve)
107+ plt.plot(x, np.dot(y, b), color = ' k' , linewidth = 3 );
86108
87- .. figure :: figures/ccspline-basis.png
88- :align: center
89109
90- Illustration of cyclic cubic regression spline basis functions
91- (colored curves) each multiplied by an associated coefficient, and the
92- spline itself (sum of the basis functions, thick black curve).
93- The basis functions were obtained using ``cc(x, df=6) ``.
110+ .. ipython :: python
111+
112+ plt.title(" Cyclic cubic regression spline basis example" );
113+ y = dmatrix(" cc(x, df=6) - 1" , {" x" : x})
114+ # Plot cyclic cubic regression spline basis functions (colored curves) each multiplied by its coeff
115+ plt.plot(x, y* b);
116+ @savefig basis -ccspline.png align=center
117+ # Plot the spline itself (sum of the basis functions, thick black curve)
118+ plt.plot(x, np.dot(y, b), color = ' k' , linewidth = 3 );
119+
94120
95121 In the following example we first set up our spline basis using same data as for
96122the B-spline example above and then make predictions on a new set of data:
97123
98124.. ipython :: python
99125
100- builder = incr_dbuilder(" cr(x, df=4, constraints='center')" ,
101- lambda : iter (data_chunked))
102-
103- design_matrix = build_design_matrices([builder], new_data)[0 ]
104- design_matrix
105- np.asarray(design_matrix)
126+ design_matrix = dmatrix(" cr(x, df=4, constraints='center')" , data)
127+ new_design_matrix = build_design_matrices([design_matrix.design_info.builder], new_data)[0 ]
128+ new_design_matrix
129+ np.asarray(new_design_matrix)
106130
107131 Note that in the above example 5 knots are actually used to achieve 4 degrees
108132of freedom since a centering constraint is requested.
@@ -112,35 +136,78 @@ Tensor product smooth
112136
113137Smooth of several covariates can be generated through a tensor product of
114138the bases of marginal univariate smooths. For these marginal smooths one can
115- use any of the above defined splines. The tensor product stateful transform
116- is called :func: `te `.
139+ use the above defined splines as well as any user defined smooth provided it
140+ is a genuine stateful transform.
141+ The tensor product stateful transform is called :func: `te `.
142+
143+ .. note ::
144+ The implementation of this tensor product is compatible with *mgcv * when
145+ considering only cubic regression spline marginal smooths, which means that
146+ generated bases will match those produced by *mgcv *.
147+ Recall that we do not implement any kind of *mgcv *-compatible penalized
148+ fitting process.
149+
150+ In the following code we show an example of tensor product basis functions
151+ used to represent a smooth of two variables ``x1 `` and ``x2 ``. Note how
152+ marginal spline bases patterns can be observed on the x and y contour projections:
153+
154+ .. ipython ::
155+
156+ In [10]: from matplotlib import cm
157+
158+ In [20]: from mpl_toolkits.mplot3d.axes3d import Axes3D
159+
160+ In [30]: x1 = np.linspace(0., 1., 100)
161+
162+ In [40]: x2 = np.linspace(0., 1., 100)
163+
164+ In [50]: x1, x2 = np.meshgrid(x1, x2)
165+
166+ In [60]: df = 3
167+
168+ In [70]: y = dmatrix("te(x1, x2, s=(ms(cr, df=df), ms(cc, df=df))) - 1",
169+ ....: {"x1": x1.ravel(), "x2": x2.ravel(), "df": df})
170+ ....:
171+
172+ In [80]: print y.shape
173+
174+ In [90]: fig = plt.figure()
175+
176+ In [100]: fig.suptitle("Tensor product basis example (2 covariates)");
117177
118- .. figure :: figures/tesmooth-basis.png
119- :align: center
178+ In [110]: for i in xrange(df * df):
179+ .....: ax = fig.add_subplot(df, df, i + 1, projection='3d')
180+ .....: yi = y[:, i].reshape(x1.shape)
181+ .....: ax.plot_surface(x1, x2, yi, rstride=4, cstride=4, alpha=0.15)
182+ .....: ax.contour(x1, x2, yi, zdir='z', cmap=cm.coolwarm, offset=-0.5)
183+ .....: ax.contour(x1, x2, yi, zdir='y', cmap=cm.coolwarm, offset=1.2)
184+ .....: ax.contour(x1, x2, yi, zdir='x', cmap=cm.coolwarm, offset=-0.2)
185+ .....: ax.set_xlim3d(-0.2, 1.0)
186+ .....: ax.set_ylim3d(0, 1.2)
187+ .....: ax.set_zlim3d(-0.5, 1)
188+ .....: ax.set_xticks([0, 1])
189+ .....: ax.set_yticks([0, 1])
190+ .....: ax.set_zticks([-0.5, 0, 1])
191+ .....:
120192
121- Illustration of tensor product basis functions used to represent a smooth
122- of two variables ``x1 `` and ``x2 ``. The basis functions were obtained using
123- ``te(x1, x2, s=(ms(cr, df=3), ms(cc, df=3))) ``. Marginal spline bases patterns
124- can be seen on the x and y contour projections.
193+ @savefig basis-tesmooth.png align=center
194+ In [120]: fig.tight_layout()
125195
126196Following what we did for univariate splines in the preceding sections, we will
127197now set up a 3-d smooth basis using some data and then make predictions on a
128198new set of data:
129199
130200.. ipython :: python
131201
132- x1 = np.linspace(0 ., 1 ., 100 )
133- x2 = np.linspace(0 ., 1 ., 100 )
134- x3 = np.linspace(0 ., 1 ., 100 )
135- data_chunked = [{" x1" : x1[:50 ], " x2" : x2[:50 ], " x3" : x3[:50 ]},
136- {" x1" : x1[50 :], " x2" : x2[50 :], " x3" : x3[50 :]}]
137-
138- builder = incr_dbuilder(" te(x1, x2, x3, s=(ms(cr, df=3), ms(cr, df=3), "
139- " ms(cc, df=3)), constraints='center')" ,
140- lambda : iter (data_chunked))
141- new_data = {" x1" : 0.1 ,
142- " x2" : 0.2 ,
143- " x3" : 0.3 }
144- design_matrix = build_design_matrices([builder], new_data)[0 ]
145- design_matrix
146- np.asarray(design_matrix)
202+ data = {" x1" : np.linspace(0 ., 1 ., 100 ),
203+ " x2" : np.linspace(0 ., 1 ., 100 ),
204+ " x3" : np.linspace(0 ., 1 ., 100 )}
205+ design_matrix = dmatrix(" te(x1, x2, x3, s=(ms(cr, df=3), ms(cr, df=3), "
206+ " ms(cc, df=3)), constraints='center')" ,
207+ data)
208+ new_data = {" x1" : [0.1 , 0.2 ],
209+ " x2" : [0.2 , 0.3 ],
210+ " x3" : [0.3 , 0.4 ]}
211+ new_design_matrix = build_design_matrices([design_matrix.design_info.builder], new_data)[0 ]
212+ new_design_matrix
213+ np.asarray(new_design_matrix)
0 commit comments