Skip to content

Commit fe2bbc6

Browse files
committed
Merge remote-tracking branch 'origin/master'
2 parents e3703da + 1710a5f commit fe2bbc6

8 files changed

Lines changed: 1687 additions & 1 deletion

File tree

doc/API-reference.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -164,6 +164,9 @@ Spline regression
164164
-----------------
165165

166166
.. autofunction:: bs
167+
.. autofunction:: cr
168+
.. autofunction:: cc
169+
.. autofunction:: te
167170

168171
Working with formulas programmatically
169172
--------------------------------------

doc/index.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,8 @@ Contents:
2121

2222
stateful-transforms.rst
2323

24+
spline-regression.rst
25+
2426
expert-model-specification.rst
2527

2628
library-developers.rst

doc/spline-regression.rst

Lines changed: 214 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,214 @@
1+
.. _spline-regression:
2+
3+
Spline regression
4+
=================
5+
6+
.. currentmodule:: patsy
7+
8+
.. ipython:: python
9+
:suppress:
10+
11+
import numpy as np
12+
from patsy import dmatrix, build_design_matrices
13+
14+
Patsy offers a set of specific stateful transforms (for more details about
15+
stateful transforms see :ref:`stateful-transforms`) that you can use in
16+
formulas to generate splines bases and express non-linear fits.
17+
18+
B-splines
19+
---------
20+
21+
B-spline bases can be generated with the :func:`bs` stateful transform.
22+
The following code illustrates a typical basis and the resulting spline:
23+
24+
25+
.. ipython:: python
26+
27+
import matplotlib.pyplot as plt
28+
plt.title("B-spline basis example (degree=3)");
29+
x = np.linspace(0., 1., 100)
30+
y = dmatrix("bs(x, df=6, degree=3, include_intercept=True) - 1", {"x": x})
31+
# Define some coefficients
32+
b = np.array([1.3, 0.6, 0.9, 0.4, 1.6, 0.7])
33+
# Plot B-spline basis functions (colored curves) each multiplied by its coeff
34+
plt.plot(x, y*b);
35+
@savefig basis-bspline.png align=center
36+
# Plot the spline itself (sum of the basis functions, thick black curve)
37+
plt.plot(x, np.dot(y, b), color='k', linewidth=3);
38+
39+
40+
41+
In the following example we first set up our B-spline basis using some data and
42+
then make predictions on a new set of data:
43+
44+
.. ipython:: python
45+
46+
data = {"x": np.linspace(0., 1., 100)}
47+
design_matrix = dmatrix("bs(x, df=4)", data)
48+
49+
new_data = {"x": [0.1, 0.25, 0.9]}
50+
build_design_matrices([design_matrix.design_info.builder], new_data)[0]
51+
52+
53+
Cubic regression splines
54+
------------------------
55+
56+
Natural and cyclic cubic regression splines are provided through the stateful
57+
transforms :func:`cr` and :func:`cc` respectively. Here the spline is
58+
parameterized directly using its values at the knots. These splines were designed
59+
to be compatible with those found in the R package
60+
`mgcv <http://cran.r-project.org/web/packages/mgcv/index.html>`_
61+
(these are called *cr*, *cs* and *cc* in the context of *mgcv*).
62+
63+
.. warning::
64+
Note that the compatibility with *mgcv* applies only to the **generation of
65+
spline bases**: we do not implement any kind of *mgcv*-compatible penalized
66+
fitting process.
67+
Thus these spline bases can be used to precisely reproduce
68+
predictions from a model previously fitted with *mgcv*, or to serve as
69+
building blocks for other regression models (like OLS).
70+
71+
Note that the API is different from *mgcv*:
72+
73+
* In patsy one can specify the number of degrees of freedom directly (actual number of
74+
columns of the resulting design matrix) whereas in *mgcv* one has to specify
75+
the number of knots to use. For instance, in the case of cyclic regression splines (with no
76+
additional constraints) the actual degrees of freedom is the number of knots
77+
minus one.
78+
* In patsy one can specify inner knots as well as lower and upper exterior knots
79+
which can be useful for cyclic spline for instance.
80+
* In *mgcv* a centering/identifiability constraint is automatically computed and
81+
absorbed in the resulting design matrix.
82+
The purpose of this is to ensure that if ``b`` is the array of *initial* parameters
83+
(corresponding to the *initial* unconstrained design matrix ``dm``), our
84+
model is centered, ie ``np.mean(np.dot(dm, b))`` is zero.
85+
We can rewrite this as ``np.dot(c, b)`` being zero with ``c`` a 1-row
86+
constraint matrix containing the mean of each column of ``dm``.
87+
Absorbing this constraint in the *final* design matrix means that we rewrite the model
88+
in terms of *unconstrained* parameters (this is done through a QR-decomposition
89+
of the constraint matrix). Those unconstrained parameters have the property
90+
that when projected back into the initial parameters space (let's call ``b_back``
91+
the result of this projection), the constraint
92+
``np.dot(c, b_back)`` being zero is automatically verified.
93+
In patsy one can choose between no
94+
constraint, a centering constraint like *mgcv* (``'center'``) or a user provided
95+
constraint matrix.
96+
97+
Here are some illustrations of typical natural and cyclic spline bases:
98+
99+
.. ipython:: python
100+
101+
plt.title("Natural cubic regression spline basis example");
102+
y = dmatrix("cr(x, df=6) - 1", {"x": x})
103+
# Plot natural cubic regression spline basis functions (colored curves) each multiplied by its coeff
104+
plt.plot(x, y*b);
105+
@savefig basis-crspline.png align=center
106+
# Plot the spline itself (sum of the basis functions, thick black curve)
107+
plt.plot(x, np.dot(y, b), color='k', linewidth=3);
108+
109+
110+
.. ipython:: python
111+
112+
plt.title("Cyclic cubic regression spline basis example");
113+
y = dmatrix("cc(x, df=6) - 1", {"x": x})
114+
# Plot cyclic cubic regression spline basis functions (colored curves) each multiplied by its coeff
115+
plt.plot(x, y*b);
116+
@savefig basis-ccspline.png align=center
117+
# Plot the spline itself (sum of the basis functions, thick black curve)
118+
plt.plot(x, np.dot(y, b), color='k', linewidth=3);
119+
120+
121+
In the following example we first set up our spline basis using same data as for
122+
the B-spline example above and then make predictions on a new set of data:
123+
124+
.. ipython:: python
125+
126+
design_matrix = dmatrix("cr(x, df=4, constraints='center')", data)
127+
new_design_matrix = build_design_matrices([design_matrix.design_info.builder], new_data)[0]
128+
new_design_matrix
129+
np.asarray(new_design_matrix)
130+
131+
Note that in the above example 5 knots are actually used to achieve 4 degrees
132+
of freedom since a centering constraint is requested.
133+
134+
Tensor product smooth
135+
---------------------
136+
137+
Smooth of several covariates can be generated through a tensor product of
138+
the bases of marginal univariate smooths. For these marginal smooths one can
139+
use the above defined splines as well as user defined smooths provided they
140+
actually transform input univariate data into some kind of smooth functions
141+
basis producing a 2-d array output with the ``(i, j)`` element corresponding
142+
to the value of the ``j`` th basis function at the ``i`` th data point.
143+
The tensor product stateful transform is called :func:`te`.
144+
145+
.. note::
146+
The implementation of this tensor product is compatible with *mgcv* when
147+
considering only cubic regression spline marginal smooths, which means that
148+
generated bases will match those produced by *mgcv*.
149+
Recall that we do not implement any kind of *mgcv*-compatible penalized
150+
fitting process.
151+
152+
In the following code we show an example of tensor product basis functions
153+
used to represent a smooth of two variables ``x1`` and ``x2``. Note how
154+
marginal spline bases patterns can be observed on the x and y contour projections:
155+
156+
.. ipython::
157+
158+
In [10]: from matplotlib import cm
159+
160+
In [20]: from mpl_toolkits.mplot3d.axes3d import Axes3D
161+
162+
In [30]: x1 = np.linspace(0., 1., 100)
163+
164+
In [40]: x2 = np.linspace(0., 1., 100)
165+
166+
In [50]: x1, x2 = np.meshgrid(x1, x2)
167+
168+
In [60]: df = 3
169+
170+
In [70]: y = dmatrix("te(cr(x1, df), cc(x2, df)) - 1",
171+
....: {"x1": x1.ravel(), "x2": x2.ravel(), "df": df})
172+
....:
173+
174+
In [80]: print y.shape
175+
176+
In [90]: fig = plt.figure()
177+
178+
In [100]: fig.suptitle("Tensor product basis example (2 covariates)");
179+
180+
In [110]: for i in xrange(df * df):
181+
.....: ax = fig.add_subplot(df, df, i + 1, projection='3d')
182+
.....: yi = y[:, i].reshape(x1.shape)
183+
.....: ax.plot_surface(x1, x2, yi, rstride=4, cstride=4, alpha=0.15)
184+
.....: ax.contour(x1, x2, yi, zdir='z', cmap=cm.coolwarm, offset=-0.5)
185+
.....: ax.contour(x1, x2, yi, zdir='y', cmap=cm.coolwarm, offset=1.2)
186+
.....: ax.contour(x1, x2, yi, zdir='x', cmap=cm.coolwarm, offset=-0.2)
187+
.....: ax.set_xlim3d(-0.2, 1.0)
188+
.....: ax.set_ylim3d(0, 1.2)
189+
.....: ax.set_zlim3d(-0.5, 1)
190+
.....: ax.set_xticks([0, 1])
191+
.....: ax.set_yticks([0, 1])
192+
.....: ax.set_zticks([-0.5, 0, 1])
193+
.....:
194+
195+
@savefig basis-tesmooth.png align=center
196+
In [120]: fig.tight_layout()
197+
198+
Following what we did for univariate splines in the preceding sections, we will
199+
now set up a 3-d smooth basis using some data and then make predictions on a
200+
new set of data:
201+
202+
.. ipython:: python
203+
204+
data = {"x1": np.linspace(0., 1., 100),
205+
"x2": np.linspace(0., 1., 100),
206+
"x3": np.linspace(0., 1., 100)}
207+
design_matrix = dmatrix("te(cr(x1, df=3), cr(x2, df=3), cc(x3, df=3), constraints='center')",
208+
data)
209+
new_data = {"x1": [0.1, 0.2],
210+
"x2": [0.2, 0.3],
211+
"x3": [0.3, 0.4]}
212+
new_design_matrix = build_design_matrices([design_matrix.design_info.builder], new_data)[0]
213+
new_design_matrix
214+
np.asarray(new_design_matrix)

patsy/__init__.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ def set_origin(self, origin):
6464
__all__ = ["PatsyError"]
6565

6666
# We make a rich API available for explicit use. To see what exactly is
67-
# exported, check each module's __all__, or import this moduel and look at its
67+
# exported, check each module's __all__, or import this module and look at its
6868
# __all__.
6969

7070
def _reexport(mod):
@@ -110,5 +110,8 @@ def _reexport(mod):
110110
import patsy.splines
111111
_reexport(patsy.splines)
112112

113+
import patsy.mgcv_cubic_splines
114+
_reexport(patsy.mgcv_cubic_splines)
115+
113116
# XX FIXME: we aren't exporting any of the explicit parsing interface
114117
# yet. Need to figure out how to do that.

patsy/builtins.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,9 @@
2222
from patsy.splines import bs
2323
__all__ += ["bs"]
2424

25+
from patsy.mgcv_cubic_splines import cr, cc, te
26+
__all__ += ["cr", "cc", "te"]
27+
2528
def I(x):
2629
"""The identity function. Simply returns its input unchanged.
2730

0 commit comments

Comments
 (0)