@@ -15,13 +15,14 @@ Patsy offers a set of specific stateful transforms (for more details about
1515stateful transforms see :ref: `stateful-transforms `) that you can use in
1616formulas to generate splines bases and express non-linear fits.
1717
18- B-splines
19- ---------
18+ General B-splines
19+ -----------------
2020
21- B-spline bases can be generated with the :func: `bs ` stateful transform.
21+ B-spline bases can be generated with the :func: `bs ` stateful
22+ transform. The spline bases returned by :func: `bs ` are designed to be
23+ compatible with those produced by the R ``bs `` function.
2224The following code illustrates a typical basis and the resulting spline:
2325
24-
2526.. ipython :: python
2627
2728 import matplotlib.pyplot as plt
@@ -36,8 +37,6 @@ The following code illustrates a typical basis and the resulting spline:
3637 # Plot the spline itself (sum of the basis functions, thick black curve)
3738 plt.plot(x, np.dot(y, b), color = ' k' , linewidth = 3 );
3839
39-
40-
4140 In the following example we first set up our B-spline basis using some data and
4241then make predictions on a new set of data:
4342
@@ -50,15 +49,23 @@ then make predictions on a new set of data:
5049 build_design_matrices([design_matrix.design_info.builder], new_data)[0 ]
5150
5251
53- Cubic regression splines
54- ------------------------
52+ :func: `bs ` can produce B-spline bases of arbitrary degrees -- e.g.,
53+ ``degree=0 `` will give produce piecewise-constant functions,
54+ ``degree=1 `` will produce piecewise-linear functions, and the default
55+ ``degree=3 `` produces cubic splines. The next section describes more
56+ specialized functions for producing different types of cubic splines.
57+
58+
59+ Natural and cyclic cubic regression splines
60+ -------------------------------------------
5561
5662Natural and cyclic cubic regression splines are provided through the stateful
5763transforms :func: `cr ` and :func: `cc ` respectively. Here the spline is
5864parameterized directly using its values at the knots. These splines were designed
5965to be compatible with those found in the R package
6066`mgcv <http://cran.r-project.org/web/packages/mgcv/index.html >`_
61- (these are called *cr *, *cs * and *cc * in the context of *mgcv *).
67+ (these are called *cr *, *cs * and *cc * in the context of *mgcv *), but
68+ can be used with any model.
6269
6370.. warning ::
6471 Note that the compatibility with *mgcv * applies only to the **generation of
@@ -68,32 +75,6 @@ to be compatible with those found in the R package
6875 predictions from a model previously fitted with *mgcv *, or to serve as
6976 building blocks for other regression models (like OLS).
7077
71- Note that the API is different from *mgcv *:
72-
73- * In patsy one can specify the number of degrees of freedom directly (actual number of
74- columns of the resulting design matrix) whereas in *mgcv * one has to specify
75- the number of knots to use. For instance, in the case of cyclic regression splines (with no
76- additional constraints) the actual degrees of freedom is the number of knots
77- minus one.
78- * In patsy one can specify inner knots as well as lower and upper exterior knots
79- which can be useful for cyclic spline for instance.
80- * In *mgcv * a centering/identifiability constraint is automatically computed and
81- absorbed in the resulting design matrix.
82- The purpose of this is to ensure that if ``b `` is the array of *initial * parameters
83- (corresponding to the *initial * unconstrained design matrix ``dm ``), our
84- model is centered, ie ``np.mean(np.dot(dm, b)) `` is zero.
85- We can rewrite this as ``np.dot(c, b) `` being zero with ``c `` a 1-row
86- constraint matrix containing the mean of each column of ``dm ``.
87- Absorbing this constraint in the *final * design matrix means that we rewrite the model
88- in terms of *unconstrained * parameters (this is done through a QR-decomposition
89- of the constraint matrix). Those unconstrained parameters have the property
90- that when projected back into the initial parameters space (let's call ``b_back ``
91- the result of this projection), the constraint
92- ``np.dot(c, b_back) `` being zero is automatically verified.
93- In patsy one can choose between no
94- constraint, a centering constraint like *mgcv * (``'center' ``) or a user provided
95- constraint matrix.
96-
9778Here are some illustrations of typical natural and cyclic spline bases:
9879
9980.. ipython :: python
@@ -131,10 +112,37 @@ the B-spline example above and then make predictions on a new set of data:
131112 Note that in the above example 5 knots are actually used to achieve 4 degrees
132113of freedom since a centering constraint is requested.
133114
134- Tensor product smooth
135- ---------------------
115+ Note that the API is different from *mgcv *:
136116
137- Smooth of several covariates can be generated through a tensor product of
117+ * In patsy one can specify the number of degrees of freedom directly (actual number of
118+ columns of the resulting design matrix) whereas in *mgcv * one has to specify
119+ the number of knots to use. For instance, in the case of cyclic regression splines (with no
120+ additional constraints) the actual degrees of freedom is the number of knots
121+ minus one.
122+ * In patsy one can specify inner knots as well as lower and upper exterior knots
123+ which can be useful for cyclic spline for instance.
124+ * In *mgcv * a centering/identifiability constraint is automatically computed and
125+ absorbed in the resulting design matrix.
126+ The purpose of this is to ensure that if ``b `` is the array of *initial * parameters
127+ (corresponding to the *initial * unconstrained design matrix ``dm ``), our
128+ model is centered, ie ``np.mean(np.dot(dm, b)) `` is zero.
129+ We can rewrite this as ``np.dot(c, b) `` being zero with ``c `` a 1-row
130+ constraint matrix containing the mean of each column of ``dm ``.
131+ Absorbing this constraint in the *final * design matrix means that we rewrite the model
132+ in terms of *unconstrained * parameters (this is done through a QR-decomposition
133+ of the constraint matrix). Those unconstrained parameters have the property
134+ that when projected back into the initial parameters space (let's call ``b_back ``
135+ the result of this projection), the constraint
136+ ``np.dot(c, b_back) `` being zero is automatically verified.
137+ In patsy one can choose between no
138+ constraint, a centering constraint like *mgcv * (``'center' ``) or a user provided
139+ constraint matrix.
140+
141+
142+ Tensor product smooths
143+ ----------------------
144+
145+ Smooths of several covariates can be generated through a tensor product of
138146the bases of marginal univariate smooths. For these marginal smooths one can
139147use the above defined splines as well as user defined smooths provided they
140148actually transform input univariate data into some kind of smooth functions
@@ -211,4 +219,4 @@ new set of data:
211219 " x3" : [0.3 , 0.4 ]}
212220 new_design_matrix = build_design_matrices([design_matrix.design_info.builder], new_data)[0 ]
213221 new_design_matrix
214- np.asarray(new_design_matrix)
222+ np.asarray(new_design_matrix)
0 commit comments