@@ -14,62 +14,67 @@ tft.pca(
1414)
1515```
1616
17- Computes pca on the dataset using biased covariance.
17+ Computes PCA on the dataset using biased covariance.
1818
19- The pca analyzer computes output_dim orthonormal vectors that capture
19+ The PCA analyzer computes output_dim orthonormal vectors that capture
2020directions/axes corresponding to the highest variances in the input vectors of
21- x . The output vectors are returned as a rank-2 tensor with shape
22- (input_dim, output_dim), where the 0th dimension are the components of each
21+ ` x ` . The output vectors are returned as a rank-2 tensor with shape
22+ ` (input_dim, output_dim) ` , where the 0th dimension are the components of each
2323output vector, and the 1st dimension are the output vectors representing
2424orthogonal directions in the input space, sorted in order of decreasing
2525variances.
2626
2727The output rank-2 tensor (matrix) serves a useful transform purpose. Formally,
2828the matrix can be used downstream in the transform step by multiplying it to
29- the input tensor x . This transform reduces the dimension of input vectors to
29+ the input tensor ` x ` . This transform reduces the dimension of input vectors to
3030output_dim in a way that retains the maximal variance.
3131
3232NOTE: To properly use PCA, input vector components should be converted to
3333similar units of measurement such that the vectors represent a Euclidean
3434space. If no such conversion is available (e.g. one element represents time,
3535another element distance), the canonical approach is to first apply a
3636transformation to the input data to normalize numerical variances, i.e.
37- tft.scale_to_z_score(). Normalization allows PCA to choose output axes that
37+ ` tft.scale_to_z_score() ` . Normalization allows PCA to choose output axes that
3838help decorrelate input axes.
3939
4040Below are a couple intuitive examples of PCA.
4141
4242Consider a simple 2-dimensional example:
4343
44- Input x is a series of vectors [ e, e] where e is Gaussian with mean 0,
44+ Input x is a series of vectors ` [e, e] ` where ` e ` is Gaussian with mean 0,
4545variance 1. The two components are perfectly correlated, and the resulting
4646covariance matrix is
47+
48+ ```
4749[[1 1],
4850 [1 1]].
49- Applying PCA with output_dim = 1 would discover the first principal component
50- [ 1 / sqrt(2), 1 / sqrt(2)] . When multipled to the original example, each
51- vector [ e, e] would be mapped to a scalar sqrt(2) * e. The second principal
52- component would be [ -1 / sqrt(2), 1 / sqrt(2)] and would map [ e, e] to 0,
53- which indicates that the second component captures no variance at all. This
54- agrees with our intuition since we know that the two axes in the input are
55- perfectly correlated and can be fully explained by a single scalar e.
51+ ```
52+
53+ Applying PCA with ` output_dim = 1 ` would discover the first principal
54+ component ` [1 / sqrt(2), 1 / sqrt(2)] ` . When multipled to the original
55+ example, each vector ` [e, e] ` would be mapped to a scalar ` sqrt(2) * e ` . The
56+ second principal component would be ` [-1 / sqrt(2), 1 / sqrt(2)] ` and would
57+ map ` [e, e] ` to 0, which indicates that the second component captures no
58+ variance at all. This agrees with our intuition since we know that the two
59+ axes in the input are perfectly correlated and can be fully explained by a
60+ single scalar ` e ` .
5661
5762Consider a 3-dimensional example:
5863
59- Input x is a series of vectors [ a, a, b] , where a is a zero-mean, unit
60- variance Gaussian. b is a zero-mean, variance 4 Gaussian and is independent of
61- a . The first principal component of the unnormalized vector would be [ 0, 0, 1 ]
62- since b has a much larger variance than any linear combination of the first
63- two components. This would map [ a, a, b] onto b, asserting that the axis with
64- highest energy is the third component. While this may be the desired
65- output if a and b correspond to the same units, it is not statistically
66- desireable when the units are irreconciliable. In such a case, one should
67- first normalize each component to unit variance first, i.e. b := b / 2 .
68- The first principal component of a normalized vector would yield
69- [ 1 / sqrt(2), 1 / sqrt(2), 0] , and would map [ a, a, b] to sqrt(2) * a. The
70- second component would be [ 0, 0, 1] and map [ a, a, b] to b . As can be seen,
71- the benefit of normalization is that PCA would capture highly correlated
72- components first and collapse them into a lower dimension.
64+ Input ` x ` is a series of vectors ` [a, a, b] ` , where ` a ` is a zero-mean, unit
65+ variance Gaussian and ` b ` is a zero-mean, variance 4 Gaussian and is
66+ independent of ` a ` . The first principal component of the unnormalized vector
67+ would be ` [0, 0, 1] ` since ` b ` has a much larger variance than any linear
68+ combination of the first two components. This would map ` [a, a, b] ` onto ` b ` ,
69+ asserting that the axis with highest energy is the third component. While this
70+ may be the desired output if ` a ` and ` b ` correspond to the same units, it is
71+ not statistically desireable when the units are irreconciliable. In such a
72+ case, one should first normalize each component to unit variance first, i.e.
73+ ` b := b / 2 ` . The first principal component of a normalized vector would yield
74+ ` [1 / sqrt(2), 1 / sqrt(2), 0] ` , and would map ` [a, a, b] ` to ` sqrt(2) * a ` .
75+ The second component would be ` [0, 0, 1] ` and map ` [a, a, b] ` to ` b ` . As can
76+ be seen, the benefit of normalization is that PCA would capture highly
77+ correlated components first and collapse them into a lower dimension.
7378
7479#### Args:
7580
0 commit comments