Skip to content

Commit bc0c3b6

Browse files
tf-transform-teamzoyahav
authored andcommitted
Project import generated by Copybara.
PiperOrigin-RevId: 215135118
1 parent 98e2f71 commit bc0c3b6

68 files changed

Lines changed: 5474 additions & 313 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

RELEASE.md

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,24 @@
33
## Major Features and Improvements
44

55
## Bug Fixes and Other Changes
6-
* 'tft.vocabulary' and 'tft.compute_and_apply_vocabulary' now support filtering
7-
based on mutual information when `labels` is provided.
8-
* Export all package level exports of `tensorflow_transform`, from the
9-
`tensorflow_transform.beam` subpackage. This allows users to just import the
10-
`tensorflow_transform.beam` subpackage for all functionality.
6+
7+
* 'tft.vocabulary' and 'tft.compute_and_apply_vocabulary' now support
8+
filtering based on mutual information when `labels` is provided.
9+
* Export all package level exports of `tensorflow_transform`, from the
10+
`tensorflow_transform.beam` subpackage. This allows users to just import the
11+
`tensorflow_transform.beam` subpackage for all functionality.
12+
* Adding API docs
1113

1214
## Breaking changes
1315

1416
## Deprecations
17+
* All functions in `tensorflow_transform.saved.input_fn_maker` are deprecated.
18+
See the examples for how to construct the `input_fn` for training and serving.
19+
Note that the examples demonstrate the use of the `tf.estimator` API. The
20+
functions named \*\_serving\_input\_fn were for use with the
21+
`tf.contrib.estimator` API which is now deprecated. We do not provide
22+
examples of usage of the `tf.contrib.estimator` API, instead users should
23+
upgrade to the `tf.estimator` API.
1524

1625
# Release 0.9.0
1726

docs/_toc.yaml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
toc:
2+
- title: Get Started
3+
path: /tfx/transform/get_started
4+
5+
- heading: Examples
6+
- title: Simple example
7+
path: https://github.com/tensorflow/transform/blob/master/examples/simple_example.py
8+
status: external
9+
- title: Census income
10+
path: https://github.com/tensorflow/transform/blob/master/examples/census_example.py
11+
status: external
12+
- title: Sentiment analysis
13+
path: https://github.com/tensorflow/transform/blob/master/examples/sentiment.md
14+
status: external
15+
- title: Chicago Taxi (end-to-end)
16+
path: https://github.com/tensorflow/model-analysis/tree/master/examples/chicago_taxi
17+
status: external

docs/api_docs/python/_toc.yaml

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
# Automatically generated file; please do not edit
2+
toc:
3+
- title: tft
4+
section:
5+
- title: Overview
6+
path: /tfx/transform/api_docs/python/tft
7+
- title: Analyzer
8+
path: /tfx/transform/api_docs/python/tft/Analyzer
9+
- title: apply_buckets
10+
path: /tfx/transform/api_docs/python/tft/apply_buckets
11+
- title: apply_combiner
12+
path: /tfx/transform/api_docs/python/tft/apply_combiner
13+
- title: apply_function
14+
path: /tfx/transform/api_docs/python/tft/apply_function
15+
- title: apply_function_with_checkpoint
16+
path: /tfx/transform/api_docs/python/tft/apply_function_with_checkpoint
17+
- title: apply_saved_model
18+
path: /tfx/transform/api_docs/python/tft/apply_saved_model
19+
- title: apply_vocab
20+
path: /tfx/transform/api_docs/python/tft/apply_vocab
21+
- title: apply_vocabulary
22+
path: /tfx/transform/api_docs/python/tft/apply_vocabulary
23+
- title: bucketize
24+
path: /tfx/transform/api_docs/python/tft/bucketize
25+
- title: bucketize_per_key
26+
path: /tfx/transform/api_docs/python/tft/bucketize_per_key
27+
- title: compute_and_apply_vocabulary
28+
path: /tfx/transform/api_docs/python/tft/compute_and_apply_vocabulary
29+
- title: covariance
30+
path: /tfx/transform/api_docs/python/tft/covariance
31+
- title: CovarianceCombiner
32+
path: /tfx/transform/api_docs/python/tft/CovarianceCombiner
33+
- title: hash_strings
34+
path: /tfx/transform/api_docs/python/tft/hash_strings
35+
- title: max
36+
path: /tfx/transform/api_docs/python/tft/max
37+
- title: mean
38+
path: /tfx/transform/api_docs/python/tft/mean
39+
- title: MeanAndVarCombiner
40+
path: /tfx/transform/api_docs/python/tft/MeanAndVarCombiner
41+
- title: min
42+
path: /tfx/transform/api_docs/python/tft/min
43+
- title: ngrams
44+
path: /tfx/transform/api_docs/python/tft/ngrams
45+
- title: NumPyCombiner
46+
path: /tfx/transform/api_docs/python/tft/NumPyCombiner
47+
- title: pca
48+
path: /tfx/transform/api_docs/python/tft/pca
49+
- title: PCACombiner
50+
path: /tfx/transform/api_docs/python/tft/PCACombiner
51+
- title: quantiles
52+
path: /tfx/transform/api_docs/python/tft/quantiles
53+
- title: QuantilesCombiner
54+
path: /tfx/transform/api_docs/python/tft/QuantilesCombiner
55+
- title: sanitized_vocab_filename
56+
path: /tfx/transform/api_docs/python/tft/sanitized_vocab_filename
57+
- title: scale_by_min_max
58+
path: /tfx/transform/api_docs/python/tft/scale_by_min_max
59+
- title: scale_to_0_1
60+
path: /tfx/transform/api_docs/python/tft/scale_to_0_1
61+
- title: scale_to_z_score
62+
path: /tfx/transform/api_docs/python/tft/scale_to_z_score
63+
- title: segment_indices
64+
path: /tfx/transform/api_docs/python/tft/segment_indices
65+
- title: size
66+
path: /tfx/transform/api_docs/python/tft/size
67+
- title: sparse_tensor_to_dense_with_shape
68+
path: /tfx/transform/api_docs/python/tft/sparse_tensor_to_dense_with_shape
69+
- title: string_to_int
70+
path: /tfx/transform/api_docs/python/tft/string_to_int
71+
- title: sum
72+
path: /tfx/transform/api_docs/python/tft/sum
73+
- title: tfidf
74+
path: /tfx/transform/api_docs/python/tft/tfidf
75+
- title: TFTransformOutput
76+
path: /tfx/transform/api_docs/python/tft/TFTransformOutput
77+
- title: uniques
78+
path: /tfx/transform/api_docs/python/tft/uniques
79+
- title: var
80+
path: /tfx/transform/api_docs/python/tft/var
81+
- title: vocabulary
82+
path: /tfx/transform/api_docs/python/tft/vocabulary
83+
- title: tft.coders
84+
section:
85+
- title: Overview
86+
path: /tfx/transform/api_docs/python/tft/coders
87+
- title: CsvCoder
88+
path: /tfx/transform/api_docs/python/tft/coders/CsvCoder
89+
- title: ExampleProtoCoder
90+
path: /tfx/transform/api_docs/python/tft/coders/ExampleProtoCoder
91+
- title: tft_beam
92+
section:
93+
- title: Overview
94+
path: /tfx/transform/api_docs/python/tft_beam
95+
- title: AnalyzeAndTransformDataset
96+
path: /tfx/transform/api_docs/python/tft_beam/AnalyzeAndTransformDataset
97+
- title: AnalyzeDataset
98+
path: /tfx/transform/api_docs/python/tft_beam/AnalyzeDataset
99+
- title: Context
100+
path: /tfx/transform/api_docs/python/tft_beam/Context
101+
- title: ReadTransformFn
102+
path: /tfx/transform/api_docs/python/tft_beam/ReadTransformFn
103+
- title: TransformDataset
104+
path: /tfx/transform/api_docs/python/tft_beam/TransformDataset
105+
- title: WriteMetadata
106+
path: /tfx/transform/api_docs/python/tft_beam/WriteMetadata
107+
- title: WriteTransformFn
108+
path: /tfx/transform/api_docs/python/tft_beam/WriteTransformFn

docs/api_docs/python/index.md

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# All symbols in TensorFlow Transform
2+
3+
* <a href="./tft.md"><code>tft</code></a>
4+
* <a href="./tft/Analyzer.md"><code>tft.Analyzer</code></a>
5+
* <a href="./tft/CovarianceCombiner.md"><code>tft.CovarianceCombiner</code></a>
6+
* <a href="./tft/MeanAndVarCombiner.md"><code>tft.MeanAndVarCombiner</code></a>
7+
* <a href="./tft/NumPyCombiner.md"><code>tft.NumPyCombiner</code></a>
8+
* <a href="./tft/PCACombiner.md"><code>tft.PCACombiner</code></a>
9+
* <a href="./tft/QuantilesCombiner.md"><code>tft.QuantilesCombiner</code></a>
10+
* <a href="./tft/TFTransformOutput.md"><code>tft.TFTransformOutput</code></a>
11+
* <a href="./tft/apply_buckets.md"><code>tft.apply_buckets</code></a>
12+
* <a href="./tft/apply_combiner.md"><code>tft.apply_combiner</code></a>
13+
* <a href="./tft/apply_function.md"><code>tft.apply_function</code></a>
14+
* <a href="./tft/apply_function_with_checkpoint.md"><code>tft.apply_function_with_checkpoint</code></a>
15+
* <a href="./tft/apply_saved_model.md"><code>tft.apply_saved_model</code></a>
16+
* <a href="./tft/apply_vocab.md"><code>tft.apply_vocab</code></a>
17+
* <a href="./tft/apply_vocabulary.md"><code>tft.apply_vocabulary</code></a>
18+
* <a href="./tft/bucketize.md"><code>tft.bucketize</code></a>
19+
* <a href="./tft/bucketize_per_key.md"><code>tft.bucketize_per_key</code></a>
20+
* <a href="./tft/coders.md"><code>tft.coders</code></a>
21+
* <a href="./tft/coders/CsvCoder.md"><code>tft.coders.CsvCoder</code></a>
22+
* <a href="./tft/coders/ExampleProtoCoder.md"><code>tft.coders.ExampleProtoCoder</code></a>
23+
* <a href="./tft/compute_and_apply_vocabulary.md"><code>tft.compute_and_apply_vocabulary</code></a>
24+
* <a href="./tft/covariance.md"><code>tft.covariance</code></a>
25+
* <a href="./tft/hash_strings.md"><code>tft.hash_strings</code></a>
26+
* <a href="./tft/max.md"><code>tft.max</code></a>
27+
* <a href="./tft/mean.md"><code>tft.mean</code></a>
28+
* <a href="./tft/min.md"><code>tft.min</code></a>
29+
* <a href="./tft/ngrams.md"><code>tft.ngrams</code></a>
30+
* <a href="./tft/pca.md"><code>tft.pca</code></a>
31+
* <a href="./tft/quantiles.md"><code>tft.quantiles</code></a>
32+
* <a href="./tft/sanitized_vocab_filename.md"><code>tft.sanitized_vocab_filename</code></a>
33+
* <a href="./tft/scale_by_min_max.md"><code>tft.scale_by_min_max</code></a>
34+
* <a href="./tft/scale_to_0_1.md"><code>tft.scale_to_0_1</code></a>
35+
* <a href="./tft/scale_to_z_score.md"><code>tft.scale_to_z_score</code></a>
36+
* <a href="./tft/segment_indices.md"><code>tft.segment_indices</code></a>
37+
* <a href="./tft/size.md"><code>tft.size</code></a>
38+
* <a href="./tft/sparse_tensor_to_dense_with_shape.md"><code>tft.sparse_tensor_to_dense_with_shape</code></a>
39+
* <a href="./tft/string_to_int.md"><code>tft.string_to_int</code></a>
40+
* <a href="./tft/sum.md"><code>tft.sum</code></a>
41+
* <a href="./tft/tfidf.md"><code>tft.tfidf</code></a>
42+
* <a href="./tft/uniques.md"><code>tft.uniques</code></a>
43+
* <a href="./tft/var.md"><code>tft.var</code></a>
44+
* <a href="./tft/vocabulary.md"><code>tft.vocabulary</code></a>
45+
* <a href="./tft_beam.md"><code>tft_beam</code></a>
46+
* <a href="./tft_beam/AnalyzeAndTransformDataset.md"><code>tft_beam.AnalyzeAndTransformDataset</code></a>
47+
* <a href="./tft_beam/AnalyzeDataset.md"><code>tft_beam.AnalyzeDataset</code></a>
48+
* <a href="./tft_beam/Context.md"><code>tft_beam.Context</code></a>
49+
* <a href="./tft_beam/ReadTransformFn.md"><code>tft_beam.ReadTransformFn</code></a>
50+
* <a href="./tft_beam/TransformDataset.md"><code>tft_beam.TransformDataset</code></a>
51+
* <a href="./tft_beam/WriteMetadata.md"><code>tft_beam.WriteMetadata</code></a>
52+
* <a href="./tft_beam/WriteTransformFn.md"><code>tft_beam.WriteTransformFn</code></a>

docs/api_docs/python/tft.md

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
<div itemscope itemtype="http://developers.google.com/ReferenceObject">
2+
<meta itemprop="name" content="tft" />
3+
<meta itemprop="path" content="Stable" />
4+
<meta itemprop="property" content="ANALYZER_COLLECTION"/>
5+
<meta itemprop="property" content="VOCAB_FILENAME_PREFIX"/>
6+
<meta itemprop="property" content="VOCAB_FREQUENCY_FILENAME_PREFIX"/>
7+
</div>
8+
9+
# Module: tft
10+
11+
Init module for TF.Transform.
12+
13+
## Modules
14+
15+
[`coders`](./tft/coders.md) module: Module level imports for tensorflow_transform.coders.
16+
17+
## Classes
18+
19+
[`class Analyzer`](./tft/Analyzer.md): A class representing computation that will be done by Beam.
20+
21+
[`class CovarianceCombiner`](./tft/CovarianceCombiner.md): Combines the PCollection to compute the biased covariance matrix.
22+
23+
[`class MeanAndVarCombiner`](./tft/MeanAndVarCombiner.md): Combines a PCollection of accumulators to compute mean and variance.
24+
25+
[`class NumPyCombiner`](./tft/NumPyCombiner.md): Combines the PCollection only on the 0th dimension using nparray.
26+
27+
[`class PCACombiner`](./tft/PCACombiner.md): Compute PCA of accumulated data using the biased covariance matrix.
28+
29+
[`class QuantilesCombiner`](./tft/QuantilesCombiner.md): Computes quantiles on the PCollection.
30+
31+
[`class TFTransformOutput`](./tft/TFTransformOutput.md): A wrapper around the output of the tf.Transform.
32+
33+
## Functions
34+
35+
[`apply_buckets(...)`](./tft/apply_buckets.md): Returns a bucketized column, with a bucket index assigned to each input.
36+
37+
[`apply_combiner(...)`](./tft/apply_combiner.md): Applies the combiner over the whole dataset.
38+
39+
[`apply_function(...)`](./tft/apply_function.md): Deprecated function, equivalent to fn(*args). (deprecated)
40+
41+
[`apply_function_with_checkpoint(...)`](./tft/apply_function_with_checkpoint.md): Applies a tensor-in-tensor-out function with variables to some `Tensor`s.
42+
43+
[`apply_saved_model(...)`](./tft/apply_saved_model.md): Applies a SavedModel to some `Tensor`s.
44+
45+
[`apply_vocab(...)`](./tft/apply_vocab.md): See <a href="./tft/apply_vocabulary.md"><code>tft.apply_vocabulary</code></a>. (deprecated)
46+
47+
[`apply_vocabulary(...)`](./tft/apply_vocabulary.md): Maps `x` to a vocabulary specified by the deferred tensor.
48+
49+
[`bucketize(...)`](./tft/bucketize.md): Returns a bucketized column, with a bucket index assigned to each input.
50+
51+
[`bucketize_per_key(...)`](./tft/bucketize_per_key.md): Returns a bucketized column, with a bucket index assigned to each input.
52+
53+
[`compute_and_apply_vocabulary(...)`](./tft/compute_and_apply_vocabulary.md): Generates a vocabulary for `x` and maps it to an integer with this vocab.
54+
55+
[`covariance(...)`](./tft/covariance.md): Computes the covariance matrix over the whole dataset.
56+
57+
[`hash_strings(...)`](./tft/hash_strings.md): Hash strings into buckets.
58+
59+
[`max(...)`](./tft/max.md): Computes the maximum of the values of a `Tensor` over the whole dataset.
60+
61+
[`mean(...)`](./tft/mean.md): Computes the mean of the values of a `Tensor` over the whole dataset.
62+
63+
[`min(...)`](./tft/min.md): Computes the minimum of the values of a `Tensor` over the whole dataset.
64+
65+
[`ngrams(...)`](./tft/ngrams.md): Create a `SparseTensor` of n-grams.
66+
67+
[`pca(...)`](./tft/pca.md): Computes pca on the dataset using biased covariance.
68+
69+
[`quantiles(...)`](./tft/quantiles.md): Computes the quantile boundaries of a `Tensor` over the whole dataset.
70+
71+
[`sanitized_vocab_filename(...)`](./tft/sanitized_vocab_filename.md): Generates a sanitized filename either from the given filename or the scope.
72+
73+
[`scale_by_min_max(...)`](./tft/scale_by_min_max.md): Scale a numerical column into the range [output_min, output_max].
74+
75+
[`scale_to_0_1(...)`](./tft/scale_to_0_1.md): Returns a column which is the input column scaled to have range [0,1].
76+
77+
[`scale_to_z_score(...)`](./tft/scale_to_z_score.md): Returns a standardized column with mean 0 and variance 1.
78+
79+
[`segment_indices(...)`](./tft/segment_indices.md): Returns a `Tensor` of indices within each segment.
80+
81+
[`size(...)`](./tft/size.md): Computes the total size of instances in a `Tensor` over the whole dataset.
82+
83+
[`sparse_tensor_to_dense_with_shape(...)`](./tft/sparse_tensor_to_dense_with_shape.md): Converts a `SparseTensor` into a dense tensor and sets its shape.
84+
85+
[`string_to_int(...)`](./tft/string_to_int.md): See <a href="./tft/compute_and_apply_vocabulary.md"><code>tft.compute_and_apply_vocabulary</code></a>. (deprecated)
86+
87+
[`sum(...)`](./tft/sum.md): Computes the sum of the values of a `Tensor` over the whole dataset.
88+
89+
[`tfidf(...)`](./tft/tfidf.md): Maps the terms in x to their term frequency * inverse document frequency.
90+
91+
[`uniques(...)`](./tft/uniques.md): See <a href="./tft/vocabulary.md"><code>tft.vocabulary</code></a>. (deprecated)
92+
93+
[`var(...)`](./tft/var.md): Computes the variance of the values of a `Tensor` over the whole dataset.
94+
95+
[`vocabulary(...)`](./tft/vocabulary.md): Computes the unique values of a `Tensor` over the whole dataset.
96+
97+
## Other Members
98+
99+
<h3 id="ANALYZER_COLLECTION"><code>ANALYZER_COLLECTION</code></h3>
100+
101+
<h3 id="VOCAB_FILENAME_PREFIX"><code>VOCAB_FILENAME_PREFIX</code></h3>
102+
103+
<h3 id="VOCAB_FREQUENCY_FILENAME_PREFIX"><code>VOCAB_FREQUENCY_FILENAME_PREFIX</code></h3>
104+
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
<div itemscope itemtype="http://developers.google.com/ReferenceObject">
2+
<meta itemprop="name" content="tft.Analyzer" />
3+
<meta itemprop="path" content="Stable" />
4+
<meta itemprop="property" content="attributes"/>
5+
<meta itemprop="property" content="control_inputs"/>
6+
<meta itemprop="property" content="inputs"/>
7+
<meta itemprop="property" content="outputs"/>
8+
<meta itemprop="property" content="__init__"/>
9+
</div>
10+
11+
# tft.Analyzer
12+
13+
## Class `Analyzer`
14+
15+
16+
17+
A class representing computation that will be done by Beam.
18+
19+
An Analyzer is like a tf.Operation except that it requires computation over
20+
the full dataset. E.g. sum(my_tensor) will compute the sum of the value of
21+
my_tensor over all instances in the dataset. The Analyzer class contains the
22+
inputs to this computation, and placeholders which will later be converted to
23+
constants during a call to AnalyzeDataset.
24+
25+
Analyzer implementations write some files to disk in a temporary location and
26+
return tensors that contain the filename. These outputs must be added to the
27+
tf.GraphKeys.ASSET_FILEPATHS collection. Doing so will ensure a few things
28+
happen:
29+
* the tensor will be removed from the collection prior to writing the
30+
SavedModel (since the tensor will be replaced)
31+
* when the tensor is replaced, the replacement will be added to the
32+
tf.GraphKeys.ASSET_FILEPATHS collection
33+
* This in turn causes the underlying file to be added to the SavedModel's
34+
`assets` directory when the model is saved
35+
36+
#### Args:
37+
38+
* <b>`inputs`</b>: The `Tensor`s that are used to create inputs to this analyzer,
39+
* <b>`outputs`</b>: The `Tensor`s whose values will be replaced by the result of the
40+
analyzer.
41+
* <b>`attributes`</b>: An object that will be used to determine how the analyzer is
42+
implemented by Beam.
43+
44+
<h2 id="__init__"><code>__init__</code></h2>
45+
46+
``` python
47+
__init__(
48+
inputs,
49+
outputs,
50+
attributes
51+
)
52+
```
53+
54+
55+
56+
57+
58+
## Properties
59+
60+
<h3 id="attributes"><code>attributes</code></h3>
61+
62+
63+
64+
<h3 id="control_inputs"><code>control_inputs</code></h3>
65+
66+
67+
68+
<h3 id="inputs"><code>inputs</code></h3>
69+
70+
71+
72+
<h3 id="outputs"><code>outputs</code></h3>
73+
74+
75+
76+
77+

0 commit comments

Comments
 (0)