Skip to content

Commit 7f16809

Browse files
zoyahavtf-transform-team
authored andcommitted
Updating documentation
PiperOrigin-RevId: 240353137
1 parent b148036 commit 7f16809

21 files changed

Lines changed: 1010 additions & 8 deletions

docs/api_docs/python/_toc.yaml

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ toc:
88
path: /tfx/transform/api_docs/python/tft/apply_analyzer
99
- title: apply_buckets
1010
path: /tfx/transform/api_docs/python/tft/apply_buckets
11+
- title: apply_buckets_with_interpolation
12+
path: /tfx/transform/api_docs/python/tft/apply_buckets_with_interpolation
1113
- title: apply_function
1214
path: /tfx/transform/api_docs/python/tft/apply_function
1315
- title: apply_function_with_checkpoint
@@ -112,3 +114,25 @@ toc:
112114
path: /tfx/transform/api_docs/python/tft_beam/WriteMetadata
113115
- title: WriteTransformFn
114116
path: /tfx/transform/api_docs/python/tft_beam/WriteTransformFn
117+
- title: tft_beam.analyzer_cache
118+
section:
119+
- title: Overview
120+
path: /tfx/transform/api_docs/python/tft_beam/analyzer_cache
121+
- title: make_cache_entry_key
122+
path: /tfx/transform/api_docs/python/tft_beam/analyzer_cache/make_cache_entry_key
123+
- title: make_dataset_key
124+
path: /tfx/transform/api_docs/python/tft_beam/analyzer_cache/make_dataset_key
125+
- title: ReadAnalysisCacheFromFS
126+
path: /tfx/transform/api_docs/python/tft_beam/analyzer_cache/ReadAnalysisCacheFromFS
127+
- title: validate_dataset_keys
128+
path: /tfx/transform/api_docs/python/tft_beam/analyzer_cache/validate_dataset_keys
129+
- title: WriteAnalysisCacheToFS
130+
path: /tfx/transform/api_docs/python/tft_beam/analyzer_cache/WriteAnalysisCacheToFS
131+
- title: tft_beam.info_theory
132+
section:
133+
- title: Overview
134+
path: /tfx/transform/api_docs/python/tft_beam/info_theory
135+
- title: calculate_partial_expected_mutual_information
136+
path: /tfx/transform/api_docs/python/tft_beam/info_theory/calculate_partial_expected_mutual_information
137+
- title: calculate_partial_mutual_information
138+
path: /tfx/transform/api_docs/python/tft_beam/info_theory/calculate_partial_mutual_information

docs/api_docs/python/index.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
* <a href="./tft/TFTransformOutput.md"><code>tft.TFTransformOutput</code></a>
1010
* <a href="./tft/apply_analyzer.md"><code>tft.apply_analyzer</code></a>
1111
* <a href="./tft/apply_buckets.md"><code>tft.apply_buckets</code></a>
12+
* <a href="./tft/apply_buckets_with_interpolation.md"><code>tft.apply_buckets_with_interpolation</code></a>
1213
* <a href="./tft/apply_function.md"><code>tft.apply_function</code></a>
1314
* <a href="./tft/apply_function_with_checkpoint.md"><code>tft.apply_function_with_checkpoint</code></a>
1415
* <a href="./tft/apply_pyfunc.md"><code>tft.apply_pyfunc</code></a>
@@ -52,4 +53,14 @@
5253
* <a href="./tft_beam/ReadTransformFn.md"><code>tft_beam.ReadTransformFn</code></a>
5354
* <a href="./tft_beam/TransformDataset.md"><code>tft_beam.TransformDataset</code></a>
5455
* <a href="./tft_beam/WriteMetadata.md"><code>tft_beam.WriteMetadata</code></a>
55-
* <a href="./tft_beam/WriteTransformFn.md"><code>tft_beam.WriteTransformFn</code></a>
56+
* <a href="./tft_beam/WriteTransformFn.md"><code>tft_beam.WriteTransformFn</code></a>
57+
* <a href="./tft_beam/analyzer_cache.md"><code>tft_beam.analyzer_cache</code></a>
58+
* <a href="./tft_beam/analyzer_cache/ReadAnalysisCacheFromFS.md"><code>tft_beam.analyzer_cache.ReadAnalysisCacheFromFS</code></a>
59+
* <a href="./tft_beam/analyzer_cache/WriteAnalysisCacheToFS.md"><code>tft_beam.analyzer_cache.WriteAnalysisCacheToFS</code></a>
60+
* <a href="./tft_beam/analyzer_cache/make_cache_entry_key.md"><code>tft_beam.analyzer_cache.make_cache_entry_key</code></a>
61+
* <a href="./tft_beam/analyzer_cache/make_dataset_key.md"><code>tft_beam.analyzer_cache.make_dataset_key</code></a>
62+
* <a href="./tft_beam/analyzer_cache/validate_dataset_keys.md"><code>tft_beam.analyzer_cache.validate_dataset_keys</code></a>
63+
* <a href="./tft_beam/info_theory.md"><code>tft_beam.info_theory</code></a>
64+
* <a href="./tft_beam/info_theory/calculate_partial_expected_mutual_information.md"><code>tft_beam.info_theory.calculate_partial_expected_mutual_information</code></a>
65+
* <a href="./tft_beam/info_theory/calculate_partial_mutual_information.md"><code>tft_beam.info_theory.calculate_partial_mutual_information</code></a>
66+
* <a href="./tft_beam/info_theory/math.md"><code>tft_beam.info_theory.math</code></a>

docs/api_docs/python/tft.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,8 @@ Init module for TF.Transform.
3333

3434
[`apply_buckets(...)`](./tft/apply_buckets.md): Returns a bucketized column, with a bucket index assigned to each input.
3535

36+
[`apply_buckets_with_interpolation(...)`](./tft/apply_buckets_with_interpolation.md): Interpolates within the provided buckets and then normalizes to 0 to 1.
37+
3638
[`apply_function(...)`](./tft/apply_function.md): Deprecated function, equivalent to fn(*args). (deprecated)
3739

3840
[`apply_function_with_checkpoint(...)`](./tft/apply_function_with_checkpoint.md): Applies a tensor-in-tensor-out function with variables to some `Tensor`s.

docs/api_docs/python/tft/MeanAndVarCombiner.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
<meta itemprop="property" content="accumulator_coder"/>
55
<meta itemprop="property" content="__init__"/>
66
<meta itemprop="property" content="add_input"/>
7+
<meta itemprop="property" content="compute_running_update"/>
78
<meta itemprop="property" content="create_accumulator"/>
89
<meta itemprop="property" content="extract_output"/>
910
<meta itemprop="property" content="merge_accumulators"/>
@@ -62,6 +63,18 @@ Composes an accumulator from batch_values and calls merge_accumulators.
6263

6364
A `_MeanAndVarAccumulator` which is accumulator and batch_values combined.
6465

66+
<h3 id="compute_running_update"><code>compute_running_update</code></h3>
67+
68+
``` python
69+
compute_running_update(
70+
total_count,
71+
current_count,
72+
update
73+
)
74+
```
75+
76+
Numerically stable way of computing a streaming batched update.
77+
6578
<h3 id="create_accumulator"><code>create_accumulator</code></h3>
6679

6780
``` python

docs/api_docs/python/tft/apply_buckets.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ Returns a bucketized column, with a bucket index assigned to each input.
2020
* <b>`x`</b>: A numeric input `Tensor` or `SparseTensor` whose values should be mapped
2121
to buckets. For `SparseTensor`s, the non-missing values will be mapped
2222
to buckets and missing value left missing.
23-
* <b>`bucket_boundaries`</b>: The bucket boundaries represented as a rank 1 `Tensor`.
23+
* <b>`bucket_boundaries`</b>: The bucket boundaries represented as a rank 2 `Tensor`.
2424
* <b>`name`</b>: (Optional) A name for this operation.
2525

2626

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
<div itemscope itemtype="http://developers.google.com/ReferenceObject">
2+
<meta itemprop="name" content="tft.apply_buckets_with_interpolation" />
3+
<meta itemprop="path" content="Stable" />
4+
</div>
5+
6+
# tft.apply_buckets_with_interpolation
7+
8+
``` python
9+
tft.apply_buckets_with_interpolation(
10+
x,
11+
bucket_boundaries,
12+
name=None
13+
)
14+
```
15+
16+
Interpolates within the provided buckets and then normalizes to 0 to 1.
17+
18+
A method for normalizing continuous numeric data to the range [0, 1].
19+
Numeric values are first bucketized according to the provided boundaries, then
20+
linearly interpolated within their respective bucket ranges. Finally, the
21+
interpolated values are normalized to the range [0, 1]. Values that are
22+
less than or equal to the lowest boundary, or greater than or equal to the
23+
highest boundary, will be mapped to 0 and 1 respectively.
24+
25+
#### Args:
26+
27+
* <b>`x`</b>: A numeric input `Tensor` (tf.float32, tf.float64, tf.int32, tf.int64).
28+
* <b>`bucket_boundaries`</b>: Sorted bucket boundaries as a rank-2 `Tensor`.
29+
* <b>`name`</b>: (Optional) A name for this operation.
30+
31+
32+
#### Returns:
33+
34+
A `Tensor` of the same shape as `x`, normalized to the range [0, 1]. If the
35+
input x is tf.float64, the returned values will be tf.float64.
36+
Otherwise, returned values are tf.float32.

docs/api_docs/python/tft/apply_vocabulary.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,9 +28,9 @@ files. This behavior will likely be fixed/improved in the future.
2828

2929
#### Args:
3030

31-
* <b>`x`</b>: A `Tensor` or `SparseTensor` of type tf.string to which the vocabulary
32-
transformation should be applied.
33-
The column names are those intended for the transformed tensors.
31+
* <b>`x`</b>: A categorical `Tensor` or `SparseTensor` of type tf.string or
32+
tf.int[8|16|32|64] to which the vocabulary transformation should be
33+
applied. The column names are those intended for the transformed tensors.
3434
* <b>`deferred_vocab_filename_tensor`</b>: The deferred vocab filename tensor as
3535
returned by <a href="../tft/vocabulary.md"><code>tft.vocabulary</code></a>.
3636
* <b>`default_value`</b>: The value to use for out-of-vocabulary values, unless

docs/api_docs/python/tft/compute_and_apply_vocabulary.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ tft.compute_and_apply_vocabulary(
2020
coverage_top_k=None,
2121
coverage_frequency_threshold=None,
2222
key_fn=None,
23+
fingerprint_shuffle=False,
2324
name=None
2425
)
2526
```
@@ -37,7 +38,7 @@ operation.
3738

3839
#### Args:
3940

40-
* <b>`x`</b>: A `Tensor` or `SparseTensor` of type tf.string.
41+
* <b>`x`</b>: A `Tensor` or `SparseTensor` of type tf.string or tf.int[8|16|32|64].
4142
* <b>`default_value`</b>: The value to use for out-of-vocabulary values, unless
4243
'num_oov_buckets' is greater than zero.
4344
* <b>`top_k`</b>: Limit the generated vocabulary to the first `top_k` elements. If set
@@ -73,6 +74,10 @@ operation.
7374
* <b>`key_fn`</b>: (Optional), (Experimental) A fn that takes in a single entry of `x`
7475
and returns the corresponding key for coverage calculation. If this is
7576
`None`, no coverage arm is added to the vocabulary.
77+
* <b>`fingerprint_shuffle`</b>: (Optional), (Experimental) Whether to sort the
78+
vocabularies by fingerprint instead of counts. This is useful for load
79+
balancing on the training parameter servers. Shuffle only happens while
80+
writing the files, so all the filters above will still take effect.
7681
* <b>`name`</b>: (Optional) A name for this operation.
7782

7883

docs/api_docs/python/tft/sparse_tensor_to_dense_with_shape.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,8 @@
88
``` python
99
tft.sparse_tensor_to_dense_with_shape(
1010
x,
11-
shape
11+
shape,
12+
default_value=0
1213
)
1314
```
1415

@@ -18,6 +19,8 @@ Converts a `SparseTensor` into a dense tensor and sets its shape.
1819

1920
* <b>`x`</b>: A `SparseTensor`.
2021
* <b>`shape`</b>: The desired shape of the densified `Tensor`.
22+
* <b>`default_value`</b>: (Optional) Value to set for indices not specified. Defaults
23+
to zero.
2124

2225

2326
#### Returns:

docs/api_docs/python/tft/vocabulary.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ tft.vocabulary(
1919
coverage_top_k=None,
2020
coverage_frequency_threshold=None,
2121
key_fn=None,
22+
fingerprint_shuffle=False,
2223
name=None
2324
)
2425
```
@@ -33,6 +34,10 @@ In case one of the tokens contains the '\n' or '\r' characters or is empty it
3334
will be discarded since we are currently writing the vocabularies as text
3435
files. This behavior will likely be fixed/improved in the future.
3536

37+
If an integer `Tensor` is provided, its semantic type should be categorical
38+
not a continuous/numeric, since computing a vocabulary over a continuous
39+
feature is not appropriate.
40+
3641
The unique values are sorted by decreasing frequency and then reverse
3742
lexicographical order (e.g. [('a', 5), ('c', 3), ('b', 3)]).
3843

@@ -64,7 +69,8 @@ within each vocabulary entry (b/117796748).
6469

6570
#### Args:
6671

67-
* <b>`x`</b>: An input `Tensor` or `SparseTensor` with dtype tf.string.
72+
* <b>`x`</b>: A categorical/discrete input `Tensor` or `SparseTensor` with dtype
73+
tf.string or tf.int[8|16|32|64].
6874
* <b>`top_k`</b>: Limit the generated vocabulary to the first `top_k` elements. If set
6975
to None, the full vocabulary is generated.
7076
* <b>`frequency_threshold`</b>: Limit the generated vocabulary only to elements whose
@@ -98,6 +104,11 @@ within each vocabulary entry (b/117796748).
98104
* <b>`key_fn`</b>: (Optional), (Experimental) A fn that takes in a single entry of `x`
99105
and returns the corresponding key for coverage calculation. If this is
100106
`None`, no coverage arm is added to the vocabulary.
107+
* <b>`fingerprint_shuffle`</b>: (Optional), (Experimental) Whether to sort the
108+
vocabularies by fingerprint instead of counts. This is useful for load
109+
balancing on the training parameter servers. Shuffle only happens while
110+
writing the files, so all the filters above (top_k, frequency_threshold,
111+
etc) will still take effect.
101112
* <b>`name`</b>: (Optional) A name for this operation.
102113

103114

0 commit comments

Comments
 (0)