@@ -76,7 +76,7 @@ within each vocabulary entry (b/117796748).
7676* <b >` frequency_threshold ` </b >: Limit the generated vocabulary only to elements whose
7777 absolute frequency is >= to the supplied threshold. If set to None, the
7878 full vocabulary is generated. Absolute frequency means the number of
79- occurences of the element in the dataset, as opposed to the proportion of
79+ occurrences of the element in the dataset, as opposed to the proportion of
8080 instances that contain that element.
8181* <b >` vocab_filename ` </b >: The file name for the vocabulary file. If none, the
8282 "uniques" scope name in the context of this graph will be used as the file
@@ -90,12 +90,16 @@ within each vocabulary entry (b/117796748).
9090 will be of the form 'frequency word'.
9191* <b >` weights ` </b >: (Optional) Weights ` Tensor ` for the vocabulary. It must have the
9292 same shape as x.
93- * <b >` labels ` </b >: (Optional) Labels ` Tensor ` for the vocabulary. It must have dtype
94- int64, have values 0 or 1, and have the same shape as x.
95- * <b >` use_adjusted_mutual_info ` </b >: If true, use adjusted mutual information.
96- * <b >` min_diff_from_avg ` </b >: Mutual information of a feature will be adjusted to zero
97- whenever the difference between count of the feature with any label and
98- its expected count is lower than min_diff_from_average.
93+ * <b >` labels ` </b >: (Optional) Labels ` Tensor ` for the vocabulary. It must have the same
94+ shape as x and be a discrete integerized tensor (If the label is numeric,
95+ it should first be bucketized; If the label is a string, an integer
96+ vocabulary should first be applied).
97+ * <b >` use_adjusted_mutual_info ` </b >: If true, and labels are provided, calculate
98+ vocabulary using adjusted rather than raw mutual information.
99+ * <b >` min_diff_from_avg ` </b >: MI (or AMI) of a feature x label will be adjusted to zero
100+ whenever the difference between count and the expected (average) count is
101+ lower than min_diff_from_average. This can be thought of as a regularizing
102+ parameter that pushes small MI/AMI values to zero.
99103* <b >` coverage_top_k ` </b >: (Optional), (Experimental) The minimum number of elements
100104 per key to be included in the vocabulary.
101105* <b >` coverage_frequency_threshold ` </b >: (Optional), (Experimental) Limit the coverage
0 commit comments