{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "mt9dL5dIir8X"
      },
      "source": [
        "##### Copyright 2020 The TensorFlow Authors."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "ufPx7EiCiqgR"
      },
      "outputs": [],
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ucMoYase6URl"
      },
      "source": [
        "# Load and preprocess images"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_Wwu5SXZmEkB"
      },
      "source": [
        "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://www.tensorflow.org/tutorials/load_data/images\"><img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" />View on TensorFlow.org</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/load_data/images.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://github.com/tensorflow/docs/blob/master/site/en/tutorials/load_data/images.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a href=\"https://storage.googleapis.com/tensorflow_docs/docs/site/en/tutorials/load_data/images.ipynb\"><img src=\"https://www.tensorflow.org/images/download_logo_32px.png\" />Download notebook</a>\n",
        "  </td>\n",
        "</table>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Oxw4WahM7DU9"
      },
      "source": [
        "This tutorial shows how to load and preprocess an image dataset in three ways:\n",
        "\n",
        "- First, you will use high-level Keras preprocessing utilities (such as `tf.keras.utils.image_dataset_from_directory`) and layers (such as `tf.keras.layers.Rescaling`) to read a directory of images on disk.\n",
        "- Next, you will write your own input pipeline from scratch [using tf.data](../../guide/data.ipynb).\n",
        "- Finally, you will download a dataset from the large [catalog](https://www.tensorflow.org/datasets/catalog/overview) available in [TensorFlow Datasets](https://www.tensorflow.org/datasets)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "hoQQiZDB6URn"
      },
      "source": [
        "## Setup"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "3vhAMaIOBIee"
      },
      "outputs": [],
      "source": [
        "import numpy as np\n",
        "import os\n",
        "import PIL\n",
        "import PIL.Image\n",
        "import tensorflow as tf\n",
        "import tensorflow_datasets as tfds"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Qnp9Z2sT5dWj"
      },
      "outputs": [],
      "source": [
        "print(tf.__version__)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "wO0InzL66URu"
      },
      "source": [
        "### Download the flowers dataset\n",
        "\n",
        "This tutorial uses a dataset of several thousand photos of flowers. The flowers dataset contains five sub-directories, one per class:\n",
        "\n",
        "```\n",
        "flowers_photos/\n",
        "  daisy/\n",
        "  dandelion/\n",
        "  roses/\n",
        "  sunflowers/\n",
        "  tulips/\n",
        "```"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Ju2yXtdV5YaT"
      },
      "source": [
        "Note: all images are licensed CC-BY, creators are listed in the LICENSE.txt file."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "rN-Pc6Zd6awg"
      },
      "outputs": [],
      "source": [
        "import pathlib\n",
        "dataset_url = \"https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz\"\n",
        "archive = tf.keras.utils.get_file(origin=dataset_url, extract=True)\n",
        "data_dir = pathlib.Path(archive) / 'flower_photos'"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "rFkFK74oO--g"
      },
      "source": [
        "After downloading (218MB), you should now have a copy of the flower photos available. There are 3,670 total images:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "QhewYCxhXQBX"
      },
      "outputs": [],
      "source": [
        "image_count = len(list(data_dir.glob('*/*.jpg')))\n",
        "print(image_count)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ZUFusk44d9GW"
      },
      "source": [
        "Each directory contains images of that type of flower. Here are some roses:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "crs7ZjEp60Ot"
      },
      "outputs": [],
      "source": [
        "roses = list(data_dir.glob('roses/*'))\n",
        "PIL.Image.open(str(roses[0]))"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "oV9PtjdKKWyI"
      },
      "outputs": [],
      "source": [
        "roses = list(data_dir.glob('roses/*'))\n",
        "PIL.Image.open(str(roses[1]))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "9_kge08gSCan"
      },
      "source": [
        "## Load data using a Keras utility\n",
        "\n",
        "Let's load these images off disk using the helpful `tf.keras.utils.image_dataset_from_directory` utility."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6jobDTUs8Wxu"
      },
      "source": [
        "### Create a dataset"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "lAmtzsnjDNhB"
      },
      "source": [
        "Define some parameters for the loader:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "qJdpyqK541ty"
      },
      "outputs": [],
      "source": [
        "batch_size = 32\n",
        "img_height = 180\n",
        "img_width = 180"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ehhW308g8soJ"
      },
      "source": [
        "It's good practice to use a validation split when developing your model. You will use 80% of the images for training and 20% for validation."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "chqakIP14PDm"
      },
      "outputs": [],
      "source": [
        "train_ds = tf.keras.utils.image_dataset_from_directory(\n",
        "  data_dir,\n",
        "  validation_split=0.2,\n",
        "  subset=\"training\",\n",
        "  seed=123,\n",
        "  image_size=(img_height, img_width),\n",
        "  batch_size=batch_size)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "pb2Af2lsUShk"
      },
      "outputs": [],
      "source": [
        "val_ds = tf.keras.utils.image_dataset_from_directory(\n",
        "  data_dir,\n",
        "  validation_split=0.2,\n",
        "  subset=\"validation\",\n",
        "  seed=123,\n",
        "  image_size=(img_height, img_width),\n",
        "  batch_size=batch_size)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Ug3ITsz0b_cF"
      },
      "source": [
        "You can find the class names in the `class_names` attribute on these datasets."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "R7z2yKt7VDPJ"
      },
      "outputs": [],
      "source": [
        "class_names = train_ds.class_names\n",
        "print(class_names)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "bK6CQCqIctCd"
      },
      "source": [
        "### Visualize the data\n",
        "\n",
        "Here are the first nine images from the training dataset."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "AAY3LJN28Kuy"
      },
      "outputs": [],
      "source": [
        "import matplotlib.pyplot as plt\n",
        "\n",
        "plt.figure(figsize=(10, 10))\n",
        "for images, labels in train_ds.take(1):\n",
        "  for i in range(9):\n",
        "    ax = plt.subplot(3, 3, i + 1)\n",
        "    plt.imshow(images[i].numpy().astype(\"uint8\"))\n",
        "    plt.title(class_names[labels[i]])\n",
        "    plt.axis(\"off\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "jUI0fr7igPtA"
      },
      "source": [
        "You can train a model using these datasets by passing them to `model.fit` (shown later in this tutorial). If you like, you can also manually iterate over the dataset and retrieve batches of images:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "BdPHeHXt9sjA"
      },
      "outputs": [],
      "source": [
        "for image_batch, labels_batch in train_ds:\n",
        "  print(image_batch.shape)\n",
        "  print(labels_batch.shape)\n",
        "  break"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "2ZgIZeXaDUsF"
      },
      "source": [
        "The `image_batch` is a tensor of the shape `(32, 180, 180, 3)`. This is a batch of 32 images of shape `180x180x3` (the last dimension refers to color channels RGB). The `label_batch` is a tensor of the shape `(32,)`, these are corresponding labels to the 32 images.\n",
        "\n",
        "You can call `.numpy()` on either of these tensors to convert them to a `numpy.ndarray`."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Ybl6a2YCg1rV"
      },
      "source": [
        "### Standardize the data\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "IdogGjM2K6OU"
      },
      "source": [
        "The RGB channel values are in the `[0, 255]` range. This is not ideal for a neural network; in general you should seek to make your input values small.\n",
        "\n",
        "Here, you will standardize values to be in the `[0, 1]` range by using `tf.keras.layers.Rescaling`:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "16yNdZXdExyM"
      },
      "outputs": [],
      "source": [
        "normalization_layer = tf.keras.layers.Rescaling(1./255)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Nd0_enkb8uxZ"
      },
      "source": [
        "There are two ways to use this layer. You can apply it to the dataset by calling `Dataset.map`:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "QgOnza-U_z5Y"
      },
      "outputs": [],
      "source": [
        "normalized_ds = train_ds.map(lambda x, y: (normalization_layer(x), y))\n",
        "image_batch, labels_batch = next(iter(normalized_ds))\n",
        "first_image = image_batch[0]\n",
        "# Notice the pixel values are now in `[0,1]`.\n",
        "print(np.min(first_image), np.max(first_image))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "z39nXayj9ioS"
      },
      "source": [
        "Or, you can include the layer inside your model definition to simplify deployment. You will use the second approach here."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "hXLd3wMpDIkp"
      },
      "source": [
        "Note: If you would like to scale pixel values to `[-1,1]` you can instead write `tf.keras.layers.Rescaling(1./127.5, offset=-1)`"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "LeNWVa8qRBGm"
      },
      "source": [
        "Note: You previously resized images using the `image_size` argument of `tf.keras.utils.image_dataset_from_directory`. If you want to include the resizing logic in your model as well, you can use the `tf.keras.layers.Resizing` layer."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Ti8avTlLofoJ"
      },
      "source": [
        "### Configure the dataset for performance\n",
        "\n",
        "Let's make sure to use buffered prefetching so you can yield data from disk without having I/O become blocking. These are two important methods you should use when loading data:\n",
        "\n",
        "- `Dataset.cache` keeps the images in memory after they're loaded off disk during the first epoch. This will ensure the dataset does not become a bottleneck while training your model. If your dataset is too large to fit into memory, you can also use this method to create a performant on-disk cache.\n",
        "- `Dataset.prefetch` overlaps data preprocessing and model execution while training.\n",
        "\n",
        "Interested readers can learn more about both methods, as well as how to cache data to disk in the *Prefetching* section of the [Better performance with the tf.data API](../../guide/data_performance.ipynb) guide."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Ea3kbMe-pGDw"
      },
      "outputs": [],
      "source": [
        "AUTOTUNE = tf.data.AUTOTUNE\n",
        "\n",
        "train_ds = train_ds.cache().prefetch(buffer_size=AUTOTUNE)\n",
        "val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "XqHjIr6cplwY"
      },
      "source": [
        "### Train a model\n",
        "\n",
        "For completeness, you will show how to train a simple model using the datasets you have just prepared.\n",
        "\n",
        "The [Sequential](https://www.tensorflow.org/guide/keras/sequential_model) model consists of three convolution blocks (`tf.keras.layers.Conv2D`) with a max pooling layer (`tf.keras.layers.MaxPooling2D`) in each of them. There's a fully-connected layer (`tf.keras.layers.Dense`) with 128 units on top of it that is activated by a ReLU activation function (`'relu'`). This model has not been tuned in any way—the goal is to show you the mechanics using the datasets you just created. To learn more about image classification, visit the [Image classification](../images/classification.ipynb) tutorial."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "LdR0BzCcqxw0"
      },
      "outputs": [],
      "source": [
        "num_classes = 5\n",
        "\n",
        "model = tf.keras.Sequential([\n",
        "  tf.keras.layers.Rescaling(1./255),\n",
        "  tf.keras.layers.Conv2D(32, 3, activation='relu'),\n",
        "  tf.keras.layers.MaxPooling2D(),\n",
        "  tf.keras.layers.Conv2D(32, 3, activation='relu'),\n",
        "  tf.keras.layers.MaxPooling2D(),\n",
        "  tf.keras.layers.Conv2D(32, 3, activation='relu'),\n",
        "  tf.keras.layers.MaxPooling2D(),\n",
        "  tf.keras.layers.Flatten(),\n",
        "  tf.keras.layers.Dense(128, activation='relu'),\n",
        "  tf.keras.layers.Dense(num_classes)\n",
        "])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "d83f5aa7f3fb"
      },
      "source": [
        "Choose the `tf.keras.optimizers.Adam` optimizer and `tf.keras.losses.SparseCategoricalCrossentropy` loss function. To view training and validation accuracy for each training epoch, pass the `metrics` argument to `Model.compile`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "t_BlmsnmsEr4"
      },
      "outputs": [],
      "source": [
        "model.compile(\n",
        "  optimizer='adam',\n",
        "  loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n",
        "  metrics=['accuracy'])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ffwd44ldNMOE"
      },
      "source": [
        "Note: You will only train for a few epochs so this tutorial runs quickly. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "S08ZKKODsnGW"
      },
      "outputs": [],
      "source": [
        "model.fit(\n",
        "  train_ds,\n",
        "  validation_data=val_ds,\n",
        "  epochs=3\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MEtT9YGjSAOK"
      },
      "source": [
        "Note: You can also write a custom training loop instead of using `Model.fit`. To learn more, visit the [Writing a training loop from scratch](https://www.tensorflow.org/guide/keras/writing_a_training_loop_from_scratch) tutorial."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "BaW4wx5L7hrZ"
      },
      "source": [
        "You may notice the validation accuracy is low compared to the training accuracy, indicating your model is overfitting. You can learn more about overfitting and how to reduce it in this [tutorial](https://www.tensorflow.org/tutorials/keras/overfit_and_underfit)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "AxS1cLzM8mEp"
      },
      "source": [
        "## Using tf.data for finer control"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Ylj9fgkamgWZ"
      },
      "source": [
        "The above Keras preprocessing utility—`tf.keras.utils.image_dataset_from_directory`—is a convenient way to create a `tf.data.Dataset` from a directory of images.\n",
        "\n",
        "For finer grain control, you can write your own input pipeline using `tf.data`. This section shows how to do just that, beginning with the file paths from the TGZ file you downloaded earlier."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "lAkQp5uxoINu"
      },
      "outputs": [],
      "source": [
        "list_ds = tf.data.Dataset.list_files(str(data_dir/'*/*'), shuffle=False)\n",
        "list_ds = list_ds.shuffle(image_count, reshuffle_each_iteration=False)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "coORvEH-NGwc"
      },
      "outputs": [],
      "source": [
        "for f in list_ds.take(5):\n",
        "  print(f.numpy())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6NLQ_VJhWO4z"
      },
      "source": [
        "The tree structure of the files can be used to compile a `class_names` list."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "uRPHzDGhKACK"
      },
      "outputs": [],
      "source": [
        "class_names = np.array(sorted([item.name for item in data_dir.glob('*') if item.name != \"LICENSE.txt\"]))\n",
        "print(class_names)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "CiptrWmAlmAa"
      },
      "source": [
        "Split the dataset into training and validation sets:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "GWHNPzXclpVr"
      },
      "outputs": [],
      "source": [
        "val_size = int(image_count * 0.2)\n",
        "train_ds = list_ds.skip(val_size)\n",
        "val_ds = list_ds.take(val_size)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "rkB-IR4-pS3U"
      },
      "source": [
        "You can print the length of each dataset as follows:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "SiKQrb9ppS-7"
      },
      "outputs": [],
      "source": [
        "print(tf.data.experimental.cardinality(train_ds).numpy())\n",
        "print(tf.data.experimental.cardinality(val_ds).numpy())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "91CPfUUJ_8SZ"
      },
      "source": [
        "Write a short function that converts a file path to an `(img, label)` pair:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "arSQzIey-4D4"
      },
      "outputs": [],
      "source": [
        "def get_label(file_path):\n",
        "  # Convert the path to a list of path components\n",
        "  parts = tf.strings.split(file_path, os.path.sep)\n",
        "  # The second to last is the class-directory\n",
        "  one_hot = parts[-2] == class_names\n",
        "  # Integer encode the label\n",
        "  return tf.argmax(one_hot)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "MGlq4IP4Aktb"
      },
      "outputs": [],
      "source": [
        "def decode_img(img):\n",
        "  # Convert the compressed string to a 3D uint8 tensor\n",
        "  img = tf.io.decode_jpeg(img, channels=3)\n",
        "  # Resize the image to the desired size\n",
        "  return tf.image.resize(img, [img_height, img_width])"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "-xhBRgvNqRRe"
      },
      "outputs": [],
      "source": [
        "def process_path(file_path):\n",
        "  label = get_label(file_path)\n",
        "  # Load the raw data from the file as a string\n",
        "  img = tf.io.read_file(file_path)\n",
        "  img = decode_img(img)\n",
        "  return img, label"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "S9a5GpsUOBx8"
      },
      "source": [
        "Use `Dataset.map` to create a dataset of `image, label` pairs:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "3SDhbo8lOBQv"
      },
      "outputs": [],
      "source": [
        "# Set `num_parallel_calls` so multiple images are loaded/processed in parallel.\n",
        "train_ds = train_ds.map(process_path, num_parallel_calls=AUTOTUNE)\n",
        "val_ds = val_ds.map(process_path, num_parallel_calls=AUTOTUNE)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "kxrl0lGdnpRz"
      },
      "outputs": [],
      "source": [
        "for image, label in train_ds.take(1):\n",
        "  print(\"Image shape: \", image.numpy().shape)\n",
        "  print(\"Label: \", label.numpy())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "vYGCgJuR_9Qp"
      },
      "source": [
        "### Configure dataset for performance"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "wwZavzgsIytz"
      },
      "source": [
        "To train a model with this dataset you will want the data:\n",
        "\n",
        "* To be well shuffled.\n",
        "* To be batched.\n",
        "* Batches to be available as soon as possible.\n",
        "\n",
        "These features can be added using the `tf.data` API. For more details, visit the [Input Pipeline Performance](../../guide/performance/datasets.ipynb) guide."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "uZmZJx8ePw_5"
      },
      "outputs": [],
      "source": [
        "def configure_for_performance(ds):\n",
        "  ds = ds.cache()\n",
        "  ds = ds.shuffle(buffer_size=1000)\n",
        "  ds = ds.batch(batch_size)\n",
        "  ds = ds.prefetch(buffer_size=AUTOTUNE)\n",
        "  return ds\n",
        "\n",
        "train_ds = configure_for_performance(train_ds)\n",
        "val_ds = configure_for_performance(val_ds)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "45P7OvzRWzOB"
      },
      "source": [
        "### Visualize the data\n",
        "\n",
        "You can visualize this dataset similarly to the one you created previously:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "UN_Dnl72YNIj"
      },
      "outputs": [],
      "source": [
        "image_batch, label_batch = next(iter(train_ds))\n",
        "\n",
        "plt.figure(figsize=(10, 10))\n",
        "for i in range(9):\n",
        "  ax = plt.subplot(3, 3, i + 1)\n",
        "  plt.imshow(image_batch[i].numpy().astype(\"uint8\"))\n",
        "  label = label_batch[i]\n",
        "  plt.title(class_names[label])\n",
        "  plt.axis(\"off\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "fMT8kh_uXPRU"
      },
      "source": [
        "### Continue training the model\n",
        "\n",
        "You have now manually built a similar `tf.data.Dataset` to the one created by `tf.keras.utils.image_dataset_from_directory` above. You can continue training the model with it. As before, you will train for just a few epochs to keep the running time short."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Vm_bi7NKXOzW"
      },
      "outputs": [],
      "source": [
        "model.fit(\n",
        "  train_ds,\n",
        "  validation_data=val_ds,\n",
        "  epochs=3\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "EDJXAexrwsx8"
      },
      "source": [
        "## Using TensorFlow Datasets\n",
        "\n",
        "So far, this tutorial has focused on loading data off disk. You can also find a dataset to use by exploring the large [catalog](https://www.tensorflow.org/datasets/catalog/overview) of easy-to-download datasets at [TensorFlow Datasets](https://www.tensorflow.org/datasets).\n",
        "\n",
        "As you have previously loaded the Flowers dataset off disk, let's now import it with TensorFlow Datasets."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Qyu9wWDf1gfH"
      },
      "source": [
        "Download the Flowers [dataset](https://www.tensorflow.org/datasets/catalog/tf_flowers) using TensorFlow Datasets:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "NTQ-53DNwv8o"
      },
      "outputs": [],
      "source": [
        "(train_ds, val_ds, test_ds), metadata = tfds.load(\n",
        "    'tf_flowers',\n",
        "    split=['train[:80%]', 'train[80%:90%]', 'train[90%:]'],\n",
        "    with_info=True,\n",
        "    as_supervised=True,\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "3hxXSgtj1iLV"
      },
      "source": [
        "The flowers dataset has five classes:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "kJvt6qzF1i4L"
      },
      "outputs": [],
      "source": [
        "num_classes = metadata.features['label'].num_classes\n",
        "print(num_classes)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6dbvEz_F1lgE"
      },
      "source": [
        " Retrieve an image from the dataset:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "1lF3IUAO1ogi"
      },
      "outputs": [],
      "source": [
        "get_label_name = metadata.features['label'].int2str\n",
        "\n",
        "image, label = next(iter(train_ds))\n",
        "_ = plt.imshow(image)\n",
        "_ = plt.title(get_label_name(label))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "lHOOH_4TwaUb"
      },
      "source": [
        "As before, remember to batch, shuffle, and configure the training, validation, and test sets for performance:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "AMV6GtZiwfGP"
      },
      "outputs": [],
      "source": [
        "train_ds = configure_for_performance(train_ds)\n",
        "val_ds = configure_for_performance(val_ds)\n",
        "test_ds = configure_for_performance(test_ds)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "gmR7kT8l1w20"
      },
      "source": [
        "You can find a complete example of working with the Flowers dataset and TensorFlow Datasets by visiting the [Data augmentation](../images/data_augmentation.ipynb) tutorial."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6cqkPenZIaHl"
      },
      "source": [
        "## Next steps\n",
        "\n",
        "This tutorial showed two ways of loading images off disk. First, you learned how to load and preprocess an image dataset using Keras preprocessing layers and utilities. Next, you learned how to write an input pipeline from scratch using `tf.data`. Finally, you learned how to download a dataset from TensorFlow Datasets.\n",
        "\n",
        "For your next steps:\n",
        "\n",
        "- You can learn [how to add data augmentation](https://www.tensorflow.org/tutorials/images/data_augmentation).\n",
        "- To learn more about `tf.data`, you can visit the [tf.data: Build TensorFlow input pipelines](https://www.tensorflow.org/guide/data) guide."
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "collapsed_sections": [],
      "name": "images.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}