From d97d88085743bf2ee7af131ec9bddf157ac4bb6c Mon Sep 17 00:00:00 2001 From: Divyansh Jha Date: Fri, 11 Dec 2020 18:32:48 +0530 Subject: [PATCH 1/4] add change detection guide --- .../how_change_detection_works.ipynb | 182 ++++++++++++++++++ 1 file changed, 182 insertions(+) create mode 100644 guide/14-deep-learning/how_change_detection_works.ipynb diff --git a/guide/14-deep-learning/how_change_detection_works.ipynb b/guide/14-deep-learning/how_change_detection_works.ipynb new file mode 100644 index 0000000000..eb4e735485 --- /dev/null +++ b/guide/14-deep-learning/how_change_detection_works.ipynb @@ -0,0 +1,182 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## How ChangeDetection Works?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Introduction" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We often get to see images that are of the same location but of different points in times. We humans can visually identify change quite quickly. Let's consider the task of identifying and segmenting buildings that have been newly developed in the last decade. We can quickly look at the imagery from different timelines and digitize the same. This task is relatively tricky for machines where the machine has to ground its decision in both spatial and temporal information that it receives. Deep Learning has made significant progress in computer vision, and we have added several of these models to ArcGIS API for Python. The computer vision models in `arcgis.learn` can perform tasks like Object Detection, Semantic Segmentation, Instance Segmentation, Image Translation, etc. Starting `v1.8.3` we have added another computer vision model for Binary Change Detection. Change detection is of primary importance in GIS, where we get lots of images of the same location but from different times. We can solve various problems, from identifying new illegal construction to finding changes in land cover." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + " \n", + "
\n", + "
\n", + "
Figure 1. Example of change detection in imagery. [1]
\n", + "
\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Binary Change Detection means that the model's output will be in two values, i.e., either change or no change. We can detect changes in specific features of interest and extract out the semantic map of that feature. For example, if we want to find out which new roads have come up in the past five years, we need to pass two images from respective points in time. Traditionally, intricate workflows and a lot of human involvment were required to extract out these change maps. However, we can do that with just some labeled data and with little to no human involvement by using deep learning." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Architecture" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The change detection architecture, which is implemented in arcgis.learn is based on the STANet Paper [2]. It can be trained on coupled images with a semantic map of change as its target. For example, In the case of change detection in buildings, the label for the pair of images will be the change map of footprints that have either come up or disappeared. This architecture uses a self-attention mechanism at activations from fina layer of a convolutional neural network. The base architecture is a UNet like architecture with an encoder and a decoder. The encoder is usually an Imagenet pre-trained ResNet-based architecture, and the decoder is a combination of upsampling, 1x1 convolution, and self-attention layers. The forward pass through the network is done on images from both timelines. Once it receives the features, it passes through the attention module, and we received attended feature maps. Upon receiving these features from the model, a loss or error function is computed, indicating that the models update their parameters. The output of the architecture is the semantic map of only the change in our feature of interest. In Figure 2, I* represents an image, X* are the features from the encoder, and Z* are the features after applying attention. The metric module block in Figure 2 is the loss function. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + " \n", + "
\n", + "
\n", + "
Figure 2. STANet Network architecture [2]
\n", + "
\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "##### PAM vs BAM" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There are two types of attention modules proposed in the paper namely PAM and BAM. PAM stands for Pyramid spatial–temporal Attention Module, while BAM stands for Basic spatial–temporal Attention Module. As the name suggests that PAM is an extension of the basic attention module. Figure 3 explains what each of these modules consists. Figure corresponding to BAM is an attention module that enables the model to learn location in the feature map that the model should pay attention to. The PAM is a bigger and better version of BAM as it uses the BAM module on the different resolutions of the final feature map. This pyramid technique is very similar to the one we discussed in the \"How PSPNet Works\" guide [3]." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + " \n", + "
\n", + "
\n", + "
Figure XX. Internal architecture for BAM and PAM. [citeXX]
\n", + "
\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There are two types of the Spatial-Temporal Attention Module proposed in the paper. In `arcgis.learn`, you can switch to any one using the `attention_type` parameter while initializing the model. You can either set it to \"PAM\" or \"BAM\". The paper suggests using BAM to detect change coarser features and PAM while detecting changes in finer ones." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can train a change detection model in a straightforward workflow explained in the change detection sample notebook. We need to have data in a specific format, i.e., a folder having three folders, a) a folder named \"images_before\" containing images from the previous timeline, b) \"images_after\" including images of the later timeline and c) \"labels\" containing change semantic map. We can pass the root path to `prepare_data` function and specify the `dataset_type` to be \"ChangeDetection\". " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "```python\n", + "data = prepare_data(root_path,\n", + " chip_size=224,\n", + " dataset_type='ChangeDetection', \n", + " batch_size=2\n", + " )\n", + "```\n", + "\n", + "We can then use this data object to see a batch using `data.show_batch()` function and to initialize the `ChangeDetector` class.\n", + "\n", + "```python\n", + "cd = ChangeDetector(data,\n", + " attention_type='BAM' # 'PAM' is default.\n", + " )\n", + "```\n", + "\n", + "We can use the tradional `arcgis.learn` workflow to train our model and see the results." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### References\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* [1] Daudt, R.C., Le Saux, B., Boulch, A. and Gousseau, Y., 2018, July. Urban Change Detection for Multispectral Earth Observation Using Convolutional Neural Networks. In IEEE International Geoscience and Remote Sensing Symposium (IGARSS) 2018 (pp. 2115-2118). IEEE.\n", + "\n", + "* [2] Chen, H., & Shi, Z. (2020). A Spatial-Temporal Attention-Based Method and a New Dataset for Remote Sensing Image Change Detection. Remote Sensing, 12(10), 1662.\n", + "\n", + "* [3] Esri - ArcGIS API for Python Guides, How PSPNet works?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} From ced0e8c1aae7b0f3f0a8d924a026bd65964c4014 Mon Sep 17 00:00:00 2001 From: Divyansh Jha Date: Mon, 14 Dec 2020 12:24:57 +0530 Subject: [PATCH 2/4] fix caption --- guide/14-deep-learning/how_change_detection_works.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/guide/14-deep-learning/how_change_detection_works.ipynb b/guide/14-deep-learning/how_change_detection_works.ipynb index eb4e735485..ad18208d3e 100644 --- a/guide/14-deep-learning/how_change_detection_works.ipynb +++ b/guide/14-deep-learning/how_change_detection_works.ipynb @@ -90,7 +90,7 @@ " \n", "
\n", "
\n", - "
Figure XX. Internal architecture for BAM and PAM. [citeXX]
\n", + "
Figure 3. Internal architecture for BAM and PAM. [2]
\n", "
\n", "" ] From 7cb90cc9b6c12f0f658b4354697c1ca57e099a7e Mon Sep 17 00:00:00 2001 From: Divyansh Jha Date: Wed, 6 Jan 2021 17:24:36 +0530 Subject: [PATCH 3/4] add review comments --- guide/14-deep-learning/how_change_detection_works.ipynb | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/guide/14-deep-learning/how_change_detection_works.ipynb b/guide/14-deep-learning/how_change_detection_works.ipynb index ad18208d3e..d8db8ab9c0 100644 --- a/guide/14-deep-learning/how_change_detection_works.ipynb +++ b/guide/14-deep-learning/how_change_detection_works.ipynb @@ -18,7 +18,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We often get to see images that are of the same location but of different points in times. We humans can visually identify change quite quickly. Let's consider the task of identifying and segmenting buildings that have been newly developed in the last decade. We can quickly look at the imagery from different timelines and digitize the same. This task is relatively tricky for machines where the machine has to ground its decision in both spatial and temporal information that it receives. Deep Learning has made significant progress in computer vision, and we have added several of these models to ArcGIS API for Python. The computer vision models in `arcgis.learn` can perform tasks like Object Detection, Semantic Segmentation, Instance Segmentation, Image Translation, etc. Starting `v1.8.3` we have added another computer vision model for Binary Change Detection. Change detection is of primary importance in GIS, where we get lots of images of the same location but from different times. We can solve various problems, from identifying new illegal construction to finding changes in land cover." + "We often get to see images that are of the same location but at different points in time. As humans can visually identify change quite effortlessly. Let's consider the task of identifying and segmenting buildings that are newly constructed in the last decade. We can quickly look at the imagery from different timelines and digitize the same. This task is relatively tricky for machines where the machine has to ground its decision in both spatial and temporal information that it receives. Deep Learning has made significant progress in computer vision, and we have added several of these models to ArcGIS API for Python. The computer vision models in `arcgis.learn` can perform tasks like Object Detection, Semantic Segmentation, Instance Segmentation, Image Translation, etc. Starting `v1.8.3` we have added another computer vision model for Binary Change Detection. Change detection is of primary importance in GIS, where we get lots of images of the same location but from different times. We can solve various problems, from identifying new illegal construction to finding changes in land cover." ] }, { @@ -26,7 +26,7 @@ "metadata": {}, "source": [ "
\n", - " \n", + " \n", "
\n", "
\n", "
Figure 1. Example of change detection in imagery. [1]
\n", @@ -38,7 +38,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Binary Change Detection means that the model's output will be in two values, i.e., either change or no change. We can detect changes in specific features of interest and extract out the semantic map of that feature. For example, if we want to find out which new roads have come up in the past five years, we need to pass two images from respective points in time. Traditionally, intricate workflows and a lot of human involvment were required to extract out these change maps. However, we can do that with just some labeled data and with little to no human involvement by using deep learning." + "Binary Change Detection means that the model's output will be in two values, i.e., either change or no change. We can detect changes in specific features of interest and extract out the semantic map of that feature. For example, if we want to find out which new roads have come up in the past five years, we need to pass two images from respective points in time. Traditionally, intricate workflows and a lot of human involvement were required to extract out these change maps. However, we can do that with just some labeled data and with little to no human involvement by using deep learning." ] }, { @@ -106,7 +106,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We can train a change detection model in a straightforward workflow explained in the change detection sample notebook. We need to have data in a specific format, i.e., a folder having three folders, a) a folder named \"images_before\" containing images from the previous timeline, b) \"images_after\" including images of the later timeline and c) \"labels\" containing change semantic map. We can pass the root path to `prepare_data` function and specify the `dataset_type` to be \"ChangeDetection\". " + "We can train a change detection model in a straightforward workflow explained in the change detection sample notebook. The exported data needs to be in the specific folder format, i.e., a folder having three folders, a) a folder named \"*images_before*\" containing images from the previous timeline, b) \"*images_after*\" including images of the later timeline and c) \"*labels*\" containing change semantic map. We can pass the root path to `prepare_data` function and specify the `dataset_type` to be \"ChangeDetection\". " ] }, { From f5c5147b1accca8d566c5f69866dc8aeb97161fa Mon Sep 17 00:00:00 2001 From: Divyansh Jha Date: Wed, 13 Jan 2021 12:23:54 +0530 Subject: [PATCH 4/4] incorporate review comments. --- .../how_change_detection_works.ipynb | 19 ++++++------------- 1 file changed, 6 insertions(+), 13 deletions(-) diff --git a/guide/14-deep-learning/how_change_detection_works.ipynb b/guide/14-deep-learning/how_change_detection_works.ipynb index d8db8ab9c0..70d417106c 100644 --- a/guide/14-deep-learning/how_change_detection_works.ipynb +++ b/guide/14-deep-learning/how_change_detection_works.ipynb @@ -18,7 +18,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We often get to see images that are of the same location but at different points in time. As humans can visually identify change quite effortlessly. Let's consider the task of identifying and segmenting buildings that are newly constructed in the last decade. We can quickly look at the imagery from different timelines and digitize the same. This task is relatively tricky for machines where the machine has to ground its decision in both spatial and temporal information that it receives. Deep Learning has made significant progress in computer vision, and we have added several of these models to ArcGIS API for Python. The computer vision models in `arcgis.learn` can perform tasks like Object Detection, Semantic Segmentation, Instance Segmentation, Image Translation, etc. Starting `v1.8.3` we have added another computer vision model for Binary Change Detection. Change detection is of primary importance in GIS, where we get lots of images of the same location but from different times. We can solve various problems, from identifying new illegal construction to finding changes in land cover." + "We often get to see images of the same location at different points in time, and as humans, we can visually identify temporal changes in these images quite effortlessly. For instance, we can quickly observe, analyze, and digitize imagery to identify and segment buildings that have been newly constructed over the last decade. While this task is simple for us, it is relatively tricky for machines that need to ground their decisions in both the spatial and temporal information they are provided. Deep learning has made significant progress in computer vision, and Esri has added several of these deep learning models to ArcGIS API for Python. The computer vision models in `arcgis.learn` can perform tasks like Object Detection, Semantic Segmentation, Instance Segmentation, Image Translation, etc., and starting at `v1.8.3` , we have added another computer vision model for Binary Change Detection. Change detection is of primary importance in GIS, where we get many images of the same location but from different times. With this new model, we can solve various problems, from identifying new illegal construction to finding changes in land cover." ] }, { @@ -52,7 +52,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The change detection architecture, which is implemented in arcgis.learn is based on the STANet Paper [2]. It can be trained on coupled images with a semantic map of change as its target. For example, In the case of change detection in buildings, the label for the pair of images will be the change map of footprints that have either come up or disappeared. This architecture uses a self-attention mechanism at activations from fina layer of a convolutional neural network. The base architecture is a UNet like architecture with an encoder and a decoder. The encoder is usually an Imagenet pre-trained ResNet-based architecture, and the decoder is a combination of upsampling, 1x1 convolution, and self-attention layers. The forward pass through the network is done on images from both timelines. Once it receives the features, it passes through the attention module, and we received attended feature maps. Upon receiving these features from the model, a loss or error function is computed, indicating that the models update their parameters. The output of the architecture is the semantic map of only the change in our feature of interest. In Figure 2, I* represents an image, X* are the features from the encoder, and Z* are the features after applying attention. The metric module block in Figure 2 is the loss function. " + "The change detection architecture that is implemented in arcgis.learn is based on the STANet Paper [2]. It can be trained on coupled images with a semantic map of change as its target. For example, In the case of change detection in buildings, the label for the pair of images will be the change map of footprints that have either developed or disappeared. This architecture uses a self-attention mechanism at activations from fina layer of a convolutional neural network. The base architecture is a UNet like architecture with an encoder and a decoder. The encoder is usually an Imagenet pre-trained ResNet-based architecture, and the decoder is a combination of upsampling, 1x1 convolution, and self-attention layers. The forward pass through the network is done on images from both timelines. Once it receives the features, it passes through the attention module, and we receive attended feature maps. Upon receiving these features from the model, a loss or error function is computed, indicating that the models updated their parameters. The output of the architecture is the semantic map of only the change in our feature of interest. In Figure 2, I* represent an image, X* are the features from the encoder, and Z* are the features after applying attention. The metric module block in Figure 2 is the loss function. " ] }, { @@ -79,7 +79,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "There are two types of attention modules proposed in the paper namely PAM and BAM. PAM stands for Pyramid spatial–temporal Attention Module, while BAM stands for Basic spatial–temporal Attention Module. As the name suggests that PAM is an extension of the basic attention module. Figure 3 explains what each of these modules consists. Figure corresponding to BAM is an attention module that enables the model to learn location in the feature map that the model should pay attention to. The PAM is a bigger and better version of BAM as it uses the BAM module on the different resolutions of the final feature map. This pyramid technique is very similar to the one we discussed in the \"How PSPNet Works\" guide [3]." + "There are two types of attention modules proposed in the STANet paper, PAM and BAM. PAM stands for Pyramid spatial–temporal Attention Module, while BAM stands for Basic spatial–temporal Attention Module. As the name suggests, PAM is an extension of the basic attention module. Figure 3 explains what each of these modules consists of. BAM is an attention module that enables the model to learn locations in the feature map that the model should pay attention to. The PAM is a bigger and better version of BAM, as it uses the BAM module on the different resolutions of the final feature map. This pyramid technique is very similar to the one we discussed in the \"How PSPNet Works\" guide [3]." ] }, { @@ -99,14 +99,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "There are two types of the Spatial-Temporal Attention Module proposed in the paper. In `arcgis.learn`, you can switch to any one using the `attention_type` parameter while initializing the model. You can either set it to \"PAM\" or \"BAM\". The paper suggests using BAM to detect change coarser features and PAM while detecting changes in finer ones." + "There are two types of the Spatial-Temporal Attention Module proposed in the paper. In `arcgis.learn`, you can switch to either one using the `attention_type` parameter when initializing the model. You can either set it to \"PAM\" or \"BAM\". The paper suggests using BAM to detect change in coarser features and using PAM to detect changes in finer features." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "We can train a change detection model in a straightforward workflow explained in the change detection sample notebook. The exported data needs to be in the specific folder format, i.e., a folder having three folders, a) a folder named \"*images_before*\" containing images from the previous timeline, b) \"*images_after*\" including images of the later timeline and c) \"*labels*\" containing change semantic map. We can pass the root path to `prepare_data` function and specify the `dataset_type` to be \"ChangeDetection\". " + "We can train a change detection model in a straightforward workflow that is explained in the change detection sample notebook. The exported data needs to be in a specific folder format, i.e., a folder having three folders: a) a folder named \"images_before\" containing images from the previous timeline, b) \"images_after\" including images of the later timeline and c) \"labels\" containing the change semantic map. We can pass the root path to the `prepare_data` function and specify the `dataset_type` to be \"ChangeDetection\". " ] }, { @@ -121,7 +121,7 @@ " )\n", "```\n", "\n", - "We can then use this data object to see a batch using `data.show_batch()` function and to initialize the `ChangeDetector` class.\n", + "We can then use this data object to see a batch using the `data.show_batch()` function and to initialize the `ChangeDetector` class.\n", "\n", "```python\n", "cd = ChangeDetector(data,\n", @@ -149,13 +149,6 @@ "\n", "* [3] Esri - ArcGIS API for Python Guides, How PSPNet works?" ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] } ], "metadata": {