This guide builds upon the general Getting Started with C++ guide. It automatically maintains the GCS (Google Cloud Storage) index described in said guide using an application deployed to Cloud Run.
The steps in this guide are self-contained. It is not necessary to go through the Getting Started with C++ guide to go through these steps. It may be easier to understand the motivation and the main components if you do so. Note that some commands below may create resources (such as the Cloud Spanner instance and database) that are already created in the previous guide.
In the Getting Started with C++ guide we showed how to build an index for GCS buckets. We built this index using a work queue to scan the contents of these buckets. But what if the contents of the bucket change dynamically? What if other applications insert new objects? Or delete them? Or update the metadata for an existing objects? We would like to extend the example to update the index as such changes take place.
The basic structure of this application is shown below. We will configure one or more GCS buckets to send Pub/Sub notifications as objects change. A new application deployed to Cloud Run will receive these notifications, parse them and update the index accordingly.
This example assumes that you have an existing GCP (Google Cloud Platform) project. The project must have billing enabled, as some of the services used in this example require it. If needed, consult:
- the GCP quickstarts to setup a GCP project
- the cloud run quickstarts to setup Cloud Run in your project
Use your workstation, a GCE instance, or the Cloud Shell to get a command-line prompt. If needed, login to GCP using:
gcloud auth loginThroughout the example we will use GOOGLE_CLOUD_PROJECT as an environment
variable containing the name of the project.
export GOOGLE_CLOUD_PROJECT=[PROJECT ID]
⚠️ this guide uses Cloud Spanner, this service is billed by the hour even if you stop using it. The charges can reach the hundreds or thousands of dollars per month if you configure a large Cloud Spanner instance. Consult the Pricing Calculator for details. Please remember to delete any Cloud Spanner resources once you no longer need them.
We will issue a number of commands using the [Google Cloud SDK], a
command-line tool to interact with Google Cloud services. Adding the
--project=$GOOGLE_CLOUD_PROJECT to each invocation of this tool quickly
becomes tedious, so we start by configuring the default project:
gcloud config set project $GOOGLE_CLOUD_PROJECT
# Output: Updated property [core/project].Some services are not enabled by default when you create a Google Cloud Project. We enable all the services we will need in this guide using:
gcloud services enable cloudbuild.googleapis.com
gcloud services enable containerregistry.googleapis.com
gcloud services enable container.googleapis.com
gcloud services enable eventarc.googleapis.com
gcloud services enable pubsub.googleapis.com
gcloud services enable run.googleapis.com
gcloud services enable spanner.googleapis.com
# Output: nothing if the services are already enabled.
# for services that are not enabled something like this
# Operation "operations/...." finished successfully.So far, we have not created any C++ code. It is time to compile and deploy our application, as we will need the name and URL of the deployment to wire the remaining resources. First obtain the code:
git clone https://github.com/GoogleCloudPlatform/cpp-samples
# Output: Cloning into 'cpp-samples'...
# additional informational messagesChange your working directory to the code location:
cd cpp-samples/getting-started/update
# Output: noneCompile the code into a Docker image. Since we are only planning to build this example once, we will use Cloud Build. Using Cloud Build is simpler, but it does not create a cache of the intermediate build artifacts. Read about buildpacks and the pack tool install guide to run your builds locally and cache intermediate artifacts. You can also use Container Registry as a shared cache for buildpacks, both between workstations and for your CI systems. To learn more about this, consult the buildpack documentation for cache images.
You can continue with other steps while this build runs in the background. Optionally, use the links in the output to follow the build process in your web browser.
gcloud builds submit \
--async \
--machine-type=e2-highcpu-32 \
--pack image="gcr.io/$GOOGLE_CLOUD_PROJECT/getting-started-cpp/update-gcs-index"
# Output:
# Creating temporary tarball archive of 10 file(s) totalling 58.1 KiB before compression.
# Uploading tarball of [.] to [gs://....tgz]
# Created [https://cloudbuild.googleapis.com/v1/projects/....].
# Logs are available at [...].As mentioned above, this guide uses Cloud Spanner to store the data. We create the smallest possible instance. If needed we will scale up the instance, but this is economical and enough for running small jobs.
⚠️ Creating the Cloud Spanner instance incurs immediate billing costs, even if the instance is not used.
gcloud beta spanner instances create getting-started-cpp \
--config=regional-us-central1 \
--processing-units=100 \
--description="Getting Started with C++"
# Output: Creating instance...done.A Cloud Spanner instance is just the allocation of compute resources for your databases. Think of them as a virtual set of database servers dedicated to your databases. Initially these servers have no databases or tables associated with the resources. We need to create a database and table that will host the data for this demo:
gcloud spanner databases create gcs-index \
--ddl-file=../gcs_objects.sql \
--instance=getting-started-cpp
# Output: Creating database...done.To use the application we need an existing bucket in your project:
BUCKET_NAME=... # The name of an existing bucket in your projectIf you have no buckets in your project, use the GCS guide to select a name and then create the bucket:
gcloud storage buckets create gs://$BUCKET_NAMEThe gsutil tool provides a single command to configure buckets to send
notifications to Cloud Pub/Sub:
gcloud storage buckets notifications create gs://$BUCKET_NAME/ \
--topic=projects/$GOOGLE_CLOUD_PROJECT/topics/gcs-updates --payload-format=json
# Output: Created Cloud Pub/Sub topic projects/.../topics/gcs-updates
# Created notification config projects/_/buckets/$BUCKET_NAME/notificationConfigs/...Note that this will create the topic (if needed), and set the right IAM permissions enabling GCS to publish on the topic.
Look at the status of your build using:
gcloud builds list --ongoing
# Output: the list of running jobsIf your build has completed the list will be empty. If you need to wait for this build to complete (it should take about 15 minutes) use:
gcloud builds log --stream $(gcloud builds list --ongoing --format="value(id)")
# Output: the output from the build, streamed.
⚠️ To continue, you must wait until the Cloud Build build completed.
Once the image is uploaded, we can create a Cloud Run deployment to run it. This starts up an instance of the job. Cloud Run will scale this up or down as this needed:
gcloud run deploy update-gcs-index \
--image="gcr.io/$GOOGLE_CLOUD_PROJECT/getting-started-cpp/update-gcs-index:latest" \
--set-env-vars="SPANNER_INSTANCE=getting-started-cpp,SPANNER_DATABASE=gcs-index,TOPIC_ID=gcs-indexing-requests,GOOGLE_CLOUD_PROJECT=$GOOGLE_CLOUD_PROJECT" \
--region="us-central1" \
--platform="managed" \
--no-allow-unauthenticated
# Output: Deploying container to Cloud Run service [update-index-index] in project [....] region [us-central1]
# Service [gcs-indexing-worker] revision [update-gcs-index-...] has been deployed and is serving 100 percent of traffic.
# Service URL: https://update-gcs-index-...run.appPROJECT_NUMBER=$(gcloud projects list \
--filter="project_id=$GOOGLE_CLOUD_PROJECT" \
--format="value(project_number)" \
--limit=1)
# Output: noneWe need the URL of this deployment to finish the Cloud Pub/Sub configuration:
URL="$(gcloud run services describe update-gcs-index \
--region="us-central1" --format="value(status.url)")"
# Output: noneCreate a push subscription. This sends Cloud Pub/Sub messages as HTTP requests to the Cloud Run deployment. We use the previously created service account to make the HTTP request, and allow up to 10 minutes for the request to complete before Cloud Pub/Sub retries on a different instance.
gcloud beta eventarc triggers create gcs-updates-trigger \
--location="us-central1" \
--destination-run-service="update-gcs-index" \
--destination-run-region="us-central1" \
--transport-topic="gcs-updates" \
--matching-criteria="type=google.cloud.pubsub.topic.v1.messagePublished" \
--service-account="$PROJECT_NUMBER-compute@developer.gserviceaccount.com"
# Output: Creating trigger [gcs-updates-trigger] in project [$GOOGLE_CLOUD_PROJECT], location [us-central1]...done.
# Publish to Pub/Sub topic [projects/$GOOGLE_CLOUD_PROJECT/topics/gcs-updates] to receive events in Cloud Run service [update-gcs-index].echo "The quick brown fox jumps over the lazy dog" | gcloud storage cp - gs://$BUCKET_NAME/fox.txt
# Output: noneThe data should start appearing in the Cloud Spanner database. We can use the
gcloud tool to query this data.
gcloud spanner databases execute-sql gcs-index --instance=getting-started-cpp \
--sql="select * from gcs_objects where name = 'fox.txt' order by updated desc limit 10"
# Output: metadata for the 10 most recent objects named 'fox.txt'Use gsutil to create, update, and delete additional objects and run additional
queries.
⚠️ Do not forget to cleanup your billable resources after going through this "Getting Started" guide.
gcloud spanner databases delete gcs-index --instance=getting-started-cpp --quiet
# Output: none
gcloud spanner instances delete getting-started-cpp --quiet
# Output: nonegcloud run services delete update-gcs-index \
--region="us-central1" \
--platform="managed" \
--quiet
# Output:
# Deleting [update-gcs-index]...done.
# Deleted [update-gcs-index].gcloud beta eventarc triggers delete gcs-updates-trigger \
--location="us-central1"
# Output: Deleting trigger [gcs-updates-trigger] in project [$GOOGLE_CLOUD_PROJECT], location [us-central1]...done.gcloud pubsub topics delete gcs-updates --quiet
# Output: Deleted topic [projects/$GOOGLE_CLOUD_PROJECT/topics/gcs-updates].gcloud container images delete gcr.io/$GOOGLE_CLOUD_PROJECT/getting-started-cpp/update-gcs-index:latest --quiet
# Output: Deleted [gcr.io/$GOOGLE_CLOUD_PROJECT/getting-started-cpp/update-gcs-index:latest]
# Output: Deleted [gcr.io/$GOOGLE_CLOUD_PROJECT/getting-started-cpp/update-gcs-index@sha256:....]# gsutil command 'notifications delete' with a bucket URL cannot be translated automatically. gcloud storage requires a specific notification ID or the --all flag.
gsutil notifications delete gs://$BUCKET_NAME
# Output: none