This code allows you to create various works of art in Cloud Bigtable's key visualizer.
-
Set your variables
BIGTABLE_PROJECT=YOUR-PROJECT-ID OR $GOOGLE_CLOUD_PROJECT
INSTANCE_ID=YOUR-INSTANCE-ID
TABLE_ID=YOUR-TABLE-ID
Your Bigtable can be the same or different than the project you want to run your Dataflow job in.
- Create a table
echo project = $BIGTABLE_PROJECT > ~/.cbtrc
echo instance = $INSTANCE_ID >> ~/.cbtrc
cbt createtable $TABLE_ID
cbt createfamily $TABLE_ID cf
- Make sure your Dataflow API is enabled
gcloud services enable dataflow.googleapis.com
Load 40GB of data with 5MB rows:
mvn compile exec:java -Dexec.mainClass=keyviz.LoadData \
"-Dexec.args=--bigtableProjectId=$BIGTABLE_PROJECT \
--bigtableInstanceId=$INSTANCE_ID --runner=dataflow \
--bigtableTableId=$TABLE_ID --project=$GOOGLE_CLOUD_PROJECT"
Load 50GB of data with 1MB rows:
mvn compile exec:java -Dexec.mainClass=keyviz.LoadData \
"-Dexec.args=--bigtableProjectId=$BIGTABLE_PROJECT \
--bigtableInstanceId=$INSTANCE_ID --runner=dataflow \
--bigtableTableId=$TABLE_ID --project=$GOOGLE_CLOUD_PROJECT \
--gigabytesWritten=50 \
--megabytesPerRow=1"
Generate Mona Lisa with 40GB total and 5MB rows:
mvn compile exec:java -Dexec.mainClass=keyviz.ReadData \
"-Dexec.args=--bigtableProjectId=$BIGTABLE_PROJECT \
--bigtableInstanceId=$INSTANCE_ID --runner=dataflow \
--bigtableTableId=$TABLE_ID --project=$GOOGLE_CLOUD_PROJECT"
Generate American Gothic with 50GB total and 1MB rows:
mvn compile exec:java -Dexec.mainClass=keyviz.ReadData \
"-Dexec.args=--bigtableProjectId=$BIGTABLE_PROJECT \
--bigtableInstanceId=$INSTANCE_ID --runner=dataflow \
--bigtableTableId=$TABLE_ID --project=$GOOGLE_CLOUD_PROJECT \
--gigabytesWritten=50 \
--megabytesPerRow=1 \
--filePath=gs://keyviz-art/american_gothic_4h.txt"
There is a bucket with existing images you can use. Or you can create your own with this tool, and then upload them to your own GCS bucket.
Filenames are made from gs://keyviz-art/[painting]_[hours]h.txt
example: gs://keyviz-art/american_gothic_4h.txt
painting options:
- american_gothic
- mona_lisa
- pearl_earring
- persistence_of_memory
- starry_night
- sunday_afternoon
- the_scream
hour options:
- 1
- 4
- 8
- 12
- 24
- 48
- 72
- 96
- 120
- 144
This is a quick command to cancel all your dataflow jobs if you start a few.
gcloud dataflow jobs list --status=active --region=YOUR-REGION | tail -n +2 | sed 's/ .*//' | xargs gcloud dataflow jobs cancel
