This is a simple example of using the Google Cloud Batch Python API
The goal of this setup is to simplify some of the syntax of the JSON, assuming a few more default settings. We replace the JSON with YAML, which is marginally easier to create and maintain.
This allows a simplified deployment command.
python3 batch.py --config_file hello_world.yaml --create_job
Use git to clone this repo to your home directory, then from the directory:
cd scientific-computing-examples/python-batch
Run pip3 (note: you may want to make use of virtual env: (ven)[https://docs.python.org/3/library/venv.html]:
python3 -m pip install --upgrade
python3 -m pip install -r requirements.txt
If you get errors around pip, make sure pip is (correctly installed)[https://pip.pypa.io/en/stable/installation/].
You should now be able test batch.py
The simplest command is for --help:
python3 batch.py --help
Which outputs:
Tools to run batch API
flags:
batch.py:
--config_file: Config file in YAML
--[no]create_job: Creates job, otherwise just prints config.
(default: 'false')
--[no]debug: If true, print debug info.
(default: 'false')
--delete_job: Job name to delete.
(default: '')
--[no]list_jobs: If true, list jobs for config.
(default: 'false')
--previous_job_id: For Pubsub restart, specifies topic to read from
--project_id: Google Cloud Project ID, not name
--[no]pubsub: Run Pubsub
(default: 'false')
--volumes: List of GCS paths to mount. Example, "bucket_name1:mountpath1 bucket_name2:mountpath2"
(a whitespace separated list)
Try --helpfull to get a list of all flags.
The following need to be run before you can submit a job.
Enable the Cloud Batch API:
gcloud services enable batch.googleapis.com compute.googleapis.com logging.googleapis.com
If you are running as the owner of the project you are running in, you will have sufficient scope to run the Batch API. You can review IAM permissions in the (Cloud Console)[ttps://console.cloud.google.com/iam-admin/iam].
At a minimum, the user (or service account) running a Batch job will need (Batch Jobs Editor role)[https://cloud.google.com/iam/docs/understanding-roles#batch.jobsEditor]
You can add this role using the gcloud command.
$ gcloud projects add-iam-policy-binding example-project-id-1 \
--member='user:test-user@gmail.com' --role='roles/batch.jobsEditor'
Where test-user example-project-id-1 should be replaced with correct values for your configuration.
Independent of where you are running this code, you will need to authenticate. The easiest way to authenticate is with the gcloud auth command, and follow the prompts. If you don't have gcloud installed, you need it.
gcloud auth application-default login
The simplest job is running a shell script to say "Hello World".
python3 batch.py --config_file hello_world.yaml --create_job
NOTE: Please update the YAML file,
hello_world.yamlto point to yourproject_id.
You can then check the status of the jobs
python3 batch.py --config_file hello_world.yaml --list_jobs
This will allow you to delete a jobs that is running.
python3 batch.py --config_file hello_world.yaml --delete_job job_id
where job_id would be the value you saw in the --list_jobs command.
To see what is really happening, you can go to the Google Cloud Console for Batch.
If you click on the link of any job you view there, you can view the details of the job and the logs associated to the batch job output.
There are a few more sample jobs to try:
To connect a Google Cloud Storage (GCS) bucket to you batch job environment, Batch will enable a GCS Fuse connection for you.
To run this test, you need a GCS bucket. This is explained in detail here.
In the config file call shell_with_gcsfuse.yaml, you will see some lines:
volumes:
- {bucket_name: "batch-jrt-2", gcs_path: "/mnt/disks/local"}
Update with your bucket name.
Also update project_id with your value.
python3 batch.py --config_file shell_with_gcsfuse.yaml --create_job
This sample will just run a container as an example of how to do so.
For convenience, there is a command line option
python3 batch.py --config_file simple-container.yaml --create_job --project_id my_project_id
The job with have the Job name container-xxxxx. In the Console, clicking through on that, you can see the log output looking like:
Hello world! This is task 0. This job has a total of 4 tasks.
This is a simple example of running the command nvidia-smi on batch using an NVIDIA CUDA container.
In the config file call nvidia-smi-container.yaml, update project_id with your value.
The command to run is:
python3 batch.py --config_file nvidia-smi-container.yaml --create_job
Go to the jobs page, Console for Batch, then select the job you just ran. It will have the name nvidia-smi-xxxxxx.
When the job is completed, it will have a status of Succeeded. Looking in the logs for that job, you can see some output for the nvidia-smi command.
2023-10-11 10:10:53.683 EDT
+---------------------------------------------------------------------------------------+
2023-10-11 10:10:53.683 EDT
| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |
2023-10-11 10:10:53.683 EDT
|-----------------------------------------+----------------------+----------------------+
2023-10-11 10:10:53.683 EDT
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
2023-10-11 10:10:53.683 EDT
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
2023-10-11 10:10:53.683 EDT
| | | MIG M. |
2023-10-11 10:10:53.683 EDT
|=========================================+======================+======================|
2023-10-11 10:10:53.693 EDT
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
2023-10-11 10:10:53.693 EDT
| N/A 73C P0 30W / 70W | 2MiB / 15360MiB | 0% Default |
2023-10-11 10:10:53.693 EDT
| | | N/A |
2023-10-11 10:10:53.693 EDT
+-----------------------------------------+----------------------+----------------------+
2023-10-11 10:10:53.693 EDT
2023-10-11 10:10:53.693 EDT
+---------------------------------------------------------------------------------------+
2023-10-11 10:10:53.693 EDT
| Processes: |
2023-10-11 10:10:53.693 EDT
| GPU GI CI PID Type Process name GPU Memory |
2023-10-11 10:10:53.693 EDT
| ID ID Usage |
2023-10-11 10:10:53.693 EDT
|=======================================================================================|
2023-10-11 10:10:53.693 EDT
| No running processes found |
2023-10-11 10:10:53.693 EDT
+---------------------------------------------------------------------------------------+
There are subdirectories of this repository that provide more detailed types of batch jobs: