.. This file is automatically generated. Do not edit this file directly. Google BigQuery Python Samples =============================================================================== This directory contains samples for Google BigQuery. `Google BigQuery`_ is Google's fully managed, petabyte scale, low cost analytics data warehouse. BigQuery is NoOps—there is no infrastructure to manage and you don't need a database administrator—so you can focus on analyzing data to find meaningful insights, use familiar SQL, and take advantage of our pay-as-you-go model. .. _Google BigQuery: https://cloud.google.com/bigquery/docs Setup ------------------------------------------------------------------------------- Authentication ++++++++++++++ Authentication is typically done through `Application Default Credentials`_, which means you do not have to change the code to authenticate as long as your environment has credentials. You have a few options for setting up authentication: #. When running locally, use the `Google Cloud SDK`_ .. code-block:: bash gcloud auth application-default login #. When running on App Engine or Compute Engine, credentials are already set-up. However, you may need to configure your Compute Engine instance with `additional scopes`_. #. You can create a `Service Account key file`_. This file can be used to authenticate to Google Cloud Platform services from any environment. To use the file, set the ``GOOGLE_APPLICATION_CREDENTIALS`` environment variable to the path to the key file, for example: .. code-block:: bash export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service_account.json .. _Application Default Credentials: https://cloud.google.com/docs/authentication#getting_credentials_for_server-centric_flow .. _additional scopes: https://cloud.google.com/compute/docs/authentication#using .. _Service Account key file: https://developers.google.com/identity/protocols/OAuth2ServiceAccount#creatinganaccount Install Dependencies ++++++++++++++++++++ #. Install `pip`_ and `virtualenv`_ if you do not already have them. #. Create a virtualenv. Samples are compatible with Python 2.7 and 3.4+. .. code-block:: bash $ virtualenv env $ source env/bin/activate #. Install the dependencies needed to run the samples. .. code-block:: bash $ pip install -r requirements.txt .. _pip: https://pip.pypa.io/ .. _virtualenv: https://virtualenv.pypa.io/ Samples ------------------------------------------------------------------------------- Getting started +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ To run this sample: .. code-block:: bash $ python getting_started.py usage: getting_started.py [-h] project_id Command-line application that demonstrates basic BigQuery API usage. This sample queries a public shakespeare dataset and displays the 10 of Shakespeare's works with the greatest number of distinct words. This sample is used on this page: https://cloud.google.com/bigquery/bigquery-api-quickstart For more information, see the README.md under /bigquery. positional arguments: project_id Your Google Cloud Project ID. optional arguments: -h, --help show this help message and exit Sync query +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ To run this sample: .. code-block:: bash $ python sync_query.py usage: sync_query.py [-h] [-t TIMEOUT] [-r NUM_RETRIES] [-l USE_LEGACY_SQL] project_id query Command-line application to perform an synchronous query in BigQuery. For more information, see the README.md under /bigquery. positional arguments: project_id Your Google Cloud project ID. query BigQuery SQL Query. optional arguments: -h, --help show this help message and exit -t TIMEOUT, --timeout TIMEOUT Number seconds to wait for a result -r NUM_RETRIES, --num_retries NUM_RETRIES Number of times to retry in case of 500 error. -l USE_LEGACY_SQL, --use_legacy_sql USE_LEGACY_SQL Use legacy BigQuery SQL syntax instead of standard SQL syntax. Async query +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ To run this sample: .. code-block:: bash $ python async_query.py usage: async_query.py [-h] [-b] [-r NUM_RETRIES] [-p POLL_INTERVAL] [-l USE_LEGACY_SQL] project_id query Command-line application to perform an asynchronous query in BigQuery. For more information, see the README.md under /bigquery. positional arguments: project_id Your Google Cloud project ID. query BigQuery SQL Query. optional arguments: -h, --help show this help message and exit -b, --batch Run query in batch mode. -r NUM_RETRIES, --num_retries NUM_RETRIES Number of times to retry in case of 500 error. -p POLL_INTERVAL, --poll_interval POLL_INTERVAL How often to poll the query for completion (seconds). -l USE_LEGACY_SQL, --use_legacy_sql USE_LEGACY_SQL Use legacy BigQuery SQL syntax instead of standard SQL syntax. Listing datasets and projects +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ To run this sample: .. code-block:: bash $ python list_datasets_projects.py usage: list_datasets_projects.py [-h] project_id Command-line application to list all projects and datasets in BigQuery. This sample is used on this page: https://cloud.google.com/bigquery/docs/managing_jobs_datasets_projects For more information, see the README.md under /bigquery. positional arguments: project_id the project id to list. optional arguments: -h, --help show this help message and exit Load data by POST +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ To run this sample: .. code-block:: bash $ python load_data_by_post.py usage: load_data_by_post.py [-h] project_id dataset_id table_name schema_file data_file Command-line application that loads data into BigQuery via HTTP POST. This sample is used on this page: https://cloud.google.com/bigquery/loading-data-into-bigquery For more information, see the README.md under /bigquery. positional arguments: project_id Your Google Cloud project ID. dataset_id A BigQuery dataset ID. table_name Name of the table to load data into. schema_file Path to a schema file describing the table schema. data_file Path to the data file. optional arguments: -h, --help show this help message and exit Load data from CSV +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ To run this sample: .. code-block:: bash $ python load_data_from_csv.py usage: load_data_from_csv.py [-h] [-p POLL_INTERVAL] [-r NUM_RETRIES] project_id dataset_id table_name schema_file data_path Command-line application that loads data into BigQuery from a CSV file in Google Cloud Storage. This sample is used on this page: https://cloud.google.com/bigquery/loading-data-into-bigquery#loaddatagcs For more information, see the README.md under /bigquery. positional arguments: project_id Your Google Cloud project ID. dataset_id A BigQuery dataset ID. table_name Name of the table to load data into. schema_file Path to a schema file describing the table schema. data_path Google Cloud Storage path to the CSV data, for example: gs://mybucket/in.csv optional arguments: -h, --help show this help message and exit -p POLL_INTERVAL, --poll_interval POLL_INTERVAL How often to poll the query for completion (seconds). -r NUM_RETRIES, --num_retries NUM_RETRIES Number of times to retry in case of 500 error. Load streaming data +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ To run this sample: .. code-block:: bash $ python streaming.py usage: streaming.py [-h] [-p POLL_INTERVAL] [-r NUM_RETRIES] project_id dataset_id table_name Command-line application that streams data into BigQuery. This sample is used on this page: https://cloud.google.com/bigquery/streaming-data-into-bigquery For more information, see the README.md under /bigquery. positional arguments: project_id Your Google Cloud project ID. dataset_id A BigQuery dataset ID. table_name Name of the table to load data into. optional arguments: -h, --help show this help message and exit -p POLL_INTERVAL, --poll_interval POLL_INTERVAL How often to poll the query for completion (seconds). -r NUM_RETRIES, --num_retries NUM_RETRIES Number of times to retry in case of 500 error. Export data to Cloud Storage +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ To run this sample: .. code-block:: bash $ python export_data_to_cloud_storage.py usage: export_data_to_cloud_storage.py [-h] [-p POLL_INTERVAL] [-r NUM_RETRIES] [-z] [-f {CSV,NEWLINE_DELIMITED_JSON,AVRO}] project_id dataset_id table_id gcs_path Command-line application to export a table from BigQuery to Google Cloud Storage. This sample is used on this page: https://cloud.google.com/bigquery/exporting-data-from-bigquery For more information, see the README.md under /bigquery. positional arguments: project_id Your Google Cloud project ID. dataset_id BigQuery dataset to export. table_id BigQuery table to export. gcs_path Google Cloud Storage path to store the exported data. For example, gs://mybucket/mydata.csv optional arguments: -h, --help show this help message and exit -p POLL_INTERVAL, --poll_interval POLL_INTERVAL How often to poll the query for completion (seconds). -r NUM_RETRIES, --num_retries NUM_RETRIES Number of times to retry in case of 500 error. -z, --gzip compress resultset with gzip -f {CSV,NEWLINE_DELIMITED_JSON,AVRO}, --format {CSV,NEWLINE_DELIMITED_JSON,AVRO} output file format User auth with an installed app +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ To run this sample: .. code-block:: bash $ python installed_app.py usage: installed_app.py [-h] [--auth_host_name AUTH_HOST_NAME] [--noauth_local_webserver] [--auth_host_port [AUTH_HOST_PORT [AUTH_HOST_PORT ...]]] [--logging_level {DEBUG,INFO,WARNING,ERROR,CRITICAL}] project_id Command-line application that demonstrates using BigQuery with credentials obtained from an installed app. This sample is used on this page: https://cloud.google.com/bigquery/authentication For more information, see the README.md under /bigquery. positional arguments: project_id Your Google Cloud Project ID. optional arguments: -h, --help show this help message and exit --auth_host_name AUTH_HOST_NAME Hostname when running a local web server. --noauth_local_webserver Do not run a local web server. --auth_host_port [AUTH_HOST_PORT [AUTH_HOST_PORT ...]] Port web server should listen on. --logging_level {DEBUG,INFO,WARNING,ERROR,CRITICAL} Set the logging level of detail. .. _Google Cloud SDK: https://cloud.google.com/sdk/