|
1 | | -Google Cloud Storage Python Samples |
| 1 | +# Cloud Storage Python Samples |
| 2 | + |
2 | 3 | =============================================================================== |
3 | 4 |
|
4 | 5 | [](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/GoogleCloudPlatform/python-docs-samples&page=editor&open_in_editor=storage/s3-sdk/README.rst) |
5 | 6 |
|
6 | | -**Google Cloud Storage:** https://cloud.google.com/storage/docs |
| 7 | +**Cloud Storage** https://cloud.google.com/storage/docs |
7 | 8 |
|
8 | | -Samples |
9 | | -------------------------------------------------------------------------------- |
10 | | -NOTE: Due to the specific functionality related to Google Cloud APIs, this guide assumes a base level of familiarity with Google Cloud Storage features, terminology, and pricing. |
| 9 | +## Samples |
11 | 10 |
|
12 | | -### Google Cloud Storage Soft Delete Cost Analyzer |
13 | | -------------------------------------------------------------------------------- |
14 | | -**Understanding Soft Delete and Cost Considerations** |
15 | | - 1. Soft Delete: A feature for protecting against accidental data loss. Deleted objects are retained for a defined period before permanent deletion. This adds safety but carries potential additional storage costs. |
16 | | - 2. Cost Analysis: This script evaluates the relative cost increase within each bucket if soft delete is enabled. Considerations include: |
17 | | - * Your soft delete retention window |
18 | | - * Amount of data likely to be soft-deleted |
19 | | - * Proportions of data in different storage classes (e.g., Standard, Nearline) |
| 11 | +NOTE: Due to the specific functionality related to Google Cloud APIs, this guide |
| 12 | +assumes a base level of familiarity with the Cloud Storage features, |
| 13 | +terminology, and pricing. |
20 | 14 |
|
21 | | -**How to Use the Script** |
| 15 | +## Cloud Storage Soft Delete Cost Analyzer |
22 | 16 |
|
23 | | -**Prerequisites** |
| 17 | +### Purpose |
24 | 18 |
|
25 | | - 1. A Google Cloud Platform (GCP) Project with existing buckets. |
26 | | - 2. Permissions on your GCP project to interact with Google Cloud Storage and Monitoring APIs. |
27 | | - 3. A Python environment (https://cloud.google.com/python/setup) |
| 19 | +* Helps you understand the potential cost implications of enabling |
| 20 | + [Soft Delete](https://cloud.google.com/storage/docs/soft-delete) on your |
| 21 | + Cloud Storage [buckets](https://cloud.google.com/storage/docs/buckets). |
| 22 | +* Identifies buckets where the cost of enabling soft delete exceeds some |
| 23 | + threshold based on past usage. |
28 | 24 |
|
29 | | -**Command-Line Arguments** |
30 | | -* `project_name` - (**Required**): Specifies your GCP project name. |
31 | | -* `--cost_threshold` - (Optional, default=0): Sets a relative cost threshold. |
32 | | -* `--soft_delete_window` - (Optional, default= 604800.0 (i.e. 7 days)): Time window (in seconds) for considering soft-deleted objects.. |
33 | | -* `--agg_days` - (Optional, default=30): The period over which to combine and aggregate results. |
34 | | -* `--lookback_days` - (Optional, default=360): Time window (in days) for considering the how old the bucket to be. |
35 | | -* `--list` - (Optional, default=False): Produces a simple list of bucket names. |
| 25 | +### Prerequisites |
36 | 26 |
|
37 | | -Note: In this sample, if setting cost_threshold 0.15 would spotlight buckets where enabling soft delete might increase costs by over 15%. |
| 27 | +* A |
| 28 | + [Google Cloud project](https://cloud.google.com/resource-manager/docs/creating-managing-projects) |
| 29 | + with existing buckets. |
| 30 | +* Permissions on your Google Cloud project to interact with Cloud Storage and |
| 31 | + Monitoring APIs. |
| 32 | +* A [Python development environment](https://cloud.google.com/python/setup) |
| 33 | + with version >= `3.11` |
38 | 34 |
|
39 | | -``` code-block:: bash |
40 | | - $ python storage_soft_delete_relative_cost_analyzer.py [your-project-name] |
41 | | -``` |
| 35 | +### Running the script using the command line |
42 | 36 |
|
43 | 37 | To disable soft-delete for buckets flagged by the script, follow these steps: |
44 | 38 |
|
45 | | -```code-block::bash |
46 | | -# 1. Authenticate (if needed): If you're not already authenticated or prefer a specific account, run: |
47 | | -gcloud auth application-default login |
48 | | -
|
49 | | -# 2. Run the analyzer to generate a list of buckets exceeding your cost threshold: |
50 | | -python storage_soft_delete_relative_cost_analyzer.py [your-project-name] --[OTHER_OPTIONS] --list=True > list_of_buckets.txt |
51 | | -
|
52 | | -# 3. Update the buckets using the generated list: |
53 | | -cat list_of_buckets.txt | gcloud storage buckets update -I --clear-soft-delete |
| 39 | +1. Authenticate (if needed): If you're not already authenticated or prefer a |
| 40 | + specific account, run: |
| 41 | + |
| 42 | + ```code-block::bash |
| 43 | + gcloud auth application-default login |
| 44 | + ``` |
| 45 | +
|
| 46 | +2. Run the analyzer to generate a list of buckets exceeding your cost |
| 47 | + threshold: |
| 48 | +
|
| 49 | + ```code-block::bash |
| 50 | + python storage_soft_delete_relative_cost_analyzer.py [project_name] --[OTHER_OPTIONS] --list=True > list_of_buckets.txt |
| 51 | + ``` |
| 52 | +
|
| 53 | + ARGUMENTS: |
| 54 | +
|
| 55 | + * `project_name` - (**Required**): Specifies your Google Cloud project |
| 56 | + name. |
| 57 | + * `--cost_threshold` - (Optional, default=0): Sets a relative cost |
| 58 | + threshold. For example, if `cost_threshold` is set to 0.15, the script |
| 59 | + will return buckets where the estimated relative cost increase exceeds |
| 60 | + 15%. |
| 61 | + * `--soft_delete_window` - (Optional, default= 604800.0 (i.e. 7 days)): |
| 62 | + Time window (in seconds) for considering soft-deleted objects. |
| 63 | + * `--agg_days` - (Optional, default=30): The period over which to combine |
| 64 | + and aggregate results. |
| 65 | + * `--lookback_days` - (Optional, default=360): Time window (in days) which |
| 66 | + describes how far back in time the analysis should consider data. |
| 67 | + * `--list` - (Optional, default=False): Produces a simple list of bucket |
| 68 | + names if set as True. |
| 69 | +
|
| 70 | + Example with all the optional parameters set to their default values: |
| 71 | +
|
| 72 | + ```code-block::bash |
| 73 | + python storage_soft_delete_relative_cost_analyzer.py [project_name] \ |
| 74 | + --cost_threshold=0 \ |
| 75 | + --soft_delete_window=604800.0 \ |
| 76 | + --agg_days=30 \ |
| 77 | + --lookback_days=360 \ |
| 78 | + --list=true |
| 79 | + ``` |
| 80 | +
|
| 81 | +3. Update the buckets using the generated list: |
| 82 | +
|
| 83 | + ```code-block::bash |
| 84 | + cat list_of_buckets.txt | gcloud storage buckets update -I --clear-soft-delete |
| 85 | + ``` |
| 86 | +
|
| 87 | +**Important Note** <span style="color: red;">If a bucket has soft delete |
| 88 | +disabled, delete requests that include an object's generation number will |
| 89 | +permanently delete the object. Additionally, any request in buckets with object |
| 90 | +versioning disabled that cause an object to be deleted or overwritten will |
| 91 | +permanently delete the object.</span> |
| 92 | +
|
| 93 | +-------------------------------------------------------------------------------- |
| 94 | +
|
| 95 | +### Script Explanation |
| 96 | +
|
| 97 | +The `storage_soft_delete_relative_cost_analyzer.py` script assesses the |
| 98 | +potential cost impact of enabling soft delete on Cloud Storage buckets. It uses |
| 99 | +the [Cloud Monitoring API](https://cloud.google.com/monitoring/api/v3) to |
| 100 | +retrieve relevant metrics and perform calculations. |
| 101 | +
|
| 102 | +#### Functionality |
| 103 | +
|
| 104 | +1. Calculates the relative cost of soft delete: |
| 105 | +
|
| 106 | + * Fetches data on soft-deleted bytes and total byte-seconds for each |
| 107 | + bucket and |
| 108 | + [storage class](https://cloud.google.com/storage/docs/storage-classes) |
| 109 | + using the Monitoring API. |
| 110 | + * Calculates the ratio of soft-deleted bytes to total byte-seconds, |
| 111 | + representing the relative amount of inactive data. |
| 112 | + * Considers |
| 113 | + [storage class pricing](https://cloud.google.com/storage/pricing) is |
| 114 | + relative to the Standard storage class to determine the cost impact of |
| 115 | + storing this inactive data. |
| 116 | +
|
| 117 | +2. Identifies buckets that exceed a cost threshold: |
| 118 | +
|
| 119 | + * Compares the calculated relative cost to a user-defined threshold. |
| 120 | + * Flags buckets where soft delete might lead to significant cost |
| 121 | + increases. |
| 122 | +
|
| 123 | +3. Provides two different output options: JSON or list of buckets. |
| 124 | +
|
| 125 | + * Can output a detailed JSON with relative cost for each bucket, suitable |
| 126 | + for further analysis or plotting. |
| 127 | + * Alternatively, generates a simple list of bucket names exceeding the |
| 128 | + cost threshold. This output can be directly piped into the gcloud |
| 129 | + storage CLI as described above. |
| 130 | +
|
| 131 | +#### Key Functions |
| 132 | +
|
| 133 | +* `soft_delete_relative_cost_analyzer`: Handles command-line input and output, |
| 134 | + calling 'get_soft_delete_cost' for the Google Cloud project. |
| 135 | + * `get_soft_delete_cost`: Orchestrates the cost analysis, using: |
| 136 | + * `get_relative_cost`: Retrieves the relative cost multiplier for a |
| 137 | + given storage class (e.g., "STANDARD", "NEARLINE") compared to the |
| 138 | + standard class. The cost for each class are pre-defined within the |
| 139 | + function and could be adjusted based on regional pricing variations. |
| 140 | + * `calculate_soft_delete_costs`: Executes Monitoring API queries and |
| 141 | + calculates costs. |
| 142 | + * `get_storage_class_ratio`: Fetches data on storage class |
| 143 | + distribution within buckets. |
| 144 | +
|
| 145 | +#### Monitoring API Queries |
| 146 | +
|
| 147 | +The script relies on the Cloud Monitoring API to fetch essential data for |
| 148 | +calculating soft delete costs. It employs the `query_client.query_time_series` |
| 149 | +method to execute specifically crafted queries that retrieve metrics from Cloud |
| 150 | +Storage. |
| 151 | +
|
| 152 | +1. `calculate_soft_delete_costs` |
| 153 | +
|
| 154 | + * This function calculates the proportion of soft-deleted data relative to |
| 155 | + the total data volume within each bucket. The calculation is based on |
| 156 | + the following metrics: |
| 157 | + * `storage.googleapis.com/storage/v2/deleted_bytes`: This metric |
| 158 | + quantifies the volume of data, in bytes, that has undergone soft |
| 159 | + deletion. |
| 160 | + * `storage.googleapis.com/storage/v2/total_byte_seconds`: This metric |
| 161 | + records the cumulative byte-seconds of data stored within the |
| 162 | + bucket, excluding objects marked for soft deletion. |
| 163 | +
|
| 164 | +2. `get_storage_class_ratio` |
| 165 | +
|
| 166 | + * This function uses a query to re-acquire the |
| 167 | + `storage.googleapis.com/storage/v2/total_byte_seconds` metric. However, |
| 168 | + in this instance, it focuses on segregating and aggregating the data |
| 169 | + based on the storage class associated with each object within the |
| 170 | + bucket. |
| 171 | + * The resultant output is a distribution of data across various storage |
| 172 | + classes, facilitating a more granular cost analysis. For example, a |
| 173 | + result like `{ "bucket_name-STANDARD": 0.90, "bucket_name-NEARLINE": |
| 174 | + 0.10 }` indicates that the bucket's data is stored across two storage |
| 175 | + classes with a ratio of 9:1. |
| 176 | +
|
| 177 | +-------------------------------------------------------------------------------- |
| 178 | +
|
| 179 | +### Key Formula |
| 180 | +
|
| 181 | +The relative increase in cost of using soft delete is calculated by combining |
| 182 | +the output of above mentioned queries and the for each bucket, |
54 | 183 |
|
55 | 184 | ``` |
| 185 | +Relative cost of each bucket = deleted_bytes / total_byte_seconds |
| 186 | + x Soft delete retention duration-seconds |
| 187 | + x Relative Storage Cost |
| 188 | + x Storage Class Ratio |
| 189 | +``` |
56 | 190 |
|
57 | | -**Important Note:** <span style="color: red;">Disabling soft-delete for flagged buckets means when deleting it will permanently delete files. These files cannot be restored, even if a soft-delete policy is later re-enabled.</span> |
| 191 | +where, |
| 192 | +
|
| 193 | +* `Deleted Bytes`: It is same as `storage/v2/deleted_bytes`. Delta count of |
| 194 | + deleted bytes per bucket. |
| 195 | +* `Total Bytes Seconds`: It is same as `storage/v2/total_byte_seconds`. Total |
| 196 | + daily storage in byte*seconds used by the bucket, grouped by storage class |
| 197 | + and type where type can be live-object, noncurrent-object, |
| 198 | + soft-deleted-object and multipart-upload. |
| 199 | +* `Soft delete retention duration-seconds`: Soft Delete window defined for the |
| 200 | + bucket, this is the threshold to be provided to test out this relative cost |
| 201 | + script. |
| 202 | +* `Relative Storage Cost`: The cost of storing data in a specific storage |
| 203 | + class (e.g., Standard, Nearline, Coldline) relative to the Standard class |
| 204 | + (where Standard class cost is 1). |
| 205 | +* `Storage Class Ratio`: The proportion of the bucket's data that belongs to |
| 206 | + the specific storage class being considered. |
| 207 | +
|
| 208 | +Please note the following Cloud Monitoring metrics: `storage/v2/deleted_bytes` |
| 209 | +and `storage/v2/total_byte_seconds` are defined on |
| 210 | +[https://cloud.google.com/monitoring/api/metrics_gcp#gcp-storage](https://cloud.google.com/monitoring/api/metrics_gcp#gcp-storage) |
| 211 | +
|
| 212 | +#### Stepwise Explanation |
| 213 | +
|
| 214 | +1. Soft Delete Rate: Dividing 'Deleted Bytes' by 'Total Bytes Seconds' gives |
| 215 | + you the rate at which data is being soft-deleted (per second). This shows |
| 216 | + how quickly data marked for deletion accumulates in the bucket. |
| 217 | +2. Cost Impact: |
| 218 | + * Multiply the `Soft Delete Rate` by the `Soft delete retention |
| 219 | + duration-seconds` to get the total ratio of data that is soft-deleted |
| 220 | + and retained within the specified period. |
| 221 | + * Multiply this result by the 'Relative Storage Cost` to factor in the |
| 222 | + pricing of the specific storage class. |
| 223 | + * Finally, multiply by the `Storage Class Ratio` to consider only the |
| 224 | + portion of the cost attributable to that particular class. |
| 225 | +
|
| 226 | +A script analyzes your bucket usage history to estimate the relative cost |
| 227 | +increase of enabling soft delete. It outputs a fraction for each bucket, |
| 228 | +representing the fractional increase in cost compared to current pricing if |
| 229 | +usage patterns continue. For instance, `{"Bucket_A": 1.15,"Bucket_B": 1.05}` |
| 230 | +indicates a `15%` price increase for `Bucket_A` and `5%` increase for `Bucket_B` |
| 231 | +with soft delete enabled for the defined `Soft delete retention |
| 232 | +duration-seconds`. This output allows you to weigh the benefits of data |
| 233 | +protection against the added storage expenses for each bucket and storage class, |
| 234 | +helping you make informed decisions about enabling soft delete. |
0 commit comments