Skip to content

Commit 7052da1

Browse files
authored
docs: Enhance README clarity with in-depth script details & key formula breakdown (GoogleCloudPlatform#11595)
* docs(samples): Updating readme for soft delete cost analyzer script * docs: Enhance README clarity with in-depth script details & key formula breakdown
1 parent 5316775 commit 7052da1

File tree

2 files changed

+220
-43
lines changed

2 files changed

+220
-43
lines changed

storage/cost-analysis/README.md

Lines changed: 216 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,57 +1,234 @@
1-
Google Cloud Storage Python Samples
1+
# Cloud Storage Python Samples
2+
23
===============================================================================
34

45
[![Open in Cloud Shell button](https://gstatic.com/cloudssh/images/open-btn.png)](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/GoogleCloudPlatform/python-docs-samples&page=editor&open_in_editor=storage/s3-sdk/README.rst)
56

6-
**Google Cloud Storage:** https://cloud.google.com/storage/docs
7+
**Cloud Storage** https://cloud.google.com/storage/docs
78

8-
Samples
9-
-------------------------------------------------------------------------------
10-
NOTE: Due to the specific functionality related to Google Cloud APIs, this guide assumes a base level of familiarity with Google Cloud Storage features, terminology, and pricing.
9+
## Samples
1110

12-
### Google Cloud Storage Soft Delete Cost Analyzer
13-
-------------------------------------------------------------------------------
14-
**Understanding Soft Delete and Cost Considerations**
15-
1. Soft Delete: A feature for protecting against accidental data loss. Deleted objects are retained for a defined period before permanent deletion. This adds safety but carries potential additional storage costs.
16-
2. Cost Analysis: This script evaluates the relative cost increase within each bucket if soft delete is enabled. Considerations include:
17-
* Your soft delete retention window
18-
* Amount of data likely to be soft-deleted
19-
* Proportions of data in different storage classes (e.g., Standard, Nearline)
11+
NOTE: Due to the specific functionality related to Google Cloud APIs, this guide
12+
assumes a base level of familiarity with the Cloud Storage features,
13+
terminology, and pricing.
2014

21-
**How to Use the Script**
15+
## Cloud Storage Soft Delete Cost Analyzer
2216

23-
**Prerequisites**
17+
### Purpose
2418

25-
1. A Google Cloud Platform (GCP) Project with existing buckets.
26-
2. Permissions on your GCP project to interact with Google Cloud Storage and Monitoring APIs.
27-
3. A Python environment (https://cloud.google.com/python/setup)
19+
* Helps you understand the potential cost implications of enabling
20+
[Soft Delete](https://cloud.google.com/storage/docs/soft-delete) on your
21+
Cloud Storage [buckets](https://cloud.google.com/storage/docs/buckets).
22+
* Identifies buckets where the cost of enabling soft delete exceeds some
23+
threshold based on past usage.
2824

29-
**Command-Line Arguments**
30-
* `project_name` - (**Required**): Specifies your GCP project name.
31-
* `--cost_threshold` - (Optional, default=0): Sets a relative cost threshold.
32-
* `--soft_delete_window` - (Optional, default= 604800.0 (i.e. 7 days)): Time window (in seconds) for considering soft-deleted objects..
33-
* `--agg_days` - (Optional, default=30): The period over which to combine and aggregate results.
34-
* `--lookback_days` - (Optional, default=360): Time window (in days) for considering the how old the bucket to be.
35-
* `--list` - (Optional, default=False): Produces a simple list of bucket names.
25+
### Prerequisites
3626

37-
Note: In this sample, if setting cost_threshold 0.15 would spotlight buckets where enabling soft delete might increase costs by over 15%.
27+
* A
28+
[Google Cloud project](https://cloud.google.com/resource-manager/docs/creating-managing-projects)
29+
with existing buckets.
30+
* Permissions on your Google Cloud project to interact with Cloud Storage and
31+
Monitoring APIs.
32+
* A [Python development environment](https://cloud.google.com/python/setup)
33+
with version >= `3.11`
3834

39-
``` code-block:: bash
40-
$ python storage_soft_delete_relative_cost_analyzer.py [your-project-name]
41-
```
35+
### Running the script using the command line
4236

4337
To disable soft-delete for buckets flagged by the script, follow these steps:
4438

45-
```code-block::bash
46-
# 1. Authenticate (if needed): If you're not already authenticated or prefer a specific account, run:
47-
gcloud auth application-default login
48-
49-
# 2. Run the analyzer to generate a list of buckets exceeding your cost threshold:
50-
python storage_soft_delete_relative_cost_analyzer.py [your-project-name] --[OTHER_OPTIONS] --list=True > list_of_buckets.txt
51-
52-
# 3. Update the buckets using the generated list:
53-
cat list_of_buckets.txt | gcloud storage buckets update -I --clear-soft-delete
39+
1. Authenticate (if needed): If you're not already authenticated or prefer a
40+
specific account, run:
41+
42+
```code-block::bash
43+
gcloud auth application-default login
44+
```
45+
46+
2. Run the analyzer to generate a list of buckets exceeding your cost
47+
threshold:
48+
49+
```code-block::bash
50+
python storage_soft_delete_relative_cost_analyzer.py [project_name] --[OTHER_OPTIONS] --list=True > list_of_buckets.txt
51+
```
52+
53+
ARGUMENTS:
54+
55+
* `project_name` - (**Required**): Specifies your Google Cloud project
56+
name.
57+
* `--cost_threshold` - (Optional, default=0): Sets a relative cost
58+
threshold. For example, if `cost_threshold` is set to 0.15, the script
59+
will return buckets where the estimated relative cost increase exceeds
60+
15%.
61+
* `--soft_delete_window` - (Optional, default= 604800.0 (i.e. 7 days)):
62+
Time window (in seconds) for considering soft-deleted objects.
63+
* `--agg_days` - (Optional, default=30): The period over which to combine
64+
and aggregate results.
65+
* `--lookback_days` - (Optional, default=360): Time window (in days) which
66+
describes how far back in time the analysis should consider data.
67+
* `--list` - (Optional, default=False): Produces a simple list of bucket
68+
names if set as True.
69+
70+
Example with all the optional parameters set to their default values:
71+
72+
```code-block::bash
73+
python storage_soft_delete_relative_cost_analyzer.py [project_name] \
74+
--cost_threshold=0 \
75+
--soft_delete_window=604800.0 \
76+
--agg_days=30 \
77+
--lookback_days=360 \
78+
--list=true
79+
```
80+
81+
3. Update the buckets using the generated list:
82+
83+
```code-block::bash
84+
cat list_of_buckets.txt | gcloud storage buckets update -I --clear-soft-delete
85+
```
86+
87+
**Important Note** <span style="color: red;">If a bucket has soft delete
88+
disabled, delete requests that include an object's generation number will
89+
permanently delete the object. Additionally, any request in buckets with object
90+
versioning disabled that cause an object to be deleted or overwritten will
91+
permanently delete the object.</span>
92+
93+
--------------------------------------------------------------------------------
94+
95+
### Script Explanation
96+
97+
The `storage_soft_delete_relative_cost_analyzer.py` script assesses the
98+
potential cost impact of enabling soft delete on Cloud Storage buckets. It uses
99+
the [Cloud Monitoring API](https://cloud.google.com/monitoring/api/v3) to
100+
retrieve relevant metrics and perform calculations.
101+
102+
#### Functionality
103+
104+
1. Calculates the relative cost of soft delete:
105+
106+
* Fetches data on soft-deleted bytes and total byte-seconds for each
107+
bucket and
108+
[storage class](https://cloud.google.com/storage/docs/storage-classes)
109+
using the Monitoring API.
110+
* Calculates the ratio of soft-deleted bytes to total byte-seconds,
111+
representing the relative amount of inactive data.
112+
* Considers
113+
[storage class pricing](https://cloud.google.com/storage/pricing) is
114+
relative to the Standard storage class to determine the cost impact of
115+
storing this inactive data.
116+
117+
2. Identifies buckets that exceed a cost threshold:
118+
119+
* Compares the calculated relative cost to a user-defined threshold.
120+
* Flags buckets where soft delete might lead to significant cost
121+
increases.
122+
123+
3. Provides two different output options: JSON or list of buckets.
124+
125+
* Can output a detailed JSON with relative cost for each bucket, suitable
126+
for further analysis or plotting.
127+
* Alternatively, generates a simple list of bucket names exceeding the
128+
cost threshold. This output can be directly piped into the gcloud
129+
storage CLI as described above.
130+
131+
#### Key Functions
132+
133+
* `soft_delete_relative_cost_analyzer`: Handles command-line input and output,
134+
calling 'get_soft_delete_cost' for the Google Cloud project.
135+
* `get_soft_delete_cost`: Orchestrates the cost analysis, using:
136+
* `get_relative_cost`: Retrieves the relative cost multiplier for a
137+
given storage class (e.g., "STANDARD", "NEARLINE") compared to the
138+
standard class. The cost for each class are pre-defined within the
139+
function and could be adjusted based on regional pricing variations.
140+
* `calculate_soft_delete_costs`: Executes Monitoring API queries and
141+
calculates costs.
142+
* `get_storage_class_ratio`: Fetches data on storage class
143+
distribution within buckets.
144+
145+
#### Monitoring API Queries
146+
147+
The script relies on the Cloud Monitoring API to fetch essential data for
148+
calculating soft delete costs. It employs the `query_client.query_time_series`
149+
method to execute specifically crafted queries that retrieve metrics from Cloud
150+
Storage.
151+
152+
1. `calculate_soft_delete_costs`
153+
154+
* This function calculates the proportion of soft-deleted data relative to
155+
the total data volume within each bucket. The calculation is based on
156+
the following metrics:
157+
* `storage.googleapis.com/storage/v2/deleted_bytes`: This metric
158+
quantifies the volume of data, in bytes, that has undergone soft
159+
deletion.
160+
* `storage.googleapis.com/storage/v2/total_byte_seconds`: This metric
161+
records the cumulative byte-seconds of data stored within the
162+
bucket, excluding objects marked for soft deletion.
163+
164+
2. `get_storage_class_ratio`
165+
166+
* This function uses a query to re-acquire the
167+
`storage.googleapis.com/storage/v2/total_byte_seconds` metric. However,
168+
in this instance, it focuses on segregating and aggregating the data
169+
based on the storage class associated with each object within the
170+
bucket.
171+
* The resultant output is a distribution of data across various storage
172+
classes, facilitating a more granular cost analysis. For example, a
173+
result like `{ "bucket_name-STANDARD": 0.90, "bucket_name-NEARLINE":
174+
0.10 }` indicates that the bucket's data is stored across two storage
175+
classes with a ratio of 9:1.
176+
177+
--------------------------------------------------------------------------------
178+
179+
### Key Formula
180+
181+
The relative increase in cost of using soft delete is calculated by combining
182+
the output of above mentioned queries and the for each bucket,
54183
55184
```
185+
Relative cost of each bucket = deleted_bytes / total_byte_seconds
186+
x Soft delete retention duration-seconds
187+
x Relative Storage Cost
188+
x Storage Class Ratio
189+
```
56190
57-
**Important Note:** <span style="color: red;">Disabling soft-delete for flagged buckets means when deleting it will permanently delete files. These files cannot be restored, even if a soft-delete policy is later re-enabled.</span>
191+
where,
192+
193+
* `Deleted Bytes`: It is same as `storage/v2/deleted_bytes`. Delta count of
194+
deleted bytes per bucket.
195+
* `Total Bytes Seconds`: It is same as `storage/v2/total_byte_seconds`. Total
196+
daily storage in byte*seconds used by the bucket, grouped by storage class
197+
and type where type can be live-object, noncurrent-object,
198+
soft-deleted-object and multipart-upload.
199+
* `Soft delete retention duration-seconds`: Soft Delete window defined for the
200+
bucket, this is the threshold to be provided to test out this relative cost
201+
script.
202+
* `Relative Storage Cost`: The cost of storing data in a specific storage
203+
class (e.g., Standard, Nearline, Coldline) relative to the Standard class
204+
(where Standard class cost is 1).
205+
* `Storage Class Ratio`: The proportion of the bucket's data that belongs to
206+
the specific storage class being considered.
207+
208+
Please note the following Cloud Monitoring metrics: `storage/v2/deleted_bytes`
209+
and `storage/v2/total_byte_seconds` are defined on
210+
[https://cloud.google.com/monitoring/api/metrics_gcp#gcp-storage](https://cloud.google.com/monitoring/api/metrics_gcp#gcp-storage)
211+
212+
#### Stepwise Explanation
213+
214+
1. Soft Delete Rate: Dividing 'Deleted Bytes' by 'Total Bytes Seconds' gives
215+
you the rate at which data is being soft-deleted (per second). This shows
216+
how quickly data marked for deletion accumulates in the bucket.
217+
2. Cost Impact:
218+
* Multiply the `Soft Delete Rate` by the `Soft delete retention
219+
duration-seconds` to get the total ratio of data that is soft-deleted
220+
and retained within the specified period.
221+
* Multiply this result by the 'Relative Storage Cost` to factor in the
222+
pricing of the specific storage class.
223+
* Finally, multiply by the `Storage Class Ratio` to consider only the
224+
portion of the cost attributable to that particular class.
225+
226+
A script analyzes your bucket usage history to estimate the relative cost
227+
increase of enabling soft delete. It outputs a fraction for each bucket,
228+
representing the fractional increase in cost compared to current pricing if
229+
usage patterns continue. For instance, `{"Bucket_A": 1.15,"Bucket_B": 1.05}`
230+
indicates a `15%` price increase for `Bucket_A` and `5%` increase for `Bucket_B`
231+
with soft delete enabled for the defined `Soft delete retention
232+
duration-seconds`. This output allows you to weigh the benefits of data
233+
protection against the added storage expenses for each bucket and storage class,
234+
helping you make informed decisions about enabling soft delete.

storage/cost-analysis/storage_soft_delete_relative_cost_analyzer.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,10 +21,10 @@
2121
storage/v2/total_byte_seconds
2222
metric.
2323
24-
Relative cost of each bucket = ('soft delete retention duration'
25-
× 'deleted bytes' / 'total bytes seconds' )
26-
x 'cost of storing in storage class'
27-
x 'ratio of storage class'.
24+
Relative cost of each bucket = deleted_bytes / total_byte_seconds
25+
x Soft delete retention duration-seconds
26+
x Relative Storage Cost
27+
x Storage Class Ratio
2828
"""
2929

3030
# [START storage_soft_delete_relative_cost]

0 commit comments

Comments
 (0)