retry errors hide responses of row mutations

When bigtable row mutations run into retryable errors and exhausts all retry budgets, the responses of batched/flushed row mutations are all masked/hidden by the client who later suppresses all retryable errors.

Additionally, the returned response of the client is no longer Status types but the default None value.

To force a retryable error to be raised in normal conditions, manually set a very tiny mutation_timeout (0.1 ms), run below code, you get None values in the response.

```
import datetime
from google.cloud import bigtable

project_id = 'ningk-test-project'
instance_id = 'ningk-test'
table_id = 'test'
client = bigtable.Client(project=project_id, admin=True)
instance = client.instance(instance_id)
table = instance.table(table_id, mutation_timeout=0.0001)
timestamp = datetime.datetime.utcnow()
column_family_id = 'stats'

rows = [
    table.direct_row('stats#a1#20220106001'),
    table.direct_row('stats#a1#20220106002')
]

rows[0].set_cell(column_family_id,
               'name',
               'dango',
               timestamp)
rows[0].set_cell(column_family_id,
               'type',
               'alpha',
               timestamp)
rows[1].set_cell(column_family_id,
               'name',
               'pump',
               timestamp)
rows[1].set_cell(column_family_id,
               'type',
               'beta',
               timestamp)

from google.cloud.bigtable.batcher import MutationsBatcher

FLUSH_COUNT = 1000
MAX_ROW_BYTES = 5242880  # 5MB

class _MutationsBatcher(MutationsBatcher):
    def __init__(
        self, table, flush_count=FLUSH_COUNT, max_row_bytes=MAX_ROW_BYTES):
      super().__init__(table, flush_count, max_row_bytes)
      self.rows = []

    def set_flush_callback(self, callback_fn):
      self.callback_fn = callback_fn

    def flush(self):
      if len(self.rows) != 0:
        response = self.table.mutate_rows(self.rows)
        self.callback_fn(response)
        self.total_mutation_count = 0
        self.total_size = 0
        self.rows = []

batcher = _MutationsBatcher(table)

def callback(response):
    for i, status in enumerate(response):
        print(type(status))

batcher.set_flush_callback(callback)

for row in rows:
    batcher.mutate(row)

batcher.flush()
```

You will get
```
<class 'NoneType'>
<class 'NoneType'>
```

More context can be found here: https://issues.apache.org/jira/browse/BEAM-13606.

The expected solution:
1. The returned response should always contain Status types instead of None. A DEADLINE_EXCEEDED might be a good fit here as retry budget is exhausted;
2. All retryable errors shouldn't be suppressed by the client because then the invoker of the client wouldn't know what rows need to be retried.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

retry errors hide responses of row mutations #485

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

retry errors hide responses of row mutations #485

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions