Skip to content

Commit e3307d9

Browse files
Alexandre Dutraolim7t
authored andcommitted
JAVA-764: Retry with the normal consistency level (not the serial one) when a write times out on the Paxos phase.
1 parent b5173ce commit e3307d9

9 files changed

Lines changed: 201 additions & 132 deletions

File tree

changelog/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@
3333
- [new feature] JAVA-1019: SchemaBuilder support for CREATE/ALTER/DROP KEYSPACE.
3434
- [bug] JAVA-1070: The Mapper should not prepare queries synchronously.
3535
- [new feature] JAVA-982: Introduce new method ConsistencyLevel.isSerial().
36+
- [bug] JAVA-764: Retry with the normal consistency level (not the serial one) when a write times out on the Paxos phase.
3637

3738
Merged from 2.0 branch:
3839

driver-core/src/main/java/com/datastax/driver/core/RequestHandler.java

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -382,7 +382,8 @@ private void processRetryDecision(RetryPolicy.RetryDecision retryDecision, Conne
382382

383383
private void retry(final boolean retryCurrent, ConsistencyLevel newConsistencyLevel) {
384384
final Host h = current;
385-
this.retryConsistencyLevel = newConsistencyLevel;
385+
if (newConsistencyLevel != null)
386+
this.retryConsistencyLevel = newConsistencyLevel;
386387

387388
// We should not retry on the current thread as this will be an IO thread.
388389
manager.executor().execute(new Runnable() {

driver-core/src/main/java/com/datastax/driver/core/policies/DefaultRetryPolicy.java

Lines changed: 16 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -42,9 +42,9 @@ private DefaultRetryPolicy() {
4242
}
4343

4444
/**
45-
* Defines whether to retry and at which consistency level on a read timeout.
45+
* {@inheritDoc}
4646
* <p/>
47-
* This method triggers a maximum of one retry, and only if enough
47+
* This implementation triggers a maximum of one retry, and only if enough
4848
* replicas had responded to the read request but data was not retrieved
4949
* amongst those. Indeed, that case usually means that enough replica
5050
* are alive to satisfy the consistency but the coordinator picked a
@@ -53,15 +53,6 @@ private DefaultRetryPolicy() {
5353
* timeout the dead replica will likely have been detected as dead and
5454
* the retry has a high chance of success.
5555
*
56-
* @param statement the original query that timed out.
57-
* @param cl the original consistency level of the read that timed out.
58-
* @param requiredResponses the number of responses that were required to
59-
* achieve the requested consistency level.
60-
* @param receivedResponses the number of responses that had been received
61-
* by the time the timeout exception was raised.
62-
* @param dataRetrieved whether actual data (by opposition to data checksum)
63-
* was present in the received responses.
64-
* @param nbRetry the number of retries already performed for this operation.
6556
* @return {@code RetryDecision.retry(cl)} if no retry attempt has yet been tried and
6657
* {@code receivedResponses >= requiredResponses && !dataRetrieved}, {@code RetryDecision.rethrow()} otherwise.
6758
*/
@@ -74,9 +65,9 @@ public RetryDecision onReadTimeout(Statement statement, ConsistencyLevel cl, int
7465
}
7566

7667
/**
77-
* Defines whether to retry and at which consistency level on a write timeout.
68+
* {@inheritDoc}
7869
* <p/>
79-
* This method triggers a maximum of one retry, and only in the case of
70+
* This implementation triggers a maximum of one retry, and only in the case of
8071
* a {@code WriteType.BATCH_LOG} write. The reasoning for the retry in
8172
* that case is that write to the distributed batch log is tried by the
8273
* coordinator of the write against a small subset of all the nodes alive
@@ -86,14 +77,6 @@ public RetryDecision onReadTimeout(Statement statement, ConsistencyLevel cl, int
8677
* nodes will likely have been detected as dead and the retry has thus a
8778
* high chance of success.
8879
*
89-
* @param statement the original query that timed out.
90-
* @param cl the original consistency level of the write that timed out.
91-
* @param writeType the type of the write that timed out.
92-
* @param requiredAcks the number of acknowledgments that were required to
93-
* achieve the requested consistency level.
94-
* @param receivedAcks the number of acknowledgments that had been received
95-
* by the time the timeout exception was raised.
96-
* @param nbRetry the number of retry already performed for this operation.
9780
* @return {@code RetryDecision.retry(cl)} if no retry attempt has yet been tried and
9881
* {@code writeType == WriteType.BATCH_LOG}, {@code RetryDecision.rethrow()} otherwise.
9982
*/
@@ -103,37 +86,27 @@ public RetryDecision onWriteTimeout(Statement statement, ConsistencyLevel cl, Wr
10386
return RetryDecision.rethrow();
10487

10588
// If the batch log write failed, retry the operation as this might just be we were unlucky at picking candidates
89+
// JAVA-764: testing the write type automatically filters out serial consistency levels as these have always WriteType.CAS.
10690
return writeType == WriteType.BATCH_LOG ? RetryDecision.retry(cl) : RetryDecision.rethrow();
10791
}
10892

10993
/**
110-
* Defines whether to retry and at which consistency level on an
111-
* unavailable exception.
94+
* {@inheritDoc}
11295
* <p/>
113-
* This method triggers a retry iff no retry has been executed before
114-
* (nbRetry == 0), with
115-
* {@link RetryPolicy.RetryDecision#tryNextHost(ConsistencyLevel) RetryDecision.tryNextHost(cl)},
116-
* otherwise it throws an exception. The retry will be processed on the next host
117-
* in the query plan according to the current Load Balancing Policy.
118-
* Where retrying on the same host in the event of an Unavailable exception
119-
* has almost no chance of success, if the first replica tried happens to
120-
* be "network" isolated from all the other nodes but can still answer to
121-
* the client, it makes sense to retry the query on another node.
122-
*
123-
* @param statement the original query for which the consistency level cannot
124-
* be achieved.
125-
* @param cl the original consistency level for the operation.
126-
* @param requiredReplica the number of replica that should have been
127-
* (known) alive for the operation to be attempted.
128-
* @param aliveReplica the number of replica that were know to be alive by
129-
* the coordinator of the operation.
130-
* @param nbRetry the number of retry already performed for this operation.
131-
* @return {@code RetryDecision.rethrow()}.
96+
* This implementation does the following:
97+
* <ul>
98+
* <li>if this is the first retry ({@code nbRetry == 0}), it triggers a retry on the next host in the query plan
99+
* with the same consistency level ({@link RetryPolicy.RetryDecision#tryNextHost(ConsistencyLevel) RetryDecision#tryNextHost(null)}.
100+
* The rationale is that the first coordinator might have been network-isolated from all other nodes (thinking
101+
* they're down), but still able to communicate with the client; in that case, retrying on the same host has almost
102+
* no chance of success, but moving to the next host might solve the issue.</li>
103+
* <li>otherwise, the exception is rethrow.</li>
104+
* </ul>
132105
*/
133106
@Override
134107
public RetryDecision onUnavailable(Statement statement, ConsistencyLevel cl, int requiredReplica, int aliveReplica, int nbRetry) {
135108
return (nbRetry == 0)
136-
? RetryDecision.tryNextHost(cl)
109+
? RetryDecision.tryNextHost(null)
137110
: RetryDecision.rethrow();
138111
}
139112

driver-core/src/main/java/com/datastax/driver/core/policies/DowngradingConsistencyRetryPolicy.java

Lines changed: 11 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -85,25 +85,14 @@ else if (knownOk == 1)
8585
}
8686

8787
/**
88-
* Defines whether to retry and at which consistency level on a read timeout.
88+
* {@inheritDoc}
8989
* <p/>
90-
* This method triggers a maximum of one retry. If less replica
90+
* This implementation triggers a maximum of one retry. If less replica
9191
* responded than required by the consistency level (but at least one
9292
* replica did respond), the operation is retried at a lower
9393
* consistency level. If enough replica responded but data was not
9494
* retrieve, the operation is retried with the initial consistency
9595
* level. Otherwise, an exception is thrown.
96-
*
97-
* @param statement the original query that timed out.
98-
* @param cl the original consistency level of the read that timed out.
99-
* @param requiredResponses the number of responses that were required to
100-
* achieve the requested consistency level.
101-
* @param receivedResponses the number of responses that had been received
102-
* by the time the timeout exception was raised.
103-
* @param dataRetrieved whether actual data (by opposition to data checksum)
104-
* was present in the received responses.
105-
* @param nbRetry the number of retry already performed for this operation.
106-
* @return a RetryDecision as defined above.
10796
*/
10897
@Override
10998
public RetryDecision onReadTimeout(Statement statement, ConsistencyLevel cl, int requiredResponses, int receivedResponses, boolean dataRetrieved, int nbRetry) {
@@ -126,9 +115,9 @@ public RetryDecision onReadTimeout(Statement statement, ConsistencyLevel cl, int
126115
}
127116

128117
/**
129-
* Defines whether to retry and at which consistency level on a write timeout.
118+
* {@inheritDoc}
130119
* <p/>
131-
* This method triggers a maximum of one retry. If {@code writeType ==
120+
* This implementation triggers a maximum of one retry. If {@code writeType ==
132121
* WriteType.BATCH_LOG}, the write is retried with the initial
133122
* consistency level. If {@code writeType == WriteType.UNLOGGED_BATCH}
134123
* and at least one replica acknowledged, the write is retried with a
@@ -137,16 +126,6 @@ public RetryDecision onReadTimeout(Statement statement, ConsistencyLevel cl, int
137126
* all, even if {@code receivedAcks > 0}). For other write types ({@code WriteType.SIMPLE}
138127
* and {@code WriteType.BATCH}), if we know the write has been persisted on at
139128
* least one replica, we ignore the exception. Otherwise, an exception is thrown.
140-
*
141-
* @param statement the original query that timed out.
142-
* @param cl the original consistency level of the write that timed out.
143-
* @param writeType the type of the write that timed out.
144-
* @param requiredAcks the number of acknowledgments that were required to
145-
* achieve the requested consistency level.
146-
* @param receivedAcks the number of acknowledgments that had been received
147-
* by the time the timeout exception was raised.
148-
* @param nbRetry the number of retry already performed for this operation.
149-
* @return a RetryDecision as defined above.
150129
*/
151130
@Override
152131
public RetryDecision onWriteTimeout(Statement statement, ConsistencyLevel cl, WriteType writeType, int requiredAcks, int receivedAcks, int nbRetry) {
@@ -170,28 +149,22 @@ public RetryDecision onWriteTimeout(Statement statement, ConsistencyLevel cl, Wr
170149
}
171150

172151
/**
173-
* Defines whether to retry and at which consistency level on an
174-
* unavailable exception.
152+
* {@inheritDoc}
175153
* <p/>
176-
* This method triggers a maximum of one retry. If at least one replica
154+
* This implementation triggers a maximum of one retry. If at least one replica
177155
* is know to be alive, the operation is retried at a lower consistency
178156
* level.
179-
*
180-
* @param statement the original query for which the consistency level cannot
181-
* be achieved.
182-
* @param cl the original consistency level for the operation.
183-
* @param requiredReplica the number of replica that should have been
184-
* (known) alive for the operation to be attempted.
185-
* @param aliveReplica the number of replica that were know to be alive by
186-
* the coordinator of the operation.
187-
* @param nbRetry the number of retry already performed for this operation.
188-
* @return a RetryDecision as defined above.
189157
*/
190158
@Override
191159
public RetryDecision onUnavailable(Statement statement, ConsistencyLevel cl, int requiredReplica, int aliveReplica, int nbRetry) {
192160
if (nbRetry != 0)
193161
return RetryDecision.rethrow();
194162

163+
// JAVA-764: if the requested consistency level is serial, it means that the operation failed at the paxos phase of a LWT.
164+
// Retry on the next host, on the assumption that the initial coordinator could be network-isolated.
165+
if (cl.isSerial())
166+
return RetryDecision.tryNextHost(null);
167+
195168
// Tries the biggest CL that is expected to work
196169
return maxLikelyToWorkCL(aliveReplica);
197170
}

driver-core/src/main/java/com/datastax/driver/core/policies/ExtendedRetryPolicy.java

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,9 @@ public interface ExtendedRetryPolicy extends RetryPolicy {
4949
* is known to be idempotent.
5050
*
5151
* @param statement the original query that failed.
52-
* @param cl the original consistency level for the operation.
52+
* @param cl the requested consistency level for the operation.
53+
* Note that this is not necessarily the achieved consistency level (if any),
54+
* and it is never a {@link ConsistencyLevel#isSerial() serial} one.
5355
* @param e the exception that caused this request to fail.
5456
* @param nbRetry the number of retries already performed for this operation.
5557
* @return the retry decision. If {@code RetryDecision.RETHROW} is returned,

driver-core/src/main/java/com/datastax/driver/core/policies/FallthroughRetryPolicy.java

Lines changed: 12 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -20,10 +20,10 @@
2020
import com.datastax.driver.core.WriteType;
2121

2222
/**
23-
* A retry policy that never retry (nor ignore).
23+
* A retry policy that never retries (nor ignores).
2424
* <p/>
25-
* All of the methods of this retry policy unconditionally return {@link RetryPolicy.RetryDecision#rethrow}.
26-
* If this policy is used, retry will have to be implemented in business code.
25+
* All of the methods of this retry policy unconditionally return {@link RetryDecision#rethrow()}.
26+
* If this policy is used, retry logic will have to be implemented in business code.
2727
*/
2828
public class FallthroughRetryPolicy implements ExtendedRetryPolicy {
2929

@@ -33,55 +33,29 @@ private FallthroughRetryPolicy() {
3333
}
3434

3535
/**
36-
* Defines whether to retry and at which consistency level on a read timeout.
37-
*
38-
* @param statement the original query that timed out.
39-
* @param cl the original consistency level of the read that timed out.
40-
* @param requiredResponses the number of responses that were required to
41-
* achieve the requested consistency level.
42-
* @param receivedResponses the number of responses that had been received
43-
* by the time the timeout exception was raised.
44-
* @param dataRetrieved whether actual data (by opposition to data checksum)
45-
* was present in the received responses.
46-
* @param nbRetry the number of retry already performed for this operation.
47-
* @return {@code RetryDecision.rethrow()}.
36+
* {@inheritDoc}
37+
* <p/>
38+
* This implementation always returns {@code RetryDecision.rethrow()}.
4839
*/
4940
@Override
5041
public RetryDecision onReadTimeout(Statement statement, ConsistencyLevel cl, int requiredResponses, int receivedResponses, boolean dataRetrieved, int nbRetry) {
5142
return RetryDecision.rethrow();
5243
}
5344

5445
/**
55-
* Defines whether to retry and at which consistency level on a write timeout.
56-
*
57-
* @param statement the original query that timed out.
58-
* @param cl the original consistency level of the write that timed out.
59-
* @param writeType the type of the write that timed out.
60-
* @param requiredAcks the number of acknowledgments that were required to
61-
* achieve the requested consistency level.
62-
* @param receivedAcks the number of acknowledgments that had been received
63-
* by the time the timeout exception was raised.
64-
* @param nbRetry the number of retry already performed for this operation.
65-
* @return {@code RetryDecision.rethrow()}.
46+
* {@inheritDoc}
47+
* <p/>
48+
* This implementation always returns {@code RetryDecision.rethrow()}.
6649
*/
6750
@Override
6851
public RetryDecision onWriteTimeout(Statement statement, ConsistencyLevel cl, WriteType writeType, int requiredAcks, int receivedAcks, int nbRetry) {
6952
return RetryDecision.rethrow();
7053
}
7154

7255
/**
73-
* Defines whether to retry and at which consistency level on an
74-
* unavailable exception.
75-
*
76-
* @param statement the original query for which the consistency level cannot
77-
* be achieved.
78-
* @param cl the original consistency level for the operation.
79-
* @param requiredReplica the number of replica that should have been
80-
* (known) alive for the operation to be attempted.
81-
* @param aliveReplica the number of replica that were know to be alive by
82-
* the coordinator of the operation.
83-
* @param nbRetry the number of retry already performed for this operation.
84-
* @return {@code RetryDecision.rethrow()}.
56+
* {@inheritDoc}
57+
* <p/>
58+
* This implementation always returns {@code RetryDecision.rethrow()}.
8559
*/
8660
@Override
8761
public RetryDecision onUnavailable(Statement statement, ConsistencyLevel cl, int requiredReplica, int aliveReplica, int nbRetry) {

driver-core/src/main/java/com/datastax/driver/core/policies/LoggingRetryPolicy.java

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,9 +25,14 @@
2525
/**
2626
* A retry policy that wraps another policy, logging the decision made by its sub-policy.
2727
* <p/>
28-
* Note that this policy only log the IGNORE and RETRY decisions (since
29-
* RETHROW decisions are just meant to propagate the cassandra exception). The
30-
* logging is done at the INFO level.
28+
* Note that this policy only logs
29+
* {@link com.datastax.driver.core.policies.RetryPolicy.RetryDecision.Type#RETRY RETRY} and
30+
* {@link com.datastax.driver.core.policies.RetryPolicy.RetryDecision.Type#IGNORE IGNORE} decisions (since
31+
* {@link com.datastax.driver.core.policies.RetryPolicy.RetryDecision.Type#RETHROW RETHROW} decisions
32+
* are just meant to propagate the Cassandra exception).
33+
* <p/>
34+
* The logging is done at the INFO level and the logger name is
35+
* {@code com.datastax.driver.core.policies.LoggingRetryPolicy}.
3136
*/
3237
public class LoggingRetryPolicy implements ExtendedRetryPolicy {
3338

0 commit comments

Comments
 (0)