|
| 1 | +## Prepared statements |
| 2 | + |
| 3 | +When Cassandra executes a query, the first thing it does is parse the |
| 4 | +query string to an internal representation. If the same query is used |
| 5 | +often, you can use a prepared statement, which allows Cassandra to cache |
| 6 | +that representation, and save time and resources for subsequent |
| 7 | +executions. Prepared statements are typically parameterized, using |
| 8 | +different values for each execution: |
| 9 | + |
| 10 | +```java |
| 11 | +PreparedStatement prepared = session.prepare( |
| 12 | + "insert into product (sku, description) values (?, ?)"); |
| 13 | + |
| 14 | +BoundStatement bound; |
| 15 | + |
| 16 | +bound = prepared.bind("234827", "Mouse"); |
| 17 | +session.execute(bound); |
| 18 | + |
| 19 | +bound = prepared.bind("987274", "Keyboard"); |
| 20 | +session.execute(bound); |
| 21 | +``` |
| 22 | + |
| 23 | +Statements should be be prepared only once. If you call `prepare` |
| 24 | +multiple times with the same query string, the driver will log a |
| 25 | +warning. So your application should cache the `PreparedStatement` object |
| 26 | +once it's been created (this can be as simple as storing it as a field |
| 27 | +in a DAO). |
| 28 | + |
| 29 | +### Parameters and binding |
| 30 | + |
| 31 | +Parameters can be either anonymous or named (named parameters are only |
| 32 | +available with [native protocol](../../native_protocol) v2 or above): |
| 33 | + |
| 34 | +```java |
| 35 | +ps1 = session.prepare("insert into product (sku, description) values (?, ?)"); |
| 36 | +ps2 = session.prepare("insert into product (sku, description) values (:s, :d)"); |
| 37 | +``` |
| 38 | + |
| 39 | +To turn the statement into its executable form, you need to *bind* it |
| 40 | +and provide values for the parameters. As shown previously, there is a |
| 41 | +shorthand form to do it all in one call: |
| 42 | + |
| 43 | +```java |
| 44 | +BoundStatement bound = ps1.bind("324378", "LCD screen"); |
| 45 | +``` |
| 46 | + |
| 47 | +You can also bind first, then use setters, which is slightly more |
| 48 | +explicit: |
| 49 | + |
| 50 | +```java |
| 51 | +// Positional setters: |
| 52 | +BoundStatement bound = ps1.bind() |
| 53 | + .setString(0, "324378") |
| 54 | + .setString(1, "LCD screen"); |
| 55 | + |
| 56 | +// Named setters: |
| 57 | +BoundStatement bound = ps2.bind() |
| 58 | + .setString("s", "324378") |
| 59 | + .setString("d", "LCD screen"); |
| 60 | +``` |
| 61 | + |
| 62 | +If you don't set a parameter, it is sent as `null` (note that this |
| 63 | +behavior changes in the 2.1 branch of the driver). |
| 64 | + |
| 65 | +You can use named setters even if the query uses anonymous parameters; |
| 66 | +Cassandra will name the parameters after the column they apply to: |
| 67 | + |
| 68 | +```java |
| 69 | +BoundStatement bound = ps1.bind() |
| 70 | + .setString("sku", "324378") |
| 71 | + .setString("description", "LCD screen"); |
| 72 | +``` |
| 73 | + |
| 74 | +This can be ambiguous if the query uses the same column multiple times, |
| 75 | +for example: `select * from sales where sku = ? and date > ? and date < |
| 76 | +?`. In these situations, use positional setters or named parameters. |
| 77 | + |
| 78 | +A bound statement also has getters to retrieve the values. Note that |
| 79 | +this has a small performance overhead since values are stored in their |
| 80 | +serialized form. |
| 81 | + |
| 82 | +### How the driver handles prepared statements |
| 83 | + |
| 84 | +When the driver prepares a statement, it sends the query string to |
| 85 | +Cassandra, which caches the statement and returns an identifier. Later, |
| 86 | +when the driver needs to execute the statement, it just sends the |
| 87 | +identifier and parameter values. Note that the identifier is |
| 88 | +deterministic, so it will always be the same for all nodes (it's a |
| 89 | +actually a hash of the query string). |
| 90 | + |
| 91 | +Prepared statements are not replicated across the cluster. It is the |
| 92 | +driver's responsibility to ensure that each node's cache is up to |
| 93 | +date. It uses a number of strategies to achieve this: |
| 94 | + |
| 95 | +1. When a statement is initially prepared, it is first sent to a single |
| 96 | + node in the cluster (this prevents from hitting all nodes in case |
| 97 | + the query string is wrong). Once that node replies successfully, the |
| 98 | + driver re-prepares on all remaining nodes: |
| 99 | + |
| 100 | + ```ditaa |
| 101 | + client driver node1 node2 node3 |
| 102 | + --+------------------------+----------------+--------------+------+--- |
| 103 | + | | | | | |
| 104 | + | session.prepare(query) | | | | |
| 105 | + |----------------------->| | | | |
| 106 | + | | PREPARE(query) | | | |
| 107 | + | |--------------->| | | |
| 108 | + | | | | | |
| 109 | + | | PREPARED(id) | | | |
| 110 | + | |<---------------| | | |
| 111 | + | | | | | |
| 112 | + | | | | | |
| 113 | + | | PREPARE(query) | | |
| 114 | + | |------------------------------>| | |
| 115 | + | | | | | |
| 116 | + | | PREPARE(query) | | |
| 117 | + | |------------------------------------->| |
| 118 | + | | | | | |
| 119 | + |<-----------------------| | | | |
| 120 | + ``` |
| 121 | +
|
| 122 | +2. if a node crashes, it loses all of its prepared statements. So the |
| 123 | + driver keeps a client-side cache; anytime a node is marked back up, |
| 124 | + the driver re-prepares all statements on it; |
| 125 | +
|
| 126 | +3. finally, if the driver tries to execute a statement and finds out |
| 127 | + that the coordinator doesn't know about it, it will re-prepare the |
| 128 | + statement on the fly (this is transparent for the client, but will cost |
| 129 | + two extra roundtrips): |
| 130 | +
|
| 131 | + ```ditaa |
| 132 | + client driver node1 |
| 133 | + --+-------------------------------+------------------------------+-- |
| 134 | + | | | |
| 135 | + |session.execute(boundStatement)| | |
| 136 | + +------------------------------>| | |
| 137 | + | | EXECUTE(id, values) | |
| 138 | + | |----------------------------->| |
| 139 | + | | | |
| 140 | + | | UNPREPARED | |
| 141 | + | |<-----------------------------| |
| 142 | + | | | |
| 143 | + | | | |
| 144 | + | | PREPARE(query) | |
| 145 | + | |----------------------------->| |
| 146 | + | | | |
| 147 | + | | PREPARED(id) | |
| 148 | + | |<-----------------------------| |
| 149 | + | | | |
| 150 | + | | | |
| 151 | + | | EXECUTE(id, values) | |
| 152 | + | |----------------------------->| |
| 153 | + | | | |
| 154 | + | | ROWS | |
| 155 | + | |<-----------------------------| |
| 156 | + | | | |
| 157 | + |<------------------------------| | |
| 158 | + ``` |
| 159 | +
|
| 160 | +You can customize these strategies through `QueryOptions`: |
| 161 | +
|
| 162 | +* [setPrepareOnAllHosts] controls whether statements are initially |
| 163 | + re-prepared on other hosts (step 1 above); |
| 164 | +* [setReprepareOnUp] controls whether statements are re-prepared on a |
| 165 | + node that comes back up (step 2 above). |
| 166 | +
|
| 167 | +Changing the driver's defaults should be done with care and only in |
| 168 | +specific situations; read each method's Javadoc for detailed |
| 169 | +explanations. |
| 170 | +
|
| 171 | +[setPrepareOnAllHosts]: http://docs.datastax.com/en/drivers/java/2.0/com/datastax/driver/core/QueryOptions.html#setPrepareOnAllHosts(boolean) |
| 172 | +[setReprepareOnUp]: http://docs.datastax.com/en/drivers/java/2.0/com/datastax/driver/core/QueryOptions.html#setReprepareOnUp(boolean) |
0 commit comments