@@ -93,7 +93,7 @@ for (Row row : rs) {
9393}
9494```
9595
96- ### Manual paging
96+ ### Saving and reusing the paging state
9797
9898Sometimes it is convenient to save the paging state in order to restore
9999it later. For example, consider a stateless web service that displays a
@@ -214,5 +214,50 @@ There are two situations where you might want to use the unsafe API:
214214 implementing your own validation logic (for example, signing the raw
215215 state with a private key).
216216
217- [ gpsu ] : http://www.datastax.com/drivers/java/2.1/com/datastax/driver/core/ExecutionInfo.html#getPagingStateUnsafe()
218- [ spsu ] : http://www.datastax.com/drivers/java/2.1/com/datastax/driver/core/Statement.html#setPagingStateUnsafe(byte[])
217+ [ gpsu ] : http://www.datastax.com/drivers/java/2.2/com/datastax/driver/core/ExecutionInfo.html#getPagingStateUnsafe()
218+ [ spsu ] : http://www.datastax.com/drivers/java/2.2/com/datastax/driver/core/Statement.html#setPagingStateUnsafe(byte[])
219+
220+ ### Offset queries
221+
222+ Saving the paging state works well when you only let the user move from
223+ one page to the next. But it doesn't allow random jumps (like "go
224+ directly to page 10"), because you can't fetch a page unless you have
225+ the paging state of the previous one. Such a feature would require
226+ * offset queries* , but they are not natively supported by Cassandra (see
227+ [ CASSANDRA-6511] ( https://issues.apache.org/jira/browse/CASSANDRA-6511 ) ).
228+ The rationale is that offset queries are inherently inefficient (the
229+ performance will always be linear in the number of rows skipped), so the
230+ Cassandra team doesn't want to encourage their use.
231+
232+ If you really want offset queries, you can emulate them client-side.
233+ You'll still get linear performance, but maybe that's acceptable for
234+ your use case. For example, if each page holds 10 rows and you show at
235+ most 20 pages, this means you'll fetch at most 190 extra rows, which
236+ doesn't sound like a big deal.
237+
238+ For example, if the page size is 10, the fetch size is 50, and the user
239+ asks for page 12 (rows 110 to 119):
240+
241+ * execute the statement a first time (the result set contains rows 0 to
242+ 49, but you're not going to use them, only the paging state);
243+ * execute the statement a second time with the paging state from the
244+ first query;
245+ * execute the statement a third time with the paging state from the
246+ second query. The result set now contains rows 100 to 149;
247+ * skip the first 10 rows of the iterator. Read the next 10 rows and
248+ discard the remaining ones.
249+
250+ You'll want to experiment with the fetch size to find the best balance:
251+ too small means many background queries; too big means bigger messages
252+ and too many unneeded rows returned (we picked 50 above for the sake of
253+ example, but it's probably too small -- the default is 5000).
254+
255+ Again, offset queries are inefficient by nature. Emulating them
256+ client-side is a compromise when you think you can get away with the
257+ performance hit. We recommend that you:
258+
259+ * test your code at scale with the expected query patterns, to make sure
260+ that your assumptions are correct;
261+ * set a hard limit on the highest possible page number, to prevent
262+ malicious users from triggering queries that would skip a huge amount
263+ of rows.
0 commit comments