Skip to content

Commit cf8ee9b

Browse files
committed
Explain offset queries in paging doc.
1 parent 7b55a58 commit cf8ee9b

2 files changed

Lines changed: 46 additions & 5 deletions

File tree

faq/README.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,3 @@ page](../features/paging/) for more information.
1010
Native protocol v1 does not support paging, but you can emulate it in
1111
CQL with `LIMIT` and the `token()` function. See
1212
[this conversation](https://groups.google.com/a/lists.datastax.com/d/msg/java-driver-user/U2KzAHruWO4/6vDmUVDDkOwJ) on the mailing list.
13-
14-
There is no trivial solution for offset queries (e.g. jump to page 20
15-
directly). Cassandra does not implement them out of the box, see
16-
[CASSANDRA-6511](https://issues.apache.org/jira/browse/CASSANDRA-6511).

features/paging/README.md

Lines changed: 46 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ for (Row row : rs) {
9393
}
9494
```
9595

96-
### Manual paging
96+
### Saving and reusing the paging state
9797

9898
Sometimes it is convenient to save the paging state in order to restore
9999
it later. For example, consider a stateless web service that displays a
@@ -215,3 +215,48 @@ There are two situations where you might want to use the unsafe API:
215215

216216
[gpsu]: http://www.datastax.com/drivers/java/2.0/com/datastax/driver/core/ExecutionInfo.html#getPagingStateUnsafe()
217217
[spsu]: http://www.datastax.com/drivers/java/2.0/com/datastax/driver/core/Statement.html#setPagingStateUnsafe(byte[])
218+
219+
### Offset queries
220+
221+
Saving the paging state works well when you only let the user move from
222+
one page to the next. But it doesn't allow random jumps (like "go
223+
directly to page 10"), because you can't fetch a page unless you have
224+
the paging state of the previous one. Such a feature would require
225+
*offset queries*, but they are not natively supported by Cassandra (see
226+
[CASSANDRA-6511](https://issues.apache.org/jira/browse/CASSANDRA-6511)).
227+
The rationale is that offset queries are inherently inefficient (the
228+
performance will always be linear in the number of rows skipped), so the
229+
Cassandra team doesn't want to encourage their use.
230+
231+
If you really want offset queries, you can emulate them client-side.
232+
You'll still get linear performance, but maybe that's acceptable for
233+
your use case. For example, if each page holds 10 rows and you show at
234+
most 20 pages, this means you'll fetch at most 190 extra rows, which
235+
doesn't sound like a big deal.
236+
237+
For example, if the page size is 10, the fetch size is 50, and the user
238+
asks for page 12 (rows 110 to 119):
239+
240+
* execute the statement a first time (the result set contains rows 0 to
241+
49, but you're not going to use them, only the paging state);
242+
* execute the statement a second time with the paging state from the
243+
first query;
244+
* execute the statement a third time with the paging state from the
245+
second query. The result set now contains rows 100 to 149;
246+
* skip the first 10 rows of the iterator. Read the next 10 rows and
247+
discard the remaining ones.
248+
249+
You'll want to experiment with the fetch size to find the best balance:
250+
too small means many background queries; too big means bigger messages
251+
and too many unneeded rows returned (we picked 50 above for the sake of
252+
example, but it's probably too small -- the default is 5000).
253+
254+
Again, offset queries are inefficient by nature. Emulating them
255+
client-side is a compromise when you think you can get away with the
256+
performance hit. We recommend that you:
257+
258+
* test your code at scale with the expected query patterns, to make sure
259+
that your assumptions are correct;
260+
* set a hard limit on the highest possible page number, to prevent
261+
malicious users from triggering queries that would skip a huge amount
262+
of rows.

0 commit comments

Comments
 (0)