Skip to content

Commit 8cabb98

Browse files
committed
Explain offset queries in paging doc.
1 parent 2edebf9 commit 8cabb98

2 files changed

Lines changed: 48 additions & 7 deletions

File tree

faq/README.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,3 @@ page](../features/paging/) for more information.
1010
Native protocol v1 does not support paging, but you can emulate it in
1111
CQL with `LIMIT` and the `token()` function. See
1212
[this conversation](https://groups.google.com/a/lists.datastax.com/d/msg/java-driver-user/U2KzAHruWO4/6vDmUVDDkOwJ) on the mailing list.
13-
14-
There is no trivial solution for offset queries (e.g. jump to page 20
15-
directly). Cassandra does not implement them out of the box, see
16-
[CASSANDRA-6511](https://issues.apache.org/jira/browse/CASSANDRA-6511).

features/paging/README.md

Lines changed: 48 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ for (Row row : rs) {
9393
}
9494
```
9595

96-
### Manual paging
96+
### Saving and reusing the paging state
9797

9898
Sometimes it is convenient to save the paging state in order to restore
9999
it later. For example, consider a stateless web service that displays a
@@ -214,5 +214,50 @@ There are two situations where you might want to use the unsafe API:
214214
implementing your own validation logic (for example, signing the raw
215215
state with a private key).
216216

217-
[gpsu]: http://www.datastax.com/drivers/java/2.1/com/datastax/driver/core/ExecutionInfo.html#getPagingStateUnsafe()
218-
[spsu]: http://www.datastax.com/drivers/java/2.1/com/datastax/driver/core/Statement.html#setPagingStateUnsafe(byte[])
217+
[gpsu]: http://www.datastax.com/drivers/java/2.2/com/datastax/driver/core/ExecutionInfo.html#getPagingStateUnsafe()
218+
[spsu]: http://www.datastax.com/drivers/java/2.2/com/datastax/driver/core/Statement.html#setPagingStateUnsafe(byte[])
219+
220+
### Offset queries
221+
222+
Saving the paging state works well when you only let the user move from
223+
one page to the next. But it doesn't allow random jumps (like "go
224+
directly to page 10"), because you can't fetch a page unless you have
225+
the paging state of the previous one. Such a feature would require
226+
*offset queries*, but they are not natively supported by Cassandra (see
227+
[CASSANDRA-6511](https://issues.apache.org/jira/browse/CASSANDRA-6511)).
228+
The rationale is that offset queries are inherently inefficient (the
229+
performance will always be linear in the number of rows skipped), so the
230+
Cassandra team doesn't want to encourage their use.
231+
232+
If you really want offset queries, you can emulate them client-side.
233+
You'll still get linear performance, but maybe that's acceptable for
234+
your use case. For example, if each page holds 10 rows and you show at
235+
most 20 pages, this means you'll fetch at most 190 extra rows, which
236+
doesn't sound like a big deal.
237+
238+
For example, if the page size is 10, the fetch size is 50, and the user
239+
asks for page 12 (rows 110 to 119):
240+
241+
* execute the statement a first time (the result set contains rows 0 to
242+
49, but you're not going to use them, only the paging state);
243+
* execute the statement a second time with the paging state from the
244+
first query;
245+
* execute the statement a third time with the paging state from the
246+
second query. The result set now contains rows 100 to 149;
247+
* skip the first 10 rows of the iterator. Read the next 10 rows and
248+
discard the remaining ones.
249+
250+
You'll want to experiment with the fetch size to find the best balance:
251+
too small means many background queries; too big means bigger messages
252+
and too many unneeded rows returned (we picked 50 above for the sake of
253+
example, but it's probably too small -- the default is 5000).
254+
255+
Again, offset queries are inefficient by nature. Emulating them
256+
client-side is a compromise when you think you can get away with the
257+
performance hit. We recommend that you:
258+
259+
* test your code at scale with the expected query patterns, to make sure
260+
that your assumptions are correct;
261+
* set a hard limit on the highest possible page number, to prevent
262+
malicious users from triggering queries that would skip a huge amount
263+
of rows.

0 commit comments

Comments
 (0)