(improvement) cqltypes: cache apply_parameters for parameterized types by mykaul · Pull Request #10 · mykaul/python-driver

mykaul · 2026-03-13T11:00:44Z

Summary

Cache apply_parameters() results so parameterized CQL type classes (e.g. ListType(UTF8Type), MapType(UTF8Type, Int32Type)) are created via type() once and returned as stable singletons on subsequent calls
Cache is a class-level dict keyed on (cls, tuple(subtypes), names) — only the base _CassandraType.apply_parameters is cached; UserType and VectorType overrides remain uncached
Stable singletons also benefit downstream caches (deserializer cache, ParseDesc cache) for non-prepared queries, since id()-based lookups now see the same type objects across result sets

Benchmark results (median, pytest-benchmark)

Single apply_parameters call

Type	Before (original)	After (with cache)
List<UTF8>	5,332 ns	139 ns
Map<UTF8,Int32>	5,455 ns	134 ns
Set<UUID>	5,502 ns	132 ns
Frozen<Timestamp>	5,329 ns	134 ns

Batch apply (simulating read_type for result metadata)

Scenario	Before (original)	After (with cache)
4 parameterized types	21,914 ns	751 ns
8 parameterized types	43,845 ns	1,599 ns
10-col mixed metadata	27,673 ns	1,090 ns

All 116 unit tests pass (1 skipped — pre-existing test_datetype issue).

Copilot

Pull request overview

This PR improves performance in the CQL type system by caching _CassandraType.apply_parameters() results so repeated construction of parameterized CQL type classes reuses stable singleton class objects instead of repeatedly calling type().

Changes:

Add a class-level cache to _CassandraType.apply_parameters() keyed by (cls, subtypes, names) to reuse generated parameterized type classes.
Add a new pytest-based benchmark module to measure cached vs uncached apply_parameters() performance and include basic correctness checks.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
cassandra/cqltypes.py	Introduces `_apply_parameters_cache` and uses it in `_CassandraType.apply_parameters()` to avoid repeated dynamic class creation.
benchmarks/test_apply_parameters_benchmark.py	Adds performance benchmarks (and some correctness assertions) for cached vs uncached `apply_parameters()`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        subtypes = tuple(subtypes)
+        cache_key = (cls, subtypes, tuple(names) if names else names)
+        cached = cls._apply_parameters_cache.get(cache_key)


+        subtypes = tuple(subtypes)
+        cache_key = (cls, subtypes, tuple(names) if names else names)
+        cached = cls._apply_parameters_cache.get(cache_key)
+        if cached is not None:
+            return cached
        newname = cls.cass_parameterized_type_with(subtypes)
-        return type(newname, (cls,), {'subtypes': subtypes, 'cassname': cls.cassname, 'fieldnames': names})
+        result = type(newname, (cls,), {'subtypes': subtypes, 'cassname': cls.cassname, 'fieldnames': names})
+        cls._apply_parameters_cache[cache_key] = result
+        return result


+"""
+
+import pytest
+


+# ---------------------------------------------------------------------------
+# Correctness tests
+# ---------------------------------------------------------------------------
+
+
+def test_cached_returns_same_object():
+    """Cached apply_parameters returns the exact same type object."""
+    a = cqltypes.ListType.apply_parameters((cqltypes.UTF8Type,))
+    b = cqltypes.ListType.apply_parameters((cqltypes.UTF8Type,))
+    assert a is b
+
+
+def test_different_params_different_types():
+    """Different subtypes produce different cached types."""
+    a = cqltypes.ListType.apply_parameters((cqltypes.UTF8Type,))
+    b = cqltypes.ListType.apply_parameters((cqltypes.Int32Type,))
+    assert a is not b
+    assert a.subtypes == (cqltypes.UTF8Type,)
+    assert b.subtypes == (cqltypes.Int32Type,)
+
+
+def test_cached_type_attributes():
+    """Cached type has correct attributes."""
+    t = cqltypes.MapType.apply_parameters((cqltypes.UTF8Type, cqltypes.LongType))
+    assert t.subtypes == (cqltypes.UTF8Type, cqltypes.LongType)
+    assert issubclass(t, cqltypes.MapType)
+    assert t.cassname == cqltypes.MapType.cassname
+
+
+def test_cached_matches_uncached():
+    """Cached version produces equivalent types to uncached."""
+    for cls, subtypes in PARAM_TYPE_SPECS:
+        cached = cls.apply_parameters(subtypes)
+        uncached = apply_parameters_uncached(cls, subtypes)
+
+        assert cached.subtypes == uncached.subtypes
+        assert cached.cassname == uncached.cassname
+        assert issubclass(cached, cls)
+        assert issubclass(uncached, cls)
+
+
+def test_nested_parameterized_types():
+    """Nested parameterized types (e.g. List<Map<text,int>>) are cached."""
+    inner = cqltypes.MapType.apply_parameters((cqltypes.UTF8Type, cqltypes.Int32Type))
+    outer1 = cqltypes.ListType.apply_parameters((inner,))
+    outer2 = cqltypes.ListType.apply_parameters((inner,))
+    assert outer1 is outer2
+    assert outer1.subtypes == (inner,)


Introduce the data layer for Private Link client routes support: - ClientRoutesChangeType enum for CLIENT_ROUTES_CHANGE event types - ClientRouteProxy dataclass and ClientRoutesConfig for user-facing configuration - _Route frozen dataclass for immutable route records - _RouteStore for thread-safe route storage with atomic update/merge and preferred route selection that avoids unnecessary connection_id migration when multiple routes exist for the same host

Add _ClientRoutesHandler which manages the full lifecycle of dynamic address translation via system.client_routes: - initialize(): loads all routes at startup and on control connection reconnect - handle_client_routes_change(): processes CLIENT_ROUTES_CHANGE events with targeted merge or full refresh depending on event data - _query_all_routes_for_connections(): complete refresh query using connection_id IN (...) - _query_routes_for_change_event(): targeted query grouping by connection_id with host_id IN (...) per group - _execute_routes_query(): common query execution and result parsing with proxy address override support - resolve_host(): host_id to (address, port) resolution with DNS lookup

- ClientRoutesEndPointFactory: creates endpoints from system.peers rows by extracting host_id, deferring address translation and DNS resolution until connection time - ClientRoutesEndPoint: endpoint that resolves via _ClientRoutesHandler on each connection attempt, ensuring immediate reaction to route changes and CLIENT_ROUTES_CHANGE events

Cluster: - Add client_routes_config parameter with mutual exclusivity check against endpoint_factory - Create _ClientRoutesHandler and ClientRoutesEndPointFactory when client_routes_config is provided ControlConnection: - Register CLIENT_ROUTES_CHANGE event watcher when handler is present - Forward events to handler via _handle_client_routes_change - Trigger full route re-read on control connection reconnection

Cover ClientRouteEntry/ClientRoutesConfig validation, _RouteStore get/merge operations, _ClientRoutesHandler initialization, ClientRoutesEndPoint resolution with and without route mappings, and SSL check_hostname rejection with client_routes_config.

Add comprehensive integration tests covering: - TCP proxy and NLB emulator infrastructure for simulating private link connectivity - query_routes filtering with different connection/host ID combinations - Full private-link connectivity verifying all driver connections go exclusively through the NLB proxy - Dynamic route updates via REST API with driver reconnection through new proxy ports

Recently scylladb started to rely on the options "--auth-superuser-name" and "--auth-superuser-salted-password" to ensure that a cassandra/cassandra user exists for tests - without those options a default superuser no longer exists.

…ames Skip the regex scanner and stack-based parser in parse_casstype_args() when the type string has no parentheses. For simple types like 'AsciiType' or 'org.apache.cassandra.db.marshal.FloatType', go directly to lookup_casstype_simple() which is just a prefix strip + dict lookup. This avoids re.Scanner, re.split on ':' / '=>', int() try/except, and list-of-lists stack manipulation for the common case of non-parameterized types. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

The time.sleep(10) in setup_keyspace() is redundant because callers already ensure the cluster is fully ready before calling it: - use_cluster() calls start_cluster_wait_for_up() which uses wait_for_binary_proto=True + wait_other_notice=True, then wait_for_node_socket() per node - External cluster path (wait=False) had no sleep anyway Remove the wait parameter entirely and its associated sleep, saving 10s per cluster startup.

Replace fixed sleeps with condition-based polling to speed up tests: - simulacron/utils.py: replace 5s sleep with HTTP endpoint polling (max 15s timeout, typically <1s) - test_authentication.py: replace 10s sleep with auth readiness poll that tries connecting with default credentials - upgrade/__init__.py: replace 10s auth sleep with same polling pattern - upgrade/test_upgrade.py: replace 3x 20s sleeps (60s total) with control connection readiness polling Total potential saving: ~95s of unconditional waiting per test run.

Replace fixed sleeps with condition-based polling in four test files: - test_shard_aware.py: replace 25s of sleeps (5+10+5+5) with wait_until_not_raised polling for reconnection after shard connection close and iptables blocking - test_metrics.py: replace 15s of sleeps (5+5+5) with polling for cluster recovery and node-down detection - test_tablets.py: replace 13s of sleeps (3+10) with polling for metadata refresh and decommission completion - simulacron/test_connection.py: replace 20s of sleeps (10+10) with polling for quiescent pool state Total potential saving: ~73s of unconditional waiting.

… for invalidation The tablet tests were intermittently failing because: 1. get_query_trace() used the default 2s max_wait, which is too short under resource pressure (--smp 2). Increased to 10s. 2. test_tablets_invalidation_decommission_non_cc_node used a fixed time.sleep(2) hoping tablet metadata invalidation would complete. Replaced with wait_until polling for the tablet record to be purged (0.5s delay, 20 attempts = 10s budget).

- test_cluster.py: replace sleep(1) x10 iterations with connect(wait_for_all_pools=True) for deterministic pool readiness - test_query.py: replace sleep(5) with wait_until polling for 'Preparing all known prepared statements' log message - test_connection.py: replace sleep(2) with wait_until polling for host_down listener notification

…superuser config Use set_configuration_options() (the Python API behind `ccm updateconf`) to set auth_superuser_name and auth_superuser_salted_password directly in the YAML config instead of passing them via the SCYLLA_EXT_OPTS environment variable.

…ema_agreement min(self._timeout, total_timeout - elapsed) raises TypeError when control_connection_timeout is set to None, which is explicitly documented as a supported value (meaning no timeout). Guard the min() call so that when self._timeout is None, we use only the remaining schema agreement wait time.

Patch reactor.running to False in setUp() so that maybe_start() always enters the branch that spawns the reactor thread. Without this, leaked global reactor state from prior tests can leave reactor.running as True, causing maybe_start() to skip thread creation and the reactor.run mock to never be called — making the assertion in test_connection_initialization fail intermittently. Observed in CI on PyPy 3.11 + macOS x86 (Rosetta 2), where timing differences make the reactor state leak more likely.

The column kind filter at line 2744 used 'clustering_key' but system_schema.columns uses 'clustering' as the kind value. This caused clustering columns to not be excluded from the 'other columns' loop, resulting in them being processed twice (once as clustering key, once as regular column). The correct value 'clustering' was already used 6 lines above in the clustering key extraction loop.

ScyllaDB doesn't support triggers, so skip the triggers query when connected to ScyllaDB. This is detected by checking if the connection has shard awareness (using the existing _is_not_scylla() method). Changes to both SchemaParserV3 and SchemaParserV4: - Modified _query_all() to conditionally append triggers query only for non-ScyllaDB - Modified _query_all() response unpacking to use array slicing for cleaner code - Modified get_table() in V3 to conditionally query triggers This eliminates unnecessary failed queries to system_schema.triggers on ScyllaDB. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

- Fix spelling: 'tring' → 'string' in docstring - Remove extra 't' at end of comment - Refactor complex list comprehension for clarity - Use 'is None' instead of '== None' for None comparison Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>

Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>

Cache the results of _CassandraType.apply_parameters() in a class-level dict keyed by (cls, subtypes, names). This avoids the expensive type() metaclass machinery on repeated calls with the same type signature, which is the common case during result-set deserialization. Benchmark: 31.7x speedup (6.48 us/call -> 0.20 us/call) for cached hits.

mykaul force-pushed the perf/cache-apply-parameters branch from 6057c76 to 6556867 Compare March 13, 2026 12:03

mykaul marked this pull request as draft March 13, 2026 12:15

mykaul requested a review from Copilot March 13, 2026 12:15

Copilot started reviewing on behalf of mykaul March 13, 2026 12:15 View session

Copilot AI reviewed Mar 13, 2026

View reviewed changes

sylwiaszunejko and others added 25 commits March 17, 2026 23:47

Add optional query_params parameter to QueryMessage

9aa5ddc

Introduce skip_scylla_version_lt for integration tests

8bba6eb

Release 3.29.9: changelog, version and documentation

efdc08a

Specify auth superuser name for tests (scylladb#759)

fec90ae

Recently scylladb started to rely on the options "--auth-superuser-name" and "--auth-superuser-salted-password" to ensure that a cassandra/cassandra user exists for tests - without those options a default superuser no longer exists.

chore(deps): update dependency pygments to v2.20.0 [security]

44cf752

chore(deps): update dependency tornado to v6.5.5 [security]

ad12bed

Fix additional '== None' comparison for consistency

8e6c4d4

Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>

mykaul force-pushed the perf/cache-apply-parameters branch from 6556867 to 90c77b5 Compare April 5, 2026 15:38

mykaul force-pushed the perf/cache-apply-parameters branch from 90c77b5 to 5d25b9b Compare April 5, 2026 17:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(improvement) cqltypes: cache apply_parameters for parameterized types#10

(improvement) cqltypes: cache apply_parameters for parameterized types#10
mykaul wants to merge 26 commits into
masterfrom
perf/cache-apply-parameters

mykaul commented Mar 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		"""

		import pytest

Conversation

mykaul commented Mar 13, 2026

Summary

Benchmark results (median, pytest-benchmark)

Single apply_parameters call

Batch apply (simulating read_type for result metadata)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants