Skip to content

Latest commit

 

History

History
101 lines (79 loc) · 5.37 KB

File metadata and controls

101 lines (79 loc) · 5.37 KB

Column Encryption

Overview

Support for client-side encryption of data was added in version 3.27.0 of the Python driver. When using this feature data will be encrypted on-the-fly according to a specified :class:`~.ColumnEncryptionPolicy` instance. This policy is also used to decrypt data in returned rows. If a prepared statement is used this decryption is transparent to the user; retrieved data will be decrypted and converted into the original type (according to definitions in the encryption policy). Support for simple (i.e. non-prepared) queries is also available, although in this case values must be manually encrypted and/or decrypted. The :class:`~.ColumnEncryptionPolicy` instance provides methods to assist with these operations.

Client-side encryption and decryption should work against all versions of Cassandra and DSE. It does not utilize any server-side functionality to do its work.

WARNING: Encryption format changes in 3.28.0

Python driver 3.28.0 introduces a new encryption format for data written by :class:`~.AES256ColumnEncryptionPolicy`. As a result, any encrypted data written by Python driver 3.27.0 will NOT be readable. If you upgraded from 3.27.0, you should re-encrypt your data with 3.28.0.

Configuration

Client-side encryption is enabled by creating an instance of a subclass of :class:`~.ColumnEncryptionPolicy` and adding information about columns to be encrypted to it. This policy is then supplied to :class:`~.Cluster` when it's created.

import os

from cassandra.policies import ColDesc
from cassandra.column_encryption.policies import AES256ColumnEncryptionPolicy, AES256_KEY_SIZE_BYTES

key = os.urandom(AES256_KEY_SIZE_BYTES)
cl_policy = AES256ColumnEncryptionPolicy()
col_desc = ColDesc('ks1','table1','column1')
cql_type = "int"
cl_policy.add_column(col_desc, key, cql_type)
cluster = Cluster(column_encryption_policy=cl_policy)

:class:`~.AES256ColumnEncryptionPolicy` is a subclass of :class:`~.ColumnEncryptionPolicy` which provides encryption and decryption via AES-256. This class is currently the only available column encryption policy implementation, although users can certainly implement their own by subclassing :class:`~.ColumnEncryptionPolicy`.

:class:`~.ColDesc` is a named tuple which uniquely identifies a column in a given keyspace and table. When we have this tuple, the encryption key and the CQL type contained by this column we can add the column to the policy using :func:`~.ColumnEncryptionPolicy.add_column`. Once we have added all column definitions to the policy we pass it along to the cluster.

The CQL type for the column only has meaning at the client; it is never sent to Cassandra. The encryption key is also never sent to the server; all the server ever sees are random bytes reflecting the encrypted data. As a result all columns containing client-side encrypted values should be declared with the CQL type "blob" at the Cassandra server.

Usage

Encryption

Client-side encryption shines most when used with prepared statements. A prepared statement is aware of information about the columns in the query it was built from and we can use this information to transparently encrypt any supplied parameters. For example, we can create a prepared statement to insert a value into column1 (as defined above) by executing the following code after creating a :class:`~.Cluster` in the manner described above:

session = cluster.connect()
prepared = session.prepare("insert into ks1.table1 (column1) values (?)")
session.execute(prepared, (1000,))

Our encryption policy will detect that "column1" is an encrypted column and take appropriate action.

As mentioned above client-side encryption can also be used with simple queries, although such use cases are certainly not transparent. :class:`~.ColumnEncryptionPolicy` provides a helper named :func:`~.ColumnEncryptionPolicy.encode_and_encrypt` which will convert an input value into bytes using the standard serialization methods employed by the driver. The result is then encrypted according to the configuration of the policy. Using this approach the example above could be implemented along the lines of the following:

session = cluster.connect()
session.execute("insert into ks1.table1 (column1) values (%s)",(cl_policy.encode_and_encrypt(col_desc, 1000),))

Decryption

Decryption of values returned from the server is always transparent. Whether we're executing a simple or prepared statement encrypted columns will be decrypted automatically and made available via rows just like any other result.

Limitations

:class:`~.AES256ColumnEncryptionPolicy` uses the implementation of AES-256 provided by the cryptography module. Any limitations of this module should be considered when deploying client-side encryption. Note specifically that a Rust compiler is required for modern versions of the cryptography package, although wheels exist for many common platforms.

Client-side encryption has been implemented for both the default Cython and pure Python row processing logic. This functionality has not yet been ported to the NumPy Cython implementation. During testing, the NumPy processing works on Python 3.7 but fails for Python 3.8.