Skip to content

Commit 2d8ff4f

Browse files
committed
remove use of SQL expressions in "modifiers" for regexp
Fixed issue where the :meth:`_sql.ColumnOperators.regexp_match` when using "flags" would not produce a "stable" cache key, that is, the cache key would keep changing each time causing cache pollution. The same issue existed for :meth:`_sql.ColumnOperators.regexp_replace` with both the flags and the actual replacement expression. The flags are now represented as fixed modifier strings rendered as safestrings rather than bound parameters, and the replacement expression is established within the primary portion of the "binary" element so that it generates an appropriate cache key. Note that as part of this change, the :paramref:`_sql.ColumnOperators.regexp_match.flags` and :paramref:`_sql.ColumnOperators.regexp_replace.flags` have been modified to render as literal strings only, whereas previously they were rendered as full SQL expressions, typically bound parameters. These parameters should always be passed as plain Python strings and not as SQL expression constructs; it's not expected that SQL expression constructs were used in practice for this parameter, so this is a backwards-incompatible change. The change also modifies the internal structure of the expression generated, for :meth:`_sql.ColumnOperators.regexp_replace` with or without flags, and for :meth:`_sql.ColumnOperators.regexp_match` with flags. Third party dialects which may have implemented regexp implementations of their own (no such dialects could be located in a search, so impact is expected to be low) would need to adjust the traversal of the structure to accommodate. Fixed issue in mostly-internal :class:`.CacheKey` construct where the ``__ne__()`` operator were not properly implemented, leading to nonsensical results when comparing :class:`.CacheKey` instances to each other. Fixes: #10042 Change-Id: I2e245f81d7ee7136ad04cf77be35f9745c5da5e5
1 parent e3bb7a6 commit 2d8ff4f

13 files changed

Lines changed: 216 additions & 131 deletions

File tree

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
.. change::
2+
:tags: bug, sql
3+
:tickets: 10042
4+
:versions: 2.0.18
5+
6+
Fixed issue where the :meth:`_sql.ColumnOperators.regexp_match`
7+
when using "flags" would not produce a "stable" cache key, that
8+
is, the cache key would keep changing each time causing cache pollution.
9+
The same issue existed for :meth:`_sql.ColumnOperators.regexp_replace`
10+
with both the flags and the actual replacement expression.
11+
The flags are now represented as fixed modifier strings rendered as
12+
safestrings rather than bound parameters, and the replacement
13+
expression is established within the primary portion of the "binary"
14+
element so that it generates an appropriate cache key.
15+
16+
Note that as part of this change, the
17+
:paramref:`_sql.ColumnOperators.regexp_match.flags` and
18+
:paramref:`_sql.ColumnOperators.regexp_replace.flags` have been modified to
19+
render as literal strings only, whereas previously they were rendered as
20+
full SQL expressions, typically bound parameters. These parameters should
21+
always be passed as plain Python strings and not as SQL expression
22+
constructs; it's not expected that SQL expression constructs were used in
23+
practice for this parameter, so this is a backwards-incompatible change.
24+
25+
The change also modifies the internal structure of the expression
26+
generated, for :meth:`_sql.ColumnOperators.regexp_replace` with or without
27+
flags, and for :meth:`_sql.ColumnOperators.regexp_match` with flags. Third
28+
party dialects which may have implemented regexp implementations of their
29+
own (no such dialects could be located in a search, so impact is expected
30+
to be low) would need to adjust the traversal of the structure to
31+
accommodate.
32+
33+
34+
.. change::
35+
:tags: bug, sql
36+
:versions: 2.0.18
37+
38+
Fixed issue in mostly-internal :class:`.CacheKey` construct where the
39+
``__ne__()`` operator were not properly implemented, leading to nonsensical
40+
results when comparing :class:`.CacheKey` instances to each other.
41+
42+
43+

lib/sqlalchemy/dialects/mysql/base.py

Lines changed: 7 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1703,7 +1703,7 @@ def visit_is_not_distinct_from_binary(self, binary, operator, **kw):
17031703

17041704
def _mariadb_regexp_flags(self, flags, pattern, **kw):
17051705
return "CONCAT('(?', %s, ')', %s)" % (
1706-
self.process(flags, **kw),
1706+
self.render_literal_value(flags, sqltypes.STRINGTYPE),
17071707
self.process(pattern, **kw),
17081708
)
17091709

@@ -1721,7 +1721,7 @@ def _regexp_match(self, op_string, binary, operator, **kw):
17211721
text = "REGEXP_LIKE(%s, %s, %s)" % (
17221722
self.process(binary.left, **kw),
17231723
self.process(binary.right, **kw),
1724-
self.process(flags, **kw),
1724+
self.render_literal_value(flags, sqltypes.STRINGTYPE),
17251725
)
17261726
if op_string == " NOT REGEXP ":
17271727
return "NOT %s" % text
@@ -1736,25 +1736,22 @@ def visit_not_regexp_match_op_binary(self, binary, operator, **kw):
17361736

17371737
def visit_regexp_replace_op_binary(self, binary, operator, **kw):
17381738
flags = binary.modifiers["flags"]
1739-
replacement = binary.modifiers["replacement"]
17401739
if flags is None:
1741-
return "REGEXP_REPLACE(%s, %s, %s)" % (
1740+
return "REGEXP_REPLACE(%s, %s)" % (
17421741
self.process(binary.left, **kw),
17431742
self.process(binary.right, **kw),
1744-
self.process(replacement, **kw),
17451743
)
17461744
elif self.dialect.is_mariadb:
17471745
return "REGEXP_REPLACE(%s, %s, %s)" % (
17481746
self.process(binary.left, **kw),
1749-
self._mariadb_regexp_flags(flags, binary.right),
1750-
self.process(replacement, **kw),
1747+
self._mariadb_regexp_flags(flags, binary.right.clauses[0]),
1748+
self.process(binary.right.clauses[1], **kw),
17511749
)
17521750
else:
1753-
return "REGEXP_REPLACE(%s, %s, %s, %s)" % (
1751+
return "REGEXP_REPLACE(%s, %s, %s)" % (
17541752
self.process(binary.left, **kw),
17551753
self.process(binary.right, **kw),
1756-
self.process(replacement, **kw),
1757-
self.process(flags, **kw),
1754+
self.render_literal_value(flags, sqltypes.STRINGTYPE),
17581755
)
17591756

17601757

lib/sqlalchemy/dialects/oracle/base.py

Lines changed: 7 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1217,7 +1217,7 @@ def visit_regexp_match_op_binary(self, binary, operator, **kw):
12171217
return "REGEXP_LIKE(%s, %s, %s)" % (
12181218
string,
12191219
pattern,
1220-
self.process(flags, **kw),
1220+
self.render_literal_value(flags, sqltypes.STRINGTYPE),
12211221
)
12221222

12231223
def visit_not_regexp_match_op_binary(self, binary, operator, **kw):
@@ -1227,21 +1227,18 @@ def visit_not_regexp_match_op_binary(self, binary, operator, **kw):
12271227

12281228
def visit_regexp_replace_op_binary(self, binary, operator, **kw):
12291229
string = self.process(binary.left, **kw)
1230-
pattern = self.process(binary.right, **kw)
1231-
replacement = self.process(binary.modifiers["replacement"], **kw)
1230+
pattern_replace = self.process(binary.right, **kw)
12321231
flags = binary.modifiers["flags"]
12331232
if flags is None:
1234-
return "REGEXP_REPLACE(%s, %s, %s)" % (
1233+
return "REGEXP_REPLACE(%s, %s)" % (
12351234
string,
1236-
pattern,
1237-
replacement,
1235+
pattern_replace,
12381236
)
12391237
else:
1240-
return "REGEXP_REPLACE(%s, %s, %s, %s)" % (
1238+
return "REGEXP_REPLACE(%s, %s, %s)" % (
12411239
string,
1242-
pattern,
1243-
replacement,
1244-
self.process(flags, **kw),
1240+
pattern_replace,
1241+
self.render_literal_value(flags, sqltypes.STRINGTYPE),
12451242
)
12461243

12471244

lib/sqlalchemy/dialects/postgresql/base.py

Lines changed: 8 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1812,14 +1812,14 @@ def _regexp_match(self, base_op, binary, operator, kw):
18121812
return self._generate_generic_binary(
18131813
binary, " %s " % base_op, **kw
18141814
)
1815-
if isinstance(flags, elements.BindParameter) and flags.value == "i":
1815+
if flags == "i":
18161816
return self._generate_generic_binary(
18171817
binary, " %s* " % base_op, **kw
18181818
)
18191819
return "%s %s CONCAT('(?', %s, ')', %s)" % (
18201820
self.process(binary.left, **kw),
18211821
base_op,
1822-
self.process(flags, **kw),
1822+
self.render_literal_value(flags, sqltypes.STRINGTYPE),
18231823
self.process(binary.right, **kw),
18241824
)
18251825

@@ -1831,21 +1831,18 @@ def visit_not_regexp_match_op_binary(self, binary, operator, **kw):
18311831

18321832
def visit_regexp_replace_op_binary(self, binary, operator, **kw):
18331833
string = self.process(binary.left, **kw)
1834-
pattern = self.process(binary.right, **kw)
1834+
pattern_replace = self.process(binary.right, **kw)
18351835
flags = binary.modifiers["flags"]
1836-
replacement = self.process(binary.modifiers["replacement"], **kw)
18371836
if flags is None:
1838-
return "REGEXP_REPLACE(%s, %s, %s)" % (
1837+
return "REGEXP_REPLACE(%s, %s)" % (
18391838
string,
1840-
pattern,
1841-
replacement,
1839+
pattern_replace,
18421840
)
18431841
else:
1844-
return "REGEXP_REPLACE(%s, %s, %s, %s)" % (
1842+
return "REGEXP_REPLACE(%s, %s, %s)" % (
18451843
string,
1846-
pattern,
1847-
replacement,
1848-
self.process(flags, **kw),
1844+
pattern_replace,
1845+
self.render_literal_value(flags, sqltypes.STRINGTYPE),
18491846
)
18501847

18511848
def visit_empty_set_expr(self, element_types, **kw):

lib/sqlalchemy/sql/cache_key.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -465,6 +465,9 @@ def to_offline_string(
465465
def __eq__(self, other: Any) -> bool:
466466
return bool(self.key == other.key)
467467

468+
def __ne__(self, other: Any) -> bool:
469+
return not (self.key == other.key)
470+
468471
@classmethod
469472
def _diff_tuples(cls, left: CacheKey, right: CacheKey) -> str:
470473
ck1 = CacheKey(left, [])

lib/sqlalchemy/sql/compiler.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6366,11 +6366,9 @@ def visit_not_regexp_match_op_binary(self, binary, operator, **kw):
63666366
return self._generate_generic_binary(binary, " <not regexp> ", **kw)
63676367

63686368
def visit_regexp_replace_op_binary(self, binary, operator, **kw):
6369-
replacement = binary.modifiers["replacement"]
6370-
return "<regexp replace>(%s, %s, %s)" % (
6369+
return "<regexp replace>(%s, %s)" % (
63716370
binary.left._compiler_dispatch(self, **kw),
63726371
binary.right._compiler_dispatch(self, **kw),
6373-
replacement._compiler_dispatch(self, **kw),
63746372
)
63756373

63766374
def visit_try_cast(self, cast, **kwargs):

lib/sqlalchemy/sql/default_comparator.py

Lines changed: 29 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -361,24 +361,17 @@ def _regexp_match_impl(
361361
flags: Optional[str],
362362
**kw: Any,
363363
) -> ColumnElement[Any]:
364-
if flags is not None:
365-
flags_expr = coercions.expect(
364+
return BinaryExpression(
365+
expr,
366+
coercions.expect(
366367
roles.BinaryElementRole,
367-
flags,
368+
pattern,
368369
expr=expr,
369-
operator=operators.regexp_replace_op,
370-
)
371-
else:
372-
flags_expr = None
373-
return _boolean_compare(
374-
expr,
370+
operator=operators.comma_op,
371+
),
375372
op,
376-
pattern,
377-
flags=flags_expr,
378-
negate_op=operators.not_regexp_match_op
379-
if op is operators.regexp_match_op
380-
else operators.regexp_match_op,
381-
**kw,
373+
negate=operators.not_regexp_match_op,
374+
modifiers={"flags": flags},
382375
)
383376

384377

@@ -390,23 +383,27 @@ def _regexp_replace_impl(
390383
flags: Optional[str],
391384
**kw: Any,
392385
) -> ColumnElement[Any]:
393-
replacement = coercions.expect(
394-
roles.BinaryElementRole,
395-
replacement,
396-
expr=expr,
397-
operator=operators.regexp_replace_op,
398-
)
399-
if flags is not None:
400-
flags_expr = coercions.expect(
401-
roles.BinaryElementRole,
402-
flags,
403-
expr=expr,
404-
operator=operators.regexp_replace_op,
405-
)
406-
else:
407-
flags_expr = None
408-
return _binary_operate(
409-
expr, op, pattern, replacement=replacement, flags=flags_expr, **kw
386+
return BinaryExpression(
387+
expr,
388+
ExpressionClauseList._construct_for_list(
389+
operators.comma_op,
390+
type_api.NULLTYPE,
391+
coercions.expect(
392+
roles.BinaryElementRole,
393+
pattern,
394+
expr=expr,
395+
operator=operators.comma_op,
396+
),
397+
coercions.expect(
398+
roles.BinaryElementRole,
399+
replacement,
400+
expr=expr,
401+
operator=operators.comma_op,
402+
),
403+
group=False,
404+
),
405+
op,
406+
modifiers={"flags": flags},
410407
)
411408

412409

lib/sqlalchemy/sql/operators.py

Lines changed: 21 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1573,15 +1573,23 @@ def regexp_match(
15731573
15741574
:param pattern: The regular expression pattern string or column
15751575
clause.
1576-
:param flags: Any regular expression string flags to apply. Flags
1577-
tend to be backend specific. It can be a string or a column clause.
1576+
:param flags: Any regular expression string flags to apply, passed as
1577+
plain Python string only. These flags are backend specific.
15781578
Some backends, like PostgreSQL and MariaDB, may alternatively
15791579
specify the flags as part of the pattern.
15801580
When using the ignore case flag 'i' in PostgreSQL, the ignore case
15811581
regexp match operator ``~*`` or ``!~*`` will be used.
15821582
15831583
.. versionadded:: 1.4
15841584
1585+
.. versionchanged:: 1.4.48, 2.0.18 Note that due to an implementation
1586+
error, the "flags" parameter previously accepted SQL expression
1587+
objects such as column expressions in addition to plain Python
1588+
strings. This implementation did not work correctly with caching
1589+
and was removed; strings only should be passed for the "flags"
1590+
parameter, as these flags are rendered as literal inline values
1591+
within SQL expressions.
1592+
15851593
.. seealso::
15861594
15871595
:meth:`_sql.ColumnOperators.regexp_replace`
@@ -1618,13 +1626,22 @@ def regexp_replace(
16181626
:param pattern: The regular expression pattern string or column
16191627
clause.
16201628
:param pattern: The replacement string or column clause.
1621-
:param flags: Any regular expression string flags to apply. Flags
1622-
tend to be backend specific. It can be a string or a column clause.
1629+
:param flags: Any regular expression string flags to apply, passed as
1630+
plain Python string only. These flags are backend specific.
16231631
Some backends, like PostgreSQL and MariaDB, may alternatively
16241632
specify the flags as part of the pattern.
16251633
16261634
.. versionadded:: 1.4
16271635
1636+
.. versionchanged:: 1.4.48, 2.0.18 Note that due to an implementation
1637+
error, the "flags" parameter previously accepted SQL expression
1638+
objects such as column expressions in addition to plain Python
1639+
strings. This implementation did not work correctly with caching
1640+
and was removed; strings only should be passed for the "flags"
1641+
parameter, as these flags are rendered as literal inline values
1642+
within SQL expressions.
1643+
1644+
16281645
.. seealso::
16291646
16301647
:meth:`_sql.ColumnOperators.regexp_match`

0 commit comments

Comments
 (0)