Skip to content

Commit 08c46ee

Browse files
committed
ORM executemany returning
Build on #5401 to allow the ORM to take advanage of executemany INSERT + RETURNING. Implemented the feature updated tests to support INSERT DEFAULT VALUES, needed to come up with a new syntax for compiler INSERT INTO table (anycol) VALUES (DEFAULT) which can then be iterated out for executemany. Added graceful degrade to plain executemany for PostgreSQL <= 8.2 Renamed EXECUTEMANY_DEFAULT to EXECUTEMANY_PLAIN Fix issue where unicode identifiers or parameter names wouldn't work with execute_values() under Py2K, because we have to encode the statement and therefore have to encode the insert_single_values_expr too. Correct issue from #5401 to support executemany + return_defaults for a PK that is explicitly pre-generated, meaning we aren't actually getting RETURNING but need to return it from compiled_parameters. Fixes: #5263 Change-Id: Id68e5c158c4f9ebc33b61c06a448907921c2a657
1 parent 2d93873 commit 08c46ee

16 files changed

Lines changed: 911 additions & 367 deletions

File tree

doc/build/changelog/migration_14.rst

Lines changed: 73 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -594,6 +594,75 @@ as was present previously.
594594

595595
:ticket:`4826`
596596

597+
.. _change_5263:
598+
599+
ORM Batch inserts with psycopg2 now batch statements with RETURNING in most cases
600+
---------------------------------------------------------------------------------
601+
602+
The change in :ref:`change_5401` adds support for "executemany" + "RETURNING"
603+
at the same time in Core, which is now enabled for the psycopg2 dialect
604+
by default using the psycopg2 ``execute_values()`` extension. The ORM flush
605+
process now makes use of this feature such that the retrieval of newly generated
606+
primary key values and server defaults can be achieved while not losing the
607+
performance benefits of being able to batch INSERT statements together. Additionally,
608+
psycopg2's ``execute_values()`` extension itself provides a five-fold performance
609+
improvement over psycopg2's default "executemany" implementation, by rewriting
610+
an INSERT statement to include many "VALUES" expressions all in one statement
611+
rather than invoking the same statement repeatedly, as psycopg2 lacks the ability
612+
to PREPARE the statement ahead of time as would normally be expected for this
613+
approach to be performant.
614+
615+
SQLAlchemy includes a :ref:`performance suite <examples_performance>` within
616+
its examples, where we can compare the times generated for the "batch_inserts"
617+
runner against 1.3 and 1.4, revealing a 3x-5x speedup for most flavors
618+
of batch insert::
619+
620+
# 1.3
621+
$ python -m examples.performance bulk_inserts --dburl postgresql://scott:tiger@localhost/test
622+
test_flush_no_pk : (100000 iterations); total time 14.051527 sec
623+
test_bulk_save_return_pks : (100000 iterations); total time 15.002470 sec
624+
test_flush_pk_given : (100000 iterations); total time 7.863680 sec
625+
test_bulk_save : (100000 iterations); total time 6.780378 sec
626+
test_bulk_insert_mappings : (100000 iterations); total time 5.363070 sec
627+
test_core_insert : (100000 iterations); total time 5.362647 sec
628+
629+
# 1.4 with enhancement
630+
$ python -m examples.performance bulk_inserts --dburl postgresql://scott:tiger@localhost/test
631+
test_flush_no_pk : (100000 iterations); total time 3.820807 sec
632+
test_bulk_save_return_pks : (100000 iterations); total time 3.176378 sec
633+
test_flush_pk_given : (100000 iterations); total time 4.037789 sec
634+
test_bulk_save : (100000 iterations); total time 2.604446 sec
635+
test_bulk_insert_mappings : (100000 iterations); total time 1.204897 sec
636+
test_core_insert : (100000 iterations); total time 0.958976 sec
637+
638+
Note that the ``execute_values()`` extension modifies the INSERT statement in the psycopg2
639+
layer, **after** it's been logged by SQLAlchemy. So with SQL logging, one will see the
640+
parameter sets batched together, but the joining of multiple "values" will not be visible
641+
on the application side::
642+
643+
2020-06-27 19:08:18,166 INFO sqlalchemy.engine.Engine INSERT INTO a (data) VALUES (%(data)s) RETURNING a.id
644+
2020-06-27 19:08:18,166 INFO sqlalchemy.engine.Engine [generated in 0.00698s] ({'data': 'data 1'}, {'data': 'data 2'}, {'data': 'data 3'}, {'data': 'data 4'}, {'data': 'data 5'}, {'data': 'data 6'}, {'data': 'data 7'}, {'data': 'data 8'} ... displaying 10 of 4999 total bound parameter sets ... {'data': 'data 4998'}, {'data': 'data 4999'})
645+
2020-06-27 19:08:18,254 INFO sqlalchemy.engine.Engine COMMIT
646+
647+
The ultimate INSERT statement can be seen by enabling statement logging on the PostgreSQL side::
648+
649+
2020-06-27 19:08:18.169 EDT [26960] LOG: statement: INSERT INTO a (data)
650+
VALUES ('data 1'),('data 2'),('data 3'),('data 4'),('data 5'),('data 6'),('data
651+
7'),('data 8'),('data 9'),('data 10'),('data 11'),('data 12'),
652+
... ('data 999'),('data 1000') RETURNING a.id
653+
654+
2020-06-27 19:08:18.175 EDT
655+
[26960] LOG: statement: INSERT INTO a (data) VALUES ('data 1001'),('data
656+
1002'),('data 1003'),('data 1004'),('data 1005 '),('data 1006'),('data
657+
1007'),('data 1008'),('data 1009'),('data 1010'),('data 1011'), ...
658+
659+
The feature batches rows into groups of 1000 by default which can be affected
660+
using the ``executemany_values_page_size`` argument documented at
661+
:ref:`psycopg2_executemany_mode`.
662+
663+
:ticket:`5263`
664+
665+
597666
.. _change_orm_update_returning_14:
598667

599668
ORM Bulk Update and Delete use RETURNING for "fetch" strategy when available
@@ -1591,7 +1660,10 @@ psycopg2 dialect features "execute_values" with RETURNING for INSERT statements
15911660
The first half of a significant performance enhancement for PostgreSQL when
15921661
using both Core and ORM, the psycopg2 dialect now uses
15931662
``psycopg2.extras.execute_values()`` by default for compiled INSERT statements
1594-
and also implements RETURNING support in this mode.
1663+
and also implements RETURNING support in this mode. The other half of this
1664+
change is :ref:`change_5263` which allows the ORM to take advantage of
1665+
RETURNING with executemany (i.e. batching of INSERT statements) so that ORM
1666+
bulk inserts with psycopg2 are up to 400% faster depending on specifics.
15951667

15961668
This extension method allows many rows to be INSERTed within a single
15971669
statement, using an extended VALUES clause for the statement. While
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
.. change::
2+
:tags: orm, performance, postgresql
3+
:tickets: 5263
4+
5+
Implemented support for the psycopg2 ``execute_values()`` extension
6+
within the ORM flush process via the enhancements to Core made
7+
in :ticket:`5401`, so that this extension is used
8+
both as a strategy to batch INSERT statements together as well as
9+
that RETURNING may now be used among multiple parameter sets to
10+
retrieve primary key values back in batch. This allows nearly
11+
all INSERT statements emitted by the ORM on behalf of PostgreSQL
12+
to be submitted in batch and also via the ``execute_values()``
13+
extension which benches at five times faster than plain
14+
executemany() for this particular backend.
15+
16+
.. seealso::
17+
18+
:ref:`change_5263`

lib/sqlalchemy/dialects/postgresql/psycopg2.py

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -643,7 +643,7 @@ class PGIdentifierPreparer_psycopg2(PGIdentifierPreparer):
643643
pass
644644

645645

646-
EXECUTEMANY_DEFAULT = util.symbol("executemany_default", canonical=0)
646+
EXECUTEMANY_PLAIN = util.symbol("executemany_plain", canonical=0)
647647
EXECUTEMANY_BATCH = util.symbol("executemany_batch", canonical=1)
648648
EXECUTEMANY_VALUES = util.symbol("executemany_values", canonical=2)
649649
EXECUTEMANY_VALUES_PLUS_BATCH = util.symbol(
@@ -655,6 +655,12 @@ class PGIdentifierPreparer_psycopg2(PGIdentifierPreparer):
655655
class PGDialect_psycopg2(PGDialect):
656656
driver = "psycopg2"
657657
if util.py2k:
658+
# turn off supports_unicode_statements for Python 2. psycopg2 supports
659+
# unicode statements in Py2K. But! it does not support unicode *bound
660+
# parameter names* because it uses the Python "%" operator to
661+
# interpolate these into the string, and this fails. So for Py2K, we
662+
# have to use full-on encoding for statements and parameters before
663+
# passing to cursor.execute().
658664
supports_unicode_statements = False
659665

660666
supports_server_side_cursors = True
@@ -714,7 +720,7 @@ def __init__(
714720
self.executemany_mode = util.symbol.parse_user_argument(
715721
executemany_mode,
716722
{
717-
EXECUTEMANY_DEFAULT: [None],
723+
EXECUTEMANY_PLAIN: [None],
718724
EXECUTEMANY_BATCH: ["batch"],
719725
EXECUTEMANY_VALUES: ["values_only"],
720726
EXECUTEMANY_VALUES_PLUS_BATCH: ["values_plus_batch", "values"],
@@ -747,7 +753,12 @@ def initialize(self, connection):
747753
and self._hstore_oids(connection.connection) is not None
748754
)
749755

750-
# http://initd.org/psycopg/docs/news.html#what-s-new-in-psycopg-2-0-9
756+
# PGDialect.initialize() checks server version for <= 8.2 and sets
757+
# this flag to False if so
758+
if not self.full_returning:
759+
self.insert_executemany_returning = False
760+
self.executemany_mode = EXECUTEMANY_PLAIN
761+
751762
self.supports_sane_multi_rowcount = not (
752763
self.executemany_mode & EXECUTEMANY_BATCH
753764
)
@@ -876,17 +887,16 @@ def do_executemany(self, cursor, statement, parameters, context=None):
876887
executemany_values = (
877888
"(%s)" % context.compiled.insert_single_values_expr
878889
)
890+
if not self.supports_unicode_statements:
891+
executemany_values = executemany_values.encode(self.encoding)
892+
879893
# guard for statement that was altered via event hook or similar
880894
if executemany_values not in statement:
881895
executemany_values = None
882896
else:
883897
executemany_values = None
884898

885899
if executemany_values:
886-
# Currently, SQLAlchemy does not pass "RETURNING" statements
887-
# into executemany(), since no DBAPI has ever supported that
888-
# until the introduction of psycopg2's executemany_values, so
889-
# we are not yet using the fetch=True flag.
890900
statement = statement.replace(executemany_values, "%s")
891901
if self.executemany_values_page_size:
892902
kwargs = {"page_size": self.executemany_values_page_size}

lib/sqlalchemy/engine/default.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -824,7 +824,7 @@ def _init_compiled(
824824
if self.isinsert or self.isupdate or self.isdelete:
825825
self.is_crud = True
826826
self._is_explicit_returning = bool(compiled.statement._returning)
827-
self._is_implicit_returning = (
827+
self._is_implicit_returning = bool(
828828
compiled.returning and not compiled.statement._returning
829829
)
830830

@@ -1291,11 +1291,12 @@ def _setup_out_parameters(self, result):
12911291
result.out_parameters = out_parameters
12921292

12931293
def _setup_dml_or_text_result(self):
1294-
if self.isinsert and not self.executemany:
1294+
if self.isinsert:
12951295
if (
12961296
not self._is_implicit_returning
12971297
and not self.compiled.inline
12981298
and self.dialect.postfetch_lastrowid
1299+
and not self.executemany
12991300
):
13001301

13011302
self._setup_ins_pk_from_lastrowid()
@@ -1375,7 +1376,7 @@ def _setup_ins_pk_from_empty(self):
13751376
getter = self.compiled._inserted_primary_key_from_lastrowid_getter
13761377

13771378
self.inserted_primary_key_rows = [
1378-
getter(None, self.compiled_parameters[0])
1379+
getter(None, param) for param in self.compiled_parameters
13791380
]
13801381

13811382
def _setup_ins_pk_from_implicit_returning(self, result, rows):

0 commit comments

Comments
 (0)