This piece of code in the online_write_batch function in the Postgres online store pushes data in batches to the online store.
I was wondering whether it makes more sense to put the conn.commit() inside the for loop, or after the for loop. I would love to hear the different trade-offs between the two options!
Copy of the code snippet here:
batch_size = 5000
for i in range(0, len(insert_values), batch_size):
cur_batch = insert_values[i : i + batch_size]
execute_values(
cur,
sql.SQL(
"""
INSERT INTO {}
(entity_key, feature_name, value, event_ts, created_ts)
VALUES %s
ON CONFLICT (entity_key, feature_name) DO
UPDATE SET
value = EXCLUDED.value,
event_ts = EXCLUDED.event_ts,
created_ts = EXCLUDED.created_ts;
""",
).format(sql.Identifier(_table_id(project, table))),
cur_batch,
page_size=batch_size,
)
conn.commit() # << This is the point of interest. Should this be placed in the loop or after?
This piece of code in the
online_write_batchfunction in the Postgres online store pushes data in batches to the online store.I was wondering whether it makes more sense to put the
conn.commit()inside the for loop, or after the for loop. I would love to hear the different trade-offs between the two options!Copy of the code snippet here: