Add skipDuplicates parameter on @entity directive#6458
Conversation
Add kw::SKIP_DUPLICATES constant, skip_duplicates bool field to ObjectType, and parsing logic in ObjectType::new() defaulting to false when absent.
Added SkipDuplicatesRequiresImmutable error variant, bool_arg validation for skipDuplicates in validate_entity_directives(), and three test functions covering non-boolean value, mutable entity, and timeseries+skipDuplicates.
Add TypeInfo::skip_duplicates(), InputSchema::skip_duplicates(), and EntityType::skip_duplicates() following the is_immutable() three-layer delegation pattern. Object types return immutable && skip_duplicates; Interface and Aggregation types return false.
Add skip_duplicates: bool field to RowGroup struct alongside immutable, update RowGroup::new() to accept the parameter, and wire it from entity_type.skip_duplicates() in RowGroups::group_entry(). All other call sites (RowGroupForPerfTest, test helpers, example) pass false.
Modify RowGroup::append_row() so that when immutable=true and skip_duplicates=true, cross-block duplicate inserts and Overwrite/Remove operations log warnings and return Ok(()) instead of failing. Same-block duplicates remain allowed. Default behavior (skip_duplicates=false) is preserved exactly. Added Logger field to RowGroup/RowGroups with CacheWeight impl, threaded through all construction sites. 5 unit tests covering all scenarios.
Added skip_duplicates: bool field to Table struct in relational.rs, wired from EntityType::skip_duplicates() in Table::new(), copied in Table::new_like(), and defaulted to false in make_poi_table().
Added logger parameter to Layout::update() and Layout::delete() in relational.rs. When table.immutable && table.skip_duplicates, these methods now log a warning and return Ok(0) instead of returning an error. Default immutable behavior (skip_duplicates=false) is preserved. Updated all callers including deployment_store.rs and test files. Added 4 unit tests with SkipDupMink entity type to verify both skip_duplicates and default immutable behavior.
Add conditional ON CONFLICT (primary_key) DO NOTHING clause to InsertQuery::walk_ast() when table.immutable && table.skip_duplicates. This handles cross-batch duplicates where an entity committed in a previous batch is re-inserted due to the immutable entity cache skipping store lookups. Two unit tests verify: skip_duplicates inserts include ON CONFLICT, default immutable inserts do not.
Restructured Layout::insert() from if-let-Err to match to capture affected_rows from InsertQuery::execute(). Tracks expected vs actual row counts across both the main batch path and the row-by-row fallback path. Logs a warning when affected_rows < expected for skip_duplicates immutable tables.
Add end-to-end runner test exercising @entity(immutable: true, skipDuplicates: true) with duplicate entity inserts across blocks. - tests/runner-tests/skip-duplicates/: subgraph with Ping entity using skipDuplicates, block handler that saves same ID every block - tests/tests/runner_tests.rs: skip_duplicates() test syncs 4 blocks and verifies entity is queryable (indexing did not fail)
3a426e4 to
78c451c
Compare
lutter
left a comment
There was a problem hiding this comment.
One thing to consider here, too, is that there is really a strict hierarchy of mutability: mutable entities, immutable entities, and immutable with skipUpdates entities. That's currently expressed with 2 booleans. It might be clearer to express that with an enum
- Remove logs and the logger from the RowGroup - Simplify schema tests - Add ObjectMutability enum instead of two booleans
| } | ||
|
|
||
| #[derive(Debug, Clone, Copy, PartialEq)] | ||
| pub enum ObjectMutability { |
There was a problem hiding this comment.
Very minor: I would call this EntityMutability
There was a problem hiding this comment.
I also thought about this, but since the struct that contains this enum is called ObjectType, it seemed to make sense to call this ObjectMutability
| #[derive(Debug, Clone, Copy, PartialEq)] | ||
| pub enum ObjectMutability { | ||
| Mutable, | ||
| Immutable { skip_duplicates: bool }, |
There was a problem hiding this comment.
You could make this an enum without any fields since these variants are strictly hierarchical:
enum EntityMutability {
Mutable,
Immutable,
SkipUpdates
}Though I am fine with leaving this as is
There was a problem hiding this comment.
I tried this, but I didn't like that it required comments explaining what it means, and notes that checks for immutability should also check the SkipUpdates variant. I think I would leave it as is for now, but I agree that, with a bit of refactoring, it would make more sense to have separate variants (maybe also with a variant for the timeseries: true).
Summary
Add
skipDuplicatesparameter to the@entitydirective for immutable entities.@entity(immutable: true, skipDuplicates: true)skips duplicate inserts instead of failing indexing. This unblocks subgraph composition on Amp-powered subgraphs where SQL queries can produce the same entities across block ranges.Changes
skipDuplicatesfrom@entitydirective (requiresimmutable: true)ON CONFLICT (id) DO NOTHINGfor skip-duplicates INSERT queries, with affected-rows loggingDefault behavior (
skipDuplicates: false) is unchanged - duplicates still fail indexing.