Skip to content

Unify array structure for containing and marshaling associated data#18988

Closed
dereuromark wants to merge 3 commits into6.xfrom
6.x-assoc-data-structure
Closed

Unify array structure for containing and marshaling associated data#18988
dereuromark wants to merge 3 commits into6.xfrom
6.x-assoc-data-structure

Conversation

@dereuromark
Copy link
Copy Markdown
Member

Resolves comments and 6.x roadmap of
#17547

Supported Formats:

  // Old format (still supported)
  ['associated' => ['Articles' => ['associated' => ['Users', 'Comments']]]]

  // New unified format (like contain())
  ['associated' => ['Articles' => ['Users', 'Comments']]]

  // Dot notation (always supported)
  ['associated' => ['Articles.Users', 'Articles.Comments']]

But this PR could actually go into 5.next, since it seems "BC".
In 6.x we could then think about removing one of the different declarations.

@dereuromark dereuromark added this to the 6.0 milestone Oct 23, 2025
@dereuromark dereuromark requested a review from markstory October 23, 2025 04:10
Comment thread src/ORM/AssociationsNormalizerTrait.php Outdated
@markstory
Copy link
Copy Markdown
Member

In 6.x we could then think about removing one of the different declarations.

How? It seems like only one of them provides the full feature set. The other forms are for 'simple' cases. Maybe we could remove the 'dot notation' form, but I think that is also supported by contain().

@dereuromark
Copy link
Copy Markdown
Member Author

dereuromark commented Oct 28, 2025

Yeah, now they are now equal in support with this.
If we want dot notation removed, we would probably want to deprecate that on the contain() part too.

@ADmad
Copy link
Copy Markdown
Member

ADmad commented Oct 28, 2025

My issue isn't really with the dot notation. It's with array structure like this where the aliases and config keys can be mixed at the same level:

$usersQuery->contain([
    'Articles' => [
        'conditions' => [.....],   // option key
        'Comments',  // model alias
    ],
]);

I would prefer if the 2nd level associations were also under the config key:

$usersQuery->contain([
    'Articles' => [
        'conditions' => [.....],
        'associated' => ['Comments']
    ],
]);

@markstory
Copy link
Copy Markdown
Member

I can agree that

$usersQuery->contain([
    'Articles' => [
        'conditions' => [.....],   // option key
        'Comments',  // model alias
    ],
]);

is awkward. I'm on-board for not having that anymore. Would it suffice to deprecate only this call style in 5.x? That would allow us to remove the problematic scenario with a graceful upgrade path.

It looks like this call style can be detected by having a mix of string and int keys which should be simple enough to detect. I think this also sets us up better for contain options to use a builder pattern more easily in the future.

@dereuromark
Copy link
Copy Markdown
Member Author

If aliases were always UpperCaseCamel (upper case first letter), there would also be no collisions.
Same if all other keys would be underscored, e.g. _conditions etc.
But otherwise I agree that we should not use that mixed format anymore.

@ADmad
Copy link
Copy Markdown
Member

ADmad commented Oct 29, 2025

Would it suffice to deprecate only this call style in 5.x?

I would be happy with just that.

It looks like this call style can be detected by having a mix of string and int keys which should be simple enough to detect.

Not always, I think you can have this form too:

$usersQuery->contain([
    'Articles' => [
        'conditions' => [.....],   // option as key
        'Comments' => [..options for this inner model...]  // model alias as key
    ],
]);

I think this also sets us up better for contain options to use a builder pattern more easily in the future.

Yes, adding builders in future would be nice. I believe that will also make it easier for AI agents to write CakePHP code.

If aliases were always UpperCaseCamel (upper case first letter), there would also be no collisions.

Yes when naming the aliases as per convention the chances of collision are negligible but it's also about readability. I personally find that this mix of options and alias as same level adds unnecessary cognitive load.

Resolved conflict in src/ORM/Marshaller.php by keeping both changes:
- Documentation for new nested array format (from PR)
- Template type annotations (from 6.x)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@dereuromark
Copy link
Copy Markdown
Member Author

So what course do we go?
What exactly do we deprecated, what do we propose people use?

@othercorey
Copy link
Copy Markdown
Contributor

Is the proposal to check if the isn't isn't one of the allowed option keys and if so suggest using associated ?

@dereuromark
Copy link
Copy Markdown
Member Author

My main concern would be that this verbose form is super hard to type for default case when u just want list them nested.
I hope we find another agreement on shortcutting.

@othercorey
Copy link
Copy Markdown
Contributor

Is is annoying. I'm not sure how to improve it with fluid getters other than with a closure - but we already do that to modify the query.

@dereuromark
Copy link
Copy Markdown
Member Author

The few times you need to set options, I would in this syntax prefer underscores. We could throw deprecations in 5.x.

@dereuromark
Copy link
Copy Markdown
Member Author

dereuromark commented Nov 19, 2025

So with what approach do we want to go moving forward?

$usersQuery->contain([
    'Articles' => [
        'conditions' => [.....],   // option key
        'Comments',  // model alias
    ],
]);

Seems fine to me as long as you stick to conventions (lower vs upper should never conflict) tbh

$usersQuery->contain([
    'Articles' => [
        '_conditions' => [.....],   // option key
        'Comments',  // model alias
    ],
]);

Prefixing options with underscore would be an option to separate.

Using associated key always is quite verbose and not sure if that's so user-friendly in the end.
The most common use case are those aliases, the options are not used too often.

Another idea:
Using a similar options builder than we did for paginator, and allow building this dynamically?

$usersQuery->contain(Contain::build()->...)

Not sure how the syntax for such a nesting would look like though.

@dereuromark
Copy link
Copy Markdown
Member Author

dereuromark commented Dec 12, 2025

We could even go as far as using options or even _options key only:

$usersQuery->contain([
    'Articles' => [
        'options' => ['conditions' => ..., '...' => ...],   // one single lowercase one (compared to CamelCase models)
        'Comments',  // model alias
    ],
]);

This was its easy to parse, in those few cases those are applied.
The 99% case is using the assoc aliases only, and nesting them - so this should remain the primary syntax for both contain and marshal case IMO

So: Current options are

  1. options wrapper key (cleanest approach)
    // Current problematic mix:
    ['Articles' => ['conditions' => [...], 'Comments']]

// Clean separation:
['Articles' => ['options' => ['conditions' => [...]], 'Comments']]
Everything not in options is treated as an association. No list needed.

  1. Runtime association lookup
    Instead of a hardcoded list, check against actual associations on the table:
  protected function extractAssociations(array $options, Table $table): array
  {
      $associations = [];
      $actualOptions = [];
      $associationNames = array_keys($table->associations()->keys());

      foreach ($options as $key => $value) {
          if (is_int($key)) {
              $associations[] = $value;
          } elseif (in_array($key, $associationNames, true) || $table->hasAssociation($key)) {
              $associations[$key] = $value;
          } else {
              $actualOptions[$key] = $value;
          }
      }
      return [$associations, $actualOptions];
  }

This flips the logic: instead of "known options list", anything matching an association name is an association, everything else is an option. No maintenance needed.

I think 1. would be an option for Cake 6 if we wanted to, "options" and "associations" being the only special keyword.
The 2. solution would actually be already implementable for 5.3 now as it is BC.

Replace getKnownOptions() with convention-based detection:
- Association names start with uppercase (CamelCase): Users, Articles
- Option keys start with lowercase (camelCase): onlyIds, conditions
- Special data keys start with underscore: _joinData, _ids

This eliminates the need to maintain a list of known keys that must
be updated whenever new marshalling/contain options are added.
@dereuromark
Copy link
Copy Markdown
Member Author

dereuromark commented Dec 13, 2025

Pushed a commit that addresses the concern about maintaining the known options list.

Solution: Use CakePHP naming conventions instead of a hardcoded list:

  • Association names start with uppercase (CamelCase): Users, Articles
  • Option keys start with lowercase (camelCase): onlyIds, conditions
  • Special data keys start with underscore: _joinData, _ids

This eliminates getKnownOptions() entirely - no list to maintain when new options are added.

The convention is already enforced by CakePHP's existing patterns (model names are always CamelCase, option keys are always camelCase), so this is fully backward compatible.

Maybe this should target 5.next then..

@dereuromark
Copy link
Copy Markdown
Member Author

@markstory Whats your take on this? This seems to make the current way of declaration then consistent.
Or do we have any alternative approaches?

@dereuromark dereuromark marked this pull request as ready for review April 11, 2026 09:55
@dereuromark dereuromark deleted the 6.x-assoc-data-structure branch April 11, 2026 10:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants