Skip to content

An initial attempt at bug #4926 - WIP#5130

Draft
rlerdorf wants to merge 1 commit intov6from
bug4926
Draft

An initial attempt at bug #4926 - WIP#5130
rlerdorf wants to merge 1 commit intov6from
bug4926

Conversation

@rlerdorf
Copy link
Copy Markdown
Member

@rlerdorf rlerdorf commented Oct 6, 2025

This one is definitely not ready yet. A summary of my discussion with the AI on this one:

Issue #4926: False Positives on Array Access with Mixed Types

Problem Statement

When assigning to different fields of arrays that have a mixed base type (e.g., from json_decode()), Phan emits false positive PhanTypeInvalidDimOffset warnings. The issue occurs because after assigning to one field, Phan creates a restrictive ArrayShapeType that only includes the assigned field, causing errors when accessing other fields.

Test Cases from Issue

Case 1: Simple Function Parameter

function test(mixed $bar) {
    if (is_array($bar)) {
        $bar["x"] = strlen($bar["x"]);
        $bar["y"] = strlen($bar["y"]); // FALSE POSITIVE: Invalid offset "y"
    }
}

Case 2: Assignment from json_decode

$barArr = json_decode('{}', true);
'@phan-var array{x:string,y:string} $barArr';
$barArr["x"] = strlen($barArr["x"]);
$barArr["y"] = strlen($barArr["y"]); // FALSE POSITIVE: Invalid offset "y"

Case 3: Assignment with List Annotation

$barArr = json_decode('{}', true);
'@phan-var list<array{x:string,y:string}> $barArr';
if (count($barArr) > 0) {
    $barArr[0]["x"] = strlen($barArr[0]["x"]);
    $barArr[0]["y"] = strlen($barArr[0]["y"]); // FALSE POSITIVE: Invalid offset "y"
}

Root Cause Analysis

AssignmentVisitor Issue (Write Side)

In src/Phan/Analysis/AssignmentVisitor.php, the visitDim() method handles assignments to array dimensions like $arr[$key] = $value.

Original Behavior:

// Line ~930 in visitDim()
$right_type = ArrayShapeType::fromFieldTypes([
    $dim_value => $this->right_type,
], false)->asRealUnionType();

This creates an array{x:int} when assigning $arr["x"] = 5, even if $arr originally had type mixed|array{x:string,y:string}. When this restrictive shape is merged with the original mixed type, subsequent accesses to $arr["y"] fail.

Why This Happens:

  • For new structures like $GLOBALS['a']['b'] = 'value', we WANT restrictive shapes to track what fields exist
  • For external mixed sources like json_decode(), we DON'T want restrictive shapes because the full structure might exist at runtime

UnionTypeVisitor Issue (Read Side)

In src/Phan/AST/UnionTypeVisitor.php, the resolveArrayShapeElementTypes() method handles reading from arrays like $arr[$key].

Original Behavior:
When resolveArrayShapeElementTypesForOffset() returns false (meaning array shapes in the union don't have the key), it would emit PhanTypeInvalidDimOffset errors even if OTHER types in the union (like mixed or generic arrays) could have the key.

Case 2 & 3 Discovery:
Cases 2 and 3 weren't failing on the WRITE side (AssignmentVisitor) but on the READ side. After fixing the write side for case 1, cases 2 and 3 still failed because:

  • The union type was list<array{x:int}>|list<array{x:string,y:string}> (from PHPDoc annotation + assignment inference)
  • When accessing $barArr[0]["y"], Phan checked if ANY array shape in the union had field "y"
  • The inferred array{x:int} didn't have "y", causing an error
  • But the PHPDoc annotated array{x:string,y:string} DID have "y", so no error should occur

ListType Discovery:
Case 3 revealed that ListType wasn't properly handled when extracting element types. The code treated ListType as "could have any key" which is incorrect - lists only accept integer keys >= 0.

Solutions Implemented

1. AssignmentVisitor Fix (Write Side)

File: src/Phan/Analysis/AssignmentVisitor.php
Lines: 885-936

Strategy: Detect whether the base array comes from an external mixed source vs. a new structure being created.

// Check if we should use array shape or generic array
// Fix for issue #4926: For mixed/unknown base types, use mixed element type
// to avoid false positives when accessing different fields
$has_mixed_type = !$expr_union_type->isEmpty() && $expr_union_type->hasMixedOrNonEmptyMixedType();

// Check if the base comes from an external mixed/generic array source
// vs. being a new structure we're creating (like GLOBALS['a'])
$use_generic_for_mixed = false;
if ($has_mixed_type) {
    // Find the root variable to check if it's from an external source
    $root_node = $expr_node;
    while ($root_node instanceof Node && $root_node->kind === \ast\AST_DIM) {
        $root_node = $root_node->children['expr'];
    }
    if ($root_node instanceof Node && $root_node->kind === \ast\AST_VAR) {
        $root_var_name = (new ContextNode($this->code_base, $this->context, $root_node))->getVariableName();
        // Hardcoded variables like $GLOBALS should use array shapes
        if (!Variable::isHardcodedVariableInScopeWithName($root_var_name, $this->context->isInGlobalScope())) {
            $root_type = UnionTypeVisitor::unionTypeFromNode(
                $this->code_base,
                $this->context,
                $root_node,
                false
            );
            // Use generic array if root has generic/mixed array types (external source)
            // Use array shape if root is undefined/empty (creating new structure)
            $use_generic_for_mixed = !$root_type->isEmpty() &&
                ($root_type->hasGenericArray() || $root_type->hasMixedOrNonEmptyMixedType());
        }
    }
}

if (!$has_mixed_type || !$use_generic_for_mixed) {
    // Base is not mixed OR we're creating a new structure → create array shape
    $right_type = ArrayShapeType::fromFieldTypes([
        $dim_value => $this->right_type,
    ], false)->asRealUnionType();
} else {
    // Base is mixed from external source (json_decode, etc.) → use generic array
    // This prevents false positives when the same array is accessed with different keys
    $key_type_enum = GenericArrayType::keyTypeFromUnionTypeValues($dim_type);
    $right_type = GenericArrayType::fromElementType(MixedType::instance(false), false, $key_type_enum)->asRealUnionType();
    if (!$right_type->hasRealTypeSet()) {
        $right_type = $right_type->withRealTypeSet(UnionType::typeSetFromString('non-empty-array'));
    }
}

Key Decisions:

  1. Traverse to root variable to determine if it's from an external source
  2. Hardcoded variables like $GLOBALS always use array shapes (for tracking fields)
  3. Variables with generic/mixed types use generic arrays (to avoid false positives)
  4. Undefined/empty variables use array shapes (for creating new structures)

Edge Case: GLOBALS Test
The tests/rasmus_files/src/0024_GLOBALS.php test showed that $GLOBALS must always use array shapes, not generic arrays, even though it has a mixed base type. This is because GLOBALS is being POPULATED, not read from an external source.

2. UnionTypeVisitor Fix (Read Side)

File: src/Phan/AST/UnionTypeVisitor.php
Lines: 2088-2140

Strategy: Don't emit errors if ANY type in the union (PHPDoc or real) could have the accessed key.

if ($resulting_element_type === false) {
    // Fix for issue #4926: Don't emit error if some array shapes in the union don't have the key
    // Check if ANY type (including PHPDoc types) could have the key
    $could_have_key_phpdoc = false;
    $could_have_key_real = self::couldRealTypesHaveKey($union_type->getRealTypeSet(), $dim_value);

    // Also check PHPDoc types - resolveArrayShapeElementTypesForOffset returns false
    // when it loops through PHPDoc types and finds none with the key, but there might
    // be generic arrays in the real types
    foreach ($union_type->getTypeSet() as $type) {
        if ($type instanceof ArrayShapeType) {
            if (isset($type->getFieldTypes()[$dim_value])) {
                $could_have_key_phpdoc = true;
                break;
            }
        } elseif ($type instanceof ListType) {
            // ListType can only have integer keys >= 0
            $filtered = \is_int($dim_value) ? $dim_value : \filter_var($dim_value, \FILTER_VALIDATE_INT);
            if (\is_int($filtered) && $filtered >= 0) {
                $could_have_key_phpdoc = true;
                break;
            }
        } elseif (!($type instanceof ArrayType) || $type instanceof GenericArrayType) {
            // Non-array-shape types (generic arrays, etc.) could have any key
            $could_have_key_phpdoc = true;
            break;
        }
    }

    $could_have_key = $could_have_key_phpdoc || $could_have_key_real;

    // XXX not sure what to do here. For now, just return null and only warn in cases where requested to.
    if ($check_invalid_dim && !$could_have_key) {
        // Only emit error if NO type in the union could possibly have this key
        $exception = new IssueException(
            Issue::fromType(Issue::TypeInvalidDimOffset)(
                $this->context->getFile(),
                $dim_node->lineno ?? $node->lineno,
                [StringUtil::jsonEncode($dim_value), ASTReverter::toShortString($node->children['expr']), (string)$union_type]
            )
        );
        if ($this->should_catch_issue_exception) {
            Issue::maybeEmitInstance($this->code_base, $this->context, $exception->getIssueInstance());
        } else {
            throw $exception;
        }
    }
    // $union_type is exclusively array shape types, but those don't contain the field $dim_value.
    // It's undefined (which becomes null)
    if ($could_have_key) {
        return NullType::instance(false)->asPHPDocUnionType();
    }
    return NullType::instance(false)->asRealUnionType();
}

Original Bug:
The original code had:

} elseif (!($type instanceof ArrayType) || $type instanceof ListType || $type instanceof GenericArrayType) {
    // Non-array-shape types (generic arrays, lists, etc.) could have any key
    $could_have_key_phpdoc = true;
    break;
}

This treated ListType as "could have any key" which is wrong. Lists can only have integer keys >= 0, not string keys like "aces".

Fix:
Split ListType into its own case with proper validation:

} elseif ($type instanceof ListType) {
    // ListType can only have integer keys >= 0
    $filtered = \is_int($dim_value) ? $dim_value : \filter_var($dim_value, \FILTER_VALIDATE_INT);
    if (\is_int($filtered) && $filtered >= 0) {
        $could_have_key_phpdoc = true;
        break;
    }
} elseif (!($type instanceof ArrayType) || $type instanceof GenericArrayType) {
    // Non-array-shape types (generic arrays, etc.) could have any key
    $could_have_key_phpdoc = true;
    break;
}

3. ListType Element Extraction

File: src/Phan/AST/UnionTypeVisitor.php
Lines: 2200-2216

Problem: When accessing $barArr[0] where $barArr is list<array{x:string,y:string}>, Phan wasn't extracting the element type array{x:string,y:string} from the list.

Solution:

} elseif ($type instanceof ListType) {
    // Fix for issue #4926: Extract element type from ListType
    $filtered = \is_int($dim_value) ? $dim_value : \filter_var($dim_value, \FILTER_VALIDATE_INT);
    if (\is_int($filtered) && $filtered >= 0) {
        // Valid int index for list - extract element type
        // ListType extends GenericArrayType which has genericArrayElementType() method
        $element_type = $type->genericArrayElementType();
        $element_union_type = $element_type->asPHPDocUnionType();
        if (!$element_union_type->isEmpty()) {
            if ($resulting_element_type instanceof UnionType) {
                $resulting_element_type = $resulting_element_type->withUnionType($element_union_type);
            } else {
                $resulting_element_type = $element_union_type;
            }
        }
    }
    continue;

Failed Attempts:

  1. First tried $type->asGenericArrayType($code_base)->genericArrayElementTypes(true, $code_base) - wrong signature
  2. Then tried $type->asGenericArrayType(KEY_INT) - still wrong, asGenericArrayType expects int enum
  3. Final solution: Use genericArrayElementType() (singular) which returns a Type, then convert to UnionType

Test Results

✅ What Works

Issue #4926 Test Cases:

  • ✅ Case 1: Simple mixed parameter - PASS
  • ✅ Case 2: json_decode with annotation - PASS
  • ✅ Case 3: json_decode with list annotation - PASS

Existing Tests:

  • ✅ GLOBALS test (0024_GLOBALS.php) - Still passes
  • ✅ shuffle/list test (0809_shuffle_converts_to_list.php) - Now correctly emits PhanTypeInvalidDimOffset for $deck['aces'] on a list<mixed>
  • ✅ 2041 out of 2053 tests passing (99.4%)

⚠️ What Doesn't Work (Test Regressions)

12 Test Failures:

  1. 0219_superglobal_type_check.php - Superglobal array access behavior changed
  2. 0327_references_sort_preserves_type.php - Reference/sort type preservation issue
  3. 0447_parse_url.php - parse_url return type handling
  4. 0459_array_key_exists_array.php - array_key_exists type narrowing changed
  5. 0523_regex_group_extraction.php - Regex group array shape inference
  6. 0608_regex_features.php - Regex array shape features
  7. 0653_instanceof_array_field.php - instanceof narrowing on array fields
  8. 0666_this_array_key_exists.php - array_key_exists on $this properties
  9. 0757_empty_nested.php - empty() checks on nested arrays
  10. 0792_list_type.php - List type behavior (may need expected output update)
  11. 0816_json_decode_objects.php - Missing PhanTypeArraySuspiciousNullable warning (false negative)
  12. 1121_literal_string_concat.php - Literal string type tracking

Analysis:

Most failures appear to be related to:

  • Type narrowing after array_key_exists/isset checks - The "could have key" logic may be too permissive
  • Array shape inference from operations - parse_url, preg_match, etc. create specific shapes
  • Mixed/generic array interactions - The fix may be affecting how unions of array shapes and generic arrays are resolved

Most Concerning: 0816_json_decode_objects.php

This is a FALSE NEGATIVE - a warning that SHOULD be emitted but isn't:

$content = json_decode($_GET['whatever']);
$content_type = is_object($content) ? $content->content_type : $content[0]->content_type;
// Should warn: PhanTypeArraySuspiciousNullable on line 3

Expected: PhanTypeArraySuspiciousNullable Suspicious array access to $content of nullable type ?bool|?float|?int|?list<mixed>|?string
Actual: No warning

The fix may have suppressed this warning because list<mixed> is in the union and we now consider it "could have any key".

Attempted Solutions That Failed

Attempt 1: Simple Mixed Check

if ($expr_union_type->hasMixedType()) {
    // Use generic array
}

Problem: Broke GLOBALS test because $GLOBALS has mixed type but needs array shapes.

Attempt 2: Simple Variable Check

$is_simple_variable = ($expr_node instanceof Node && $expr_node->kind === \ast\AST_VAR);
if ($has_mixed_type && !$is_simple_variable) {
    // Use generic array
}

Problem: Case 2 failed because $barArr["x"] is a simple variable.

Attempt 3: Check for Array Shapes in Union

$has_array_shapes = false;
foreach ($expr_union_type->getTypeSet() as $type) {
    if ($type instanceof ArrayShapeType) {
        $has_array_shapes = true;
        break;
    }
}
if ($has_mixed_type && !$has_array_shapes) {
    // Use generic array
}

Problem: Many mixed unions have array shapes from PHPDoc annotations, so this check was too broad.

Attempt 4: Wrong ListType Method Calls

// Tried: asGenericArrayType() with CodeBase
$element_types = $type->asGenericArrayType($code_base)->genericArrayElementTypes(true, $code_base);
// Error: asGenericArrayType() expects int, not CodeBase

// Tried: genericArrayElementTypes() on Type
$element_types = $type->genericArrayElementTypes(true, $code_base);
// Error: genericArrayElementTypes() only exists on UnionType, not Type

Solution: Use genericArrayElementType() (singular) which returns a Type, then convert.

Open Questions

1. Is the "Could Have Key" Logic Too Permissive?

The current fix checks if ANY type in a union could have a key. This means:

  • Union array{x:int}|mixed accessing ["y"] → No error (mixed could have "y")
  • Union array{x:int}|list<mixed> accessing ["y"] → No error (generic array could have "y")

Is this the right behavior? Or should we:

  • Require that the MAJORITY of types can have the key?
  • Emit warnings when SOME but not ALL types lack the key?
  • Only suppress errors when there's a non-array-shape generic type (mixed, array, list, etc.)?

2. Should Empty List Emit Warnings?

Test 0816_json_decode_objects.php expects a warning when accessing $content[0] where $content is ?bool|?float|?int|?list<mixed>|?string.

The list is nullable AND could be empty, so accessing [0] is suspicious. But our fix treats list<mixed> as "could have any integer key" which suppresses the nullable warning.

Should we:

  • Check if the list is NonEmptyListType vs regular ListType?
  • Always emit nullable warnings even when generic arrays are present?
  • Have different logic for nullable vs non-nullable unions?

3. Should We Track Assignment Context?

Currently we distinguish external sources by checking if the root variable has generic/mixed types. But what if:

  • A variable starts as array{} (empty)
  • PHPDoc annotates it as array{x:string,y:string}
  • We assign $arr["x"] = 5

Should the assignment create a restrictive array{x:int} or respect the PHPDoc annotation and keep the full shape?

4. Performance Impact?

The fix adds:

  • Root variable traversal for every array dimension assignment
  • Extra type checking for mixed/generic detection
  • Double iteration through union type sets (PHPDoc + real types)

Has this introduced any measurable performance penalty on large codebases?

Recommendations

Option A: Ship with Known Regressions

  • Document the 12 failing tests
  • Update expected output for tests where behavior change is acceptable
  • File follow-up issues for real bugs (like 0816_json_decode_objects.php)

Option B: Refine the Fix

  • Make "could have key" logic stricter (e.g., only when MixedType or non-shape GenericArrayType is present)
  • Special case nullable warnings to always emit even with generic types present
  • Add configuration option for strictness level

Option C: Hybrid Approach

  • Keep write-side fix (AssignmentVisitor) as-is
  • Refine read-side fix (UnionTypeVisitor) to be more conservative
  • Only suppress errors when there's an explicit generic array or MixedType (not ListType)

Files Modified

  1. src/Phan/Analysis/AssignmentVisitor.php

    • Lines 885-936: visitDim() method
    • Added logic to detect external mixed sources vs new structures
  2. src/Phan/AST/UnionTypeVisitor.php

    • Lines 2088-2140: resolveArrayShapeElementTypes() method
    • Lines 2103-2109: ListType validation in could-have-key check
    • Lines 2200-2216: ListType element extraction in resolveArrayShapeElementTypesForOffset()
  3. tests/files/src/4926_array_mixed_access.php (new)

    • Test file with all 3 cases from the issue
  4. tests/files/expected/4926_array_mixed_access.php.expected (new)

    • Empty file (expecting no warnings)

Test Commands

# Test individual cases
~/phan/phan --no-progress-bar -n /tmp/test_4926_case1.php
~/phan/phan --no-progress-bar -n /tmp/test_4926_case2.php
~/phan/phan --no-progress-bar -n /tmp/test_4926_case3.php

# Test all cases at once
~/phan/phan --no-progress-bar -n /tmp/test_4926_case1.php && \
~/phan/phan --no-progress-bar -n /tmp/test_4926_case2.php && \
~/phan/phan --no-progress-bar -n /tmp/test_4926_case3.php && \
echo "✓✓✓ All 3 cases PASS!"

# Test GLOBALS (important regression test)
~/phan/phan --no-progress-bar tests/rasmus_files/src/0024_GLOBALS.php

# Run full test suite
cd ~/phan && ./vendor/bin/phpunit

# Run specific failing test
./vendor/bin/phpunit --filter="testFiles.*0816_json_decode_objects"

Debugging Techniques Used

@phan-debug-var Annotation

$arr = ['x' => 1];
'@phan-debug-var $arr'; // Outputs: array{x:int}

Type Flow Tracing

Created test files with progressive steps to see where types changed:

  • /tmp/test_barr0_detailed.php - Showed when array shapes were created
  • /tmp/test_union_access.php - Demonstrated union type field access
  • /tmp/test_just_index0.php - Isolated ListType element extraction

Test-Driven Investigation

  1. Run test to see failure
  2. Add debug output to understand type flow
  3. Modify code with hypothesis
  4. Re-run test to validate
  5. Run GLOBALS test to check for regressions
  6. Iterate

Conclusion

The fix successfully resolves issue #4926 for all three reported cases by:

  1. Detecting external mixed sources and using generic arrays instead of restrictive shapes
  2. Checking if any type in a union could have an accessed key before emitting errors
  3. Properly handling ListType validation and element extraction

However, the fix introduces 12 test regressions that need investigation. The most concerning is a false negative in json_decode nullable array access. The root issue is balancing between:

Further refinement may be needed to find the right balance, possibly with configuration options for strictness levels or special handling for nullable types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant