|
| 1 | +# Hotspot Analysis from JMH Profiling |
| 2 | + |
| 3 | +## Executive Summary |
| 4 | + |
| 5 | +Analysis of allocation profiling data from SimpleQueryBenchmark reveals several significant hotspots where targeted optimizations could yield measurable performance improvements. |
| 6 | + |
| 7 | +## Methodology |
| 8 | + |
| 9 | +Ran SimpleQueryBenchmark with async-profiler allocation tracking: |
| 10 | +- 903.495 ± 213.207 ops/s baseline performance |
| 11 | +- Total allocations analyzed: 77.6 GB across test run |
| 12 | +- Focus on allocation sites >1% of total |
| 13 | + |
| 14 | +## Top Allocation Hotspots |
| 15 | + |
| 16 | +### 1. ExecutionStrategyParameters (10.21% - 7.9GB) |
| 17 | + |
| 18 | +**Location:** `graphql.execution.ExecutionStrategyParameters` |
| 19 | + |
| 20 | +**Issue:** This object is created for every field resolution in the query execution tree. With nested queries, this creates thousands of instances per query. |
| 21 | + |
| 22 | +**Current Implementation:** |
| 23 | +```java |
| 24 | +private ExecutionStrategyParameters(ExecutionStepInfo executionStepInfo, |
| 25 | + Object source, |
| 26 | + Object localContext, |
| 27 | + MergedSelectionSet fields, |
| 28 | + NonNullableFieldValidator nonNullableFieldValidator, |
| 29 | + ResultPath path, |
| 30 | + MergedField currentField, |
| 31 | + ExecutionStrategyParameters parent, |
| 32 | + AlternativeCallContext alternativeCallContext) { |
| 33 | + this.executionStepInfo = assertNotNull(executionStepInfo, "executionStepInfo is null"); |
| 34 | + // ... 8 more field assignments |
| 35 | +} |
| 36 | +``` |
| 37 | + |
| 38 | +**Optimization Opportunities:** |
| 39 | +1. **Object Pooling**: Consider pooling ExecutionStrategyParameters objects for reuse |
| 40 | +2. **Reduce Field Count**: Review if all 9 fields are necessary or if some can be computed on-demand |
| 41 | +3. **Flyweight Pattern**: Share immutable state across instances where possible |
| 42 | + |
| 43 | +**Impact Estimate:** 2-3% throughput improvement |
| 44 | + |
| 45 | +### 2. LinkedHashMap + LinkedHashMap$Entry (11.68% combined - 13GB) |
| 46 | + |
| 47 | +**Location:** Various (field arguments, variable maps, selection sets) |
| 48 | + |
| 49 | +**Issue:** LinkedHashMap is used throughout execution but often with small, known-size collections. |
| 50 | + |
| 51 | +**Optimization Opportunities:** |
| 52 | +1. **Pre-size collections**: When size is known, initialize with capacity |
| 53 | +2. **Use ArrayList for small sets**: For <5 items, ArrayList may be faster |
| 54 | +3. **Immutable collections**: Use ImmutableMap for read-only data |
| 55 | + |
| 56 | +**Example Fix:** |
| 57 | +```java |
| 58 | +// Before: |
| 59 | +Map<String, Object> args = new LinkedHashMap<>(); |
| 60 | + |
| 61 | +// After (if size known): |
| 62 | +Map<String, Object> args = new LinkedHashMap<>((int) (expectedSize / 0.75) + 1); |
| 63 | +``` |
| 64 | + |
| 65 | +**Impact Estimate:** 1-2% throughput improvement |
| 66 | + |
| 67 | +### 3. ExecutionStepInfo (5.49% - 4.2GB) |
| 68 | + |
| 69 | +**Location:** `graphql.execution.ExecutionStepInfo` |
| 70 | + |
| 71 | +**Issue:** Created for every field in the execution tree. Has 8 fields including Supplier for arguments. |
| 72 | + |
| 73 | +**Current Allocation Pattern:** |
| 74 | +- Created via Builder pattern |
| 75 | +- Alternative constructor exists but not heavily used |
| 76 | +- Contains `Supplier<ImmutableMapWithNullValues<String, Object>> arguments` |
| 77 | + |
| 78 | +**Optimization Opportunities:** |
| 79 | +1. **Prefer direct constructor**: Line 84-98 shows optimized constructor (~1% faster) |
| 80 | +2. **Lazy argument resolution**: Arguments supplier allocates IntraThreadMemoizedSupplier |
| 81 | +3. **Cache common instances**: Root-level ExecutionStepInfo could be cached |
| 82 | + |
| 83 | +**Impact Estimate:** 1-2% throughput improvement |
| 84 | + |
| 85 | +### 4. ResultPath (3.38% - 2.6GB) |
| 86 | + |
| 87 | +**Location:** `graphql.execution.ResultPath` |
| 88 | + |
| 89 | +**Issue:** Creates new path object for each field traversal. Immutable with parent reference. |
| 90 | + |
| 91 | +**Current Implementation:** |
| 92 | +```java |
| 93 | +private ResultPath(ResultPath parent, String segment) { |
| 94 | + this.parent = assertNotNull(parent, "Must provide a parent path"); |
| 95 | + this.segment = assertNotNull(segment, "Must provide a sub path"); |
| 96 | + this.toStringValue = initString(); // ← String allocation |
| 97 | + this.level = parent.level + 1; |
| 98 | +} |
| 99 | +``` |
| 100 | + |
| 101 | +**Optimization Opportunities:** |
| 102 | +1. **Lazy toString()**: `toStringValue` is computed eagerly but may not be used |
| 103 | +2. **Path interning**: Common paths could be cached/interned |
| 104 | +3. **StringBuilder pooling**: String building could use pooled StringBuilder |
| 105 | + |
| 106 | +**Impact Estimate:** 0.5-1% throughput improvement |
| 107 | + |
| 108 | +### 5. IntraThreadMemoizedSupplier (3.34% - 2.5GB) |
| 109 | + |
| 110 | +**Location:** `graphql.util.IntraThreadMemoizedSupplier` |
| 111 | + |
| 112 | +**Issue:** Created for every lazy-evaluated value, particularly in ExecutionStepInfo for arguments. |
| 113 | + |
| 114 | +**Current Implementation:** |
| 115 | +```java |
| 116 | +private T value = (T) SENTINEL; |
| 117 | +private final Supplier<T> delegate; |
| 118 | +``` |
| 119 | + |
| 120 | +**Optimization Opportunities:** |
| 121 | +1. **Avoid for already-resolved values**: If value is known, skip memoization wrapper |
| 122 | +2. **Direct value storage**: For hot paths, store value directly instead of wrapping |
| 123 | +3. **Reuse wrapper instances**: Pool for common access patterns |
| 124 | + |
| 125 | +**Impact Estimate:** 0.5-1% throughput improvement |
| 126 | + |
| 127 | +### 6. String and byte[] (15.9% combined - 12.2GB) |
| 128 | + |
| 129 | +**Location:** Throughout codebase |
| 130 | + |
| 131 | +**Issue:** String operations, particularly in path construction and error messages. |
| 132 | + |
| 133 | +**Optimization Opportunities:** |
| 134 | +1. **Reduce toString() calls**: Many classes compute string representation eagerly |
| 135 | +2. **String interning**: For common field names and type names |
| 136 | +3. **Avoid string concatenation**: Use StringBuilder for multi-part strings |
| 137 | +4. **Lazy error message construction**: Only build error strings when actually needed |
| 138 | + |
| 139 | +**Impact Estimate:** 2-3% throughput improvement |
| 140 | + |
| 141 | +## Recommended Implementation Priority |
| 142 | + |
| 143 | +### High Impact, Low Risk (Implement First) |
| 144 | +1. **Pre-size LinkedHashMap collections** - Easy win, low risk |
| 145 | +2. **Lazy ResultPath.toStringValue** - Simple change, measurable impact |
| 146 | +3. **Avoid IntraThreadMemoizedSupplier for known values** - Clear optimization |
| 147 | + |
| 148 | +### Medium Impact, Medium Risk |
| 149 | +4. **Optimize ExecutionStepInfo construction** - Use direct constructor more |
| 150 | +5. **Cache common ExecutionStepInfo instances** - Requires careful lifecycle management |
| 151 | +6. **String interning for field/type names** - Needs memory analysis |
| 152 | + |
| 153 | +### High Impact, High Risk (Requires Deep Analysis) |
| 154 | +7. **Object pooling for ExecutionStrategyParameters** - Complex lifecycle |
| 155 | +8. **Flyweight pattern for shared state** - Significant architectural change |
| 156 | + |
| 157 | +## Validation Methodology |
| 158 | + |
| 159 | +For each optimization: |
| 160 | +1. Create isolated microbenchmark |
| 161 | +2. Run with and without optimization |
| 162 | +3. Verify with allocation profiler |
| 163 | +4. Run full test suite |
| 164 | +5. Compare before/after on all three benchmarks |
| 165 | + |
| 166 | +## Next Steps |
| 167 | + |
| 168 | +1. Implement top 3 optimizations |
| 169 | +2. Re-run profiling to measure impact |
| 170 | +3. Document actual vs estimated improvements |
| 171 | +4. Iterate on remaining opportunities |
| 172 | + |
0 commit comments