Prework for read write streamlining#6316
Conversation
0fb90c0 to
7982f05
Compare
|
Rebased onto main |
There was a problem hiding this comment.
Pull request overview
This PR refactors Npgsql’s type info/converter resolution model as prework for simplifying read/write call sites by making PgTypeInfo abstract and introducing PgConcreteTypeInfo as the unified, authoritative carrier used during reads/writes. It also renames the “resolver” concepts to “provider” concepts (PgConverterResolver → PgConcreteTypeInfoProvider, PgResolverTypeInfo → PgProviderTypeInfo) to better reflect responsibilities and reduce duplicated logic around delayed resolution.
Changes:
- Introduce
PgConcreteTypeInfoand makePgTypeInfoabstract, shifting resolution APIs to return concrete type infos. - Replace converter resolvers with concrete type info providers and update call sites/mapping composition accordingly.
- Update tests and internal factories/converters to use the new provider + concrete type info model.
Reviewed changes
Copilot reviewed 30 out of 30 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| test/Npgsql.Tests/TypeMapperTests.cs | Update test resolver to return PgConcreteTypeInfo. |
| test/Npgsql.Tests/ReaderTests.cs | Update test resolver to return PgConcreteTypeInfo. |
| test/Npgsql.Tests/CopyTests.cs | Rename test to reflect provider terminology. |
| src/Npgsql/TypeMapping/GlobalTypeMapper.cs | Adjust data type name inference to use provider/concrete type info APIs. |
| src/Npgsql/NpgsqlParameter`.cs | Update generic parameter implementation to request concrete type info. |
| src/Npgsql/NpgsqlParameter.cs | Replace converter resolution with concrete type info selection during binding. |
| src/Npgsql/NpgsqlBinaryExporter.cs | Update binary export binding to use provider default concrete type info. |
| src/Npgsql/Internal/TypeInfoMapping.cs | Refactor mapping composition to operate on providers and PgConcreteTypeInfo. |
| src/Npgsql/Internal/ResolverFactories/UnmappedTypeInfoResolverFactory.cs | Update dynamic range/multirange mapping composition for concrete type infos. |
| src/Npgsql/Internal/ResolverFactories/NetworkTypeInfoResolverFactory.cs | Use PgConcreteTypeInfo for IPAddress mapping. |
| src/Npgsql/Internal/ResolverFactories/AdoTypeInfoResolverFactory.cs | Update baseline mappings to use providers/concrete type infos (incl. temporal/bitstring/arrays). |
| src/Npgsql/Internal/ResolverFactories/AdoTypeInfoResolverFactory.Range.cs | Update range mappings to use DateTime type info providers. |
| src/Npgsql/Internal/ResolverFactories/AdoTypeInfoResolverFactory.Multirange.cs | Update multirange mappings to use DateTime type info providers. |
| src/Npgsql/Internal/PgTypeInfo.cs | Core refactor: make PgTypeInfo abstract, add PgProviderTypeInfo + PgConcreteTypeInfo, replace resolution APIs. |
| src/Npgsql/Internal/PgSerializerOptions.cs | Update UnspecifiedDBNullTypeInfo to be a PgConcreteTypeInfo. |
| src/Npgsql/Internal/PgConverterResolver.cs | Removed in favor of PgConcreteTypeInfoProvider. |
| src/Npgsql/Internal/PgConcreteTypeInfoProvider.cs | New public experimental provider abstraction with validation and internal plumbing. |
| src/Npgsql/Internal/PgComposingTypeInfoProvider.cs | New composing provider base to cache composed PgConcreteTypeInfo instances. |
| src/Npgsql/Internal/PgComposingConverterResolver.cs | Removed in favor of composing provider base. |
| src/Npgsql/Internal/DynamicTypeInfoResolver.cs | Update reflection-based mapping registration to call provider APIs. |
| src/Npgsql/Internal/Converters/Temporal/DateTimeTypeInfoProvider.cs | New DateTime provider implementation replacing the old resolver. |
| src/Npgsql/Internal/Converters/Temporal/DateTimeConverterResolver.cs | Removed in favor of DateTime type info provider. |
| src/Npgsql/Internal/Converters/PolymorphicConverterResolver.cs | Removed in favor of polymorphic array type info provider(s). |
| src/Npgsql/Internal/Converters/PolymorphicArrayTypeInfoProvider.cs | New provider for polymorphic array converter composition (read-time). |
| src/Npgsql/Internal/Converters/ObjectConverter.cs | Switch object converter binding to new concrete type info APIs. |
| src/Npgsql/Internal/Converters/NullableConverter.cs | Replace nullable composing resolver with nullable composing provider. |
| src/Npgsql/Internal/Converters/CastingConverter.cs | Replace casting composing resolver with casting composing provider and update ToNonBoxing. |
| src/Npgsql/Internal/Converters/BitStringConverters.cs | Replace polymorphic bitstring resolver with provider producing cached concrete infos. |
| src/Npgsql/Internal/Converters/ArrayConverter.cs | Refactor array converter composition to accept PgConcreteTypeInfo and use providers for polymorphic cases. |
| src/Npgsql/Internal/Composites/Metadata/CompositeFieldInfo.cs | Update composite field binding to use concrete type infos and provider terminology. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
roji
left a comment
There was a problem hiding this comment.
OK, I gave this a somewhat impressionistic; it generally makes sense to me, but my grasp of the details in this area isn't very strong, and this is also the first step in a direction where I don't have the final picture. But it all does seem to make sense.
@NinoFloris if you have some specific points/thoughts/questions you want to discuss around this don't hesitate to let me know.
(as always, I can, however, generate lots of uninteresting nits ;))
d2f0833 to
19a2538
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 30 out of 30 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
19a2538 to
f67caf6
Compare
|
@roji please take a look at this portion, you might find this interesting to think about.
For the completed picture, this was mostly completed in #6317, going commit by commit should give a decent overview. Essentially the finished rework will remove a lot of the lifetime paper cuts, easy mistakes, and duplication. Having most of it be handled centrally by PgTypeInfo. Once users are able to use the high level pieces I'm much more confident in helping them author their own (non-toy) converters/providers for composing over other type infos. |
This helps us later on in the restructuring as well
And make object converter use it
f67caf6 to
4bb21d4
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 33 out of 33 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
75c79ac to
3a812d6
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 34 out of 34 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
src/Npgsql/Internal/Converters/CompositeConverter.cs:184
- In the no-cached-state path,
CompositeConverter.WritesetsElementState.SizetowriteRequirement(a buffer requirement) and discardsfieldState(WriteState = null). This can produce an incorrect length for variable-sized fields and can break converters/providers that require write state (e.g. late-bound/object scenarios). This fallback path should compute the actual field length (e.g. viaGetSizeOrDbNull) and propagate the resultingfieldStateintoElementState.WriteState.
if (data?[i] is not { } state)
{
var converter = field.GetWriteInfo(boxedInstance, out var writeRequirement, out var fieldState);
elementState = new()
{
Size = field.IsDbNull(converter, boxedInstance, ref fieldState) ? -1 : writeRequirement,
WriteState = null,
Converter = converter,
BufferRequirement = writeRequirement,
};
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 36 out of 36 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
src/Npgsql/Internal/Converters/ArrayConverterCore.cs:99
- When renting/using elemData from ArrayPool, elemData[...].WriteState isn't cleared before being passed by ref to GetSizeOrDbNull. Because pooled arrays can contain stale references, this can (a) incorrectly set anyWriteState=true, (b) pass the wrong writeState into element converters, and (c) dispose unrelated objects later. Ensure each slot's WriteState is explicitly initialized (e.g., set to null / default the tuple) before calling into element sizing, and similarly clear slots that don't receive provider state.
if (elemData is null)
{
arrayPool = ArrayPool<(Size, object?)>.Shared;
elemData = arrayPool.Rent(metadata.TotalElements);
}
var lastCount = metadata.LastDimension;
do
{
ref var elemState = ref elemData[indices.IndicesSum].WriteState;
var elemSize = elemOps.GetSizeOrDbNull(context, values, indices, ref elemState);
anyWriteState = anyWriteState || elemState is not null;
elemData[indices.IndicesSum].Size = elemSize ?? -1;
size = size.Combine(elemSize ?? 0);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| public ArrayPool<(Size Size, object? WriteState)>? ArrayPool { get; set; } | ||
| public ArraySegment<(Size Size, object? WriteState)> Data { get; set; } | ||
| public bool AnyWriteState { get; set; } | ||
|
|
||
| public void Dispose() |
There was a problem hiding this comment.
MultiWriteState.Dispose iterates from Data.Offset to array.Length, not to the end of the active segment (Data.Offset + Data.Count). With pooled arrays this can dispose/clear stale WriteState entries outside the segment, potentially double-disposing unrelated objects and causing hard-to-diagnose failures. Limit the loop bounds to the active segment (and keep clearing only the active segment) before returning the array to the pool.
src/Npgsql/Internal/PgConverter.cs
Outdated
| protected virtual bool IsDbNullValue(T? value, object? writeState) => throw new NotSupportedException(); | ||
|
|
||
| [Obsolete("Use the overload without ref.")] | ||
| [EditorBrowsable(EditorBrowsableState.Never)] | ||
| protected virtual bool IsDbNullValue(T? value, ref object? writeState) => IsDbNullValue(value, writeState); | ||
|
|
There was a problem hiding this comment.
The new IsDbNullValue(T?, object?) overload is now what IsDbNull ultimately calls, but its default implementation throws. Any existing converter that previously overrode only the (now-obsolete) IsDbNullValue(T?, ref object?) will no longer be invoked and will start throwing NotSupportedException at runtime. To preserve backward compatibility, consider making the new overload delegate to the ref overload by default (or otherwise detect/route to overridden implementations) so old overrides keep working.
| protected virtual bool IsDbNullValue(T? value, object? writeState) => throw new NotSupportedException(); | |
| [Obsolete("Use the overload without ref.")] | |
| [EditorBrowsable(EditorBrowsableState.Never)] | |
| protected virtual bool IsDbNullValue(T? value, ref object? writeState) => IsDbNullValue(value, writeState); | |
| protected virtual bool IsDbNullValue(T? value, object? writeState) | |
| { | |
| var refOverload = GetType().GetMethod( | |
| nameof(IsDbNullValue), | |
| System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.NonPublic, | |
| binder: null, | |
| types: [typeof(T), typeof(object).MakeByRefType()], | |
| modifiers: null); | |
| if (refOverload?.DeclaringType != typeof(PgConverter<T>)) | |
| { | |
| var state = writeState; | |
| return IsDbNullValue(value, ref state); | |
| } | |
| throw new NotSupportedException(); | |
| } | |
| [Obsolete("Use the overload without ref.")] | |
| [EditorBrowsable(EditorBrowsableState.Never)] | |
| protected virtual bool IsDbNullValue(T? value, ref object? writeState) | |
| { | |
| var nonRefOverload = GetType().GetMethod( | |
| nameof(IsDbNullValue), | |
| System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.NonPublic, | |
| binder: null, | |
| types: [typeof(T), typeof(object)], | |
| modifiers: null); | |
| if (nonRefOverload?.DeclaringType != typeof(PgConverter<T>)) | |
| return IsDbNullValue(value, writeState); | |
| throw new NotSupportedException(); | |
| } |
| public PgConcreteTypeInfo GetConcreteTypeInfo<T>(T? value, out object? writeState) | ||
| { | ||
| if (this is not PgProviderTypeInfo providerTypeInfo) | ||
| { | ||
| writeState = null; | ||
| return (PgConcreteTypeInfo)this; |
There was a problem hiding this comment.
GetConcreteTypeInfo assumes any non-provider PgTypeInfo is a PgConcreteTypeInfo and performs an unconditional cast. Since PgTypeInfo is public/abstract, a third-party could derive from it (or this project could add more derived types later), and this cast would throw InvalidCastException. Consider either sealing the hierarchy, or making concrete-type retrieval virtual/abstract so non-provider derived types can participate without being force-cast to PgConcreteTypeInfo.
| public PgConcreteTypeInfo GetConcreteTypeInfo<T>(T? value, out object? writeState) | |
| { | |
| if (this is not PgProviderTypeInfo providerTypeInfo) | |
| { | |
| writeState = null; | |
| return (PgConcreteTypeInfo)this; | |
| protected virtual PgConcreteTypeInfo? GetDefaultConcreteTypeInfo() => this as PgConcreteTypeInfo; | |
| public PgConcreteTypeInfo GetConcreteTypeInfo<T>(T? value, out object? writeState) | |
| { | |
| if (this is not PgProviderTypeInfo providerTypeInfo) | |
| { | |
| writeState = null; | |
| return GetDefaultConcreteTypeInfo() | |
| ?? throw new NotSupportedException($"{GetType().FullName} must override {nameof(GetDefaultConcreteTypeInfo)} to provide a concrete type info."); |
| if (elemData is null) | ||
| { | ||
| elemDataArrayPool = ArrayPool<(Size, object?)>.Shared; | ||
| elemData = elemDataArrayPool.Rent(metadata.TotalElements); | ||
| } | ||
|
|
||
| elemData[index].WriteState = state; | ||
| } |
There was a problem hiding this comment.
elemData is rented from ArrayPool and only elemData[index].WriteState is assigned when state != null; when state is null the slot is left untouched. Because pooled arrays may contain stale references, later code may observe/Dispose an unrelated WriteState. Clear each slot you touch (e.g., set elemData[index] = default or WriteState = null) before/when populating, regardless of whether provider produced a state.
a226b73 to
33edcfd
Compare
33edcfd to
9e3461f
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 35 out of 35 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
src/Npgsql/Internal/Converters/CompositeConverter.cs:181
- In the fallback path where there is no cached per-field state (
data?[i] is not { } state),fieldStatereturned fromGetWriteInfo(and potentially mutated byIsDbNull) is discarded by settingWriteState = null. This breaks converters/providers that rely on write-state being passed intoBeginNestedWrite/Write, and can also leak disposable state when the field resolves to NULL. Propagate the computedfieldStateintoelementState.WriteState(and dispose it if the field is written as NULL).
var converter = field.GetWriteInfo(boxedInstance, out var writeRequirement, out var fieldState);
elementState = new()
{
Size = field.IsDbNull(converter, boxedInstance, ref fieldState) ? -1 : writeRequirement,
WriteState = null,
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| foreach (var factory in _typeMappingResolvers) | ||
| builder.AppendResolverFactory(factory); | ||
| var chain = builder.Build(); | ||
| _typeMappingCachedVersion = version; | ||
| return _typeMappingOptions = new(PostgresMinimalDatabaseInfo.DefaultTypeCatalog, chain) | ||
| { |
There was a problem hiding this comment.
_typeMappingCachedVersion is updated before _typeMappingOptions is assigned. If building the chain or constructing PgSerializerOptions throws, the cached version may match the current version while _typeMappingOptions still points to an older instance, causing stale options to be returned on subsequent calls. Consider building into a local newOptions first, assign _typeMappingOptions = newOptions, and only then update _typeMappingCachedVersion (or update both in a consistent order).
102d263 to
a5a103f
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 35 out of 35 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
src/Npgsql/Internal/Converters/ArrayConverterCore.cs:270
- When
FixedSizeElementsis true, the write loop ignores any per-elementWriteStatecaptured earlier (e.g. fromArrayTypeInfoProviderprovider-phase) by settingelemto null and passing only a locally computedfixedSizeWriteState. This can drop required element state for nested writes and lead to incorrect serialization or state leaks. Consider always sourcing the per-elementWriteStatefromstate.Dataeven in fixed-size mode (you can still skip storing per-element sizes).
var offset = state.Data.Offset;
var fixedSizeElements = state.FixedSizeElements;
do
{
if (writer.ShouldFlush(sizeof(int)))
await writer.Flush(async, cancellationToken).ConfigureAwait(false);
var elem = !fixedSizeElements ? elemData?[offset + indices.IndicesSum] : null;
object? fixedSizeWriteState = null;
var length = fixedSizeElements
? ElemTypeDbNullable && IsDbNull(values, indices, ref fixedSizeWriteState) ? -1 : binaryRequirements.Write.Value
: elem.GetValueOrDefault().Size.Value;
writer.WriteInt32(length);
if (length is not -1)
{
using var _ = await writer.BeginNestedWrite(async, binaryRequirements.Write,
length, fixedSizeWriteState ?? elem?.WriteState, cancellationToken).ConfigureAwait(false);
await elemOps.Write(async, writer, values, indices, cancellationToken).ConfigureAwait(false);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| var resolution = ResolveConverter(typeInfo!); | ||
| Converter = resolution.Converter; | ||
| PgTypeId = resolution.PgTypeId; | ||
| var concreteTypeInfo = GetConcreteTypeInfo(typeInfo!); |
There was a problem hiding this comment.
ResolveTypeInfo calls GetConcreteTypeInfo(typeInfo!), which for provider-backed type infos populates _writeState (via GetObjectConcreteTypeInfo(..., out _writeState)) even when validateValues is false (e.g. CommandBehavior.SchemaOnly). In SchemaOnly, Bind() is skipped, so _writeState is never disposed/cleared (and ResetBindingInfo() asserts _writeState == default when WriteSize is null). This can leak disposable state and leave the parameter in an inconsistent state. Consider avoiding populating _writeState during type resolution (or disposing it immediately in SchemaOnly), or adjusting ResetBindingInfo() to handle a non-null _writeState even when WriteSize is null.
| var concreteTypeInfo = GetConcreteTypeInfo(typeInfo!); | |
| var concreteTypeInfo = GetConcreteTypeInfo(typeInfo!); | |
| if (!validateValues) | |
| { | |
| if (_writeState is IDisposable disposable) | |
| disposable.Dispose(); | |
| _writeState = default; | |
| } |
e1347bc to
27196e1
Compare
As it stands we support two kinds of PgTypeInfos: one directly wrapping a converter and one to facilitate delayed converter resolution (PgConverterResolver). Unfortunately both return the same information from different places today, and it must generally be cached for different lengths of time as well. This causes us to have a bunch of accidental complexity (additional types and state) and duplicated logic to work on 'these things that must not be forgotten or uncommonly used things go wrong' for the case where converter resolver type infos are passed in.
For example, the public PgConverterResolution, the internal
PgConverterResolverInfoand even to some extent the internal ColumnInfo and ColumnInfo for the nested reader exist because we need to handle that split (where an info could either be concrete or a delayed resolution) all the way down at the actual read and write call sites. Just to paint a picture, just Npgsql alone is expected to implement this split correctly in all the following places: NpgsqlParameter, NpgsqlBinaryImporter/NpgsqlBinaryExporter, NpgsqlDataReader, NpgsqlNestedDataReader, ReplicationValue, CompositeConverter, and ObjectConverter. But also any plugin that wants to compose type infos to write some larger data type has to deal with this.Unsurprisingly it would be much better if all of this could be centralized, and I recently got inspired to finally tackle this after reviewing #6005. The feature that the linked PR intends to enable deals with all the same complexity around value dependent writing to provide general DataTable to pg (table) composite support, which they would like to release as a plugin. Today it's practically impossible to do that correctly, due to the knowledge required to replicate that split correctly for the composed, field level, type info handling needed for composites.
To improve the status quo this PR starts by making PgTypeInfo abstract and adds a new derived type to the hierarchy: PgConcreteTypeInfo, this will be the authoritative info source that gets used during reads and writes, no more split at all those sites. We also rename
PgConverterResolvertoPgConcreteTypeInfoProviderandPgResolverTypeInfotoPgProviderTypeInfo. These renames align the types with what they operate on and gets rid of the overloaded resolver/resolution moniker, which will now just relate toIPgTypeInfoResolver. This has been a change I've been meaning to make ever since we released 8.0, at the time it was just too late in the cycle to rework and think all of this through properly.Shifting the PgTypeInfo hierarchy into this direction allows us to merge the concrete/resolver split back into one source after field and value binding. For most cases nothing really changes on the mapping authoring side except the type identity: mappings that don't rely on a resolver will now return PgConcreteTypeInfo. As most mappings rely on
mapping.CreateInfoto create their PgTypeInfo we are able to change this in a binary compatible way (and if the PgTypeInfo constructor does happen to be used today, rare even in Npgsql, a small change to new up PgConcreteTypeInfo instead is sufficient).The PgResolverTypeInfo and PgConverterResolver changes are more breaking, though also easily adapted to. Regardless, no users of this api surface have been found after a thorough github search (searching for any direct references to these types); it's also quite unlikely private implementations exist. Additionally all of this api surface is marked experimental, and while I don't advocate for doing impactful breaking changes just because we can, it is intended to give us that kind of wiggle room.
Finally, where current mappings do use the PgProviderTypeInfo functionality (like bitstring fields or datetime values) the provider implementation (previously converter resolver) will become responsible for returning *cached* PgConcreteTypeInfo instances. The unwillingness to demand this caching was the original sin that lead to the split, and why the PgConverterResolution return type was a struct.
At the time it was unclear to me whether we could always cache the returned results (which otherwise would lead to an unavoidable allocation on every bind) as the providers may act on ever changing values. In practice all implementations we authored end up caching their produced converter instances already. And at this point I'm also convinced that any useful implementation classifies fields and values into a few well known categories it has converters for, and for which it is never a problem to cache the required variations.
The total work is split across PRs to make reviewing more practical. This PR mostly contains the nominal and type level restructuring, while the follow up PR will use these changes to actually deliver the simplification of the call sites. A final PR should provide the public surface that we can recommend to plugin authors interested in working with POCO or dynamic (e.g. datatable) conversions that interact correctly with nested field and value binding.