Add DNS SRV service discovery support (RFC 2782)#6554
Conversation
Introduce a new `SrvHost` connection string property that enables a single
DNS name to represent an entire high-availability PostgreSQL cluster. When
set, Npgsql resolves `_postgresql._tcp.<SrvHost>` SRV records at
`NpgsqlDataSource` build time and uses the returned host/port pairs as the
multi-host list, sorted by priority (ascending) then weight (descending) per
RFC 2782.
### Changes
**`NpgsqlConnectionStringBuilder`**
- New `SrvHost` string property, exposed as the `SrvHost` keyword in
connection strings (e.g. `SrvHost=cluster.example.com`).
- `PostProcessAndValidate` updated to allow a null/empty `Host` when `SrvHost`
is set, and to enforce mutual exclusivity: supplying both throws
`ArgumentException`.
**`SrvLookup.cs`** (new)
- Static helper class that queries `_postgresql._tcp.<srvHost>` via the
`DnsClient` NuGet package, sorts SRV records by priority / weight, strips
trailing FQDN dots, and returns a comma-separated `host:port,host:port,...`
string ready for use as `NpgsqlConnectionStringBuilder.Host`.
- `SortAndBuild(IEnumerable<SrvRecord>)` is `internal` so unit tests can
exercise sorting logic without a live DNS server.
**`NpgsqlSlimDataSourceBuilder` / `NpgsqlDataSourceBuilder`**
- New `SrvLookupClient` property (`DnsClient.ILookupClient?`). When `null`
(the default), the OS resolver is used. Inject a custom client in tests to
return deterministic SRV records without hitting DNS.
- `Build()` and `BuildMultiHost()` call `ResolveSrvIfNeeded()` before
`PostProcessAndValidate()`, expanding SRV results into the `Host` property
so the existing multi-host path handles all subsequent connection logic.
**Dependency**
- `DnsClient 1.8.0` added to `Directory.Packages.props` and `Npgsql.csproj`.
### New test project — `Npgsql.SrvTests`
Isolated from the main `Npgsql.Tests` assembly (which requires a live
PostgreSQL server) so SRV unit tests can run on any machine without a
database.
Unit tests cover:
- Connection-string roundtrip and keyword parsing.
- `SrvHost` / `Host` mutual exclusivity.
- RFC 2782 sort order: priority ascending, weight descending.
- Trailing-dot stripping from FQDNs returned by DnsClient.
- Priority/weight ordering mirroring the real records at `mmatvei.ru`.
- Empty result set throws `NpgsqlException`.
`ResolveSrvLive` performs an end-to-end DNS lookup against
`_postgresql._tcp.mmatvei.ru` (four real SRV records, priorities 96–100)
and verifies ordering. The test skips automatically if the records are
unreachable. Set `NPGSQL_TEST_SRV_DNS=<ip>` to force a specific nameserver
(useful when the system resolver has a stale negative cache).
### Usage
```csharp
// Connection string keyword
var ds = NpgsqlDataSource.Create(
"SrvHost=cluster.example.com;Database=app;Username=app_user");
// Builder API
var builder = new NpgsqlDataSourceBuilder();
builder.ConnectionStringBuilder.SrvHost = "cluster.example.com";
builder.ConnectionStringBuilder.Database = "app";
var ds = builder.Build();
```
### Connection string format
```
SrvHost=cluster.example.com;Database=mydb;Username=myuser;Password=...
```
### Notes
- SRV resolution happens once at `Build()` time. Re-build the data source to
re-query DNS (matches how `NpgsqlMultiHostDataSource` works today).
- `TargetSessionAttributes` (e.g. `read-write`, `primary`, `standby`) work
unchanged with the resolved host list.
- `SrvHost` and `Host` are mutually exclusive; mixing them throws
`ArgumentException` at build time.
Made-with: Cursor
|
Neither pgjdbc or pgx has this feature. In fact, there are only pr's submitted by you to them just yesterday. BTW the links to them are completely wrong, but what else can I expect from AI. |
|
Yeah, PR links are wrong, my bad. Unlike RFC link, which is correct. And, of course, AI was used. |
Implement DNS SRV resolution directly using UdpClient and manual DNS wire-format parsing, eliminating the DnsClient NuGet package. System DNS servers are obtained via NetworkInterface; the packet parser handles compressed names per RFC 1035. Public API surface is unchanged: only SrvHost is exposed; the internal SrvLookupClient override (used in tests) is replaced by calling SortAndBuild() directly with the now-internal SrvRecord type. Made-with: Cursor
|
@vonzshik I've pushed a draft, no more DnsClient. Though I'm curious if 200 lines of DNS wire-format parsing is a net win over one library? I'm happy to keep it either way, just asking. |
Can you point us to the libpq discussion around this? We generally tend to follow libpq in terms of functionality like this - if this is accepted in the standard PostgreSQL client library, that provides good motivation for considering it here. Otherwise, some general thoughts:
Overall, I'd prefer for the feature to be actually requested by real-world users in order to address their actual problems, rather than proposed as additions into all drivers like this; in the history of Npgsql I don't recall a single user asking for it. |
|
@roji Many thanks for your thoughtful reply!
There are several relevant discussions on pgsql-hackers: Most recent attempt: resolving DNS A record into multiple IPs https://www.postgresql.org/message-id/flat/AM9PR09MB49008B02CDF003054D5D4E00977DA%40AM9PR09MB4900.eurprd09.prod.outlook.com It totally makes sense to wait for a libpq-approved design - I needed a prototype, and interested users can build Npgsql from this branch in the meantime. Happy to keep this PR open as a reference once the libpq thread lands, or close it if you prefer.
On Windows, .NET does have built-in SRV support via DnsQuery. The gap is on non-Windows, which is why the wire-parsing code exists. That said, I agree it's not ideal to maintain in a database driver - this was the original motivation for using DnsClient.
Agreed - and we do offer that. But it adds a latency hop and a cost that some users would rather avoid, especially for internal deployments.
Drivers using target_session_attrs already handle dead host removal gracefully. The harder problem is adding a new host to the cluster - and for that, eventual propagation via DNS is perfectly acceptable.
They already have it. AFAICT MySQL, SQL Server, MongoDB, and Valkey all support DNS SRV in their official clients.
I'm submitting this PR on behalf of our users who asked for this. |
Thank you, I did not know that. And thanks for the rest of the context and the conversation as well, that's all useful. Let's see how the PostgreSQL folks react to this. If they decide that this is worth doing in libpq, that's definitely good motivation for us to at least consider it too; at that point we can work out the details of how to do SRV lookup etc. |
|
Thanks! For the sake of correctness: MS SQL support for DNS SRV is not documented, and for Valkey only Java driver supports DNS SRV. FWIW there's alternative approach, when a single hostname resolves to multiple A record IPs, Npgsql iterates through them at the TCP level but not at the target_session_attrs level - NpgsqlMultiHostDataSource is only activated when Host contains a comma. pgx handles this with try_all_addrs: each IP from DNS is treated as an independent candidate, so failover and read/write routing work without listing hosts explicitly. SRV is essentially the same idea taken further - instead of multiple IPs behind one name, you get multiple hostnames with ports and priorities. Both are about representing a whole cluster as a single DNS entry. Would you be willing to have the feature for A-record instead\along with SRV record? |
Unless I'm mistaken, Npgsql already does that; note the for loop just below over the different resolved addresses. This is old logic and was definitely not meant for round-robbin or anything like that - but it does go through the resolved addresses in order, trying later ones if connections to earlier ones fail. |
|
You're right that the loop is there. But it only continues to the next IP on a socket exception; on TCP success it returns immediately, and target_session_attrs is checked later. So my understanding is that role discovery doesn't benefit from the multi-IP iteration. But I may be missing something — happy to be corrected. If pg.example.com resolves to three IPs - one primary and two standbys - and the primary happens to be first, everything works. But if a standby responds first, Npgsql connects to it, exits the loop, and then fails the target_session_attrs check without ever trying the other IPs. The iteration is purely for TCP reachability, not for role discovery. |
DNS SRV Service Discovery for PostgreSQL HA Clusters
Problem
High-availability PostgreSQL deployments (Patroni, Stolon, etc.) expose several
nodes that change over time as primaries fail over or replicas are added.
Today, every client connection string must hard-code the full list of hosts:
Updating that list when topology changes requires redeploying every service that
connects to the database.
Solution
RFC 2782 DNS SRV records were
designed to solve exactly this problem. The operator publishes a single DNS name:
Clients look up that name once and receive an ordered list of hosts. Topology
changes become a DNS update—no application restart required.
This PR adds a
SrvHostconnection string property. When set, Npgsql queries_postgresql._tcp.<SrvHost>atNpgsqlDataSourcebuild time, sorts thereturned records by priority ascending, weight descending (RFC 2782 §3),
and passes the resulting
host:port,...list into the existingNpgsqlMultiHostDataSourceinfrastructure. All existing features—TargetSessionAttributes, load balancing, health checks—work unchanged.API
New connection string keyword
SrvHoststringnullSrvHostandHostare mutually exclusive. Specifying both throwsArgumentExceptionat build time.New builder property
Leave as
nullto use the OS resolver. Inject a customILookupClientintests to return deterministic records without a real DNS server.
Usage
The resolved hosts behave identically to a hand-written
Host=pg1:5432,pg2:5432,pg3:5432list.Implementation Details
New files
src/Npgsql/SrvLookup.csDnsClientNuGettest/Npgsql.SrvTests/SrvLookupTests.cstest/Npgsql.SrvTests/Npgsql.SrvTests.csprojModified files
NpgsqlConnectionStringBuilder.csSrvHostproperty + mutual exclusivity checkNpgsqlSlimDataSourceBuilder.csResolveSrvIfNeeded()called inBuild()/BuildMultiHost()NpgsqlDataSourceBuilder.csSrvLookupClientto the internal builderNpgsql.csprojDnsClient 1.8.0dependencyDirectory.Packages.propsDnsClientPublicAPI.Unshipped.txtProperties/AssemblyInfo.csInternalsVisibleToforNpgsql.SrvTestsWhy
DnsClient?.NET's built-in
System.Net.Dnsonly resolves A/AAAA records.DnsClientisthe canonical .NET library for SRV lookups. It is MIT-licensed, has no
transitive dependencies beyond .NET itself, and targets
netstandard2.0.Resolution timing
SRV records are resolved once at
NpgsqlDataSourcebuild time, not perconnection. This mirrors the model used for static multi-host connection
strings. Applications that need periodic re-discovery can rebuild the data
source (e.g. on a timer or after a connection error).
Testing
Unit tests (no database, no DNS required)
Npgsql.SrvTestsis an isolated project that does not inherit theassembly-level PostgreSQL
[OneTimeSetUp]fromNpgsql.Tests, so all unittests run on any developer machine.
Tests cover:
SrvHostconnection-string roundtrip and keyword parsing.SrvHost/Hostmutual exclusivity throwsArgumentException.pg.example.com.→pg.example.com).mmatvei.ru.NpgsqlException.Live integration test
ResolveSrvLivequeries_postgresql._tcp.mmatvei.ru, a set of real publicSRV records maintained for this purpose:
The test skips automatically when DNS is unavailable so it never fails an
offline build. Override the nameserver via environment variable if the system
resolver has a stale negative cache:
NPGSQL_TEST_SRV_DNS=88.212.208.183 dotnet test test/Npgsql.SrvTests/Run tests
Prior Art
postgres+srv://URI scheme andLookupSRVFunchook.jdbc:postgresql+srv://URI scheme using JNDI DNS.SrvHost=keyword approach, consistent with Npgsql's key-value connectionstring convention.