Skip to content

NewAppenderWithColumns support#94

Merged
taniabogatsch merged 19 commits into
duckdb:mainfrom
EtgarDev:feature/etgarsh/appender-with-columns
Jan 12, 2026
Merged

NewAppenderWithColumns support#94
taniabogatsch merged 19 commits into
duckdb:mainfrom
EtgarDev:feature/etgarsh/appender-with-columns

Conversation

@EtgarDev
Copy link
Copy Markdown

@EtgarDev EtgarDev commented Dec 25, 2025

This PR introduces a new API (opt-in) to create the appender with specific columns to affect, eliminating a generation of a very-long queries when appending to very wide tables (tons of columns).

The Problem

  • Current Appender always operates over the full table schema. On flush, DuckDB generates a query of INSERT INTO ... across all columns.
  • For extremely wide schemas, this can take a long time

Solution

  • Add NewAppenderWithColumns, which restricts the appender's active columns to the provided subset using DuckDB's duckdb_appender_add_column, before fetching the types.
  • DuckDB will only generate the INSERT INTO query for the active columns.

API

  • func NewAppenderWithColumns(driverConn driver.Conn, catalog, schema, table string, columns []string) (*Appender, error)

Backward compatibility

  • No breaking changes. Existing code using NewAppender behaves exactly as before.
  • NewAppenderWithColumns is optional

Results

I made a comparison between the two, for some scenarios, parameters: (rows to append, columns in table, number of columns to actually set (density)).
I wanted to check how those parameters influence this optimisation of setting the columns, to make sure that we don't optimise just edge-case but also degrade performance of the average / usual cases.

Conclusion - in most scenarios, setting the columns showed a strong benefit, but as the density (non-nulls/default columns) is growing, the benefit of using it is getting lower and from a specific density (let's call it D) it's even better to not mention columns at all in the appender:

  • on wide tables - this specific D is lower (for 40K tables D ~ 30%)
  • on regular tables - D ~ 100%

D(n) is high for small tables and decreases as the table gets wider

Note - in my comparison I don't measure the times of building the rows themselves in the client, but only the time from the constructors, AppendRow and Flush.

State Rows Cols setCols Density Runs Regular Time WithColumns Time Winner State Purpose
1 7,800 40,000 1 0.0025% 1 17.95s 1.52s WC ×11.8 Ultra-Sparse Wide (baseline)
2 7,800 40,000 2 0.005% 1 17.93s 1.47s WC ×12.2 Ultra-Sparse Wide
3 7,800 40,000 10 0.025% 1 18.88s 1.59s WC ×11.9 Ultra-Sparse Wide
4 7,800 100 1 1% 100 32.37ms 4.01ms WC ×8.1 Sparse Narrow
5 7,800 100 100 100% 100 8.65ms 9.43ms Reg ×1.09 Fully Dense Narrow
6 7,800 100 10 10% 100 30.31ms 4.34ms WC ×7.0 Mid-Density Narrow
7 7,800 100 50 50% 100 21.40ms 6.76ms WC ×3.2 High-Density Narrow
8 7,800 100 90 90% 100 12.58ms 8.86ms WC ×1.4 Near-Dense Narrow
9 1 20 3 15% 1000 446µs 331µs WC Single-Row 15%
10 10 100 100 100% 100 1.20ms 1.91ms Reg Tiny Batch Dense
11 100 100 100 100% 100 1.32ms 2.10ms Reg Small Batch Dense
12 1,000 100 100 100% 100 2.20ms 2.94ms Reg Small Batch Dense
13 10 20 20 100% 100 419µs 478µs Reg Tiny Batch Small
14 1 100 100 100% 100 1.22ms 1.87ms Reg Single-Row Dense
15 7,800 20 20 100% 100 2.26ms 2.52ms Reg Fully Dense Small
16 7,800 30 30 100% 100 3.05ms 3.36ms Reg Fully Dense Small
17 7,800 50 50 100% 100 4.62ms 5.04ms Reg Fully Dense Small
18 7,800 100 100 100% 100 8.86ms 9.76ms Reg Fully Dense Narrow
19 20,000 40,000 1 0.0025% 1 40.98s 7.64s WC ×5.4 Sparse Wide (row scaling)
20 7,800 5,000 1 0.02% 1 1.80s 186ms WC ×9.6 Sparse Medium-Wide
21 7,800 40,000 8,000 20% 1 19.86s 8.64s WC ×2.3 Mid-Density Wide
22 7,800 40,000 12,000 30% 1 16.75s 16.47s ~Tie Crossover Wide (D(40000)
23 7,800 40,000 15,000 37.5% 1 15.74s 23.91s Reg ×1.5 High-Density Wide
24 7,800 40,000 20,000 50% 1 17.67s 46.24s Reg ×2.6 Very High Density Wide
25 7,800 40,000 40,000 100% 1 11.65s 182.7s Reg ×15.7 Fully Dense Wide

comparison script

@EtgarDev EtgarDev marked this pull request as ready for review December 25, 2025 14:55
Copy link
Copy Markdown
Member

@taniabogatsch taniabogatsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @EtgarDev - thanks for the PR! I left some comments.

Comment thread appender.go Outdated
Comment thread appender.go Outdated
Comment thread appender.go
Comment thread appender.go Outdated
Comment thread appender.go Outdated
Comment thread appender.go Outdated
Comment thread appender.go
Comment thread appender_test.go
Comment thread appender.go Outdated
Comment thread appender_test.go Outdated
- Return an error when the provided column subset exceeds the table column count.
- Add a corresponding test case to ensure proper error handling.
@EtgarDev EtgarDev requested a review from taniabogatsch January 7, 2026 09:26
Copy link
Copy Markdown
Member

@taniabogatsch taniabogatsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two small comments and then the PR looks ready to go from my side!

Comment thread appender.go Outdated
Comment thread appender.go
@taniabogatsch
Copy link
Copy Markdown
Member

Looks like you also need to merge main into this branch again?

@EtgarDev
Copy link
Copy Markdown
Author

@taniabogatsch I can see that golangci-lint fails on lines that weren't changes in this PR.
Was something changed?

@taniabogatsch
Copy link
Copy Markdown
Member

I can see that golangci-lint fails on lines that weren't changes in this PR.
Was something changed?

No - that's odd, indeed. IIRC that happened to me once or twice already, too, but I haven't seen it happening in a bit... it feels like golangci-lint just (almost randomly) picks up some extra stuff to test and finds issues... maybe you could add the fix to the PR until the tidy checks are also green?

@taniabogatsch
Copy link
Copy Markdown
Member

Once CI is green this is ready to go in from my side. :)

@taniabogatsch
Copy link
Copy Markdown
Member

Looks like the same tidy check has been fixed here - #98

So we can also just merge this PR afterwards and it should be fine.

Screenshot 2026-01-12 at 12 45 05

@taniabogatsch
Copy link
Copy Markdown
Member

EDIT EDIT: has been merged here now, so should be fixed once you re-sync with main.
#101

@EtgarDev
Copy link
Copy Markdown
Author

Oh I was fixing it on my side while it was merged, reverting haha

@EtgarDev
Copy link
Copy Markdown
Author

Oh I was fixing it on my side while it was merged, reverting haha

nvm a merge was enough, didn't need to revert

@taniabogatsch taniabogatsch merged commit e8d64fe into duckdb:main Jan 12, 2026
17 checks passed
@taniabogatsch
Copy link
Copy Markdown
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants