NewAppenderWithColumns support#94
Conversation
- Introduced `NewAppenderWithColumns` for creating appenders restricted to specific columns.
…sh/appender-with-columns
- Merge column activation & type inferring to the same iteration. - Add sanity check for column count alignment.
- Extract column initialization logic into a dedicated `initTableColumns` function.
taniabogatsch
left a comment
There was a problem hiding this comment.
Hi @EtgarDev - thanks for the PR! I left some comments.
…ppenderWithColumns`
- Return an error when the provided column subset exceeds the table column count. - Add a corresponding test case to ensure proper error handling.
taniabogatsch
left a comment
There was a problem hiding this comment.
Just two small comments and then the PR looks ready to go from my side!
|
Looks like you also need to merge |
|
@taniabogatsch I can see that |
No - that's odd, indeed. IIRC that happened to me once or twice already, too, but I haven't seen it happening in a bit... it feels like |
|
Once CI is green this is ready to go in from my side. :) |
|
Looks like the same tidy check has been fixed here - #98 So we can also just merge this PR afterwards and it should be fine.
|
|
EDIT EDIT: has been merged here now, so should be fixed once you re-sync with |
|
Oh I was fixing it on my side while it was merged, reverting haha |
…sh/appender-with-columns
nvm a merge was enough, didn't need to revert |
|
Thanks! |

This PR introduces a new API (opt-in) to create the appender with specific columns to affect, eliminating a generation of a very-long queries when appending to very wide tables (tons of columns).
The Problem
Appenderalways operates over the full table schema. On flush, DuckDB generates a query ofINSERT INTO ...across all columns.Solution
NewAppenderWithColumns, which restricts the appender's active columns to the provided subset using DuckDB'sduckdb_appender_add_column, before fetching the types.INSERT INTOquery for the active columns.API
func NewAppenderWithColumns(driverConn driver.Conn, catalog, schema, table string, columns []string) (*Appender, error)Backward compatibility
NewAppenderbehaves exactly as before.NewAppenderWithColumnsis optionalResults
I made a comparison between the two, for some scenarios, parameters: (rows to append, columns in table, number of columns to actually set (density)).
I wanted to check how those parameters influence this optimisation of setting the columns, to make sure that we don't optimise just edge-case but also degrade performance of the average / usual cases.
Conclusion - in most scenarios, setting the columns showed a strong benefit, but as the density (non-nulls/default columns) is growing, the benefit of using it is getting lower and from a specific density (let's call it
D) it's even better to not mention columns at all in the appender:D ~ 30%)D(n) is high for small tables and decreases as the table gets wider
Note - in my comparison I don't measure the times of building the rows themselves in the client, but only the time from the constructors,
AppendRowandFlush.comparison script