As part of 3.0 COPY IN/OUT has to be reimplemented. After looking into the current (2.2) implementation here are some problems:
- It uses exclusively text format, which is slower than binary and has more potential encoding/escaping issues (much like regular non-COPY data transfers). It should be possible to both read and write in binary.
- NpgsqlCopySerializer provides some functions for writing data types: AddString(), AddInt32(), etc. Very little data types are covered, and the "encoding" isn't implemented with the regular Npgsql type handling system (used for sending command parameters).
- There is no similar similar facility for reading data types (i.e. parsing columns coming in from a COPY OUT).
- Somewhat clunky API, see proposal below
New API Proposal
Binary
- Expose StartBinaryCopyIn() and StartBinaryCopyOut() on NpgsqlConnection. These accept the SQL command that starts the COPY operation (e.g. COPY data FROM STDIN WITH FORMAT BINARY).
- These methods return a Stream. A user can interact directly with the Stream, reading and writing byte[], in which case encoding/decoding (if done) is their responsibility. One use case for direct interaction is dumping a table to disk in binary format and loading it later.
- In addition, the user can use NpgsqlBinaryReader and NpgsqlBinaryWriter, similar to .NET BinaryReader and BinaryWriter, to have Npgsql encode and decode values, start new rows, etc.
Text
Although binary is the more efficient method, we still want to support text COPY operations to allow people to import/export, say, a CSV which they've prepared. In text mode Npgsql simply passes on textual data to and from the backend and does not actually deal with encoding/decoding, delimiters, etc. - this is the user's responsibility.
- Expose StartTextCopyIn() and StartTextCopyOut() on NpgsqlConnection. Like with binary, these accept the SQL command that starts the COPY operation.
- These methods return a TextWriter and a TextReader respectively.
- The user can then directly read or write textual data. It's their responsibility to make sure everything is properly formatted/parsed.
I don't think we should have an (immediate) goal of supporting async operation for COPY.
Any comments? I'm not very familiar with exactly how people use COPY so any feedback would be appreciated!
As part of 3.0 COPY IN/OUT has to be reimplemented. After looking into the current (2.2) implementation here are some problems:
New API Proposal
Binary
Text
Although binary is the more efficient method, we still want to support text COPY operations to allow people to import/export, say, a CSV which they've prepared. In text mode Npgsql simply passes on textual data to and from the backend and does not actually deal with encoding/decoding, delimiters, etc. - this is the user's responsibility.
I don't think we should have an (immediate) goal of supporting async operation for COPY.
Any comments? I'm not very familiar with exactly how people use COPY so any feedback would be appreciated!