Skip to content

[SPARK-52601][SQL] Support primitive types in TransformingEncoder#51313

Closed
eejbyfeldt wants to merge 1 commit intoapache:masterfrom
eejbyfeldt:SPARK-52601
Closed

[SPARK-52601][SQL] Support primitive types in TransformingEncoder#51313
eejbyfeldt wants to merge 1 commit intoapache:masterfrom
eejbyfeldt:SPARK-52601

Conversation

@eejbyfeldt
Copy link
Copy Markdown
Contributor

@eejbyfeldt eejbyfeldt commented Jun 28, 2025

What changes were proposed in this pull request?

Support defining TransformingEncoder that has a primitive type as the input type.

Why are the changes needed?

To support defining TransformingEncoder that has a primitive type as the input type.

This came up for me when using a Scala 3 opaque type around a Long as a timestamp but wating have the encoder encode it as a timestamp. Ideally Spark would have some way of encoding a micro second timestamp without going through a java.sql.Timestamp or java.time.Instant. But this at least makes it possible to achive something similar (but less efficient) by defining a TransformingEncoder that takes a Long and returns a java.sql.Timestamp.

Does this PR introduce any user-facing change?

Yes, it allows TransformingEncoder to be used in more cases.

How was this patch tested?

New and existing unit tests.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions Bot added the SQL label Jun 28, 2025
@eejbyfeldt
Copy link
Copy Markdown
Contributor Author

@hvanhovell You reviewed #50023 where AgnosticExpressionPathEncoder was added as temporary migration path. But I tired implementing some custom encoders on Spark 4.0.0 not using it and only using the other AgnosticEncoders. But ran into two issues this PR #51313 and #51319 would you be willing to review? And/or know how would be a good candidate?

Copy link
Copy Markdown
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@hvanhovell
Copy link
Copy Markdown
Contributor

Merged to master!

@hvanhovell
Copy link
Copy Markdown
Contributor

@eejbyfeldt thanks for doing this! Can you create a backport for Spark 4.0?

eejbyfeldt added a commit to eejbyfeldt/spark that referenced this pull request Sep 16, 2025
Support defining TransformingEncoder that has a primitive type as the input type.

To support defining TransformingEncoder that has a primitive type as the input type.

This came up for me when using a Scala 3 opaque type around a Long as a timestamp but wating have the encoder encode it as a timestamp. Ideally Spark would have some way of encoding a micro second timestamp without going through a java.sql.Timestamp or java.time.Instant. But this at least makes it possible to achive something similar (but less efficient) by defining a TransformingEncoder that takes a Long and returns a java.sql.Timestamp.

Yes, it allows TransformingEncoder to be used in more cases.

New and existing unit tests.

No

Closes apache#51313 from eejbyfeldt/SPARK-52601.

Authored-by: Emil Ejbyfeldt <emil.ejbyfeldt@choreograph.com>
Signed-off-by: Herman van Hovell <herman@databricks.com>
@eejbyfeldt
Copy link
Copy Markdown
Contributor Author

@eejbyfeldt thanks for doing this! Can you create a backport for Spark 4.0?

Created the backport here #52354

dongjoon-hyun pushed a commit that referenced this pull request Sep 16, 2025
Backport of #51313 to 4.0 branch.

### What changes were proposed in this pull request?

Support defining TransformingEncoder that has a primitive type as the input type.

### Why are the changes needed?

This came up for me when using a Scala 3 opaque type around a Long as a timestamp but wating have the encoder encode it as a timestamp. Ideally Spark would have some way of encoding a micro second timestamp without going through a java.sql.Timestamp or java.time.Instant. But this at least makes it possible to achive something similar (but less efficient) by defining a TransformingEncoder that takes a Long and returns a java.sql.Timestamp.

### Does this PR introduce _any_ user-facing change?

Yes, it allows TransformingEncoder to be used in more cases.

### How was this patch tested?

New and existing unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #52354 from eejbyfeldt/SPARK-52601-4.0.

Authored-by: Emil Ejbyfeldt <emil.ejbyfeldt@choreograph.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 14, 2025
Backport of apache#51313 to 4.0 branch.

### What changes were proposed in this pull request?

Support defining TransformingEncoder that has a primitive type as the input type.

### Why are the changes needed?

This came up for me when using a Scala 3 opaque type around a Long as a timestamp but wating have the encoder encode it as a timestamp. Ideally Spark would have some way of encoding a micro second timestamp without going through a java.sql.Timestamp or java.time.Instant. But this at least makes it possible to achive something similar (but less efficient) by defining a TransformingEncoder that takes a Long and returns a java.sql.Timestamp.

### Does this PR introduce _any_ user-facing change?

Yes, it allows TransformingEncoder to be used in more cases.

### How was this patch tested?

New and existing unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#52354 from eejbyfeldt/SPARK-52601-4.0.

Authored-by: Emil Ejbyfeldt <emil.ejbyfeldt@choreograph.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
### What changes were proposed in this pull request?
Support defining TransformingEncoder that has a primitive type as the input type.

### Why are the changes needed?
To support defining TransformingEncoder that has a primitive type as the input type.

This came up for me when using a Scala 3 opaque type around a Long as a timestamp but wating have the encoder encode it as a timestamp. Ideally Spark would have some way of encoding a micro second timestamp without going through a java.sql.Timestamp or java.time.Instant. But this at least makes it possible to achive something similar (but less efficient) by defining a TransformingEncoder that takes a Long and returns a java.sql.Timestamp.

### Does this PR introduce _any_ user-facing change?
Yes, it allows TransformingEncoder to be used in more cases.

### How was this patch tested?
New and existing unit tests.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#51313 from eejbyfeldt/SPARK-52601.

Authored-by: Emil Ejbyfeldt <emil.ejbyfeldt@choreograph.com>
Signed-off-by: Herman van Hovell <herman@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants