Skip to content

Commit e28181d

Browse files
authored
[SQL] Implementation for user-defined functions (#1129)
* [SQL] Parser support for user-defined functions * [SQL] Document user-defined functions Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
1 parent 737b0c9 commit e28181d

39 files changed

Lines changed: 1114 additions & 231 deletions

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1616
- SQL: support for trigonometric functions `sin` and `cos` ([#1118](https://github.com/feldera/feldera/pull/1118))
1717
- SQL: support for mathematical constant `PI` ([#1123](https://github.com/feldera/feldera/pull/1123))
1818
- WebConsole: 'Inspect connector' button in the connector list in Pipeline Builder that opens a non-editable popup
19+
- SQL: Support for user-defined functions, declared in SQL and implemented
20+
in Rust ([#1129](https://github.com/feldera/feldera/pull/1129))
1921

2022
## [0.5.0] - 2023-12-05
2123

CONTRIBUTING.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ with `git commit -s`.
1313
Our team develops and tests using Linux and MacOS. Windows Subsystem for Linux works fine too.
1414

1515
The Feldera container images and CI workflows use Linux. You can see our setup in
16-
our [Dockerfile](deploy/Dockerfile) and [Earthfile](Earthfile) .
16+
our [Dockerfile](deploy/Dockerfile) and [Earthfile](Earthfile).
1717

1818
Our known dependencies are:
1919
- Runtime
@@ -29,6 +29,7 @@ Our known dependencies are:
2929
- Python 3
3030
- typescript
3131
- Redpanda or Kafka
32+
- Earthly (https://earthly.dev/get-earthly)
3233

3334
Additional dependencies are automatically installed by the Rust,
3435
maven, Python, and typescript build tools.

docs/contributors/compiler.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,9 @@ Usage: sql-to-dbsp [options] Input file to compile
5656
--ignoreOrder
5757
Ignore ORDER BY clauses at the end
5858
Default: false
59+
--udf
60+
Specify a Rust file containing implementations of user-defined functions
61+
Default: <empty string>
5962
-O
6063
Optimization level (0, 1, or 2)
6164
Default: 2

docs/sidebars.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,7 @@ const sidebars = {
108108
'sql/binary',
109109
'sql/array',
110110
'sql/datetime'
111+
'sql/udf'
111112
]
112113
},
113114
{

docs/sql/grammar.md

Lines changed: 22 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,14 @@
11
# SQL Grammar
22

3-
This is a short formal description of the grammar supported:
3+
This is a short formal description of the grammar supported in a BNF
4+
form.
5+
6+
- Constructs enclosed between `[]` are optional.
7+
- `*` denotes zero or many repetitions.
8+
- Uppercase words (`FUNCTION`) and single-quoted text (`')'`) indicate
9+
grammar terminals.
10+
- Parentheses `()` are used for grouping productions together.
11+
- The vertical bar `|` indicates alternation.
412

513
```
614
statementList:
@@ -9,6 +17,13 @@ statementList:
917
statement
1018
: createTableStatement
1119
| createViewStatement
20+
| createFunctionStatement
21+
22+
generalType
23+
: type [NOT NULL]
24+
25+
createFunctionStatement
26+
: CREATE FUNCTION name '(' [ columnDecl [, columnDecl ]* ] ')' RETURNS generalType
1227
1328
createTableStatement
1429
: CREATE TABLE name
@@ -20,7 +35,7 @@ createViewStatement
2035
AS query
2136
2237
tableElement
23-
: columnName type [NOT [NULL]] ( columnConstraint )*
38+
: columnName generalType ( columnConstraint )*
2439
| columnName
2540
| tableConstraint
2641
@@ -79,7 +94,7 @@ groupItem:
7994
| '(' expression [, expression ]* ')'
8095
8196
columnDecl
82-
: column type [ NOT NULL ]
97+
: column generalType
8398
8499
selectWithoutFrom
85100
: SELECT [ ALL | DISTINCT ]
@@ -105,7 +120,7 @@ tableExpression
105120
106121
joinCondition
107122
: ON booleanExpression
108-
| USING '(' column [, column ]* ')'
123+
| USING '(' column [, column ]* ')'
109124
110125
tableReference
111126
: tablePrimary [ pivot ] [ [ AS ] alias [ '(' columnAlias [, columnAlias ]* ')' ] ]
@@ -142,6 +157,9 @@ and is currently ignored.
142157
In `orderItem`, if expression is a positive integer n, it denotes the
143158
nth item in the `SELECT` clause.
144159

160+
SQL `CREATE FUNCTION` can be used to declare [user-defined
161+
functions](udf.md).
162+
145163
An aggregate query is a query that contains a `GROUP BY` or a `HAVING`
146164
clause, or aggregate functions in the `SELECT` clause. In the
147165
`SELECT`, `HAVING` and `ORDER` BY clauses of an aggregate query, all

docs/sql/udf.md

Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
# User-defined functions
2+
3+
The SQL statement `CREATE FUNCTION` can be used to declare new
4+
functions whose implementation is provided separately in Rust.
5+
6+
The implementation of these functions is provided to the compiler in a
7+
separate file, whose name is specified using the `--udf` flag.
8+
9+
The following example shows a user-defined function:
10+
11+
```sql
12+
CREATE FUNCTION contains_number(str VARCHAR NOT NULL, value INTEGER) RETURNS BOOLEAN NOT NULL
13+
CREATE VIEW V0 AS SELECT contains_number(CAST('YES: 10 NO:5' AS VARCHAR), 5)
14+
```
15+
16+
The function name capitalization obeys the same rules as table and
17+
view names: the names are converted by default to all-capitals, but if
18+
the name is quoted capitalization is unchanged. This is important,
19+
because the user must provide a rust implementation that matches the
20+
canonical function name. Here is a possible implementation of the
21+
function `contains_number` above:
22+
23+
```rs
24+
use sqllib::*;
25+
26+
pub fn CONTAINS_NUMBER(pos: &SourcePositionRange, str: String, value: Option<i32>) ->
27+
Result<bool, Box<dyn std::error::Error>> {
28+
match value {
29+
None => Err(format!(\"{}: null value\", pos).into()),
30+
Some(value) => Ok(str.contains(&format!(\"{}\", value).to_string())),
31+
}
32+
}
33+
```
34+
35+
The `use sqllib::types::*` directive imports the definitions of the
36+
standard Rust types that the compiler uses to implement SQL datatypes.
37+
The next section explains what these types are.
38+
39+
Notice the function implemented has an all-capitals name (which is not
40+
a standard convention for Rust), dictated by the default SQL
41+
capitalization rules.
42+
43+
Currently there is no type casting or type inference performed for the
44+
function arguments in the SQL program. For example, a call such as
45+
`CONTAINS_NUMBER('2010-10-20', '5')` will fail at SQL compilation time
46+
because the first argument has type `CHAR(8)` instead of `VARCHAR`,
47+
and the second argument has type `CHAR(1)` instead of `INTEGER`.
48+
49+
Clearly, user-defined functions can pose security problems, since the
50+
Rust implementation is only verified by the Rust compiler. Such
51+
functions are expected to have no side-effects, to be deterministic,
52+
and not to crash.
53+
54+
## Type representation in Rust
55+
56+
The following table shows the Rust representation of standard SQL data
57+
types. A nullable SQL type is represented by the corresponding rust
58+
`Option<>` type. Notice that some of these types are not standard
59+
Rust types, but are defined in the DBSP runtime library.
60+
61+
SQL | Rust
62+
-----------
63+
`BOOLEAN` | `bool`
64+
`TINYINT` | `i8`
65+
`SMALLINT` | `i16`
66+
`INT` | `i32`
67+
`BIGINT` | `i64`
68+
`DECIMAL`(p, s) | `Decimal`
69+
`REAL` | `F32`
70+
`DOUBLE` | `F64`
71+
`CHAR`(n) | `String`
72+
`VARCHAR`, `VARCHAR`(n) | `String`
73+
`BINARY`, `BINARY`(n) | `ByteArray`
74+
`NULL` | `()`
75+
`INTERVAL` | `ShortInterval`, `LongInterval`
76+
`TIME` | `Time`
77+
`TIMESTAMP` | `Timestamp`
78+
`DATE` | `Date`
79+
`ARRAY T` | `Vec<T>`
80+
81+
Multiple SQL types may be represented by the same Rust type. For
82+
example, `CHAR`, `CHAR(n)`, `VARCHAR(n)`, and `VARCHAR` are all
83+
represented by the standard Rust `String` type.
84+
85+
The SQL family of `INTERVAL` types translates to one of two Rust
86+
types: `ShortInterval` (representing intervals from days to seconds),
87+
and `LongInterval` (representing intervals from years to months).
88+
(Our dialect of SQL does not allow mixing the two kinds of intervals
89+
in a single expression.)
90+
91+
In the Rust implementation the function always has to return the type
92+
`Result<T, Box<dyn std::error::Error>>`, where `T` is the Rust
93+
equivalent of the expected return type of the SQL function. The Rust
94+
function should return an `Err` only when the function fails at
95+
runtime; in this case the returned error can use the source position
96+
information to indicate where the error has originated in the code.
97+
The function should return an error only for fatal conditions, similar
98+
to other SQL functions (e.g., array index out of bounds, arithmetic
99+
overflows, etc.).
100+
101+
## Source position information
102+
103+
The first argument passed to the Rust function is always `pos:
104+
&SourcePositionRange`. This argument indicates the position in the
105+
SQL source code of the call to this user-defined function. This
106+
information can be used to generate better runtime error messages when
107+
the user-defined function encounters an error. (Note: currently
108+
Calcite does not provide any source position information, but we hope
109+
to remedy this state of affairs soon.)
110+
111+
## Limitations
112+
113+
There can be only one function with each name.
114+
115+
Functions cannot have names identical to standard SQL library function
116+
names.
117+
118+
Polymorphic functions are not supported. For example, in SQL the
119+
addition operation operates on any numeric types; such an operation
120+
cannot be implemented as a single user-defined function.
121+
122+
The current type checker is very strict, and it requires the function
123+
arguments to have exactly the specified types. No casts are inserted
124+
by the compiler. That is why the previous example requires casting
125+
the string `'YES: 10 NO: 5'` to `VARCHAR`.
126+

sql-to-dbsp-compiler/SQL-compiler/out7670973118205954433.tmp

Whitespace-only changes.

sql-to-dbsp-compiler/SQL-compiler/src/main/codegen/config.fmpp

Lines changed: 3 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ data: {
1515
"org.apache.calcite.sql.ddl.SqlDdlNodes"
1616
"org.apache.calcite.sql.ddl.SqlCreateType"
1717
"org.dbsp.sqlCompiler.compiler.frontend.calciteCompiler.SqlExtendedColumnDeclaration"
18+
"org.dbsp.sqlCompiler.compiler.frontend.calciteCompiler.SqlCreateFunctionDeclaration"
1819
]
1920

2021
# List of new keywords. Example: "DATABASES", "TABLES". If the keyword is
@@ -28,7 +29,7 @@ data: {
2829
"SEMI"
2930
"SEQUENCES"
3031
"TEMP"
31-
"VOLATILE"
32+
#"VOLATILE"
3233
]
3334

3435
# List of keywords from "keywords" section that are not reserved.
@@ -883,13 +884,6 @@ data: {
883884
# List of methods for parsing custom SQL statements.
884885
# Return type of method implementation should be 'SqlNode'.
885886
statementParserMethods: [
886-
# The following are not yet released
887-
# "PostgresqlSqlShow()",
888-
# "PostgresqlSqlSetOption()",
889-
# "PostgresqlSqlBegin()",
890-
# "PostgresqlSqlDiscard()",
891-
# "PostgresqlSqlCommit()",
892-
# "PostgresqlSqlRollback()"
893887
]
894888

895889
# List of methods for parsing custom literals.
@@ -918,6 +912,7 @@ data: {
918912
"SqlCreateView"
919913
"SqlCreateExtendedTable"
920914
"SqlCreateType"
915+
"SqlCreateFunction"
921916
]
922917

923918
truncateStatementParserMethods: [

sql-to-dbsp-compiler/SQL-compiler/src/main/codegen/includes/ddl.ftl

Lines changed: 40 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ SqlExtendedColumnDeclaration ColumnAttribute(SqlExtendedColumnDeclaration column
9898
)
9999
}
100100

101-
SqlNodeList AttributeDefList() :
101+
SqlNodeList NonEmptyAttributeDefList() :
102102
{
103103
final Span s;
104104
final List<SqlNode> list = new ArrayList<SqlNode>();
@@ -114,6 +114,23 @@ SqlNodeList AttributeDefList() :
114114
}
115115
}
116116

117+
SqlNodeList AttributeDefList() :
118+
{
119+
final Span s;
120+
final List<SqlNode> list = new ArrayList<SqlNode>();
121+
}
122+
{
123+
<LPAREN> { s = span(); }
124+
( AttributeDef(list)
125+
(
126+
<COMMA> AttributeDef(list)
127+
)*
128+
)?
129+
<RPAREN> {
130+
return new SqlNodeList(list, s.end(this));
131+
}
132+
}
133+
117134
void AttributeDef(List<SqlNode> list) :
118135
{
119136
final SqlIdentifier id;
@@ -135,6 +152,27 @@ void AttributeDef(List<SqlNode> list) :
135152
}
136153
}
137154

155+
SqlCreateFunctionDeclaration SqlCreateFunction(Span s, boolean replace) :
156+
{
157+
final boolean ifNotExists;
158+
final SqlIdentifier id;
159+
final SqlNodeList parameters;
160+
final SqlDataTypeSpec type;
161+
final boolean nullable;
162+
}
163+
{
164+
<FUNCTION> ifNotExists = IfNotExistsOpt()
165+
id = SimpleIdentifier()
166+
parameters = AttributeDefList()
167+
<RETURNS>
168+
type = DataType()
169+
nullable = NullableOptDefaultTrue()
170+
{
171+
return new SqlCreateFunctionDeclaration(s.end(this), replace, ifNotExists,
172+
id, parameters, type.withNullable(nullable));
173+
}
174+
}
175+
138176
SqlCreate SqlCreateType(Span s, boolean replace) :
139177
{
140178
final SqlIdentifier id;
@@ -146,7 +184,7 @@ SqlCreate SqlCreateType(Span s, boolean replace) :
146184
id = CompoundIdentifier()
147185
<AS>
148186
(
149-
attributeDefList = AttributeDefList()
187+
attributeDefList = NonEmptyAttributeDefList()
150188
|
151189
type = DataType()
152190
)

sql-to-dbsp-compiler/SQL-compiler/src/main/codegen/includes/parserImpls.ftl

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -73,15 +73,6 @@ TableCollectionType TableCollectionTypeOpt() :
7373
{ return TableCollectionType.UNSPECIFIED; }
7474
}
7575

76-
boolean VolatileOpt() :
77-
{
78-
}
79-
{
80-
<VOLATILE> { return true; }
81-
|
82-
{ return false; }
83-
}
84-
8576
/* Extra operators */
8677

8778
<DEFAULT, DQID, BTID> TOKEN :

0 commit comments

Comments
 (0)