|
| 1 | +# User-defined functions |
| 2 | + |
| 3 | +The SQL statement `CREATE FUNCTION` can be used to declare new |
| 4 | +functions whose implementation is provided separately in Rust. |
| 5 | + |
| 6 | +The implementation of these functions is provided to the compiler in a |
| 7 | +separate file, whose name is specified using the `--udf` flag. |
| 8 | + |
| 9 | +The following example shows a user-defined function: |
| 10 | + |
| 11 | +```sql |
| 12 | +CREATE FUNCTION contains_number(str VARCHAR NOT NULL, value INTEGER) RETURNS BOOLEAN NOT NULL |
| 13 | +CREATE VIEW V0 AS SELECT contains_number(CAST('YES: 10 NO:5' AS VARCHAR), 5) |
| 14 | +``` |
| 15 | + |
| 16 | +The function name capitalization obeys the same rules as table and |
| 17 | +view names: the names are converted by default to all-capitals, but if |
| 18 | +the name is quoted capitalization is unchanged. This is important, |
| 19 | +because the user must provide a rust implementation that matches the |
| 20 | +canonical function name. Here is a possible implementation of the |
| 21 | +function `contains_number` above: |
| 22 | + |
| 23 | +```rs |
| 24 | +use sqllib::*; |
| 25 | + |
| 26 | +pub fn CONTAINS_NUMBER(pos: &SourcePositionRange, str: String, value: Option<i32>) -> |
| 27 | + Result<bool, Box<dyn std::error::Error>> { |
| 28 | + match value { |
| 29 | + None => Err(format!(\"{}: null value\", pos).into()), |
| 30 | + Some(value) => Ok(str.contains(&format!(\"{}\", value).to_string())), |
| 31 | + } |
| 32 | +} |
| 33 | +``` |
| 34 | +
|
| 35 | +The `use sqllib::types::*` directive imports the definitions of the |
| 36 | +standard Rust types that the compiler uses to implement SQL datatypes. |
| 37 | +The next section explains what these types are. |
| 38 | +
|
| 39 | +Notice the function implemented has an all-capitals name (which is not |
| 40 | +a standard convention for Rust), dictated by the default SQL |
| 41 | +capitalization rules. |
| 42 | +
|
| 43 | +Currently there is no type casting or type inference performed for the |
| 44 | +function arguments in the SQL program. For example, a call such as |
| 45 | +`CONTAINS_NUMBER('2010-10-20', '5')` will fail at SQL compilation time |
| 46 | +because the first argument has type `CHAR(8)` instead of `VARCHAR`, |
| 47 | +and the second argument has type `CHAR(1)` instead of `INTEGER`. |
| 48 | +
|
| 49 | +Clearly, user-defined functions can pose security problems, since the |
| 50 | +Rust implementation is only verified by the Rust compiler. Such |
| 51 | +functions are expected to have no side-effects, to be deterministic, |
| 52 | +and not to crash. |
| 53 | +
|
| 54 | +## Type representation in Rust |
| 55 | +
|
| 56 | +The following table shows the Rust representation of standard SQL data |
| 57 | +types. A nullable SQL type is represented by the corresponding rust |
| 58 | +`Option<>` type. Notice that some of these types are not standard |
| 59 | +Rust types, but are defined in the DBSP runtime library. |
| 60 | +
|
| 61 | +SQL | Rust |
| 62 | +----------- |
| 63 | +`BOOLEAN` | `bool` |
| 64 | +`TINYINT` | `i8` |
| 65 | +`SMALLINT` | `i16` |
| 66 | +`INT` | `i32` |
| 67 | +`BIGINT` | `i64` |
| 68 | +`DECIMAL`(p, s) | `Decimal` |
| 69 | +`REAL` | `F32` |
| 70 | +`DOUBLE` | `F64` |
| 71 | +`CHAR`(n) | `String` |
| 72 | +`VARCHAR`, `VARCHAR`(n) | `String` |
| 73 | +`BINARY`, `BINARY`(n) | `ByteArray` |
| 74 | +`NULL` | `()` |
| 75 | +`INTERVAL` | `ShortInterval`, `LongInterval` |
| 76 | +`TIME` | `Time` |
| 77 | +`TIMESTAMP` | `Timestamp` |
| 78 | +`DATE` | `Date` |
| 79 | +`ARRAY T` | `Vec<T>` |
| 80 | +
|
| 81 | +Multiple SQL types may be represented by the same Rust type. For |
| 82 | +example, `CHAR`, `CHAR(n)`, `VARCHAR(n)`, and `VARCHAR` are all |
| 83 | +represented by the standard Rust `String` type. |
| 84 | +
|
| 85 | +The SQL family of `INTERVAL` types translates to one of two Rust |
| 86 | +types: `ShortInterval` (representing intervals from days to seconds), |
| 87 | +and `LongInterval` (representing intervals from years to months). |
| 88 | +(Our dialect of SQL does not allow mixing the two kinds of intervals |
| 89 | +in a single expression.) |
| 90 | +
|
| 91 | +In the Rust implementation the function always has to return the type |
| 92 | +`Result<T, Box<dyn std::error::Error>>`, where `T` is the Rust |
| 93 | +equivalent of the expected return type of the SQL function. The Rust |
| 94 | +function should return an `Err` only when the function fails at |
| 95 | +runtime; in this case the returned error can use the source position |
| 96 | +information to indicate where the error has originated in the code. |
| 97 | +The function should return an error only for fatal conditions, similar |
| 98 | +to other SQL functions (e.g., array index out of bounds, arithmetic |
| 99 | +overflows, etc.). |
| 100 | +
|
| 101 | +## Source position information |
| 102 | +
|
| 103 | +The first argument passed to the Rust function is always `pos: |
| 104 | +&SourcePositionRange`. This argument indicates the position in the |
| 105 | +SQL source code of the call to this user-defined function. This |
| 106 | +information can be used to generate better runtime error messages when |
| 107 | +the user-defined function encounters an error. (Note: currently |
| 108 | +Calcite does not provide any source position information, but we hope |
| 109 | +to remedy this state of affairs soon.) |
| 110 | +
|
| 111 | +## Limitations |
| 112 | +
|
| 113 | +There can be only one function with each name. |
| 114 | +
|
| 115 | +Functions cannot have names identical to standard SQL library function |
| 116 | +names. |
| 117 | +
|
| 118 | +Polymorphic functions are not supported. For example, in SQL the |
| 119 | +addition operation operates on any numeric types; such an operation |
| 120 | +cannot be implemented as a single user-defined function. |
| 121 | +
|
| 122 | +The current type checker is very strict, and it requires the function |
| 123 | +arguments to have exactly the specified types. No casts are inserted |
| 124 | +by the compiler. That is why the previous example requires casting |
| 125 | +the string `'YES: 10 NO: 5'` to `VARCHAR`. |
| 126 | +
|
0 commit comments