|
| 1 | + |
| 2 | +# The Flex Backend |
| 3 | + |
| 4 | +**The Flex Backend is experimental. Everything in here is subject to change.** |
| 5 | + |
| 6 | +The "Flex" backend, as the name suggests, allows for a more flexible |
| 7 | +configuration that tells osm2pgsql what OSM data to store in your database and |
| 8 | +exactly where and how. It is configured through a Lua file which |
| 9 | + |
| 10 | +* defines the structure of the output tables and |
| 11 | +* defines functions to map the OSM data to the database data format |
| 12 | + |
| 13 | +See also the example config files in the `flex-config` directory which contain |
| 14 | +lots of comments to get you started. |
| 15 | + |
| 16 | +## The Lua config file |
| 17 | + |
| 18 | +All configuration is done through the `osm2pgsql` object in Lua. It has the |
| 19 | +following fields: |
| 20 | + |
| 21 | +* `osm2pgsql.version`: The version of osm2pgsql as a string. |
| 22 | +* `osm2pgsql.srid`: The SRID set on the command line (with `-l|--latlong`, |
| 23 | + `-m|--merc`, or `-E|--proj`). |
| 24 | +* `osm2pgsql.mode`: Either `"create"` or `"append"` depending on the command |
| 25 | + line options (`--create` or `-a|--append`). |
| 26 | +* `osm2pgsql.stage`: Either `1` or `2` (1st/2nd stage processing of the data). |
| 27 | + See below. |
| 28 | + |
| 29 | +The following functions are defined: |
| 30 | + |
| 31 | +* `osm2pgsql.define_node_table(name, columns)`: Define a node table. |
| 32 | +* `osm2pgsql.define_way_table(name, columns)`: Define a way table. |
| 33 | +* `osm2pgsql.define_relation_table(name, columns)`: Define a relation table. |
| 34 | +* `osm2pgsql.define_area_table(name, columns)`: Define an area table. |
| 35 | +* `osm2pgsql.define_table()`: Define a table. This is the more flexible |
| 36 | + function behind all the other `define_*_table()` functions. It gives you |
| 37 | + more control than the more convenient other functions. |
| 38 | +* `osm2pgsql.mark_way(id)`: Mark the OSM way with the specified id. This way |
| 39 | + will be processed (again) in stage 2. |
| 40 | +* `osm2pgsql.mark_relation(id)`: Mark the OSM relation with the specified id. |
| 41 | + This relation will be processed (again) in stage 2. |
| 42 | + |
| 43 | +You are expected to define one or more of the following functions: |
| 44 | + |
| 45 | +* `osm2pgsql.process_node()`: Called for each node. |
| 46 | +* `osm2pgsql.process_way()`: Called for each way. |
| 47 | +* `osm2pgsql.process_relation()`: Called for each relation. |
| 48 | + |
| 49 | +### Defining a table |
| 50 | + |
| 51 | +You have to define one or more tables where your data should end up. This |
| 52 | +is done with the `osm2pgsql.define_table()` function or one of the slightly |
| 53 | +more convenient functions `osm2pgsql.define_(node|way|relation|area)_table()`. |
| 54 | + |
| 55 | +Each table is either a *node table*, *way table*, *relation table*, or *area |
| 56 | +table*. This means that the data for that table comes primarily from a node, |
| 57 | +way, relation, or area, respectively. Osm2pgsql makes sure that the OSM object |
| 58 | +id will be stored in the table so that later updates to those OSM objects (or |
| 59 | +deletions) will be properly reflected in the tables. Area tables are special, |
| 60 | +they can contain data derived from ways or from relations. Way ids will be |
| 61 | +stored as is, relation ids will be stored as negative numbers. |
| 62 | + |
| 63 | +With the `osm2pgsql.define_table()` function you can also define tables that |
| 64 | +* don't have any ids, but those tables will never be updated by osm2pgsql |
| 65 | +* take *any OSM object*, in this case the type of object is stored in an |
| 66 | + additional column. |
| 67 | +* are in a specific PostgresSQL tablespace (set `data_tablespace`) or that |
| 68 | + get their indexes created in a specific tablespace (set `index_tablespace`). |
| 69 | + |
| 70 | +If you are using the `osm2pgsql.define_(node|way|relation|area)_table()` |
| 71 | +convenience functions, osm2pgsql will automatically create an id column named |
| 72 | +`(node|way|relation|area)_id`, respectively. If you want more control over |
| 73 | +the id column(s), use the `osm2pgsql.define_table()` function. |
| 74 | + |
| 75 | +Most tables will have a geometry column. (Currently only zero or one geometry |
| 76 | +columns are supported.) The types of the geometry column possible depend on |
| 77 | +the type of the input data. For node tables you are pretty much restricted |
| 78 | +to point geometries, but there is a variety of options for relation tables |
| 79 | +for instance. |
| 80 | + |
| 81 | +The supported geometry types are: |
| 82 | +* `point`: Point geometry, usually created from nodes. |
| 83 | +* `linestring`: Linestring geometry, usually created from ways. |
| 84 | +* `polygon`: Polygon geometry for area tables, created from ways or relations. |
| 85 | +* `multipoint`: Currently not used. |
| 86 | +* `multilinestring`: Created from (possibly split up) ways or relations. |
| 87 | +* `multipolygon`: For area tables, created from ways or relations. |
| 88 | +* `geometry`: Any kind of geometry. Also used for area tables that should hold |
| 89 | + both polygon and multipolygon geometries. |
| 90 | + |
| 91 | +A column of type `area` will be filled automatically with the area of the |
| 92 | +geometry. This will only work for (multi)polygons. |
| 93 | + |
| 94 | +In addition to id and geometry columns, each table can have any number of |
| 95 | +"normal" columns using any type supported by PostgreSQL. Some types are |
| 96 | +specially recognized by osm2pgsql: |
| 97 | + |
| 98 | +* `text`: A text string. |
| 99 | +* `boolean`: Interprets string values `"true"`, `"yes"` as `true` and all |
| 100 | + others as `false`. Boolean and integer values will also work in the usual |
| 101 | + way. |
| 102 | +* `int2`, `smallint`: 16bit signed integer. Values too large to fit will be |
| 103 | + truncated in some unspecified way. |
| 104 | +* `int4`, `int`, `integer`: 32bit signed integer. Values too large to fit will |
| 105 | + be truncated in some unspecified way. |
| 106 | +* `int8`, `bigint`: 64bit signed integer. Values too large to fit will be |
| 107 | + truncated in some unspecified way. |
| 108 | +* `real`: A real number. |
| 109 | +* `hstore`: Automatically filled from a Lua table with only strings as keys |
| 110 | + and values. |
| 111 | +* `direction`: Interprets values `"true"`, `"yes"`, and `"1"` as 1, `"-1"` as |
| 112 | + `-1`, and everything else as `0`. Useful for `oneway` tags etc. |
| 113 | + |
| 114 | +Instead of the above types you can use any SQL type you want. If you do that |
| 115 | +you have to supply the PostgreSQL string representation for that type when |
| 116 | +adding data to such columns (or Lua nil to set the column to `NULL`). |
| 117 | + |
| 118 | +### Processing callbacks |
| 119 | + |
| 120 | +You are expected to define one or more of the following functions: |
| 121 | + |
| 122 | +* `osm2pgsql.process_node(object)`: Called for each node. |
| 123 | +* `osm2pgsql.process_way(object)`: Called for each way. |
| 124 | +* `osm2pgsql.process_relation(object)`: Called for each relation. |
| 125 | + |
| 126 | +They all have a single argument of type table (here called `object`) and no |
| 127 | +return value. If you are not interested in all object types, you do not have |
| 128 | +to supply all the functions. |
| 129 | + |
| 130 | +These functions are called for each new or modified OSM object in the input |
| 131 | +file. No function is called for deleted objects, osm2pgsql will automatically |
| 132 | +delete all data in your database tables that derived from deleted objects. |
| 133 | +Modifications are handled as deletions followed by creation of a "new" object, |
| 134 | +for which the functions are called. |
| 135 | + |
| 136 | +The parameter table (`object`) has the following fields: |
| 137 | + |
| 138 | +* `id`: The id of the node, way, or relation. |
| 139 | +* `tags`: A table with all the tags of the object. |
| 140 | +* `version`, `timestamp`, `changeset`, `uid`, and `user`: Attributes of the |
| 141 | + OSM object. These are only available if the `-x|--extra-attributes` option |
| 142 | + is used and the OSM input file actually contains those fields. The |
| 143 | + `timestamp` contains the time in seconds since the epoch (midnight |
| 144 | + 1970-01-01). |
| 145 | +* `grab_tag(KEY)`: Return the tag value of the specified key and remove the |
| 146 | + tag from the list of tags. (Example: `local name = object:grab_tag('name')`) |
| 147 | + This is often used when you want to store some tags in special columns and |
| 148 | + the rest of the tags in an hstore column. |
| 149 | +* `get_bbox()`: Get the bounding box of the current node or way. (It doesn't |
| 150 | + work for relations currently.) |
| 151 | + |
| 152 | +Ways have the following additional fields: |
| 153 | +* `is_closed`: A boolean telling you whether the way geometry is closed, ie |
| 154 | + the first and last node are the same. |
| 155 | +* `nodes`: An array with the way node ids. |
| 156 | + |
| 157 | +Relations have the following additional field: |
| 158 | +* `members`: An array with member tables. Each member table has the fields |
| 159 | + `type` (values `n`, `w`, or `r`), `ref` (member id) and `role`. |
| 160 | + |
| 161 | +You can do anything in those processing functions to decide what to do with |
| 162 | +this data. If you are not interested in that OSM object, simply return from the |
| 163 | +function. If you want to add the OSM object to some table call the `add_row()` |
| 164 | +function on that table: |
| 165 | + |
| 166 | +``` |
| 167 | +-- definition of the table: |
| 168 | +table_pois = osm2pgsql.define_node_table('pois', { |
| 169 | + { column = 'tags', type = 'hstore' }, |
| 170 | + { column = 'name', type = 'text' }, |
| 171 | + { column = 'geom', type = 'point' }, |
| 172 | +}) |
| 173 | +... |
| 174 | +function osm2pgsql.process_node(object) |
| 175 | +... |
| 176 | + table_pois:add_row({ |
| 177 | + tags = object.tags, |
| 178 | + name = object.tags.name, |
| 179 | + geom = { create = 'point' } |
| 180 | + }) |
| 181 | +... |
| 182 | +end |
| 183 | +``` |
| 184 | + |
| 185 | +The `add_row()` function takes a single table parameter, that describes what to |
| 186 | +fill into all the database columns. Any column not mentioned will be set to |
| 187 | +`NULL`. |
| 188 | + |
| 189 | +The geometry column in somewhat special. You have to define a *geometry |
| 190 | +transformation* that will be used to transform the OSM object data into |
| 191 | +a geometry that fits into the geometry column. See the next section for |
| 192 | +details. |
| 193 | + |
| 194 | +Note that you can't set the object id, this will be handled for you behind the |
| 195 | +scenes. |
| 196 | + |
| 197 | +## Geometry transformations |
| 198 | + |
| 199 | +Currently these geometry transformations are supported: |
| 200 | + |
| 201 | +* `{ create = 'point'}`. Only valid for nodes, create a 'point' geometry. |
| 202 | +* `{ create = 'line'}`. For ways or relations. Create a 'linestring' or |
| 203 | + 'multilinestring' geometry. |
| 204 | +* `{ create = 'area'}` For ways or relations. Create a 'polygon' or |
| 205 | + 'multipolygon' geometry. |
| 206 | + |
| 207 | +Some of these transformations can have parameters: |
| 208 | + |
| 209 | +* The `line` transformation has an optional parameter `split_at`. If this |
| 210 | + is set to anything other than 0, linestrings longer than this value will |
| 211 | + be split up into parts no longer than this value. |
| 212 | +* The `area` transformation has an optional parameter `multi`. If this is |
| 213 | + set to `false` (the default), a multipolygon geometry will be split up into |
| 214 | + several polygons. If this is set to `true`, the multipolygon geometry is |
| 215 | + kept as one. It depends on this parameter whether you need a polygon |
| 216 | + or multipolygon geometry column. |
| 217 | + |
| 218 | +If no geometry transformation is set, osm2pgsql will, in some cases, assume |
| 219 | +a default transformation. These are the defaults: |
| 220 | + |
| 221 | +* For node tables, a `point` column gets the node location. |
| 222 | +* For way tables, a `linestring` column gets the complete way geometry, a |
| 223 | + `polygon` column gets the way geometry as area (if the way is closed and |
| 224 | + the area is valid). |
| 225 | + |
| 226 | +## Stages |
| 227 | + |
| 228 | +Osm2pgsql processes the data in up to two stages. You can mark ways or |
| 229 | +relations in stage 1 for processing in stage 2 by calling |
| 230 | +`osm2pgsql.mark_way(id)` or `osm2pgsql.mark_relation(id)`, respectively. If you |
| 231 | +don't mark any objects, nothing will be done in stage 2. |
| 232 | + |
| 233 | +You can look at `osm2pgsql.stage` to see in which stage you are. |
| 234 | + |
| 235 | +In stage 1 you can only look at each OSM object on its own. You can see |
| 236 | +its id and tags (and possibly timestamp, changeset, user, etc.), but you don't |
| 237 | +know how this OSM objects relates to other OSM objects (for instance whether a |
| 238 | +way you are looking at is a member in a relation). If this is enough to decide |
| 239 | +in which database table(s) and with what data an OSM object should end up in, |
| 240 | +then you can process the OSM object in stage 1. If, on the other hand, you |
| 241 | +need some extra information, you have to defer processing to the second stage. |
| 242 | + |
| 243 | +You want to do all the processing you can in stage 1, because it is faster |
| 244 | +and there is less memory overhead. For most use cases, stage 1 is enough. If |
| 245 | +it is not, use stage 1 to store information about OSM objects you will need |
| 246 | +in stage 2 in some global variable. In stage 2 you can read this information |
| 247 | +again and use it to decide where and how to store the data in the database. |
| 248 | + |
| 249 | +## Command line options |
| 250 | + |
| 251 | +Use the command line option `-O flex` or `--output=flex` to enable the flex |
| 252 | +backend and the `-S|--style` option to set the Lua config file. |
| 253 | + |
| 254 | +The following command line options have a somewhat different meaning when |
| 255 | +using the flex backend: |
| 256 | + |
| 257 | +* `-p|--prefix`: The table names you are setting in your Lua config files |
| 258 | + will *not* get this prefix. You can easily add the prefix in the Lua config |
| 259 | + yourself. |
| 260 | +* `-S|--style`: Use this to specify the Lua config file. Without it, osm2pgsql |
| 261 | + will not work, because it will try to read the default style file. |
| 262 | +* `-G|--multi-geometry` is not used. Instead, set the type of the geometry |
| 263 | + column to the type you want, ie `polygon` vs. `multipolygon`. |
| 264 | + |
| 265 | +The following command line options are ignored by `osm2pgsl` when using the |
| 266 | +flex backend, because they don't make sense in that context: |
| 267 | + |
| 268 | +* `-k|--hstore` |
| 269 | +* `-j|--hstore-all` |
| 270 | +* `-z|--hstore-column` |
| 271 | +* `--hstore-match-only` |
| 272 | +* `--hstore-add-index` |
| 273 | +* `-K|--keep-coastlines` (Coastline tags are not handled specially in the |
| 274 | + flex backend.) |
| 275 | +* `--tag-transform-script` (Set the Lua config file with the `-S|--style` |
| 276 | + option.) |
| 277 | +* `-G|--multi-geometry` (Use the `multi` option on the geometry transformation |
| 278 | + instead.) |
| 279 | +* The command line options to set the tablespace are ignored by the flex |
| 280 | + backend, instead use the `data_tablespace` or `index_tablespace` options |
| 281 | + when defining your table. |
| 282 | + |
0 commit comments