Skip to content

Commit 9c16722

Browse files
committed
New flex backend
This introduces a new "flex" backend which allows much more flexibility in choosing the database format and the transformation from OSM data to the database format. The user defines all this in a Lua script.
1 parent 76a3e78 commit 9c16722

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+6268
-4
lines changed

README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -176,6 +176,12 @@ null backend for testing. For flexibility a new [multi](docs/multi.md)
176176
backend is also available which allows the configuration of custom
177177
PostgreSQL tables instead of those provided in the pgsql backend.
178178

179+
Also available is the new [flex](docs/flex.md) backend. It is much more
180+
flexible than the other backends. IT IS CURRENTLY EXPERIMENTAL AND SUBJECT
181+
TO CHANGE. The flex backend is only available if you have compiled osm2pgsql
182+
with Lua support. More details at
183+
https://github.com/openstreetmap/osm2pgsql/issues/1036 .
184+
179185
## LuaJIT support ##
180186

181187
To speed up Lua tag transformations, [LuaJIT](https://luajit.org/) can be optionally

docs/flex.md

Lines changed: 282 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,282 @@
1+
2+
# The Flex Backend
3+
4+
**The Flex Backend is experimental. Everything in here is subject to change.**
5+
6+
The "Flex" backend, as the name suggests, allows for a more flexible
7+
configuration that tells osm2pgsql what OSM data to store in your database and
8+
exactly where and how. It is configured through a Lua file which
9+
10+
* defines the structure of the output tables and
11+
* defines functions to map the OSM data to the database data format
12+
13+
See also the example config files in the `flex-config` directory which contain
14+
lots of comments to get you started.
15+
16+
## The Lua config file
17+
18+
All configuration is done through the `osm2pgsql` object in Lua. It has the
19+
following fields:
20+
21+
* `osm2pgsql.version`: The version of osm2pgsql as a string.
22+
* `osm2pgsql.srid`: The SRID set on the command line (with `-l|--latlong`,
23+
`-m|--merc`, or `-E|--proj`).
24+
* `osm2pgsql.mode`: Either `"create"` or `"append"` depending on the command
25+
line options (`--create` or `-a|--append`).
26+
* `osm2pgsql.stage`: Either `1` or `2` (1st/2nd stage processing of the data).
27+
See below.
28+
29+
The following functions are defined:
30+
31+
* `osm2pgsql.define_node_table(name, columns)`: Define a node table.
32+
* `osm2pgsql.define_way_table(name, columns)`: Define a way table.
33+
* `osm2pgsql.define_relation_table(name, columns)`: Define a relation table.
34+
* `osm2pgsql.define_area_table(name, columns)`: Define an area table.
35+
* `osm2pgsql.define_table()`: Define a table. This is the more flexible
36+
function behind all the other `define_*_table()` functions. It gives you
37+
more control than the more convenient other functions.
38+
* `osm2pgsql.mark_way(id)`: Mark the OSM way with the specified id. This way
39+
will be processed (again) in stage 2.
40+
* `osm2pgsql.mark_relation(id)`: Mark the OSM relation with the specified id.
41+
This relation will be processed (again) in stage 2.
42+
43+
You are expected to define one or more of the following functions:
44+
45+
* `osm2pgsql.process_node()`: Called for each node.
46+
* `osm2pgsql.process_way()`: Called for each way.
47+
* `osm2pgsql.process_relation()`: Called for each relation.
48+
49+
### Defining a table
50+
51+
You have to define one or more tables where your data should end up. This
52+
is done with the `osm2pgsql.define_table()` function or one of the slightly
53+
more convenient functions `osm2pgsql.define_(node|way|relation|area)_table()`.
54+
55+
Each table is either a *node table*, *way table*, *relation table*, or *area
56+
table*. This means that the data for that table comes primarily from a node,
57+
way, relation, or area, respectively. Osm2pgsql makes sure that the OSM object
58+
id will be stored in the table so that later updates to those OSM objects (or
59+
deletions) will be properly reflected in the tables. Area tables are special,
60+
they can contain data derived from ways or from relations. Way ids will be
61+
stored as is, relation ids will be stored as negative numbers.
62+
63+
With the `osm2pgsql.define_table()` function you can also define tables that
64+
* don't have any ids, but those tables will never be updated by osm2pgsql
65+
* take *any OSM object*, in this case the type of object is stored in an
66+
additional column.
67+
* are in a specific PostgresSQL tablespace (set `data_tablespace`) or that
68+
get their indexes created in a specific tablespace (set `index_tablespace`).
69+
70+
If you are using the `osm2pgsql.define_(node|way|relation|area)_table()`
71+
convenience functions, osm2pgsql will automatically create an id column named
72+
`(node|way|relation|area)_id`, respectively. If you want more control over
73+
the id column(s), use the `osm2pgsql.define_table()` function.
74+
75+
Most tables will have a geometry column. (Currently only zero or one geometry
76+
columns are supported.) The types of the geometry column possible depend on
77+
the type of the input data. For node tables you are pretty much restricted
78+
to point geometries, but there is a variety of options for relation tables
79+
for instance.
80+
81+
The supported geometry types are:
82+
* `point`: Point geometry, usually created from nodes.
83+
* `linestring`: Linestring geometry, usually created from ways.
84+
* `polygon`: Polygon geometry for area tables, created from ways or relations.
85+
* `multipoint`: Currently not used.
86+
* `multilinestring`: Created from (possibly split up) ways or relations.
87+
* `multipolygon`: For area tables, created from ways or relations.
88+
* `geometry`: Any kind of geometry. Also used for area tables that should hold
89+
both polygon and multipolygon geometries.
90+
91+
A column of type `area` will be filled automatically with the area of the
92+
geometry. This will only work for (multi)polygons.
93+
94+
In addition to id and geometry columns, each table can have any number of
95+
"normal" columns using any type supported by PostgreSQL. Some types are
96+
specially recognized by osm2pgsql:
97+
98+
* `text`: A text string.
99+
* `boolean`: Interprets string values `"true"`, `"yes"` as `true` and all
100+
others as `false`. Boolean and integer values will also work in the usual
101+
way.
102+
* `int2`, `smallint`: 16bit signed integer. Values too large to fit will be
103+
truncated in some unspecified way.
104+
* `int4`, `int`, `integer`: 32bit signed integer. Values too large to fit will
105+
be truncated in some unspecified way.
106+
* `int8`, `bigint`: 64bit signed integer. Values too large to fit will be
107+
truncated in some unspecified way.
108+
* `real`: A real number.
109+
* `hstore`: Automatically filled from a Lua table with only strings as keys
110+
and values.
111+
* `direction`: Interprets values `"true"`, `"yes"`, and `"1"` as 1, `"-1"` as
112+
`-1`, and everything else as `0`. Useful for `oneway` tags etc.
113+
114+
Instead of the above types you can use any SQL type you want. If you do that
115+
you have to supply the PostgreSQL string representation for that type when
116+
adding data to such columns (or Lua nil to set the column to `NULL`).
117+
118+
### Processing callbacks
119+
120+
You are expected to define one or more of the following functions:
121+
122+
* `osm2pgsql.process_node(object)`: Called for each node.
123+
* `osm2pgsql.process_way(object)`: Called for each way.
124+
* `osm2pgsql.process_relation(object)`: Called for each relation.
125+
126+
They all have a single argument of type table (here called `object`) and no
127+
return value. If you are not interested in all object types, you do not have
128+
to supply all the functions.
129+
130+
These functions are called for each new or modified OSM object in the input
131+
file. No function is called for deleted objects, osm2pgsql will automatically
132+
delete all data in your database tables that derived from deleted objects.
133+
Modifications are handled as deletions followed by creation of a "new" object,
134+
for which the functions are called.
135+
136+
The parameter table (`object`) has the following fields:
137+
138+
* `id`: The id of the node, way, or relation.
139+
* `tags`: A table with all the tags of the object.
140+
* `version`, `timestamp`, `changeset`, `uid`, and `user`: Attributes of the
141+
OSM object. These are only available if the `-x|--extra-attributes` option
142+
is used and the OSM input file actually contains those fields. The
143+
`timestamp` contains the time in seconds since the epoch (midnight
144+
1970-01-01).
145+
* `grab_tag(KEY)`: Return the tag value of the specified key and remove the
146+
tag from the list of tags. (Example: `local name = object:grab_tag('name')`)
147+
This is often used when you want to store some tags in special columns and
148+
the rest of the tags in an hstore column.
149+
* `get_bbox()`: Get the bounding box of the current node or way. (It doesn't
150+
work for relations currently.)
151+
152+
Ways have the following additional fields:
153+
* `is_closed`: A boolean telling you whether the way geometry is closed, ie
154+
the first and last node are the same.
155+
* `nodes`: An array with the way node ids.
156+
157+
Relations have the following additional field:
158+
* `members`: An array with member tables. Each member table has the fields
159+
`type` (values `n`, `w`, or `r`), `ref` (member id) and `role`.
160+
161+
You can do anything in those processing functions to decide what to do with
162+
this data. If you are not interested in that OSM object, simply return from the
163+
function. If you want to add the OSM object to some table call the `add_row()`
164+
function on that table:
165+
166+
```
167+
-- definition of the table:
168+
table_pois = osm2pgsql.define_node_table('pois', {
169+
{ column = 'tags', type = 'hstore' },
170+
{ column = 'name', type = 'text' },
171+
{ column = 'geom', type = 'point' },
172+
})
173+
...
174+
function osm2pgsql.process_node(object)
175+
...
176+
table_pois:add_row({
177+
tags = object.tags,
178+
name = object.tags.name,
179+
geom = { create = 'point' }
180+
})
181+
...
182+
end
183+
```
184+
185+
The `add_row()` function takes a single table parameter, that describes what to
186+
fill into all the database columns. Any column not mentioned will be set to
187+
`NULL`.
188+
189+
The geometry column in somewhat special. You have to define a *geometry
190+
transformation* that will be used to transform the OSM object data into
191+
a geometry that fits into the geometry column. See the next section for
192+
details.
193+
194+
Note that you can't set the object id, this will be handled for you behind the
195+
scenes.
196+
197+
## Geometry transformations
198+
199+
Currently these geometry transformations are supported:
200+
201+
* `{ create = 'point'}`. Only valid for nodes, create a 'point' geometry.
202+
* `{ create = 'line'}`. For ways or relations. Create a 'linestring' or
203+
'multilinestring' geometry.
204+
* `{ create = 'area'}` For ways or relations. Create a 'polygon' or
205+
'multipolygon' geometry.
206+
207+
Some of these transformations can have parameters:
208+
209+
* The `line` transformation has an optional parameter `split_at`. If this
210+
is set to anything other than 0, linestrings longer than this value will
211+
be split up into parts no longer than this value.
212+
* The `area` transformation has an optional parameter `multi`. If this is
213+
set to `false` (the default), a multipolygon geometry will be split up into
214+
several polygons. If this is set to `true`, the multipolygon geometry is
215+
kept as one. It depends on this parameter whether you need a polygon
216+
or multipolygon geometry column.
217+
218+
If no geometry transformation is set, osm2pgsql will, in some cases, assume
219+
a default transformation. These are the defaults:
220+
221+
* For node tables, a `point` column gets the node location.
222+
* For way tables, a `linestring` column gets the complete way geometry, a
223+
`polygon` column gets the way geometry as area (if the way is closed and
224+
the area is valid).
225+
226+
## Stages
227+
228+
Osm2pgsql processes the data in up to two stages. You can mark ways or
229+
relations in stage 1 for processing in stage 2 by calling
230+
`osm2pgsql.mark_way(id)` or `osm2pgsql.mark_relation(id)`, respectively. If you
231+
don't mark any objects, nothing will be done in stage 2.
232+
233+
You can look at `osm2pgsql.stage` to see in which stage you are.
234+
235+
In stage 1 you can only look at each OSM object on its own. You can see
236+
its id and tags (and possibly timestamp, changeset, user, etc.), but you don't
237+
know how this OSM objects relates to other OSM objects (for instance whether a
238+
way you are looking at is a member in a relation). If this is enough to decide
239+
in which database table(s) and with what data an OSM object should end up in,
240+
then you can process the OSM object in stage 1. If, on the other hand, you
241+
need some extra information, you have to defer processing to the second stage.
242+
243+
You want to do all the processing you can in stage 1, because it is faster
244+
and there is less memory overhead. For most use cases, stage 1 is enough. If
245+
it is not, use stage 1 to store information about OSM objects you will need
246+
in stage 2 in some global variable. In stage 2 you can read this information
247+
again and use it to decide where and how to store the data in the database.
248+
249+
## Command line options
250+
251+
Use the command line option `-O flex` or `--output=flex` to enable the flex
252+
backend and the `-S|--style` option to set the Lua config file.
253+
254+
The following command line options have a somewhat different meaning when
255+
using the flex backend:
256+
257+
* `-p|--prefix`: The table names you are setting in your Lua config files
258+
will *not* get this prefix. You can easily add the prefix in the Lua config
259+
yourself.
260+
* `-S|--style`: Use this to specify the Lua config file. Without it, osm2pgsql
261+
will not work, because it will try to read the default style file.
262+
* `-G|--multi-geometry` is not used. Instead, set the type of the geometry
263+
column to the type you want, ie `polygon` vs. `multipolygon`.
264+
265+
The following command line options are ignored by `osm2pgsl` when using the
266+
flex backend, because they don't make sense in that context:
267+
268+
* `-k|--hstore`
269+
* `-j|--hstore-all`
270+
* `-z|--hstore-column`
271+
* `--hstore-match-only`
272+
* `--hstore-add-index`
273+
* `-K|--keep-coastlines` (Coastline tags are not handled specially in the
274+
flex backend.)
275+
* `--tag-transform-script` (Set the Lua config file with the `-S|--style`
276+
option.)
277+
* `-G|--multi-geometry` (Use the `multi` option on the geometry transformation
278+
instead.)
279+
* The command line options to set the tablespace are ignored by the flex
280+
backend, instead use the `data_tablespace` or `index_tablespace` options
281+
when defining your table.
282+

docs/osm2pgsql.1

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -156,7 +156,8 @@ Specifies the output back\-end or database schema to use. Currently
156156
osm2pgsql supports \fBpgsql\fR, \fBgazetteer\fR and \fBnull\fR. \fBpgsql\fR is
157157
the default output back\-end / schema and is optimized for rendering with Mapnik.
158158
\fBgazetteer\fR is a db schema optimized for geocoding and is used by Nominatim.
159-
The \fBmulti\fR backend allows more customization of tables.
159+
The \fBmulti\fR backend allows more customization of tables. The experimental
160+
\fBflex\fR backend allows more flexible configuration.
160161
\fBnull\fR does not write any output and is only useful for testing or with
161162
\-\-slim for creating slim tables.
162163
.TP

flex-config/README.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
2+
# Flex Backend Configuration
3+
4+
**The Flex Backend is experimental. Everything in here is subject to change.**
5+
6+
See the [Flex Backend Documentation](docs/flex.md) for all the details.
7+
8+
## Example config files
9+
10+
This directory contains example config files for the flex backend. All config
11+
files are documented extensively with inline comments.
12+
13+
If you are learning about the flex backend, read the config files in the
14+
following order (from easiest to understand to the more complex ones):
15+
16+
1. [simple.lua](simple.lua) -- Introduction to config file format
17+
2. [geometries.lua](geometries.lua) -- Geometry column options
18+
3. [data-types.lua](data-types.lua) -- Data types and how to handle them
19+
20+
After that you can dive into more advanced topics:
21+
22+
* [route-relations.lua](route-relations.lua) -- Use multi-stage processing
23+
to bring tags from relations to member ways
24+
* [unitable.lua](unitable.lua) -- Put all OSM data into a single table
25+
* [places.lua](places.lua) -- Creating JSON/JSONB columns
26+
27+
The "default" configuration is a full-featured but simple configuration that
28+
is a good starting point for your own real-world configuration:
29+
30+
* [default-config.lua](default-config.lua)
31+
32+
The following config file tries to be more or less compatible with the old
33+
osm2pgsql C transforms:
34+
35+
* [compatible.lua](compatible.lua)
36+
37+
## Dependencies
38+
39+
Some of the example files use the `inspect` Lua library to show debugging
40+
output. It is not needed for the actual functionality of the examples, so if
41+
you don't have the library, you can remove all uses of `inspect` and the
42+
scripts should still work.
43+
44+
The library is available from [the
45+
source](https://github.com/kikito/inspect.lua) or using
46+
[LuaRocks](https://luarocks.org/modules/kikito/inspect). Debian/Ubuntu users
47+
can install the `lua-inspect` package.
48+

0 commit comments

Comments
 (0)