Skip to content

Commit 546a584

Browse files
committed
Improve the metrics documentation autogenerator.
This uses a template file instead of pure Python, which makes it easier to read the template. It also uses a regular expression, instead of a prefix string, to choose the metrics for each section, which makes it possible to merge metrics that don't start with the same prefix, in turn making the documentation easier to understand. Signed-off-by: Ben Pfaff <blp@feldera.com>
1 parent b0cc726 commit 546a584

File tree

4 files changed

+172
-155
lines changed

4 files changed

+172
-155
lines changed

docs.feldera.com/docs/operations/metrics.md

Lines changed: 21 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,9 @@
1-
<!-- This file is automatically generated. Do not edit!
2-
3-
To regenerate this file, start a Feldera pipeline (any one will do)
4-
and obtain Prometheus metrics for it with, for example, `fda metrics
5-
--format=prometheus` or `curl https://server/v0/metrics`, then run
6-
`metrics.py` from the same directory as this, like so:
1+
# Pipeline Metrics
72

8-
metrics.py < metrics.txt > metrics.md
9-
-->
3+
<!-- This file is automatically generated. Do not edit!
104
11-
# Metrics
5+
To update the documentation, please edit metrics.md.in instead and
6+
then regenerate this file using the instructions in that file. -->
127

138
This reference lists all of the metrics that Feldera exports through
149
its `/metrics` endpoint in [Prometheus exposition format]. It is
@@ -19,10 +14,15 @@ All of the metrics exported by a particular Feldera pipeline are
1914
labeled with the pipeline's UUID as `pipeline`. Some metrics have
2015
additional labels, as documented below.
2116

22-
[Prometheus exposition format]: https://prometheus.io/docs/instrumenting/exposition_formats
17+
See [Monitoring and Profiling] for a guide to setting up Prometheus
18+
and Grafana with Feldera. The [Feldera template dashboard] is a
19+
sample Grafana dashboard for Feldera.
2320

21+
[Prometheus exposition format]: https://prometheus.io/docs/instrumenting/exposition_formats
22+
[Monitoring and Profiling]: /tutorials/monitoring
23+
[Feldera template dashboard]: https://raw.githubusercontent.com/feldera/feldera/main/deploy/grafana_dashboard.json
2424

25-
## Process Metrics
25+
# Process Metrics
2626

2727
These metrics report statistics for a running Feldera pipeline
2828
process. When a pipeline process is killed and restarts from a
@@ -64,18 +64,19 @@ which Feldera is built.
6464

6565
| Name | Type | Description |
6666
| :--- | :--- | :---------- |
67+
| `compaction_stall_duration_seconds` |counter | Time in seconds a worker was stalled waiting for more merges to complete. |
6768
| `dbsp_operator_checkpoint_latency_seconds` |histogram | Latency of individual operator checkpoint operations in seconds. (Because checkpoints run in parallel across workers, these will not add to `feldera_checkpoint_latency_seconds`.) |
6869
| `dbsp_step_latency_seconds` |histogram | Latency of DBSP steps over the last 60 seconds or 1000 steps, whichever is less, in seconds |
6970
| `dbsp_steps_total` |counter | Total number of DBSP steps executed. |
7071

7172
## Record Processing
7273

73-
These metrics report the status of record input, processing, and
74-
output as a whole. They are maintained consistently across checkpoint
75-
and resume.
74+
These metrics report overall counts of records as they pass through
75+
the pipeline. They accumulate across checkpoint and resume.
7676

7777
| Name | Type | Description |
7878
| :--- | :--- | :---------- |
79+
| `output_buffered_batches` |gauge | Number of batches of records currently buffered by the output connector. |
7980
| `records_input_buffered` |gauge | Total number of records currently buffered by all endpoints. |
8081
| `records_input_total` |counter | Total number of input records received from all connectors. |
8182
| `records_late_total` |counter | Number of records dropped due to LATENESS annotations. |
@@ -88,6 +89,8 @@ to work with data larger than memory.
8889

8990
| Name | Type | Description |
9091
| :--- | :--- | :---------- |
92+
| `files_created_total` |counter | Total number of files created. |
93+
| `files_deleted_total` |counter | Total number of files deleted. |
9194
| `storage_read_block_bytes` |histogram | Sizes in bytes of blocks read from storage. |
9295
| `storage_read_latency_seconds` |histogram | Read latency for storage blocks in seconds |
9396
| `storage_sync_latency_seconds` |histogram | Sync latency in seconds |
@@ -107,8 +110,8 @@ These metrics report the status of the pipeline.
107110
These metrics are per-input connector, labeled with `endpoint` set to
108111
the name of the input connector, which is either the name assigned in
109112
the SQL program or automatically generated as `unnamed-<number>`,
110-
where `<number>` is 1 for the first connector for a given table, 2 for
111-
the second, and so on.
113+
where `<number>` counts starting from 1 for the first connector for a
114+
given table.
112115

113116
| Name | Type | Description |
114117
| :--- | :--- | :---------- |
@@ -123,8 +126,8 @@ the second, and so on.
123126
These metrics are per-output connector, labeled with `endpoint` set to
124127
the name of the output connector, which is either the name assigned in
125128
the SQL program or automatically generated as `unnamed-<number>`,
126-
where `<number>` is 1 for the first connector for a given view,
127-
2 for the second, and so on.
129+
where `<number>` counts starting from 1 for the first connector for a
130+
given view.
128131

129132
| Name | Type | Description |
130133
| :--- | :--- | :---------- |
@@ -134,28 +137,3 @@ where `<number>` is 1 for the first connector for a given view,
134137
| `output_connector_errors_transport_total` |counter | Total number of errors encountered at the transport layer sending records. |
135138
| `output_connector_records_total` |counter | Total number of records sent by the output connector. |
136139

137-
## Merge Status
138-
139-
These metrics reports the status of the merger.
140-
141-
| Name | Type | Description |
142-
| :--- | :--- | :---------- |
143-
| `compaction_stall_duration_seconds` |counter | Time in seconds a worker was stalled waiting for more merges to complete. |
144-
145-
## Output Batches
146-
147-
These metrics report output buffering status.
148-
149-
| Name | Type | Description |
150-
| :--- | :--- | :---------- |
151-
| `output_buffered_batches` |gauge | Number of batches of records currently buffered by the output connector. |
152-
153-
## File metrics
154-
155-
These report use of files within Feldera storage.
156-
157-
| Name | Type | Description |
158-
| :--- | :--- | :---------- |
159-
| `files_created_total` |counter | Total number of files created. |
160-
| `files_deleted_total` |counter | Total number of files deleted. |
161-
Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
<!-- This is a template file for metrics documentation.
2+
3+
To regenerate the metrics documentation, start a Feldera pipeline (any
4+
one will do) and obtain Prometheus metrics for it with, for example,
5+
`fda metrics --format=prometheus` or `curl https://server/v0/metrics`.
6+
Save them to a file named `metrics.txt`, then then run `metrics.py`
7+
from this directory, like so:
8+
9+
metrics.py metrics.txt
10+
-->
11+
12+
# Pipeline Metrics
13+
14+
This reference lists all of the metrics that Feldera exports through
15+
its `/metrics` endpoint in [Prometheus exposition format]. It is
16+
automatically generated using the documentation embedded in Prometheus
17+
output.
18+
19+
All of the metrics exported by a particular Feldera pipeline are
20+
labeled with the pipeline's UUID as `pipeline`. Some metrics have
21+
additional labels, as documented below.
22+
23+
See [Monitoring and Profiling] for a guide to setting up Prometheus
24+
and Grafana with Feldera. The [Feldera template dashboard] is a
25+
sample Grafana dashboard for Feldera.
26+
27+
[Prometheus exposition format]: https://prometheus.io/docs/instrumenting/exposition_formats
28+
[Monitoring and Profiling]: /tutorials/monitoring
29+
[Feldera template dashboard]: https://raw.githubusercontent.com/feldera/feldera/main/deploy/grafana_dashboard.json
30+
31+
# Process Metrics
32+
33+
These metrics report statistics for a running Feldera pipeline
34+
process. When a pipeline process is killed and restarts from a
35+
checkpoint, the new process's metrics are for it alone, not cumulative
36+
with any previous instantiations.
37+
38+
These metrics are intended to match the standard [Prometheus
39+
definitions].
40+
41+
[Prometheus definitions]: https://prometheus.io/docs/instrumenting/writing_clientlibs/#process-metrics
42+
43+
{{process_}}
44+
45+
## Feldera metrics
46+
47+
These metrics report statistics for Feldera operations.
48+
49+
{{feldera_}}
50+
51+
## DBSP metrics
52+
53+
These metrics report statistics for [DBSP], the low-level mechanism on
54+
which Feldera is built.
55+
56+
[DBSP]: https://docs.rs/dbsp/latest/dbsp/
57+
58+
{{dbsp_|compaction_}}
59+
60+
## Record Processing
61+
62+
These metrics report overall counts of records as they pass through
63+
the pipeline. They accumulate across checkpoint and resume.
64+
65+
{{records_|output_buffered_}}
66+
67+
## Storage Performance
68+
69+
These metrics report the performance of storage, which allows Feldera
70+
to work with data larger than memory.
71+
72+
{{storage_|files_}}
73+
74+
## Pipeline Status
75+
76+
These metrics report the status of the pipeline.
77+
78+
{{pipeline_}}
79+
80+
## Input Connectors
81+
82+
These metrics are per-input connector, labeled with `endpoint` set to
83+
the name of the input connector, which is either the name assigned in
84+
the SQL program or automatically generated as `unnamed-<number>`,
85+
where `<number>` counts starting from 1 for the first connector for a
86+
given table.
87+
88+
{{input_connector_}}
89+
90+
## Output Connectors
91+
92+
These metrics are per-output connector, labeled with `endpoint` set to
93+
the name of the output connector, which is either the name assigned in
94+
the SQL program or automatically generated as `unnamed-<number>`,
95+
where `<number>` counts starting from 1 for the first connector for a
96+
given view.
97+
98+
{{output_connector_}}
99+
Lines changed: 47 additions & 112 deletions
Original file line numberDiff line numberDiff line change
@@ -1,125 +1,60 @@
11
#!/usr/bin/env python3
22

33
import fileinput
4+
import re
45
import sys
56

7+
# Read Prometheus metrics from the files provided on the command line,
8+
# or on stdin, and save their types and descriptions into `metrics`.
69
metrics = {}
7-
for line in fileinput.input(encoding='utf-8'):
10+
for line in fileinput.input(encoding="utf-8"):
811
comment, keyword, name, args = line.strip().split(maxsplit=3)
9-
if comment == '#' and keyword in ['TYPE', 'HELP']:
12+
if comment == "#" and keyword in ["TYPE", "HELP"]:
1013
metrics.setdefault(name, {})[keyword] = args
1114

12-
print("""<!-- This file is automatically generated. Do not edit!
13-
14-
To regenerate this file, start a Feldera pipeline (any one will do)
15-
and obtain Prometheus metrics for it with, for example, `fda metrics
16-
--format=prometheus` or `curl https://server/v0/metrics`, then run
17-
`metrics.py` from the same directory as this, like so:
18-
19-
metrics.py < metrics.txt > metrics.md
20-
-->
21-
22-
# Metrics
23-
24-
This reference lists all of the metrics that Feldera exports through
25-
its `/metrics` endpoint in [Prometheus exposition format]. It is
26-
automatically generated using the documentation embedded in Prometheus
27-
output.
28-
29-
All of the metrics exported by a particular Feldera pipeline are
30-
labeled with the pipeline's UUID as `pipeline`. Some metrics have
31-
additional labels, as documented below.
32-
33-
[Prometheus exposition format]: https://prometheus.io/docs/instrumenting/exposition_formats
34-
15+
# Read Markdown template from metrics.md.in, making some substitutions:
16+
#
17+
# - Delete lines up to the first `#`, and copy that line to the output.
18+
#
19+
# - Add a warning about the file being automatically generated, so
20+
# that people don't edit it. (This has to go *after* the first `#`
21+
# because docusaurus won't skip the comment when it goes looking
22+
# for the page title.)
23+
#
24+
# - Copy the rest of the file to the output, substituting lines that
25+
# are bracketed by {{}} by autogenerated metrics documentation.
26+
template = open("metrics.md.in", "r")
27+
output = open("metrics.md", "w")
28+
for line in template:
29+
line = line.rstrip()
30+
if line.startswith("#"):
31+
break
32+
output.write(f"""{line}
33+
34+
<!-- This file is automatically generated. Do not edit!
35+
36+
To update the documentation, please edit metrics.md.in instead and
37+
then regenerate this file using the instructions in that file. -->
3538
""")
36-
37-
def document_section(name, heading):
38-
print(heading.strip())
39-
print()
40-
41-
section_metrics = sorted([key for key in metrics.keys() if key.startswith(f"{name}_")])
42-
assert section_metrics != []
43-
print("| Name | Type | Description |")
44-
print("| :--- | :--- | :---------- |")
45-
for metric in section_metrics:
46-
type_ = metrics[metric]['TYPE']
47-
help = metrics[metric]['HELP']
48-
print(f"| `{metric}` |{type_} | {help} |")
49-
del metrics[metric]
50-
print()
51-
52-
document_section("process", """## Process Metrics
53-
54-
These metrics report statistics for a running Feldera pipeline
55-
process. When a pipeline process is killed and restarts from a
56-
checkpoint, the new process's metrics are for it alone, not cumulative
57-
with any previous instantiations.
58-
59-
These metrics are intended to match the standard [Prometheus
60-
definitions].
61-
62-
[Prometheus definitions]: https://prometheus.io/docs/instrumenting/writing_clientlibs/#process-metrics""")
63-
64-
document_section("feldera", """## Feldera metrics
65-
66-
These metrics report statistics for Feldera operations.""")
67-
68-
document_section("dbsp", """## DBSP metrics
69-
70-
These metrics report statistics for [DBSP], the low-level mechanism on
71-
which Feldera is built.
72-
73-
[DBSP]: https://docs.rs/dbsp/latest/dbsp/""")
74-
75-
document_section("records", """## Record Processing
76-
77-
These metrics report the status of record input, processing, and
78-
output as a whole. They are maintained consistently across checkpoint
79-
and resume.
80-
81-
""")
82-
83-
document_section("storage", """## Storage Performance
84-
85-
These metrics report the performance of storage, which allows Feldera
86-
to work with data larger than memory.""")
87-
88-
document_section("pipeline", """## Pipeline Status
89-
90-
These metrics report the status of the pipeline.
91-
92-
""")
93-
94-
document_section("input_connector", """## Input Connectors
95-
96-
These metrics are per-input connector, labeled with `endpoint` set to
97-
the name of the input connector, which is either the name assigned in
98-
the SQL program or automatically generated as `unnamed-<number>`,
99-
where `<number>` is 1 for the first connector for a given table, 2 for
100-
the second, and so on.""")
101-
102-
document_section("output_connector", """## Output Connectors
103-
104-
These metrics are per-output connector, labeled with `endpoint` set to
105-
the name of the output connector, which is either the name assigned in
106-
the SQL program or automatically generated as `unnamed-<number>`,
107-
where `<number>` is 1 for the first connector for a given view,
108-
2 for the second, and so on.""")
109-
110-
document_section("compaction", """## Merge Status
111-
112-
These metrics reports the status of the merger.
113-
""")
114-
115-
document_section("output_buffered", """## Output Batches
116-
117-
These metrics report output buffering status.""")
118-
119-
document_section("files", """## File metrics
120-
121-
These report use of files within Feldera storage.""")
39+
for line in template:
40+
line = line.rstrip()
41+
m = re.match(r"{{(.*)}}", line)
42+
if m:
43+
regex = re.compile(m.group(1))
44+
matching_metrics = sorted([key for key in metrics.keys() if regex.match(key)])
45+
assert matching_metrics != []
46+
output.write("| Name | Type | Description |\n")
47+
output.write("| :--- | :--- | :---------- |\n")
48+
for metric in matching_metrics:
49+
type_ = metrics[metric]["TYPE"]
50+
help = metrics[metric]["HELP"]
51+
output.write(f"| `{metric}` |{type_} | {help} |\n")
52+
del metrics[metric]
53+
else:
54+
output.write(f"{line}\n")
12255

12356
if len(metrics) > 0:
124-
sys.stderr.write(f"error: the following metrics need documentation sections: {metrics.keys()}\n")
57+
sys.stderr.write(
58+
f"error: the following metrics need to be included in documentation: {metrics.keys()}\n"
59+
)
12560
sys.exit(1)

0 commit comments

Comments
 (0)