Skip to content

Commit fe730f2

Browse files
authored
Ringbuf Support for Python API (iovisor#2989)
This pull request contains an implementation for ringbuf support in bcc's Python API. Fixes iovisor#2985. More specifically, the following are added: - ringbuf helpers from libbpf API to libbcc - a new RingBuf class to represent the ringbuf map - BPF_RINGBUF_OUTPUT macro for BPF programs - tests - detailed documentation and examples
1 parent 156a7d1 commit fe730f2

12 files changed

Lines changed: 837 additions & 62 deletions

File tree

docs/reference_guide.md

Lines changed: 244 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,11 @@ This guide is incomplete. If something feels missing, check the bcc and kernel s
3737
- [1. bpf_trace_printk()](#1-bpf_trace_printk)
3838
- [2. BPF_PERF_OUTPUT](#2-bpf_perf_output)
3939
- [3. perf_submit()](#3-perf_submit)
40+
- [4. BPF_RINGBUF_OUTPUT](#4-bpf_ringbuf_output)
41+
- [5. ringbuf_output()](#5-ringbuf_output)
42+
- [6. ringbuf_reserve()](#6-ringbuf_reserve)
43+
- [7. ringbuf_submit()](#7-ringbuf_submit)
44+
- [8. ringbuf_discard()](#8-ringbuf_submit)
4045
- [Maps](#maps)
4146
- [1. BPF_TABLE](#1-bpf_table)
4247
- [2. BPF_HASH](#2-bpf_hash)
@@ -81,6 +86,8 @@ This guide is incomplete. If something feels missing, check the bcc and kernel s
8186
- [2. trace_fields()](#2-trace_fields)
8287
- [Output](#output)
8388
- [1. perf_buffer_poll()](#1-perf_buffer_poll)
89+
- [2. ring_buffer_poll()](#2-ring_buffer_poll)
90+
- [3. ring_buffer_consume()](#3-ring_buffer_consume)
8491
- [Maps](#maps)
8592
- [1. get_table()](#1-get_table)
8693
- [2. open_perf_buffer()](#2-open_perf_buffer)
@@ -89,6 +96,7 @@ This guide is incomplete. If something feels missing, check the bcc and kernel s
8996
- [5. clear()](#5-clear)
9097
- [6. print_log2_hist()](#6-print_log2_hist)
9198
- [7. print_linear_hist()](#6-print_linear_hist)
99+
- [8. open_ring_buffer()](#8-open_ring_buffer)
92100
- [Helpers](#helpers)
93101
- [1. ksym()](#1-ksym)
94102
- [2. ksymname()](#2-ksymname)
@@ -647,6 +655,131 @@ Examples in situ:
647655
[search /examples](https://github.com/iovisor/bcc/search?q=perf_submit+path%3Aexamples&type=Code),
648656
[search /tools](https://github.com/iovisor/bcc/search?q=perf_submit+path%3Atools&type=Code)
649657
658+
### 4. BPF_RINGBUF_OUTPUT
659+
660+
Syntax: ```BPF_RINGBUF_OUTPUT(name, page_cnt)```
661+
662+
Creates a BPF table for pushing out custom event data to user space via a ringbuf ring buffer.
663+
```BPF_RINGBUF_OUTPUT``` has several advantages over ```BPF_PERF_OUTPUT```, summarized as follows:
664+
665+
- Buffer is shared across all CPUs, meaning no per-CPU allocation
666+
- Supports two APIs for BPF programs
667+
- ```map.ringbuf_output()``` works like ```map.perf_submit()``` (covered in [ringbuf_output](#5-ringbuf_output))
668+
- ```map.ringbuf_reserve()```/```map.ringbuf_submit()```/```map.ringbuf_discard()```
669+
split the process of reserving buffer space and submitting events into two steps
670+
(covered in [ringbuf_reserve](#6-ringbuf_reserve), [ringbuf_submit](#7-ringbuf_submit), [ringbuf_discard](#8-ringbuf_submit))
671+
- BPF APIs do not require access to a CPU ctx argument
672+
- Superior performance and latency in userspace thanks to a shared ring buffer manager
673+
- Supports two ways of consuming data in userspace
674+
675+
Starting in Linux 5.8, this should be the preferred method for pushing per-event data to user space.
676+
677+
Example of both APIs:
678+
679+
```C
680+
struct data_t {
681+
u32 pid;
682+
u64 ts;
683+
char comm[TASK_COMM_LEN];
684+
};
685+
686+
// Creates a ringbuf called events with 8 pages of space, shared across all CPUs
687+
BPF_RINGBUF_OUTPUT(events, 8);
688+
689+
int first_api_example(struct pt_regs *ctx) {
690+
struct data_t data = {};
691+
692+
data.pid = bpf_get_current_pid_tgid();
693+
data.ts = bpf_ktime_get_ns();
694+
bpf_get_current_comm(&data.comm, sizeof(data.comm));
695+
696+
events.ringbuf_output(&data, sizeof(data), 0 /* flags */);
697+
698+
return 0;
699+
}
700+
701+
int second_api_example(struct pt_regs *ctx) {
702+
struct data_t *data = events.ringbuf_reserve(sizeof(struct data_t));
703+
if (!data) { // Failed to reserve space
704+
return 1;
705+
}
706+
707+
data->pid = bpf_get_current_pid_tgid();
708+
data->ts = bpf_ktime_get_ns();
709+
bpf_get_current_comm(&data->comm, sizeof(data->comm));
710+
711+
events.ringbuf_submit(data, 0 /* flags */);
712+
713+
return 0;
714+
}
715+
```
716+
717+
The output table is named ```events```. Data is allocated via ```events.ringbuf_reserve()``` and pushed to it via ```events.ringbuf_submit()```.
718+
719+
Examples in situ: <!-- TODO -->
720+
[search /examples](https://github.com/iovisor/bcc/search?q=BPF_RINGBUF_OUTPUT+path%3Aexamples&type=Code),
721+
722+
### 5. ringbuf_output()
723+
724+
Syntax: ```int ringbuf_output((void *)data, u64 data_size, u64 flags)```
725+
726+
Return: 0 on success
727+
728+
Flags:
729+
- ```BPF_RB_NO_WAKEUP```: Do not sent notification of new data availability
730+
- ```BPF_RB_FORCE_WAKEUP```: Send notification of new data availability unconditionally
731+
732+
A method of the BPF_RINGBUF_OUTPUT table, for submitting custom event data to user space. This method works like ```perf_submit()```,
733+
although it does not require a ctx argument.
734+
735+
Examples in situ: <!-- TODO -->
736+
[search /examples](https://github.com/iovisor/bcc/search?q=ringbuf_output+path%3Aexamples&type=Code),
737+
738+
### 6. ringbuf_reserve()
739+
740+
Syntax: ```void* ringbuf_reserve(u64 data_size)```
741+
742+
Return: Pointer to data struct on success, NULL on failure
743+
744+
A method of the BPF_RINGBUF_OUTPUT table, for reserving space in the ring buffer and simultaenously
745+
allocating a data struct for output. Must be used with one of ```ringbuf_submit``` or ```ringbuf_discard```.
746+
747+
Examples in situ: <!-- TODO -->
748+
[search /examples](https://github.com/iovisor/bcc/search?q=ringbuf_reserve+path%3Aexamples&type=Code),
749+
750+
### 7. ringbuf_submit()
751+
752+
Syntax: ```void ringbuf_submit((void *)data, u64 flags)```
753+
754+
Return: Nothing, always succeeds
755+
756+
Flags:
757+
- ```BPF_RB_NO_WAKEUP```: Do not sent notification of new data availability
758+
- ```BPF_RB_FORCE_WAKEUP```: Send notification of new data availability unconditionally
759+
760+
A method of the BPF_RINGBUF_OUTPUT table, for submitting custom event data to user space. Must be preceded by a call to
761+
```ringbuf_reserve()``` to reserve space for the data.
762+
763+
Examples in situ: <!-- TODO -->
764+
[search /examples](https://github.com/iovisor/bcc/search?q=ringbuf_submit+path%3Aexamples&type=Code),
765+
766+
### 8. ringbuf_discard()
767+
768+
Syntax: ```void ringbuf_discard((void *)data, u64 flags)```
769+
770+
Return: Nothing, always succeeds
771+
772+
Flags:
773+
- ```BPF_RB_NO_WAKEUP```: Do not sent notification of new data availability
774+
- ```BPF_RB_FORCE_WAKEUP```: Send notification of new data availability unconditionally
775+
776+
A method of the BPF_RINGBUF_OUTPUT table, for discarding custom event data; userspace
777+
ignores the data associated with the discarded event. Must be preceded by a call to
778+
```ringbuf_reserve()``` to reserve space for the data.
779+
780+
Examples in situ: <!-- TODO -->
781+
[search /examples](https://github.com/iovisor/bcc/search?q=ringbuf_submit+path%3Aexamples&type=Code),
782+
650783
## Maps
651784

652785
Maps are BPF data stores, and are the basis for higher level object types including tables, hashes, and histograms.
@@ -1451,6 +1584,55 @@ Examples in situ:
14511584
[search /examples](https://github.com/iovisor/bcc/search?q=perf_buffer_poll+path%3Aexamples+language%3Apython&type=Code),
14521585
[search /tools](https://github.com/iovisor/bcc/search?q=perf_buffer_poll+path%3Atools+language%3Apython&type=Code)
14531586

1587+
### 2. ring_buffer_poll()
1588+
1589+
Syntax: ```BPF.ring_buffer_poll(timeout=T)```
1590+
1591+
This polls from all open ringbuf ring buffers, calling the callback function that was provided when calling open_ring_buffer for each entry.
1592+
1593+
The timeout parameter is optional and measured in milliseconds. In its absence, polling continues until
1594+
there is no more data or the callback returns a negative value.
1595+
1596+
Example:
1597+
1598+
```Python
1599+
# loop with callback to print_event
1600+
b["events"].open_ring_buffer(print_event)
1601+
while 1:
1602+
try:
1603+
b.ring_buffer_poll(30)
1604+
except KeyboardInterrupt:
1605+
exit();
1606+
```
1607+
1608+
Examples in situ:
1609+
[search /examples](https://github.com/iovisor/bcc/search?q=ring_buffer_poll+path%3Aexamples+language%3Apython&type=Code),
1610+
1611+
### 3. ring_buffer_consume()
1612+
1613+
Syntax: ```BPF.ring_buffer_consume()```
1614+
1615+
This consumes from all open ringbuf ring buffers, calling the callback function that was provided when calling open_ring_buffer for each entry.
1616+
1617+
Unlike ```ring_buffer_poll```, this method **does not poll for data** before attempting to consume.
1618+
This reduces latency at the expense of higher CPU consumption. If you are unsure which to use,
1619+
use ```ring_buffer_poll```.
1620+
1621+
Example:
1622+
1623+
```Python
1624+
# loop with callback to print_event
1625+
b["events"].open_ring_buffer(print_event)
1626+
while 1:
1627+
try:
1628+
b.ring_buffer_consume()
1629+
except KeyboardInterrupt:
1630+
exit();
1631+
```
1632+
1633+
Examples in situ:
1634+
[search /examples](https://github.com/iovisor/bcc/search?q=ring_buffer_consume+path%3Aexamples+language%3Apython&type=Code),
1635+
14541636
## Maps
14551637

14561638
Maps are BPF data stores, and are used in bcc to implement a table, and then higher level objects on top of tables, including hashes and histograms.
@@ -1694,6 +1876,68 @@ Examples in situ:
16941876
[search /examples](https://github.com/iovisor/bcc/search?q=print_linear_hist+path%3Aexamples+language%3Apython&type=Code),
16951877
[search /tools](https://github.com/iovisor/bcc/search?q=print_linear_hist+path%3Atools+language%3Apython&type=Code)
16961878

1879+
### 8. open_ring_buffer()
1880+
1881+
Syntax: ```table.open_ring_buffer(callback, ctx=None)```
1882+
1883+
This operates on a table as defined in BPF as BPF_RINGBUF_OUTPUT(), and associates the callback Python function ```callback``` to be called when data is available in the ringbuf ring buffer. This is part of the new (Linux 5.8+) recommended mechanism for transferring per-event data from kernel to user space. Unlike perf buffers, ringbuf sizes are specified within the BPF program, as part of the ```BPF_RINGBUF_OUTPUT``` macro. If the callback is not processing data fast enough, some submitted data may be lost. In this case, the events should be polled more frequently and/or the size of the ring buffer should be increased.
1884+
1885+
Example:
1886+
1887+
```Python
1888+
# process event
1889+
def print_event(ctx, data, size):
1890+
event = ct.cast(data, ct.POINTER(Data)).contents
1891+
[...]
1892+
1893+
# loop with callback to print_event
1894+
b["events"].open_ring_buffer(print_event)
1895+
while 1:
1896+
try:
1897+
b.ring_buffer_poll()
1898+
except KeyboardInterrupt:
1899+
exit()
1900+
```
1901+
1902+
Note that the data structure transferred will need to be declared in C in the BPF program. For example:
1903+
1904+
```C
1905+
// define output data structure in C
1906+
struct data_t {
1907+
u32 pid;
1908+
u64 ts;
1909+
char comm[TASK_COMM_LEN];
1910+
};
1911+
BPF_RINGBUF_OUTPUT(events, 8);
1912+
[...]
1913+
```
1914+
1915+
In Python, you can either let bcc generate the data structure from C declaration automatically (recommended):
1916+
1917+
```Python
1918+
def print_event(ctx, data, size):
1919+
event = b["events"].event(data)
1920+
[...]
1921+
```
1922+
1923+
or define it manually:
1924+
1925+
```Python
1926+
# define output data structure in Python
1927+
TASK_COMM_LEN = 16 # linux/sched.h
1928+
class Data(ct.Structure):
1929+
_fields_ = [("pid", ct.c_ulonglong),
1930+
("ts", ct.c_ulonglong),
1931+
("comm", ct.c_char * TASK_COMM_LEN)]
1932+
1933+
def print_event(ctx, data, size):
1934+
event = ct.cast(data, ct.POINTER(Data)).contents
1935+
[...]
1936+
```
1937+
1938+
Examples in situ:
1939+
[search /examples](https://github.com/iovisor/bcc/search?q=open_ring_buffer+path%3Aexamples+language%3Apython&type=Code),
1940+
16971941
## Helpers
16981942

16991943
Some helper methods provided by bcc. Note that since we're in Python, we can import any Python library and their methods, including, for example, the libraries: argparse, collections, ctypes, datetime, re, socket, struct, subprocess, sys, and time.

examples/ringbuf/ringbuf_output.py

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
#!/usr/bin/python3
2+
3+
import sys
4+
import time
5+
6+
from bcc import BPF
7+
8+
src = r"""
9+
BPF_RINGBUF_OUTPUT(buffer, 1 << 4);
10+
11+
struct event {
12+
char filename[16];
13+
int dfd;
14+
int flags;
15+
int mode;
16+
};
17+
18+
TRACEPOINT_PROBE(syscalls, sys_enter_openat) {
19+
int zero = 0;
20+
21+
struct event event = {};
22+
23+
bpf_probe_read_user_str(event.filename, sizeof(event.filename), args->filename);
24+
25+
event.dfd = args->dfd;
26+
event.flags = args->flags;
27+
event.mode = args->mode;
28+
29+
buffer.ringbuf_output(&event, sizeof(event), 0);
30+
31+
return 0;
32+
}
33+
"""
34+
35+
b = BPF(text=src)
36+
37+
def callback(ctx, data, size):
38+
event = b['buffer'].event(data)
39+
print("%-16s %10d %10d %10d" % (event.filename.decode('utf-8'), event.dfd, event.flags, event.mode))
40+
41+
b['buffer'].open_ring_buffer(callback)
42+
43+
print("Printing openat() calls, ctrl-c to exit.")
44+
45+
print("%-16s %10s %10s %10s" % ("FILENAME", "DIR_FD", "FLAGS", "MODE"))
46+
47+
try:
48+
while 1:
49+
b.ring_buffer_poll()
50+
# or b.ring_buffer_consume()
51+
time.sleep(0.5)
52+
except KeyboardInterrupt:
53+
sys.exit()

examples/ringbuf/ringbuf_submit.py

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
#!/usr/bin/python3
2+
3+
import sys
4+
import time
5+
6+
from bcc import BPF
7+
8+
src = r"""
9+
BPF_RINGBUF_OUTPUT(buffer, 1 << 4);
10+
11+
struct event {
12+
char filename[64];
13+
int dfd;
14+
int flags;
15+
int mode;
16+
};
17+
18+
TRACEPOINT_PROBE(syscalls, sys_enter_openat) {
19+
int zero = 0;
20+
21+
struct event *event = buffer.ringbuf_reserve(sizeof(struct event));
22+
if (!event) {
23+
return 1;
24+
}
25+
26+
bpf_probe_read_user_str(event->filename, sizeof(event->filename), args->filename);
27+
28+
event->dfd = args->dfd;
29+
event->flags = args->flags;
30+
event->mode = args->mode;
31+
32+
buffer.ringbuf_submit(event, 0);
33+
// or, to discard: buffer.ringbuf_discard(event, 0);
34+
35+
return 0;
36+
}
37+
"""
38+
39+
b = BPF(text=src)
40+
41+
def callback(ctx, data, size):
42+
event = b['buffer'].event(data)
43+
print("%-64s %10d %10d %10d" % (event.filename.decode('utf-8'), event.dfd, event.flags, event.mode))
44+
45+
b['buffer'].open_ring_buffer(callback)
46+
47+
print("Printing openat() calls, ctrl-c to exit.")
48+
49+
print("%-64s %10s %10s %10s" % ("FILENAME", "DIR_FD", "FLAGS", "MODE"))
50+
51+
try:
52+
while 1:
53+
b.ring_buffer_consume()
54+
time.sleep(0.5)
55+
except KeyboardInterrupt:
56+
sys.exit()

0 commit comments

Comments
 (0)