Skip to content

Commit fe430e5

Browse files
committed
oomkill
1 parent 2187813 commit fe430e5

4 files changed

Lines changed: 137 additions & 0 deletions

File tree

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,7 @@ Tools:
8383
- tools/[memleak](tools/memleak.py): Display outstanding memory allocations to find memory leaks. [Examples](tools/memleak_examples.txt).
8484
- tools/[offcputime](tools/offcputime.py): Summarize off-CPU time by kernel stack trace. [Examples](tools/offcputime_example.txt).
8585
- tools/[offwaketime](tools/offwaketime.py): Summarize blocked time by kernel off-CPU stack and waker stack. [Examples](tools/offwaketime_example.txt).
86+
- tools/[oomkill](tools/oomkill.py): Trace the out-of-memory (OOM) killer. [Examples](tools/oomkill_example.txt).
8687
- tools/[opensnoop](tools/opensnoop.py): Trace open() syscalls. [Examples](tools/opensnoop_example.txt).
8788
- tools/[pidpersec](tools/pidpersec.py): Count new processes (via fork). [Examples](tools/pidpersec_example.txt).
8889
- tools/[runqlat](tools/runqlat.py): Run queue (scheduler) latency as a histogram. [Examples](tools/runqlat_example.txt).

man/man8/oomkill.8

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
.TH oomkill 8 "2016-02-09" "USER COMMANDS"
2+
.SH NAME
3+
oomkill \- Trace oom_kill_process(). Uses Linux eBPF/bcc.
4+
.SH SYNOPSIS
5+
.B bashreadline
6+
.SH DESCRIPTION
7+
This traces the kernel out-of-memory killer, and prints basic details,
8+
including the system load averages at the time of the OOM kill. This can
9+
provide more context on the system state at the time: was it getting busier
10+
or steady, based on the load averages? This tool may also be useful to
11+
customize for investigations; for example, by adding other task_struct
12+
details at the time of OOM.
13+
14+
This program is also a basic example of eBPF/bcc.
15+
16+
Since this uses BPF, only the root user can use this tool.
17+
.SH REQUIREMENTS
18+
CONFIG_BPF and bcc.
19+
.SH EXAMPLES
20+
.TP
21+
Trace OOM kill events:
22+
#
23+
.B oomkill
24+
.SH FIELDS
25+
.TP
26+
Triggered by ...
27+
The process ID and process name of the task that was running when another task was OOM
28+
killed.
29+
.TP
30+
OOM kill of ...
31+
The process ID and name of the target process that was OOM killed.
32+
.TP
33+
loadavg
34+
Contents of /proc/loadavg. The first three numbers are 1, 5, and 15 minute
35+
load averages (where the average is an exponentially damped moving sum, and
36+
those numbers are constants in the equation); then there is the number of
37+
running tasks, a slash, and the total number of tasks; and then the last number
38+
is the last PID to be created.
39+
.SH OVERHEAD
40+
Negligible.
41+
.SH SOURCE
42+
This is from bcc.
43+
.IP
44+
https://github.com/iovisor/bcc
45+
.PP
46+
Also look in the bcc distribution for a companion _examples.txt file containing
47+
example usage, output, and commentary for this tool.
48+
.SH OS
49+
Linux
50+
.SH STABILITY
51+
Unstable - in development.
52+
.SH AUTHOR
53+
Brendan Gregg
54+
.SH SEE ALSO
55+
memleak(8)

tools/oomkill.py

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
#!/usr/bin/env python
2+
#
3+
# oomkill Trace oom_kill_process(). For Linux, uses BCC, eBPF.
4+
#
5+
# This traces the kernel out-of-memory killer, and prints basic details,
6+
# including the system load averages. This can provide more context on the
7+
# system state at the time of OOM: was it getting busier or steady, based
8+
# on the load averages? This tool may also be useful to customize for
9+
# investigations; for example, by adding other task_struct details at the time
10+
# of OOM.
11+
#
12+
# Copyright 2016 Netflix, Inc.
13+
# Licensed under the Apache License, Version 2.0 (the "License")
14+
#
15+
# 09-Feb-2016 Brendan Gregg Created this.
16+
17+
from bcc import BPF
18+
from time import strftime
19+
20+
# linux stats
21+
loadavg = "/proc/loadavg"
22+
23+
# initialize BPF
24+
b = BPF(text="""
25+
#include <uapi/linux/ptrace.h>
26+
#include <linux/oom.h>
27+
void kprobe__oom_kill_process(struct pt_regs *ctx, struct oom_control *oc,
28+
struct task_struct *p, unsigned int points, unsigned long totalpages)
29+
{
30+
bpf_trace_printk("OOM kill of PID %d (\\"%s\\"), %d pages\\n", p->pid,
31+
p->comm, totalpages);
32+
}
33+
""")
34+
35+
# print output
36+
print("Tracing oom_kill_process()... Ctrl-C to end.")
37+
while 1:
38+
(task, pid, cpu, flags, ts, msg) = b.trace_fields()
39+
with open(loadavg) as stats:
40+
avgline = stats.read().rstrip()
41+
print("%s Triggered by PID %d (\"%s\"), %s, loadavg: %s" % (
42+
strftime("%H:%M:%S"), pid, task, msg, avgline))

tools/oomkill_example.txt

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
Demonstrations of oomkill, the Linux eBPF/bcc version.
2+
3+
4+
oomkill is a simple program that traces the Linux out-of-memory (OOM) killer,
5+
and shows basic details on one line per OOM kill:
6+
7+
# ./oomkill
8+
Tracing oom_kill_process()... Ctrl-C to end.
9+
21:03:39 Triggered by PID 3297 ("ntpd"), OOM kill of PID 22516 ("perl"), 3850642 pages, loadavg: 0.99 0.39 0.30 3/282 22724
10+
21:03:48 Triggered by PID 22517 ("perl"), OOM kill of PID 22517 ("perl"), 3850642 pages, loadavg: 0.99 0.41 0.30 2/282 22932
11+
12+
The first line shows that PID 22516, with process name "perl", was OOM killed
13+
when it reached 3850642 pages (usually 4 Kbytes per page). This OOM kill
14+
happened to be triggered by PID 3297, process name "ntpd", doing some memory
15+
allocation.
16+
17+
The system log (dmesg) shows pages of details and system context about an OOM
18+
kill. What it currently lacks, however, is context on how the system had been
19+
changing over time. I've seen OOM kills where I wanted to know if the system
20+
was at steady state at the time, or if there had been a recent increase in
21+
workload that triggered the OOM event. oomkill provides some context: at the
22+
end of the line is the load average information from /proc/loadavg. For both
23+
of the oomkills here, we can see that the system was getting busier at the
24+
time (a higher 1 minute "average" of 0.99, compared to the 15 minute "average"
25+
of 0.30).
26+
27+
oomkill can also be the basis of other tools and customizations. For example,
28+
you can edit it to include other task_struct details from the target PID at
29+
the time of the OOM kill.
30+
31+
32+
The following commands can be used to test this program, and invoke a memory
33+
consuming process that exhausts system memory and is OOM killed:
34+
35+
sysctl -w vm.overcommit_memory=1 # always overcommit
36+
perl -e 'while (1) { $a .= "A" x 1024; }' # eat all memory
37+
38+
WARNING: This exhausts system memory after disabling some overcommit checks.
39+
Only test in a lab environment.

0 commit comments

Comments
 (0)