|
| 1 | +Demonstrations of oomkill, the Linux eBPF/bcc version. |
| 2 | + |
| 3 | + |
| 4 | +oomkill is a simple program that traces the Linux out-of-memory (OOM) killer, |
| 5 | +and shows basic details on one line per OOM kill: |
| 6 | + |
| 7 | +# ./oomkill |
| 8 | +Tracing oom_kill_process()... Ctrl-C to end. |
| 9 | +21:03:39 Triggered by PID 3297 ("ntpd"), OOM kill of PID 22516 ("perl"), 3850642 pages, loadavg: 0.99 0.39 0.30 3/282 22724 |
| 10 | +21:03:48 Triggered by PID 22517 ("perl"), OOM kill of PID 22517 ("perl"), 3850642 pages, loadavg: 0.99 0.41 0.30 2/282 22932 |
| 11 | + |
| 12 | +The first line shows that PID 22516, with process name "perl", was OOM killed |
| 13 | +when it reached 3850642 pages (usually 4 Kbytes per page). This OOM kill |
| 14 | +happened to be triggered by PID 3297, process name "ntpd", doing some memory |
| 15 | +allocation. |
| 16 | + |
| 17 | +The system log (dmesg) shows pages of details and system context about an OOM |
| 18 | +kill. What it currently lacks, however, is context on how the system had been |
| 19 | +changing over time. I've seen OOM kills where I wanted to know if the system |
| 20 | +was at steady state at the time, or if there had been a recent increase in |
| 21 | +workload that triggered the OOM event. oomkill provides some context: at the |
| 22 | +end of the line is the load average information from /proc/loadavg. For both |
| 23 | +of the oomkills here, we can see that the system was getting busier at the |
| 24 | +time (a higher 1 minute "average" of 0.99, compared to the 15 minute "average" |
| 25 | +of 0.30). |
| 26 | + |
| 27 | +oomkill can also be the basis of other tools and customizations. For example, |
| 28 | +you can edit it to include other task_struct details from the target PID at |
| 29 | +the time of the OOM kill. |
| 30 | + |
| 31 | + |
| 32 | +The following commands can be used to test this program, and invoke a memory |
| 33 | +consuming process that exhausts system memory and is OOM killed: |
| 34 | + |
| 35 | +sysctl -w vm.overcommit_memory=1 # always overcommit |
| 36 | +perl -e 'while (1) { $a .= "A" x 1024; }' # eat all memory |
| 37 | + |
| 38 | +WARNING: This exhausts system memory after disabling some overcommit checks. |
| 39 | +Only test in a lab environment. |
0 commit comments