Skip to content

Commit ad5d698

Browse files
committed
Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar: "As a first remark I'd like to note that the way to build perf tooling has been simplified and sped up, in the future it should be enough for you to build perf via: cd tools/perf/ make install (ie without the -j option.) The build system will figure out the number of CPUs and will do a parallel build+install. The various build system inefficiencies and breakages Linus reported against the v3.12 pull request should now be resolved - please (re-)report any remaining annoyances or bugs. Main changes on the perf kernel side: * Performance optimizations: . perf ring-buffer code optimizations, by Peter Zijlstra . perf ring-buffer code optimizations, by Oleg Nesterov . x86 NMI call-stack processing optimizations, by Peter Zijlstra . perf context-switch optimizations, by Peter Zijlstra . perf sampling speedups, by Peter Zijlstra . x86 Intel PEBS processing speedups, by Peter Zijlstra * Enhanced hardware support: . for Intel Ivy Bridge-EP uncore PMUs, by Zheng Yan . for Haswell transactions, by Andi Kleen, Peter Zijlstra * Core perf events code enhancements and fixes by Oleg Nesterov: . for uprobes, if fork() is called with pending ret-probes . for uprobes platform support code * New ABI details by Andi Kleen: . Report x86 Haswell TSX transaction abort cost as weight Main changes on the perf tooling side (some of these tooling changes utilize the above kernel side changes): * 'perf report/top' enhancements: . Convert callchain children list to rbtree, greatly reducing the time taken for callchain processing, from Namhyung Kim. . Add new COMM infrastructure, further improving histogram processing, from Frédéric Weisbecker, one fix from Namhyung Kim. . Add /proc/kcore based live-annotation improvements, including build-id cache support, multi map 'call' instruction navigation fixes, kcore address validation, objdump workarounds. From Adrian Hunter. . Show progress on histogram collapsing, that can take a long time, from Namhyung Kim. . Add --max-stack option to limit callchain stack scan in 'top' and 'report', improving callchain processing when reducing the stack depth is an option, from Waiman Long. . Add new option --ignore-vmlinux for perf top, from Willy Tarreau. * 'perf trace' enhancements: . 'perf trace' now can can use a 'perf probe' dynamic tracepoints to hook into the userspace -> kernel pathname copy so that it can map fds to pathnames without reading /proc/pid/fd/ symlinks. From Arnaldo Carvalho de Melo. . Show VFS path associated with fd in live sessions, using a 'vfs_getname' 'perf probe' created dynamic tracepoint or by looking at /proc/pid/fd, from Arnaldo Carvalho de Melo. . Add 'trace' beautifiers for lots of syscall arguments, from Arnaldo Carvalho de Melo. . Implement more compact 'trace' output by suppressing zeroed args, from Arnaldo Carvalho de Melo. . Show thread COMM by default in 'trace', from Arnaldo Carvalho de Melo. . Add option to show full timestamp in 'trace', from David Ahern. . Add 'record' command in 'trace', to record raw_syscalls:*, from David Ahern. . Add summary option to dump syscall statistics in 'trace', from David Ahern. . Improve error messages in 'trace', providing hints about system configuration steps needed for using it, from Ramkumar Ramachandra. . 'perf trace' now emits hints as to why tracing is not possible, helping the user to setup the system to allow tracing in the desired permission granularity, telling if the problem is due to debugfs not being mounted or with not enough permission for !root, /proc/sys/kernel/perf_event_paranoit value, etc. From Arnaldo Carvalho de Melo. * 'perf record' enhancements: . Check maximum frequency rate for record/top, emitting better error messages, from Jiri Olsa. . 'perf record' code cleanups, from David Ahern. . Improve write_output error message in 'perf record', from Adrian Hunter. . Allow specifying B/K/M/G unit to the --mmap-pages arguments, from Jiri Olsa. . Fix command line callchain attribute tests to handle the new -g/--call-chain semantics, from Arnaldo Carvalho de Melo. * 'perf kvm' enhancements: . Disable live kvm command if timerfd is not supported, from David Ahern. . Fix detection of non-core features, from David Ahern. * 'perf list' enhancements: . Add usage to 'perf list', from David Ahern. . Show error in 'perf list' if tracepoints not available, from Pekka Enberg. * 'perf probe' enhancements: . Support "$vars" meta argument syntax for local variables, allowing asking for all possible variables at a given probe point to be collected when it hits, from Masami Hiramatsu. * 'perf sched' enhancements: . Address the root cause of that 'perf sched' stack initialization build slowdown, by programmatically setting a big array after moving the global variable back to the stack. Fix from Adrian Hunter. * 'perf script' enhancements: . Set up output options for in-stream attributes, from Adrian Hunter. . Print addr by default for BTS in 'perf script', from Adrian Juntmer * 'perf stat' enhancements: . Improved messages when doing profiling in all or a subset of CPUs using a workload as the session delimitator, as in: 'perf stat --cpu 0,2 sleep 10s' from Arnaldo Carvalho de Melo. . Add units to nanosec-based counters in 'perf stat', from David Ahern. . Remove bogus info when using 'perf stat' -e cycles/instructions, from Ramkumar Ramachandra. * 'perf lock' enhancements: . 'perf lock' fixes and cleanups, from Davidlohr Bueso. * 'perf test' enhancements: . Fixup PERF_SAMPLE_TRANSACTION handling in sample synthesizing and 'perf test', from Adrian Hunter. . Clarify the "sample parsing" test entry, from Arnaldo Carvalho de Melo. . Consider PERF_SAMPLE_TRANSACTION in the "sample parsing" test, from Arnaldo Carvalho de Melo. . Memory leak fixes in 'perf test', from Felipe Pena. * 'perf bench' enhancements: . Change the procps visible command-name of invididual benchmark tests plus cleanups, from Ingo Molnar. * Generic perf tooling infrastructure/plumbing changes: . Separating data file properties from session, code reorganization from Jiri Olsa. . Fix version when building out of tree, as when using one of these: $ make help | grep perf perf-tar-src-pkg - Build perf-3.12.0.tar source tarball perf-targz-src-pkg - Build perf-3.12.0.tar.gz source tarball perf-tarbz2-src-pkg - Build perf-3.12.0.tar.bz2 source tarball perf-tarxz-src-pkg - Build perf-3.12.0.tar.xz source tarball $ from David Ahern. . Enhance option parse error message, showing just the help lines of the options affected, from Namhyung Kim. . libtraceevent updates from upstream trace-cmd repo, from Steven Rostedt. . Always use perf_evsel__set_sample_bit to set sample_type, from Adrian Hunter. . Memory and mmap leak fixes from Chenggang Qin. . Assorted build fixes for from David Ahern and Jiri Olsa. . Speed up and prettify the build system, from Ingo Molnar. . Implement addr2line directly using libbfd, from Roberto Vitillo. . Separate the GTK support in a separate libperf-gtk.so DSO, that is only loaded when --gtk is specified, from Namhyung Kim. . perf bash completion fixes and improvements from Ramkumar Ramachandra. . Support for Openembedded/Yocto -dbg packages, from Ricardo Ribalda Delgado. And lots and lots of other fixes and code reorganizations that did not make it into the list, see the shortlog, diffstat and the Git log for details!" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (300 commits) uprobes: Fix the memory out of bound overwrite in copy_insn() uprobes: Fix the wrong usage of current->utask in uprobe_copy_process() perf tools: Remove unneeded include perf record: Remove post_processing_offset variable perf record: Remove advance_output function perf record: Refactor feature handling into a separate function perf trace: Don't relookup fields by name in each sample perf tools: Fix version when building out of tree perf evsel: Ditch evsel->handler.data field uprobes: Export write_opcode() as uprobe_write_opcode() uprobes: Introduce arch_uprobe->ixol uprobes: Kill module_init() and module_exit() uprobes: Move function declarations out of arch perf/x86/intel: Add Ivy Bridge-EP uncore IRP box support perf/x86/intel/uncore: Add filter support for IvyBridge-EP QPI boxes perf: Factor out strncpy() in perf_event_mmap_event() tools/perf: Add required memory barriers perf: Fix arch_perf_out_copy_user default perf: Update a stale comment perf: Optimize perf_output_begin() -- address calculation ...
2 parents ef1417a + caea6cf commit ad5d698

212 files changed

Lines changed: 9109 additions & 3939 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

arch/powerpc/include/asm/uprobes.h

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ typedef ppc_opcode_t uprobe_opcode_t;
3737
struct arch_uprobe {
3838
union {
3939
u8 insn[MAX_UINSN_BYTES];
40+
u8 ixol[MAX_UINSN_BYTES];
4041
u32 ainsn;
4142
};
4243
};
@@ -45,11 +46,4 @@ struct arch_uprobe_task {
4546
unsigned long saved_trap_nr;
4647
};
4748

48-
extern int arch_uprobe_analyze_insn(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long addr);
49-
extern int arch_uprobe_pre_xol(struct arch_uprobe *aup, struct pt_regs *regs);
50-
extern int arch_uprobe_post_xol(struct arch_uprobe *aup, struct pt_regs *regs);
51-
extern bool arch_uprobe_xol_was_trapped(struct task_struct *tsk);
52-
extern int arch_uprobe_exception_notify(struct notifier_block *self, unsigned long val, void *data);
53-
extern void arch_uprobe_abort_xol(struct arch_uprobe *aup, struct pt_regs *regs);
54-
extern unsigned long arch_uretprobe_hijack_return_addr(unsigned long trampoline_vaddr, struct pt_regs *regs);
5549
#endif /* _ASM_UPROBES_H */

arch/x86/include/asm/uprobes.h

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,10 @@ typedef u8 uprobe_opcode_t;
3535

3636
struct arch_uprobe {
3737
u16 fixups;
38-
u8 insn[MAX_UINSN_BYTES];
38+
union {
39+
u8 insn[MAX_UINSN_BYTES];
40+
u8 ixol[MAX_UINSN_BYTES];
41+
};
3942
#ifdef CONFIG_X86_64
4043
unsigned long rip_rela_target_address;
4144
#endif
@@ -49,11 +52,4 @@ struct arch_uprobe_task {
4952
unsigned int saved_tf;
5053
};
5154

52-
extern int arch_uprobe_analyze_insn(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long addr);
53-
extern int arch_uprobe_pre_xol(struct arch_uprobe *aup, struct pt_regs *regs);
54-
extern int arch_uprobe_post_xol(struct arch_uprobe *aup, struct pt_regs *regs);
55-
extern bool arch_uprobe_xol_was_trapped(struct task_struct *tsk);
56-
extern int arch_uprobe_exception_notify(struct notifier_block *self, unsigned long val, void *data);
57-
extern void arch_uprobe_abort_xol(struct arch_uprobe *aup, struct pt_regs *regs);
58-
extern unsigned long arch_uretprobe_hijack_return_addr(unsigned long trampoline_vaddr, struct pt_regs *regs);
5955
#endif /* _ASM_UPROBES_H */

arch/x86/kernel/cpu/perf_event.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1989,7 +1989,7 @@ perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry *entry)
19891989
frame.return_address = 0;
19901990

19911991
bytes = copy_from_user_nmi(&frame, fp, sizeof(frame));
1992-
if (bytes != sizeof(frame))
1992+
if (bytes != 0)
19931993
break;
19941994

19951995
if (!valid_user_frame(fp, sizeof(frame)))
@@ -2041,7 +2041,7 @@ perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
20412041
frame.return_address = 0;
20422042

20432043
bytes = copy_from_user_nmi(&frame, fp, sizeof(frame));
2044-
if (bytes != sizeof(frame))
2044+
if (bytes != 0)
20452045
break;
20462046

20472047
if (!valid_user_frame(fp, sizeof(frame)))

arch/x86/kernel/cpu/perf_event.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -163,6 +163,11 @@ struct cpu_hw_events {
163163
u64 intel_ctrl_host_mask;
164164
struct perf_guest_switch_msr guest_switch_msrs[X86_PMC_IDX_MAX];
165165

166+
/*
167+
* Intel checkpoint mask
168+
*/
169+
u64 intel_cp_status;
170+
166171
/*
167172
* manage shared (per-core, per-cpu) registers
168173
* used on Intel NHM/WSM/SNB
@@ -440,6 +445,7 @@ struct x86_pmu {
440445
int lbr_nr; /* hardware stack size */
441446
u64 lbr_sel_mask; /* LBR_SELECT valid bits */
442447
const int *lbr_sel_map; /* lbr_select mappings */
448+
bool lbr_double_abort; /* duplicated lbr aborts */
443449

444450
/*
445451
* Extra registers for events

arch/x86/kernel/cpu/perf_event_intel.c

Lines changed: 73 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -190,9 +190,9 @@ static struct extra_reg intel_snbep_extra_regs[] __read_mostly = {
190190
EVENT_EXTRA_END
191191
};
192192

193-
EVENT_ATTR_STR(mem-loads, mem_ld_nhm, "event=0x0b,umask=0x10,ldlat=3");
194-
EVENT_ATTR_STR(mem-loads, mem_ld_snb, "event=0xcd,umask=0x1,ldlat=3");
195-
EVENT_ATTR_STR(mem-stores, mem_st_snb, "event=0xcd,umask=0x2");
193+
EVENT_ATTR_STR(mem-loads, mem_ld_nhm, "event=0x0b,umask=0x10,ldlat=3");
194+
EVENT_ATTR_STR(mem-loads, mem_ld_snb, "event=0xcd,umask=0x1,ldlat=3");
195+
EVENT_ATTR_STR(mem-stores, mem_st_snb, "event=0xcd,umask=0x2");
196196

197197
struct attribute *nhm_events_attrs[] = {
198198
EVENT_PTR(mem_ld_nhm),
@@ -1184,6 +1184,11 @@ static void intel_pmu_disable_fixed(struct hw_perf_event *hwc)
11841184
wrmsrl(hwc->config_base, ctrl_val);
11851185
}
11861186

1187+
static inline bool event_is_checkpointed(struct perf_event *event)
1188+
{
1189+
return (event->hw.config & HSW_IN_TX_CHECKPOINTED) != 0;
1190+
}
1191+
11871192
static void intel_pmu_disable_event(struct perf_event *event)
11881193
{
11891194
struct hw_perf_event *hwc = &event->hw;
@@ -1197,6 +1202,7 @@ static void intel_pmu_disable_event(struct perf_event *event)
11971202

11981203
cpuc->intel_ctrl_guest_mask &= ~(1ull << hwc->idx);
11991204
cpuc->intel_ctrl_host_mask &= ~(1ull << hwc->idx);
1205+
cpuc->intel_cp_status &= ~(1ull << hwc->idx);
12001206

12011207
/*
12021208
* must disable before any actual event
@@ -1271,6 +1277,9 @@ static void intel_pmu_enable_event(struct perf_event *event)
12711277
if (event->attr.exclude_guest)
12721278
cpuc->intel_ctrl_host_mask |= (1ull << hwc->idx);
12731279

1280+
if (unlikely(event_is_checkpointed(event)))
1281+
cpuc->intel_cp_status |= (1ull << hwc->idx);
1282+
12741283
if (unlikely(hwc->config_base == MSR_ARCH_PERFMON_FIXED_CTR_CTRL)) {
12751284
intel_pmu_enable_fixed(hwc);
12761285
return;
@@ -1289,6 +1298,17 @@ static void intel_pmu_enable_event(struct perf_event *event)
12891298
int intel_pmu_save_and_restart(struct perf_event *event)
12901299
{
12911300
x86_perf_event_update(event);
1301+
/*
1302+
* For a checkpointed counter always reset back to 0. This
1303+
* avoids a situation where the counter overflows, aborts the
1304+
* transaction and is then set back to shortly before the
1305+
* overflow, and overflows and aborts again.
1306+
*/
1307+
if (unlikely(event_is_checkpointed(event))) {
1308+
/* No race with NMIs because the counter should not be armed */
1309+
wrmsrl(event->hw.event_base, 0);
1310+
local64_set(&event->hw.prev_count, 0);
1311+
}
12921312
return x86_perf_event_set_period(event);
12931313
}
12941314

@@ -1372,6 +1392,13 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
13721392
x86_pmu.drain_pebs(regs);
13731393
}
13741394

1395+
/*
1396+
* Checkpointed counters can lead to 'spurious' PMIs because the
1397+
* rollback caused by the PMI will have cleared the overflow status
1398+
* bit. Therefore always force probe these counters.
1399+
*/
1400+
status |= cpuc->intel_cp_status;
1401+
13751402
for_each_set_bit(bit, (unsigned long *)&status, X86_PMC_IDX_MAX) {
13761403
struct perf_event *event = cpuc->events[bit];
13771404

@@ -1837,6 +1864,20 @@ static int hsw_hw_config(struct perf_event *event)
18371864
event->attr.precise_ip > 0))
18381865
return -EOPNOTSUPP;
18391866

1867+
if (event_is_checkpointed(event)) {
1868+
/*
1869+
* Sampling of checkpointed events can cause situations where
1870+
* the CPU constantly aborts because of a overflow, which is
1871+
* then checkpointed back and ignored. Forbid checkpointing
1872+
* for sampling.
1873+
*
1874+
* But still allow a long sampling period, so that perf stat
1875+
* from KVM works.
1876+
*/
1877+
if (event->attr.sample_period > 0 &&
1878+
event->attr.sample_period < 0x7fffffff)
1879+
return -EOPNOTSUPP;
1880+
}
18401881
return 0;
18411882
}
18421883

@@ -2182,10 +2223,36 @@ static __init void intel_nehalem_quirk(void)
21822223
}
21832224
}
21842225

2185-
EVENT_ATTR_STR(mem-loads, mem_ld_hsw, "event=0xcd,umask=0x1,ldlat=3");
2186-
EVENT_ATTR_STR(mem-stores, mem_st_hsw, "event=0xd0,umask=0x82")
2226+
EVENT_ATTR_STR(mem-loads, mem_ld_hsw, "event=0xcd,umask=0x1,ldlat=3");
2227+
EVENT_ATTR_STR(mem-stores, mem_st_hsw, "event=0xd0,umask=0x82")
2228+
2229+
/* Haswell special events */
2230+
EVENT_ATTR_STR(tx-start, tx_start, "event=0xc9,umask=0x1");
2231+
EVENT_ATTR_STR(tx-commit, tx_commit, "event=0xc9,umask=0x2");
2232+
EVENT_ATTR_STR(tx-abort, tx_abort, "event=0xc9,umask=0x4");
2233+
EVENT_ATTR_STR(tx-capacity, tx_capacity, "event=0x54,umask=0x2");
2234+
EVENT_ATTR_STR(tx-conflict, tx_conflict, "event=0x54,umask=0x1");
2235+
EVENT_ATTR_STR(el-start, el_start, "event=0xc8,umask=0x1");
2236+
EVENT_ATTR_STR(el-commit, el_commit, "event=0xc8,umask=0x2");
2237+
EVENT_ATTR_STR(el-abort, el_abort, "event=0xc8,umask=0x4");
2238+
EVENT_ATTR_STR(el-capacity, el_capacity, "event=0x54,umask=0x2");
2239+
EVENT_ATTR_STR(el-conflict, el_conflict, "event=0x54,umask=0x1");
2240+
EVENT_ATTR_STR(cycles-t, cycles_t, "event=0x3c,in_tx=1");
2241+
EVENT_ATTR_STR(cycles-ct, cycles_ct, "event=0x3c,in_tx=1,in_tx_cp=1");
21872242

21882243
static struct attribute *hsw_events_attrs[] = {
2244+
EVENT_PTR(tx_start),
2245+
EVENT_PTR(tx_commit),
2246+
EVENT_PTR(tx_abort),
2247+
EVENT_PTR(tx_capacity),
2248+
EVENT_PTR(tx_conflict),
2249+
EVENT_PTR(el_start),
2250+
EVENT_PTR(el_commit),
2251+
EVENT_PTR(el_abort),
2252+
EVENT_PTR(el_capacity),
2253+
EVENT_PTR(el_conflict),
2254+
EVENT_PTR(cycles_t),
2255+
EVENT_PTR(cycles_ct),
21892256
EVENT_PTR(mem_ld_hsw),
21902257
EVENT_PTR(mem_st_hsw),
21912258
NULL
@@ -2452,6 +2519,7 @@ __init int intel_pmu_init(void)
24522519
x86_pmu.hw_config = hsw_hw_config;
24532520
x86_pmu.get_event_constraints = hsw_get_event_constraints;
24542521
x86_pmu.cpu_events = hsw_events_attrs;
2522+
x86_pmu.lbr_double_abort = true;
24552523
pr_cont("Haswell events, ");
24562524
break;
24572525

0 commit comments

Comments
 (0)