summaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)Author
2020-08-14perf ftrace: Make option description initials all capital lettersArnaldo Carvalho de Melo
And improve a bit the -m description to state that a B/K/M/G suffix is needed. Cc: Changbin Du <changbin.du@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-14perf build-ids: Fall back to debuginfod query if debuginfo not foundFrank Ch. Eigler
During a perf-record, use the -ldebuginfod API to query a debuginfod server, should the debug data not be found in the usual system locations. If successful, the usual $HOME/.debug dir is populated. Tested with: $ find . . ./ctags-debuginfo-5.8-26.fc31.x86_64.rpm ./usr ./usr/lib ./usr/lib/debug ./usr/lib/debug/.build-id ./usr/lib/debug/.build-id/ca ./usr/lib/debug/.build-id/ca/46f6ae6a0cee57d85abc1d461c49074248908d ./usr/lib/debug/.build-id/ca/46f6ae6a0cee57d85abc1d461c49074248908d.debug ./usr/lib/debug/usr ./usr/lib/debug/usr/bin ./usr/lib/debug/usr/bin/ctags-5.8-26.fc31.x86_64.debug $ debuginfod -F . ... $ rm -rf ~/.debug/ ; mkdir ~/.debug $ perf record make tags BUILD: Doing 'make -j8' parallel build GEN tags [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.107 MB perf.data (1483 samples) ] $ find ~/.debug | grep ctags /home/jolsa/.debug/usr/bin/ctags /home/jolsa/.debug/usr/bin/ctags/ca46f6ae6a0cee57d85abc1d461c49074248908d /home/jolsa/.debug/usr/bin/ctags/ca46f6ae6a0cee57d85abc1d461c49074248908d/elf /home/jolsa/.debug/usr/bin/ctags/ca46f6ae6a0cee57d85abc1d461c49074248908d/probes $ rm -rf ~/.debug/ ; mkdir ~/.debug $ DEBUGINFOD_URLS=http://localhost:8002 perf record make tags BUILD: Doing 'make -j8' parallel build GEN tags [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.108 MB perf.data (1531 samples) ] $ find ~/.debug | grep ctag /home/jolsa/.debug/usr/bin/ctags /home/jolsa/.debug/usr/bin/ctags/ca46f6ae6a0cee57d85abc1d461c49074248908d /home/jolsa/.debug/usr/bin/ctags/ca46f6ae6a0cee57d85abc1d461c49074248908d/debug /home/jolsa/.debug/usr/bin/ctags/ca46f6ae6a0cee57d85abc1d461c49074248908d/elf /home/jolsa/.debug/usr/bin/ctags/ca46f6ae6a0cee57d85abc1d461c49074248908d/probes Note the 'debug' file is created in the last run. Note that currently the debuginfo data are downloaded only on record path, we still need add this support to perf build-id/report.. and test ;-) Tested-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Frank Ch. Eigler <fche@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-14perf bench numa: Remove dead code in parse_nodes_opt()Peng Fan
In the function parse_nodes_opt(), the statement "return 0;" is dead code, remove it. Signed-off-by: Peng Fan <fanpeng@loongson.cn> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/1597401894-27549-1-git-send-email-fanpeng@loongson.cn Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-14perf stat: Update POWER9 metrics to utilize other metricsPaul A. Clarke
These changes take advantage of the new capability added in merge commit 00e4db51259a5f936fec1424b884f029479d3981 "Allow using computed metrics in calculating other metrics". The net is a simplification of the expressions for a handful of metrics, but no functional change. Signed-off-by: Paul Clarke <pc@us.ibm.com> Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Link: http://lore.kernel.org/lkml/20200813222155.268183-1-pc@us.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-14perf ftrace: Add change logChangbin Du
Add a change log after previous enhancements. Signed-off-by: Changbin Du <changbin.du@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: http://lore.kernel.org/lkml/20200808023141.14227-19-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-14perf: ftrace: Add set_tracing_options() to set all trace optionsChangbin Du
Now the __cmd_ftrace() becomes a bit long. This moves the trace option setting code to a separate function set_tracing_options(). Suggested-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Changbin Du <changbin.du@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: http://lore.kernel.org/lkml/20200808023141.14227-18-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-14perf ftrace: Add option --tid to filter by thread idChangbin Du
This allows us to trace single thread instead of the whole process. Signed-off-by: Changbin Du <changbin.du@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: http://lore.kernel.org/lkml/20200808023141.14227-17-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-14perf ftrace: Add option -D/--delay to delay tracingChangbin Du
This adds an option '-D/--delay' to allow us to start tracing some times later after workload is launched. Signed-off-by: Changbin Du <changbin.du@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: http://lore.kernel.org/lkml/20200808023141.14227-16-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-14perf: ftrace: Allow set graph depth by '--graph-opts'Changbin Du
This is to have a consistent view of all graph tracer options. The original option '--graph-depth' is marked as deprecated. Signed-off-by: Changbin Du <changbin.du@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: http://lore.kernel.org/lkml/20200808023141.14227-15-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-14perf ftrace: Add support for trace option tracing_threshChangbin Du
This adds an option '--graph-opts thresh' to setup trace duration threshold for funcgraph tracer. $ sudo ./perf ftrace -G '*' --graph-opts thresh=100 3) ! 184.060 us | } /* schedule */ 3) ! 185.600 us | } /* exit_to_usermode_loop */ 2) ! 225.989 us | } /* schedule_idle */ 2) # 4140.051 us | } /* do_idle */ Signed-off-by: Changbin Du <changbin.du@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: http://lore.kernel.org/lkml/20200808023141.14227-14-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-14perf ftrace: Add option 'verbose' to show more info for graph tracerChangbin Du
Sometimes we want ftrace display more and longer information about the trace. $ sudo perf ftrace -G '*' 2) 0.979 us | mutex_unlock(); 2) 1.540 us | __fsnotify_parent(); 2) 0.433 us | fsnotify(); $ sudo perf ftrace -G '*' --graph-opts verbose 14160.770883 | 0) <...>-47814 | .... | 1.289 us | mutex_unlock(); 14160.770886 | 0) <...>-47814 | .... | 1.624 us | __fsnotify_parent(); 14160.770887 | 0) <...>-47814 | .... | 0.636 us | fsnotify(); 14160.770888 | 0) <...>-47814 | .... | 0.328 us | __sb_end_write(); 14160.770888 | 0) <...>-47814 | d... | 0.430 us | fpregs_assert_state_consistent(); 14160.770889 | 0) <...>-47814 | d... | | do_syscall_64() { 14160.770889 | 0) <...>-47814 | .... | | __x64_sys_close() { Signed-off-by: Changbin Du <changbin.du@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: http://lore.kernel.org/lkml/20200808023141.14227-13-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-14perf ftrace: Add support for tracing option 'irq-info'Changbin Du
This adds support to display irq context info for function tracer. To do this, just specify a '--func-opts irq-info' option. Signed-off-by: Changbin Du <changbin.du@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: http://lore.kernel.org/lkml/20200808023141.14227-12-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-14perf ftrace: Add support for trace option funcgraph-irqsChangbin Du
This adds an option '--graph-opts noirqs' to filter out functions executed in irq context. Signed-off-by: Changbin Du <changbin.du@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: http://lore.kernel.org/lkml/20200808023141.14227-11-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-14perf ftrace: Add support for trace option sleep-timeChangbin Du
This adds an option '--graph-opts nosleep-time' which allow us only to measure on-CPU time. This option is function_graph tracer only. Signed-off-by: Changbin Du <changbin.du@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: http://lore.kernel.org/lkml/20200808023141.14227-10-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-14perf ftrace: Add support for tracing option 'func_stack_trace'Changbin Du
This adds support to display call trace for function tracer. To do this, just specify a '--func-opts call-graph' option. Example: $ sudo perf ftrace -T vfs_read --func-opts call-graph iio-sensor-prox-855 [003] 6168.369657: vfs_read <-ksys_read iio-sensor-prox-855 [003] 6168.369677: <stack trace> => vfs_read => ksys_read => __x64_sys_read => do_syscall_64 => entry_SYSCALL_64_after_hwframe ... Signed-off-by: Changbin Du <changbin.du@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: http://lore.kernel.org/lkml/20200808023141.14227-9-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-14perf tools: Add general function to parse sublevel optionsChangbin Du
This factors out a general function perf_parse_sublevel_options() to parse sublevel options. The 'sublevel' options is something like the '--debug' options which allow more sublevel options. Signed-off-by: Changbin Du <changbin.du@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: http://lore.kernel.org/lkml/20200808023141.14227-8-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-14perf ftrace: Add option '--inherit' to trace children processesChangbin Du
This adds an option '--inherit' to allow us trace children processes spawned by our target. Signed-off-by: Changbin Du <changbin.du@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: http://lore.kernel.org/lkml/20200808023141.14227-7-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-14perf ftrace: Show trace column headerChangbin Du
This makes 'perf ftrace' display column header before printing trace. $ sudo perf ftrace # tracer: function # # entries-in-buffer/entries-written: 0/0 #P:8 # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | <...>-9246 [006] 10726.262760: mutex_unlock <-rb_simple_write <...>-9246 [006] 10726.262764: __fsnotify_parent <-vfs_write <...>-9246 [006] 10726.262765: fsnotify <-vfs_write <...>-9246 [006] 10726.262766: __sb_end_write <-vfs_write <...>-9246 [006] 10726.262767: fpregs_assert_state_consistent <-do_syscall_64 Signed-off-by: Changbin Du <changbin.du@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: http://lore.kernel.org/lkml/20200808023141.14227-6-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-14perf ftrace: Add option '-m/--buffer-size' to set per-cpu buffer sizeChangbin Du
This adds an option '-m/--buffer-size' to allow us set the size of per-cpu tracing buffer. Committer testing: Before running with this option: # find /sys/kernel/tracing/ -name buffer_size_kb | xargs cat 1408 1408 1408 1408 1408 1408 1408 1408 1408 # Then, run: # perf ftrace -m 2048K | head -10 2) | mutex_unlock() { 2) ==========> | 2) | smp_irq_work_interrupt() { 2) | irq_enter() { 2) 0.121 us | rcu_irq_enter(); 2) 0.128 us | irqtime_account_irq(); 2) 0.719 us | } 2) | __wake_up() { 2) | __wake_up_common_lock() { 2) 0.105 us | _raw_spin_lock_irqsave(); # Now look at those tracefs knobs: # find /sys/kernel/tracing/ -name buffer_size_kb | xargs cat 2048 2048 2048 2048 2048 2048 2048 2048 2048 # This should be similar to the -m option in the other perf tools, such as 'perf record', 'perf trace', etc. Signed-off-by: Changbin Du <changbin.du@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: http://lore.kernel.org/lkml/20200808023141.14227-5-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-14perf ftrace: Factor out function write_tracing_file_int()Changbin Du
We will reuse this function later. Signed-off-by: Changbin Du <changbin.du@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: http://lore.kernel.org/lkml/20200808023141.14227-4-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-14perf ftrace: Add option '-F/--funcs' to list available functionsChangbin Du
This adds an option '-F/--funcs' to list all available functions to trace, which is read from tracing file 'available_filter_functions'. $ sudo ./perf ftrace -F | head trace_initcall_finish_cb initcall_blacklisted do_one_initcall do_one_initcall trace_initcall_start_cb run_init_process try_to_run_init_process match_dev_by_label match_dev_by_uuid rootfs_init_fs_context $ Committer notes: This is the same command line option and for the same purpose as in 'perf probe'. Signed-off-by: Changbin Du <changbin.du@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: http://lore.kernel.org/lkml/20200808023141.14227-3-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-14perf ftrace: Select function/function_graph tracer automaticallyChangbin Du
The '-g/-G' options have already implied function_graph tracer should be used instead of function tracer. So we don't need extra option '--tracer' in this case. This patch changes the behavior as below: - If '-g' or '-G' option is on, then function_graph tracer is used. - If '-T' or '-N' option is on, then function tracer is used. - The function_graph has priority over function tracer. - The option '--tracer' only take effect if neither -g/-G nor -T/-N is specified. Here are some examples. This will start tracing all functions using default tracer: $ sudo perf ftrace This will trace all functions using function graph tracer: $ sudo perf ftrace -G '*' This will trace function vfs_read using function graph tracer: $ sudo perf ftrace -G vfs_read This will trace function vfs_read using function tracer: $ sudo perf ftrace -T vfs_read Committer notes: Using '-h -G' will tell what that option is about, so to further clarify the above examples: # perf ftrace -h -G -G, --graph-funcs <func> Set graph filter on given functions # perf ftrace -h -g -g, --nograph-funcs <func> Set nograph filter on given functions # perf ftrace -h -T -T, --trace-funcs <func> trace given functions only # perf ftrace -h -N -N, --notrace-funcs <func> do not trace given functions # Signed-off-by: Changbin Du <changbin.du@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: http://lore.kernel.org/lkml/20200808023141.14227-2-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: "Some merge window fallout, some longer term fixes: 1) Handle headroom properly in lapbether and x25_asy drivers, from Xie He. 2) Fetch MAC address from correct r8152 device node, from Thierry Reding. 3) In the sw kTLS path we should allow MSG_CMSG_COMPAT in sendmsg, from Rouven Czerwinski. 4) Correct fdputs in socket layer, from Miaohe Lin. 5) Revert troublesome sockptr_t optimization, from Christoph Hellwig. 6) Fix TCP TFO key reading on big endian, from Jason Baron. 7) Missing CAP_NET_RAW check in nfc, from Qingyu Li. 8) Fix inet fastreuse optimization with tproxy sockets, from Tim Froidcoeur. 9) Fix 64-bit divide in new SFC driver, from Edward Cree. 10) Add a tracepoint for prandom_u32 so that we can more easily perform usage analysis. From Eric Dumazet. 11) Fix rwlock imbalance in AF_PACKET, from John Ogness" * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (49 commits) net: openvswitch: introduce common code for flushing flows af_packet: TPACKET_V3: fix fill status rwlock imbalance random32: add a tracepoint for prandom_u32() Revert "ipv4: tunnel: fix compilation on ARCH=um" net: accept an empty mask in /sys/class/net/*/queues/rx-*/rps_cpus net: ethernet: stmmac: Disable hardware multicast filter net: stmmac: dwmac1000: provide multicast filter fallback ipv4: tunnel: fix compilation on ARCH=um vsock: fix potential null pointer dereference in vsock_poll() sfc: fix ef100 design-param checking net: initialize fastreuse on inet_inherit_port net: refactor bind_bucket fastreuse into helper net: phy: marvell10g: fix null pointer dereference net: Fix potential memory leak in proto_register() net: qcom/emac: add missed clk_disable_unprepare in error path of emac_clks_phase1_init ionic_lif: Use devm_kcalloc() in ionic_qcq_alloc() net/nfc/rawsock.c: add CAP_NET_RAW check. hinic: fix strncpy output truncated compile warnings drivers/net/wan/x25_asy: Added needed_headroom and a skb->len check net/tls: Fix kmap usage ...
2020-08-13selftests/bpf: Make test_varlen work with 32-bit user-space archAndrii Nakryiko
Despite bpftool generating data section memory layout that will work for 32-bit architectures on user-space side, BPF programs should be careful to not use ambiguous types like `long`, which have different size in 32-bit and 64-bit environments. Fix that in test by using __u64 explicitly, which is a recommended approach anyway. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200813204945.1020225-10-andriin@fb.com
2020-08-13tools/bpftool: Generate data section struct with conservative alignmentAndrii Nakryiko
The comment in the code describes this in good details. Generate such a memory layout that would work both on 32-bit and 64-bit architectures for user-space. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200813204945.1020225-9-andriin@fb.com
2020-08-13selftests/bpf: Correct various core_reloc 64-bit assumptionsAndrii Nakryiko
Ensure that types are memory layout- and field alignment-compatible regardless of 32/64-bitness mix of libbpf and BPF architecture. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200813204945.1020225-8-andriin@fb.com
2020-08-13libbpf: Enforce 64-bitness of BTF for BPF object filesAndrii Nakryiko
BPF object files are always targeting 64-bit BPF target architecture, so enforce that at BTF level as well. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200813204945.1020225-7-andriin@fb.com
2020-08-13selftests/bpf: Fix btf_dump test cases on 32-bit archesAndrii Nakryiko
Fix btf_dump test cases by hard-coding BPF's pointer size of 8 bytes for cases where it's impossible to deterimne the pointer size (no long type in BTF). In cases where it's known, validate libbpf correctly determines it as 8. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200813204945.1020225-6-andriin@fb.com
2020-08-13libbpf: Handle BTF pointer sizes more carefullyAndrii Nakryiko
With libbpf and BTF it is pretty common to have libbpf built for one architecture, while BTF information was generated for a different architecture (typically, but not always, BPF). In such case, the size of a pointer might differ betweem architectures. libbpf previously was always making an assumption that pointer size for BTF is the same as native architecture pointer size, but that breaks for cases where libbpf is built as 32-bit library, while BTF is for 64-bit architecture. To solve this, add heuristic to determine pointer size by searching for `long` or `unsigned long` integer type and using its size as a pointer size. Also, allow to override the pointer size with a new API btf__set_pointer_size(), for cases where application knows which pointer size should be used. User application can check what libbpf "guessed" by looking at the result of btf__pointer_size(). If it's not 0, then libbpf successfully determined a pointer size, otherwise native arch pointer size will be used. For cases where BTF is parsed from ELF file, use ELF's class (32-bit or 64-bit) to determine pointer size. Fixes: 8a138aed4a80 ("bpf: btf: Add BTF support to libbpf") Fixes: 351131b51c7a ("libbpf: add btf_dump API for BTF-to-C conversion") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200813204945.1020225-5-andriin@fb.com
2020-08-13libbpf: Fix BTF-defined map-in-map initialization on 32-bit host archesAndrii Nakryiko
Libbpf built in 32-bit mode should be careful about not conflating 64-bit BPF pointers in BPF ELF file and host architecture pointers. This patch fixes issue of incorrect initializating of map-in-map inner map slots due to such difference. Fixes: 646f02ffdd49 ("libbpf: Add BTF-defined map-in-map support") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200813204945.1020225-4-andriin@fb.com
2020-08-13selftest/bpf: Fix compilation warnings in 32-bit modeAndrii Nakryiko
Fix compilation warnings emitted when compiling selftests for 32-bit platform (x86 in my case). Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200813204945.1020225-3-andriin@fb.com
2020-08-13tools/bpftool: Fix compilation warnings in 32-bit modeAndrii Nakryiko
Fix few compilation warnings in bpftool when compiling in 32-bit mode. Abstract away u64 to pointer conversion into a helper function. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200813204945.1020225-2-andriin@fb.com
2020-08-13bpf, selftests: Add tests to sock_ops for loading skJohn Fastabend
Add tests to directly accesse sock_ops sk field. Then use it to ensure a bad pointer access will fault if something goes wrong. We do three tests: The first test ensures when we read sock_ops sk pointer into the same register that we don't fault as described earlier. Here r9 is chosen as the temp register. The xlated code is, 36: (7b) *(u64 *)(r1 +32) = r9 37: (61) r9 = *(u32 *)(r1 +28) 38: (15) if r9 == 0x0 goto pc+3 39: (79) r9 = *(u64 *)(r1 +32) 40: (79) r1 = *(u64 *)(r1 +0) 41: (05) goto pc+1 42: (79) r9 = *(u64 *)(r1 +32) The second test ensures the temp register selection does not collide with in-use register r9. Shown here r8 is chosen because r9 is the sock_ops pointer. The xlated code is as follows, 46: (7b) *(u64 *)(r9 +32) = r8 47: (61) r8 = *(u32 *)(r9 +28) 48: (15) if r8 == 0x0 goto pc+3 49: (79) r8 = *(u64 *)(r9 +32) 50: (79) r9 = *(u64 *)(r9 +0) 51: (05) goto pc+1 52: (79) r8 = *(u64 *)(r9 +32) And finally, ensure we didn't break the base case where dst_reg does not equal the source register, 56: (61) r2 = *(u32 *)(r1 +28) 57: (15) if r2 == 0x0 goto pc+1 58: (79) r2 = *(u64 *)(r1 +0) Notice it takes us an extra four instructions when src reg is the same as dst reg. One to save the reg, two to restore depending on the branch taken and a goto to jump over the second restore. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/159718355325.4728.4163036953345999636.stgit@john-Precision-5820-Tower
2020-08-13bpf, selftests: Add tests for sock_ops load with r9, r8.r7 registersJohn Fastabend
Loads in sock_ops case when using high registers requires extra logic to ensure the correct temporary value is used. We need to ensure the temp register does not use either the src_reg or dst_reg. Lets add an asm test to force the logic is triggered. The xlated code is here, 30: (7b) *(u64 *)(r9 +32) = r7 31: (61) r7 = *(u32 *)(r9 +28) 32: (15) if r7 == 0x0 goto pc+2 33: (79) r7 = *(u64 *)(r9 +0) 34: (63) *(u32 *)(r7 +916) = r8 35: (79) r7 = *(u64 *)(r9 +32) Notice r9 and r8 are not used for temp registers and r7 is chosen. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/159718353345.4728.8805043614257933227.stgit@john-Precision-5820-Tower
2020-08-13bpf, selftests: Add tests for ctx access in sock_ops with single registerJohn Fastabend
To verify fix ("bpf: sock_ops ctx access may stomp registers in corner case") we want to force compiler to generate the following code when accessing a field with BPF_TCP_SOCK_GET_COMMON, r1 = *(u32 *)(r1 + 96) // r1 is skops ptr Rather than depend on clang to do this we add the test with inline asm to the tcpbpf test. This saves us from having to create another runner and ensures that if we break this again test_tcpbpf will crash. With above code we get the xlated code, 11: (7b) *(u64 *)(r1 +32) = r9 12: (61) r9 = *(u32 *)(r1 +28) 13: (15) if r9 == 0x0 goto pc+4 14: (79) r9 = *(u64 *)(r1 +32) 15: (79) r1 = *(u64 *)(r1 +0) 16: (61) r1 = *(u32 *)(r1 +2348) 17: (05) goto pc+1 18: (79) r9 = *(u64 *)(r1 +32) We also add the normal case where src_reg != dst_reg so we can compare code generation easily from llvm-objdump and ensure that case continues to work correctly. The normal code is xlated to, 20: (b7) r1 = 0 21: (61) r1 = *(u32 *)(r3 +28) 22: (15) if r1 == 0x0 goto pc+2 23: (79) r1 = *(u64 *)(r3 +0) 24: (61) r1 = *(u32 *)(r1 +2348) Where the temp variable is not used. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/159718351457.4728.3295119261717842496.stgit@john-Precision-5820-Tower
2020-08-13libbpf: Prevent overriding errno when logging errorsToke Høiland-Jørgensen
Turns out there were a few more instances where libbpf didn't save the errno before writing an error message, causing errno to be overridden by the printf() return and the error disappearing if logging is enabled. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200813142905.160381-1-toke@redhat.com
2020-08-13perf bench numa: Use numa_node_to_cpus() to bind tasks to nodesAlexander Gordeev
It is currently assumed that each node contains at most nr_cpus/nr_nodes CPUs and nodes' CPU ranges do not overlap. That assumption is generally incorrect as there are archs where a CPU number does not depend on to its node number. This update removes the described assumption by simply calling numa_node_to_cpus() interface and using the returned mask for binding CPUs to nodes. Also, variable types and names made consistent in functions using cpumask. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Balamuruhan S <bala24@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Link: http://lore.kernel.org/lkml/20200813113247.GA2014@oc3871087118.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-13perf bench numa: Fix cpumask memory leak in node_has_cpus()Alexander Gordeev
Couple numa_allocate_cpumask() and numa_free_cpumask() functions Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Balamuruhan S <bala24@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Link: http://lore.kernel.org/lkml/20200813113041.GA1685@oc3871087118.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-13tools build feature: Quote CC and CXX for their argumentsDaniel Díaz
When using a cross-compilation environment, such as OpenEmbedded, the CC an CXX variables are set to something more than just a command: there are arguments (such as --sysroot) that need to be passed on to the compiler so that the right set of headers and libraries are used. For the particular case that our systems detected, CC is set to the following: export CC="aarch64-linaro-linux-gcc --sysroot=/oe/build/tmp/work/machine/perf/1.0-r9/recipe-sysroot" Without quotes, detection is as follows: Auto-detecting system features: ... dwarf: [ OFF ] ... dwarf_getlocations: [ OFF ] ... glibc: [ OFF ] ... gtk2: [ OFF ] ... libbfd: [ OFF ] ... libcap: [ OFF ] ... libelf: [ OFF ] ... libnuma: [ OFF ] ... numa_num_possible_cpus: [ OFF ] ... libperl: [ OFF ] ... libpython: [ OFF ] ... libcrypto: [ OFF ] ... libunwind: [ OFF ] ... libdw-dwarf-unwind: [ OFF ] ... zlib: [ OFF ] ... lzma: [ OFF ] ... get_cpuid: [ OFF ] ... bpf: [ OFF ] ... libaio: [ OFF ] ... libzstd: [ OFF ] ... disassembler-four-args: [ OFF ] Makefile.config:414: *** No gnu/libc-version.h found, please install glibc-dev[el]. Stop. Makefile.perf:230: recipe for target 'sub-make' failed make[1]: *** [sub-make] Error 2 Makefile:69: recipe for target 'all' failed make: *** [all] Error 2 With CC and CXX quoted, some of those features are now detected. Fixes: e3232c2f39ac ("tools build feature: Use CC and CXX from parent") Signed-off-by: Daniel Díaz <daniel.diaz@linaro.org> Reviewed-by: Thomas Hebb <tommyhebb@gmail.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andriin@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@chromium.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Link: http://lore.kernel.org/lkml/20200812221518.2869003-1-daniel.diaz@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-13perf tools: Fix module symbol processingJiri Olsa
The 'dso->kernel' condition is true also for kernel modules now, and there are several places that were omited by the initial change: - we need to identify modules separately in dso__process_kernel_symbol - we need to set 'dso->kernel' for module from buildid table - there's no need to use 'dso->kernel || kmodule' in one condition Committer testing: Before: # perf test -v object <SNIP> Objdump command is: objdump -z -d --start-address=0xffffffff813e682f --stop-address=0xffffffff813e68af /usr/lib/debug/lib/modules/5.7.14-200.fc32.x86_64/vmlinux Bytes read match those read by objdump Reading object code for memory address: 0xffffffffc02dc257 File is: /lib/modules/5.7.14-200.fc32.x86_64/kernel/arch/x86/crypto/crc32c-intel.ko.xz On file address is: 0xffffffffc02dc2e7 dso__data_read_offset failed test child finished with -1 ---- end ---- Object code reading: FAILED! # After: # perf test object 26: Object code reading : Ok # perf test object 26: Object code reading : Ok # perf test object 26: Object code reading : Ok # perf test object 26: Object code reading : Ok # perf test object 26: Object code reading : Ok # Fixes: 02213cec64bb ("perf maps: Mark module DSOs with kernel type") Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
2020-08-13perf tools: Rename 'enum dso_kernel_type' to 'enum dso_space_type'Jiri Olsa
Rename enum dso_kernel_type to enum dso_space_type, which seems like better fit. Committer notes: This is used with 'struct dso'->kernel, which once was a boolean, so DSO_SPACE__USER is zero, !zero means some sort of kernel space, be it the host kernel space or a guest kernel space. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-13libperf: Fix man page typosRob Herring
Fix various typos and inconsistent capitalization of CPU in the libperf man pages. Signed-off-by: Rob Herring <robh@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200807193241.3904545-1-robh@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-13perf test: Allow multiple probes in record+script_probe_vfs_getname.shMichael Petlan
Sometimes when adding a kprobe by perf, it results in multiple probe points, such as the following: # ./perf probe -l probe:vfs_getname (on getname_flags:73@fs/namei.c with pathname) probe:vfs_getname_1 (on getname_flags:73@fs/namei.c with pathname) probe:vfs_getname_2 (on getname_flags:73@fs/namei.c with pathname) # cat /sys/kernel/debug/tracing/kprobe_events p:probe/vfs_getname _text+5501804 pathname=+0(+0(%gpr31)):string p:probe/vfs_getname_1 _text+5505388 pathname=+0(+0(%gpr31)):string p:probe/vfs_getname_2 _text+5508396 pathname=+0(+0(%gpr31)):string In this test, we need to record all of them and expect any of them in the perf-script output, since it's not clear which one will be used for the desired syscall: # perf stat -e probe:vfs_getname\* -- touch /tmp/nic Performance counter stats for 'touch /tmp/nic': 31 probe:vfs_getname_2 0 probe:vfs_getname_1 1 probe:vfs_getname 0.001421826 seconds time elapsed 0.001506000 seconds user 0.000000000 seconds sys If the test relies only on probe:vfs_getname, it might easily miss the relevant data. Signed-off-by: Michael Petlan <mpetlan@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> LPU-Reference: 20200722135845.29958-1-mpetlan@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-13perf bench mem: Always memset source before memcpyVincent Whitchurch
For memcpy, the source pages are memset to zero only when --cycles is used. This leads to wildly different results with or without --cycles, since all sources pages are likely to be mapped to the same zero page without explicit writes. Before this fix: $ export cmd="./perf stat -e LLC-loads -- ./perf bench \ mem memcpy -s 1024MB -l 100 -f default" $ $cmd 2,935,826 LLC-loads 3.821677452 seconds time elapsed $ $cmd --cycles 217,533,436 LLC-loads 8.616725985 seconds time elapsed After this fix: $ $cmd 214,459,686 LLC-loads 8.674301124 seconds time elapsed $ $cmd --cycles 214,758,651 LLC-loads 8.644480006 seconds time elapsed Fixes: 47b5757bac03c338 ("perf bench mem: Move boilerplate memory allocation to the infrastructure") Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kernel@axis.com Link: http://lore.kernel.org/lkml/20200810133404.30829-1-vincent.whitchurch@axis.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-13perf sched: Prefer sched_waking event when it existsDavid Ahern
Commit fbd705a0c618 ("sched: Introduce the 'trace_sched_waking' tracepoint") added sched_waking tracepoint which should be preferred over sched_wakeup when analyzing scheduling delays. Update 'perf sched record' to collect sched_waking events if it exists and fallback to sched_wakeup if it does not. Similarly, update timehist command to skip sched_wakeup events if the session includes sched_waking (ie., sched_waking is preferred over sched_wakeup). Signed-off-by: David Ahern <dsahern@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20200807164844.44870-1-dsahern@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-13selftests: netfilter: kill running process onlyFabian Frederick
Avoid noise like the following: nft_flowtable.sh: line 250: kill: (4691) - No such process Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-13selftests: netfilter: add MTU arguments to flowtablesFabian Frederick
Add some documentation, default values defined in original script and Originator/Link/Responder arguments using getopts like in tools/power/cpupower/bench/cpufreq-bench_plot.sh Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-13selftests: netfilter: add checktool functionFabian Frederick
avoid repeating the same test for different toolcheck Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-12libbpf: Handle GCC built-in types for Arm NEONJean-Philippe Brucker
When building Arm NEON (SIMD) code from lib/raid6/neon.uc, GCC emits DWARF information using a base type "__Poly8_t", which is internal to GCC and not recognized by Clang. This causes build failures when building with Clang a vmlinux.h generated from an arm64 kernel that was built with GCC. vmlinux.h:47284:9: error: unknown type name '__Poly8_t' typedef __Poly8_t poly8x16_t[16]; ^~~~~~~~~ The polyX_t types are defined as unsigned integers in the "Arm C Language Extension" document (101028_Q220_00_en). Emit typedefs based on standard integer types for the GCC internal types, similar to those emitted by Clang. Including linux/kernel.h to use ARRAY_SIZE() incidentally redefined max(), causing a build bug due to different types, hence the seemingly unrelated change. Reported-by: Jakov Petrina <jakov.petrina@sartura.hr> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200812143909.3293280-1-jean-philippe@linaro.org
2020-08-12tools/bpftool: Make skeleton code C++17-friendly by dropping typeof()Andrii Nakryiko
Seems like C++17 standard mode doesn't recognize typeof() anymore. This can be tested by compiling test_cpp test with -std=c++17 or -std=c++1z options. The use of typeof in skeleton generated code is unnecessary, all types are well-known at the time of code generation, so remove all typeof()'s to make skeleton code more future-proof when interacting with C++ compilers. Fixes: 985ead416df3 ("bpftool: Add skeleton codegen command") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200812025907.1371956-1-andriin@fb.com