diff options
| author | Daniel Borkmann <daniel@iogearbox.net> | 2019-10-17 16:44:36 +0200 |
|---|---|---|
| committer | Daniel Borkmann <daniel@iogearbox.net> | 2019-10-17 16:44:59 +0200 |
| commit | 0142fdc8186ea15e0bff03422ddc7d23b2e71dfc (patch) | |
| tree | 7b4106d91e7dcc48dbd1bc70b47ef67ff6cfd81a /include/linux/bpf.h | |
| parent | eac9153f2b584c702cea02c1f1a57d85aa9aea42 (diff) | |
| parent | 580d656d80cfe616432fddd1ec35297a1454e05a (diff) | |
Merge branch 'bpf-btf-trace'
Alexei Starovoitov says:
====================
v2->v3:
- while trying to adopt btf-based tracing in production service realized
that disabling bpf_probe_read() was premature. The real tracing program
needs to see much more than this type safe tracking can provide.
With these patches the verifier will be able to see that skb->data
is a pointer to 'u8 *', but it cannot possibly know how many bytes
of it is readable. Hence bpf_probe_read() is necessary to do basic
packet reading from tracing program. Some helper can be introduced to
solve this particular problem, but there are other similar structures.
Another issue is bitfield reading. The support for bitfields
is coming to llvm. libbpf will be supporting it eventually as well,
but there will be corner cases where bpf_probe_read() is necessary.
The long term goal is still the same: get rid of probe_read eventually.
- fixed build issue with clang reported by Nathan Chancellor.
- addressed a ton of comments from Andrii.
bitfields and arrays are explicitly unsupported in btf-based tracking.
This will be improved in the future.
Right now the verifier is more strict than necessary.
In some cases it can fall back to 'scalar' instead of rejecting
the program, but rejection today allows to make better decisions
in the future.
- adjusted testcase to demo bitfield and skb->data reading.
v1->v2:
- addressed feedback from Andrii and Eric. Thanks a lot for review!
- added missing check at raw_tp attach time.
- Andrii noticed that expected_attach_type cannot be reused.
Had to introduce new field to bpf_attr.
- cleaned up logging nicely by introducing bpf_log() helper.
- rebased.
Revolutionize bpf tracing and bpf C programming.
C language allows any pointer to be typecasted to any other pointer
or convert integer to a pointer.
Though bpf verifier is operating at assembly level it has strict type
checking for fixed number of types.
Known types are defined in 'enum bpf_reg_type'.
For example:
PTR_TO_FLOW_KEYS is a pointer to 'struct bpf_flow_keys'
PTR_TO_SOCKET is a pointer to 'struct bpf_sock',
and so on.
When it comes to bpf tracing there are no types to track.
bpf+kprobe receives 'struct pt_regs' as input.
bpf+raw_tracepoint receives raw kernel arguments as an array of u64 values.
It was up to bpf program to interpret these integers.
Typical tracing program looks like:
int bpf_prog(struct pt_regs *ctx)
{
struct net_device *dev;
struct sk_buff *skb;
int ifindex;
skb = (struct sk_buff *) ctx->di;
bpf_probe_read(&dev, sizeof(dev), &skb->dev);
bpf_probe_read(&ifindex, sizeof(ifindex), &dev->ifindex);
}
Addressing mistakes will not be caught by C compiler or by the verifier.
The program above could have typecasted ctx->si to skb and page faulted
on every bpf_probe_read().
bpf_probe_read() allows reading any address and suppresses page faults.
Typical program has hundreds of bpf_probe_read() calls to walk
kernel data structures.
Not only tracing program would be slow, but there was always a risk
that bpf_probe_read() would read mmio region of memory and cause
unpredictable hw behavior.
With introduction of Compile Once Run Everywhere technology in libbpf
and in LLVM and BPF Type Format (BTF) the verifier is finally ready
for the next step in program verification.
Now it can use in-kernel BTF to type check bpf assembly code.
Equivalent program will look like:
struct trace_kfree_skb {
struct sk_buff *skb;
void *location;
};
SEC("raw_tracepoint/kfree_skb")
int trace_kfree_skb(struct trace_kfree_skb* ctx)
{
struct sk_buff *skb = ctx->skb;
struct net_device *dev;
int ifindex;
__builtin_preserve_access_index(({
dev = skb->dev;
ifindex = dev->ifindex;
}));
}
These patches teach bpf verifier to recognize kfree_skb's first argument
as 'struct sk_buff *' because this is what kernel C code is doing.
The bpf program cannot 'cheat' and say that the first argument
to kfree_skb raw_tracepoint is some other type.
The verifier will catch such type mismatch between bpf program
assumption of kernel code and the actual type in the kernel.
Furthermore skb->dev access is type tracked as well.
The verifier can see which field of skb is being read
in bpf assembly. It will match offset to type.
If bpf program has code:
struct net_device *dev = (void *)skb->len;
C compiler will not complain and generate bpf assembly code,
but the verifier will recognize that integer 'len' field
is being accessed at offsetof(struct sk_buff, len) and will reject
further dereference of 'dev' variable because it contains
integer value instead of a pointer.
Such sophisticated type tracking allows calling networking
bpf helpers from tracing programs.
This patchset allows calling bpf_skb_event_output() that dumps
skb data into perf ring buffer.
It greatly improves observability.
Now users can not only see packet lenth of the skb
about to be freed in kfree_skb() kernel function, but can
dump it to user space via perf ring buffer using bpf helper
that was previously available only to TC and socket filters.
See patch 10 for full example.
The end result is safer and faster bpf tracing.
Safer - because type safe direct load can be used most of the time
instead of bpf_probe_read().
Faster - because direct loads are used to walk kernel data structures
instead of bpf_probe_read() calls.
Note that such loads can page fault and are supported by
hidden bpf_probe_read() in interpreter and via exception table
if program is JITed.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Diffstat (limited to 'include/linux/bpf.h')
| -rw-r--r-- | include/linux/bpf.h | 39 |
1 files changed, 33 insertions, 6 deletions
diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 282e28bf41ec..2c2c29b49845 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -16,6 +16,7 @@ #include <linux/u64_stats_sync.h> struct bpf_verifier_env; +struct bpf_verifier_log; struct perf_event; struct bpf_prog; struct bpf_map; @@ -23,6 +24,7 @@ struct sock; struct seq_file; struct btf; struct btf_type; +struct exception_table_entry; extern struct idr btf_idr; extern spinlock_t btf_idr_lock; @@ -211,6 +213,7 @@ enum bpf_arg_type { ARG_PTR_TO_INT, /* pointer to int */ ARG_PTR_TO_LONG, /* pointer to long */ ARG_PTR_TO_SOCKET, /* pointer to bpf_sock (fullsock) */ + ARG_PTR_TO_BTF_ID, /* pointer to in-kernel struct */ }; /* type of values returned from helper functions */ @@ -233,11 +236,17 @@ struct bpf_func_proto { bool gpl_only; bool pkt_access; enum bpf_return_type ret_type; - enum bpf_arg_type arg1_type; - enum bpf_arg_type arg2_type; - enum bpf_arg_type arg3_type; - enum bpf_arg_type arg4_type; - enum bpf_arg_type arg5_type; + union { + struct { + enum bpf_arg_type arg1_type; + enum bpf_arg_type arg2_type; + enum bpf_arg_type arg3_type; + enum bpf_arg_type arg4_type; + enum bpf_arg_type arg5_type; + }; + enum bpf_arg_type arg_type[5]; + }; + u32 *btf_id; /* BTF ids of arguments */ }; /* bpf_context is intentionally undefined structure. Pointer to bpf_context is @@ -281,6 +290,7 @@ enum bpf_reg_type { PTR_TO_TCP_SOCK_OR_NULL, /* reg points to struct tcp_sock or NULL */ PTR_TO_TP_BUFFER, /* reg points to a writable raw tp's buffer */ PTR_TO_XDP_SOCK, /* reg points to struct xdp_sock */ + PTR_TO_BTF_ID, /* reg points to kernel struct */ }; /* The information passed from prog-specific *_is_valid_access @@ -288,7 +298,11 @@ enum bpf_reg_type { */ struct bpf_insn_access_aux { enum bpf_reg_type reg_type; - int ctx_field_size; + union { + int ctx_field_size; + u32 btf_id; + }; + struct bpf_verifier_log *log; /* for verbose logs */ }; static inline void @@ -375,6 +389,7 @@ struct bpf_prog_aux { u32 id; u32 func_cnt; /* used by non-func prog as the number of func progs */ u32 func_idx; /* 0 for non-func prog, the index in func array for func prog */ + u32 attach_btf_id; /* in-kernel BTF type id to attach to */ bool verifier_zext; /* Zero extensions has been inserted by verifier. */ bool offload_requested; struct bpf_prog **func; @@ -416,6 +431,8 @@ struct bpf_prog_aux { * main prog always has linfo_idx == 0 */ u32 linfo_idx; + u32 num_exentries; + struct exception_table_entry *extable; struct bpf_prog_stats __percpu *stats; union { struct work_struct work; @@ -482,6 +499,7 @@ struct bpf_event_entry { bool bpf_prog_array_compatible(struct bpf_array *array, const struct bpf_prog *fp); int bpf_prog_calc_tag(struct bpf_prog *fp); +const char *kernel_type_name(u32 btf_type_id); const struct bpf_func_proto *bpf_get_trace_printk_proto(void); @@ -747,6 +765,15 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr, int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog, const union bpf_attr *kattr, union bpf_attr __user *uattr); +bool btf_ctx_access(int off, int size, enum bpf_access_type type, + const struct bpf_prog *prog, + struct bpf_insn_access_aux *info); +int btf_struct_access(struct bpf_verifier_log *log, + const struct btf_type *t, int off, int size, + enum bpf_access_type atype, + u32 *next_btf_id); +u32 btf_resolve_helper_id(struct bpf_verifier_log *log, void *, int); + #else /* !CONFIG_BPF_SYSCALL */ static inline struct bpf_prog *bpf_prog_get(u32 ufd) { |
