summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-01-07ipv6: Ignore dead routes during lookupIdo Schimmel
Currently, dead routes are only present in the routing tables in case the 'ignore_routes_with_linkdown' sysctl is set. Otherwise, they are flushed. Subsequent patches are going to remove the reliance on this sysctl and make IPv6 more consistent with IPv4. Before this is done, we need to make sure dead routes are skipped during route lookup, so as to not cause packet loss. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-07ipv6: Check nexthop flags in route dump instead of carrierIdo Schimmel
Similar to previous patch, there is no need to check for the carrier of the nexthop device when dumping the route and we can instead check for the presence of the RTNH_F_LINKDOWN flag. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-07ipv6: Check nexthop flags during route lookup instead of carrierIdo Schimmel
Now that the RTNH_F_LINKDOWN flag is set in nexthops, we can avoid the need to dereference the nexthop device and check its carrier and instead check for the presence of the flag. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-07ipv6: Set nexthop flags during route creationIdo Schimmel
It is valid to install routes with a nexthop device that does not have a carrier, so we need to make sure they're marked accordingly. As explained in the previous patch, host and anycast routes are never marked with the 'linkdown' flag. Note that reject routes are unaffected, as these use the loopback device which always has a carrier. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-07ipv6: Set nexthop flags upon carrier changeIdo Schimmel
Similar to IPv4, when the carrier of a netdev changes we should toggle the 'linkdown' flag on all the nexthops using it as their nexthop device. This will later allow us to test for the presence of this flag during route lookup and dump. Up until commit 4832c30d5458 ("net: ipv6: put host and anycast routes on device with address") host and anycast routes used the loopback netdev as their nexthop device and thus were not marked with the 'linkdown' flag. The patch preserves this behavior and allows one to ping the local address even when the nexthop device does not have a carrier and the 'ignore_routes_with_linkdown' sysctl is set. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-07ipv6: Prepare to handle multiple netdev eventsIdo Schimmel
To make IPv6 more in line with IPv4 we need to be able to respond differently to different netdev events. For example, when a netdev is unregistered all the routes using it as their nexthop device should be flushed, whereas when the netdev's carrier changes only the 'linkdown' flag should be toggled. Currently, this is not possible, as the function that traverses the routing tables is not aware of the triggering event. Propagate the triggering event down, so that it could be used in later patches. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-07ipv6: Clear nexthop flags upon netdev upIdo Schimmel
Previous patch marked nexthops with the 'dead' and 'linkdown' flags. Clear these flags when the netdev comes back up. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-07ipv6: Mark dead nexthops with appropriate flagsIdo Schimmel
When a netdev is put administratively down or unregistered all the nexthops using it as their nexthop device should be marked with the 'dead' and 'linkdown' flags. Currently, when a route is dumped its nexthop device is tested and the flags are set accordingly. A similar check is performed during route lookup. Instead, we can simply mark the nexthops based on netdev events and avoid checking the netdev's state during route dump and lookup. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-07ipv6: Remove redundant route flushing during namespace dismantleIdo Schimmel
By the time fib6_net_exit() is executed all the netdevs in the namespace have been either unregistered or pushed back to the default namespace. That is because pernet subsys operations are always ordered before pernet device operations and therefore invoked after them during namespace dismantle. Thus, all the routing tables in the namespace are empty by the time fib6_net_exit() is invoked and the call to rt6_ifdown() can be removed. This allows us to simplify the condition in fib6_ifdown() as it's only ever called with an actual netdev. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-01-07 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Add a start of a framework for extending struct xdp_buff without having the overhead of populating every data at runtime. Idea is to have a new per-queue struct xdp_rxq_info that holds read mostly data (currently that is, queue number and a pointer to the corresponding netdev) which is set up during rxqueue config time. When a XDP program is invoked, struct xdp_buff holds a pointer to struct xdp_rxq_info that the BPF program can then walk. The user facing BPF program that uses struct xdp_md for context can use these members directly, and the verifier rewrites context access transparently by walking the xdp_rxq_info and net_device pointers to load the data, from Jesper. 2) Redo the reporting of offload device information to user space such that it works in combination with network namespaces. The latter is reported through a device/inode tuple as similarly done in other subsystems as well (e.g. perf) in order to identify the namespace. For this to work, ns_get_path() has been generalized such that the namespace can be retrieved not only from a specific task (perf case), but also from a callback where we deduce the netns (ns_common) from a netdevice. bpftool support using the new uapi info and extensive test cases for test_offload.py in BPF selftests have been added as well, from Jakub. 3) Add two bpftool improvements: i) properly report the bpftool version such that it corresponds to the version from the kernel source tree. So pick the right linux/version.h from the source tree instead of the installed one. ii) fix bpftool and also bpf_jit_disasm build with bintutils >= 2.9. The reason for the build breakage is that binutils library changed the function signature to select the disassembler. Given this is needed in multiple tools, add a proper feature detection to the tools/build/features infrastructure, from Roman. 4) Implement the BPF syscall command BPF_MAP_GET_NEXT_KEY for the stacktrace map. It is currently unimplemented, but there are use cases where user space needs to walk all stacktrace map entries e.g. for dumping or deleting map entries w/o having to close and recreate the map. Add BPF selftests along with it, from Yonghong. 5) Few follow-up cleanups for the bpftool cgroup code: i) rename the cgroup 'list' command into 'show' as we have it for other subcommands as well, ii) then alias the 'show' command such that 'list' is accepted which is also common practice in iproute2, and iii) remove couple of newlines from error messages using p_err(), from Jakub. 6) Two follow-up cleanups to sockmap code: i) remove the unused bpf_compute_data_end_sk_skb() function and ii) only build the sockmap infrastructure when CONFIG_INET is enabled since it's only aware of TCP sockets at this time, from John. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-07Linux 4.15-rc7Linus Torvalds
2018-01-07Merge branch 'parisc-4.15-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fixes from Helge Deller: - Many small fixes to show the real physical addresses of devices instead of hashed addresses. - One important fix to unbreak 32-bit SMP support: We forgot to 16-byte align the spinlocks in the assembler code. - Qemu support: The host will get a chance to sleep when the parisc guest is idle. We use the same mechanism as the power architecture by overlaying the "or %r10,%r10,%r10" instruction which is simply a nop on real hardware. * 'parisc-4.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: qemu idle sleep support parisc: Fix alignment of pa_tlb_lock in assembly on 32-bit SMP kernel parisc: Show unhashed EISA EEPROM address parisc: Show unhashed HPA of Dino chip parisc: Show initial kernel memory layout unhashed parisc: Show unhashed hardware inventory
2018-01-07Merge tag 'apparmor-pr-2018-01-07' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor Pull apparmor fix from John Johansen: "This fixes a regression when the kernel feature set is reported as supporting mount and policy is pinned to a feature set that does not support mount mediation" * tag 'apparmor-pr-2018-01-07' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor: apparmor: fix regression in mount mediation when feature set is pinned
2018-01-07Merge tag 'led_fixes_for_4.15-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds Pull LED fix from Jacek Anaszewski: "The commit 2b83ff96f51d for 4.15-rc6, which was fixing LED brightness setting after clearing delay_off broke the behavior on any alteration of delay_on{off} properties, due to use of a LED core helper that does too much for this particular case" * tag 'led_fixes_for_4.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds: leds: core: Fix regression caused by commit 2b83ff96f51d
2018-01-07Merge tag 'for-linus-20180107' of git://git.infradead.org/linux-mtdLinus Torvalds
Pull MTD bugfix from Richard Weinberger: "A single fix for the pxa3xx NAND driver" * tag 'for-linus-20180107' of git://git.infradead.org/linux-mtd: mtd: nand: pxa3xx: Fix READOOB implementation
2018-01-07leds: core: Fix regression caused by commit 2b83ff96f51dJacek Anaszewski
Commit 2b83ff96f51d ("led: core: Fix brightness setting when setting delay_off=0") replaced del_timer_sync(&led_cdev->blink_timer) with led_stop_software_blink() in led_blink_set(), which additionally clears LED_BLINK_SW flag as well as zeroes blink_delay_on and blink_delay_off properties of the struct led_classdev. Cleansing of the latter ones wasn't required to fix the original issue but wasn't considered harmful. It nonetheless turned out to be so in case when pointer to one or both props is passed to led_blink_set() like in the ledtrig-timer.c. In such cases zeroes are passed later in delay_on and/or delay_off arguments to led_blink_setup(), which results either in stopping the software blinking or setting blinking frequency always to 1Hz. Avoid using led_stop_software_blink() and add a single call required to clear LED_BLINK_SW flag, which was the only needed modification to fix the original issue. Fixes 2b83ff96f51d ("led: core: Fix brightness setting when setting delay_off=0") Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
2018-01-06Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: - untangle sys_close() abuses in xt_bpf - deal with register_shrinker() failures in sget() * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix "netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'" sget(): handle failures of register_shrinker() mm,vmscan: Make unregister_shrinker() no-op if register_shrinker() failed.
2018-01-06Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Radim Krčmář: "s390: - Two fixes for potential bitmap overruns in the cmma migration code x86: - Clear guest provided GPRs to defeat the Project Zero PoC for CVE 2017-5715" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: vmx: Scrub hardware GPRs at VM-exit KVM: s390: prevent buffer overrun on memory hotplug during migration KVM: s390: fix cmma migration for multiple memory slots
2018-01-06Merge branch 'bpf-stacktrace-map-next-key-support'Daniel Borkmann
Yonghong Song says: ==================== The patch set implements bpf syscall command BPF_MAP_GET_NEXT_KEY for stacktrace map. Patch #1 is the core implementation and Patch #2 implements a bpf test at tools/testing/selftests/bpf directory. Please see individual patch comments for details. Changelog: v1 -> v2: - For invalid key (key pointer is non-NULL), sets next_key to be the first valid key. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-06tools/bpf: add a bpf selftest for stacktraceYonghong Song
Added a bpf selftest in test_progs at tools directory for stacktrace. The test will populate a hashtable map and a stacktrace map at the same time with the same key, stackid. The user space will compare both maps, using BPF_MAP_LOOKUP_ELEM command and BPF_MAP_GET_NEXT_KEY command, to ensure that both have the same set of keys. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-06bpf: implement syscall command BPF_MAP_GET_NEXT_KEY for stacktrace mapYonghong Song
Currently, bpf syscall command BPF_MAP_GET_NEXT_KEY is not supported for stacktrace map. However, there are use cases where user space wants to enumerate all stacktrace map entries where BPF_MAP_GET_NEXT_KEY command will be really helpful. In addition, if user space wants to delete all map entries in order to save memory and does not want to close the map file descriptor, BPF_MAP_GET_NEXT_KEY may help improve performance if map entries are sparsely populated. The implementation has similar behavior for BPF_MAP_GET_NEXT_KEY implementation in hashtab. If user provides a NULL key pointer or an invalid key, the first key is returned. Otherwise, the first valid key after the input parameter "key" is returned, or -ENOENT if no valid key can be found. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-06mtd: nand: pxa3xx: Fix READOOB implementationBoris Brezillon
In the current driver, OOB bytes are accessed in raw mode, and when a page access is done with NDCR_SPARE_EN set and NDCR_ECC_EN cleared, the driver must read the whole spare area (64 bytes in case of a 2k page, 16 bytes for a 512 page). The driver was only reading the free OOB bytes, which was leaving some unread data in the FIFO and was somehow leading to a timeout. We could patch the driver to read ->spare_size + ->ecc_size instead of just ->spare_size when READOOB is requested, but we'd better make in-band and OOB accesses consistent. Since the driver is always accessing in-band data in non-raw mode (with the ECC engine enabled), we should also access OOB data in this mode. That's particularly useful when using the BCH engine because in this mode the free OOB bytes are also ECC protected. Fixes: 43bcfd2bb24a ("mtd: nand: pxa3xx: Add driver-specific ECC BCH support") Cc: stable@vger.kernel.org Reported-by: Sean Nyekjær <sean.nyekjaer@prevas.dk> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Acked-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar> Tested-by: Sean Nyekjaer <sean.nyekjaer@prevas.dk> Acked-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: Richard Weinberger <richard@nod.at>
2018-01-06Merge tag 'powerpc-4.15-6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fix from Michael Ellerman: "Just one fix to correctly return SEGV_ACCERR when we take a SEGV on a mapped region. The bug was introduced in the refactoring of the page fault handler we did in the previous release. Thanks to John Sperbeck" * tag 'powerpc-4.15-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/mm: Fix SEGV on mapped region to return SEGV_ACCERR
2018-01-06Merge tag 'kvm-s390-master-4.15-2' of ↵Radim Krčmář
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux KVM: s390: fixes for cmma migration Two fixes for potential bitmap overruns in the cmma migration code.
2018-01-06parisc: qemu idle sleep supportHelge Deller
Add qemu idle sleep support when running under qemu with SeaBIOS PDC firmware. Like the power architecture we use the "or" assembler instructions, which translate to nops on real hardware, to indicate that qemu shall idle sleep. Signed-off-by: Helge Deller <deller@gmx.de> Cc: Richard Henderson <rth@twiddle.net> CC: stable@vger.kernel.org # v4.9+
2018-01-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: "Just a few driver fixups, nothing exciting" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: xen-kbdfront - do not advertise multi-touch pressure support Input: hideep - fix compile error due to missing include file Input: elants_i2c - do not clobber interrupt trigger on x86 Input: joystick/analog - riscv has get_cycles() Input: elantech - add new icbody type 15 Input: ims-pcu - fix typo in the error message
2018-01-05Merge tag 'iommu-v4.15-rc7' of git://github.com/awilliam/linux-vfioLinus Torvalds
Pull IOMMU fixes from Alex Williamson: "Fixes via Will Deacon for arm-smmu-v3. - Fix duplicate Stream ID handling in arm-smmu-v3 - Fix arm-smmu-v3 page table ops double free" * tag 'iommu-v4.15-rc7' of git://github.com/awilliam/linux-vfio: iommu/arm-smmu-v3: Cope with duplicated Stream IDs iommu/arm-smmu-v3: Don't free page table ops twice
2018-01-05Merge tag 'arc-4.15-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull ARC fixes from Vineet Gupta: - platform updates for setting up clock correctly - fixes to accomodate newer gcc (__builtin_trap, removed inline asm modifier) - other fixes * tag 'arc-4.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: ARC: handle gcc generated __builtin_trap for older compiler ARC: handle gcc generated __builtin_trap() ARC: uaccess: dont use "l" gcc inline asm constraint modifier ARC: [plat-axs103] refactor the quad core DT quirk code ARC: [plat-axs103]: Set initial core pll output frequency ARC: [plat-hsdk]: Get rid of core pll frequency set in platform code ARC: [plat-hsdk]: Set initial core pll output frequency ARC: [plat-hsdk] Switch DisplayLink driver from fbdev to DRM arc: do not use __print_symbol() ARC: Fix detection of dual-issue enabled
2018-01-05Merge branch 'xdp_rxq_info'Alexei Starovoitov
Jesper Dangaard Brouer says: ==================== V4: * Added reviewers/acks to patches * Fix patch desc in i40e that got out-of-sync with code * Add SPDX license headers for the two new files added in patch 14 V3: * Fixed bug in virtio_net driver * Removed export of xdp_rxq_info_init() V2: * Changed API exposed to drivers - Removed invocation of "init" in drivers, and only call "reg" (Suggested by Saeed) - Allow "reg" to fail and handle this in drivers (Suggested by David Ahern) * Removed the SINKQ qtype, instead allow to register as "unused" * Also fixed some drivers during testing on actual HW (noted in patches) There is a need for XDP to know more about the RX-queue a given XDP frames have arrived on. For both the XDP bpf-prog and kernel side. Instead of extending struct xdp_buff each time new info is needed, this patchset takes a different approach. Struct xdp_buff is only extended with a pointer to a struct xdp_rxq_info (allowing for easier extending this later). This xdp_rxq_info contains information related to how the driver have setup the individual RX-queue's. This is read-mostly information, and all xdp_buff frames (in drivers napi_poll) point to the same xdp_rxq_info (per RX-queue). We stress this data/cache-line is for read-mostly info. This is NOT for dynamic per packet info, use the data_meta for such use-cases. This patchset start out small, and only expose ingress_ifindex and the RX-queue index to the XDP/BPF program. Access to tangible info like the ingress ifindex and RX queue index, is fairly easy to comprehent. The other future use-cases could allow XDP frames to be recycled back to the originating device driver, by providing info on RX device and queue number. As XDP doesn't have driver feature flags, and eBPF code due to bpf-tail-calls cannot determine that XDP driver invoke it, this patchset have to update every driver that support XDP. For driver developers (review individual driver patches!): The xdp_rxq_info is tied to the drivers RX-ring(s). Whenever a RX-ring modification require (temporary) stopping RX frames, then the xdp_rxq_info should (likely) also be unregistred and re-registered, especially if reallocating the pages in the ring. Make sure ethtool set_channels does the right thing. When replacing XDP prog, if and only if RX-ring need to be changed, then also re-register the xdp_rxq_info. I'm Cc'ing the individual driver patches to the registered maintainers. Testing: I've only tested the NIC drivers I have hardware for. The general test procedure is to (DUT = Device Under Test): (1) run pktgen script pktgen_sample04_many_flows.sh (against DUT) (2) run samples/bpf program xdp_rxq_info --dev $DEV (on DUT) (3) runtime modify number of NIC queues via ethtool -L (on DUT) (4) runtime modify number of NIC ring-size via ethtool -G (on DUT) Patch based on git tree bpf-next (at commit fb982666e380c1632a): https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/ ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05samples/bpf: program demonstrating access to xdp_rxq_infoJesper Dangaard Brouer
This sample program can be used for monitoring and reporting how many packets per sec (pps) are received per NIC RX queue index and which CPU processed the packet. In itself it is a useful tool for quickly identifying RSS imbalance issues, see below. The default XDP action is XDP_PASS in-order to provide a monitor mode. For benchmarking purposes it is possible to specify other XDP actions on the cmdline --action. Output below shows an imbalance RSS case where most RXQ's deliver to CPU-0 while CPU-2 only get packets from a single RXQ. Looking at things from a CPU level the two CPUs are processing approx the same amount, BUT looking at the rx_queue_index levels it is clear that RXQ-2 receive much better service, than other RXQs which all share CPU-0. Running XDP on dev:i40e1 (ifindex:3) action:XDP_PASS XDP stats CPU pps issue-pps XDP-RX CPU 0 900,473 0 XDP-RX CPU 2 906,921 0 XDP-RX CPU total 1,807,395 RXQ stats RXQ:CPU pps issue-pps rx_queue_index 0:0 180,098 0 rx_queue_index 0:sum 180,098 rx_queue_index 1:0 180,098 0 rx_queue_index 1:sum 180,098 rx_queue_index 2:2 906,921 0 rx_queue_index 2:sum 906,921 rx_queue_index 3:0 180,098 0 rx_queue_index 3:sum 180,098 rx_queue_index 4:0 180,082 0 rx_queue_index 4:sum 180,082 rx_queue_index 5:0 180,093 0 rx_queue_index 5:sum 180,093 Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05bpf: finally expose xdp_rxq_info to XDP bpf-programsJesper Dangaard Brouer
Now all XDP driver have been updated to setup xdp_rxq_info and assign this to xdp_buff->rxq. Thus, it is now safe to enable access to some of the xdp_rxq_info struct members. This patch extend xdp_md and expose UAPI to userspace for ingress_ifindex and rx_queue_index. Access happens via bpf instruction rewrite, that load data directly from struct xdp_rxq_info. * ingress_ifindex map to xdp_rxq_info->dev->ifindex * rx_queue_index map to xdp_rxq_info->queue_index Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05xdp: generic XDP handling of xdp_rxq_infoJesper Dangaard Brouer
Hook points for xdp_rxq_info: * reg : netif_alloc_rx_queues * unreg: netif_free_rx_queues The net_device have some members (num_rx_queues + real_num_rx_queues) and data-area (dev->_rx with struct netdev_rx_queue's) that were primarily used for exporting information about RPS (CONFIG_RPS) queues to sysfs (CONFIG_SYSFS). For generic XDP extend struct netdev_rx_queue with the xdp_rxq_info, and remove some of the CONFIG_SYSFS ifdefs. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05virtio_net: setup xdp_rxq_infoJesper Dangaard Brouer
The virtio_net driver doesn't dynamically change the RX-ring queue layout and backing pages, but instead reject XDP setup if all the conditions for XDP is not meet. Thus, the xdp_rxq_info also remains fairly static. This allow us to simply add the reg/unreg to net_device open/close functions. Driver hook points for xdp_rxq_info: * reg : virtnet_open * unreg: virtnet_close V3: - bugfix, also setup xdp.rxq in receive_mergeable() - Tested bpf-sample prog inside guest on a virtio_net device Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: virtualization@lists.linux-foundation.org Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05tun: setup xdp_rxq_infoJesper Dangaard Brouer
Driver hook points for xdp_rxq_info: * reg : tun_attach * unreg: __tun_detach I've done some manual testing of this tun driver, but I would appriciate good review and someone else running their use-case tests, as I'm not 100% sure I understand the tfile->detached semantics. V2: Removed the skb_array_cleanup() call from V1 by request from Jason Wang. Cc: Jason Wang <jasowang@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05thunderx: setup xdp_rxq_infoJesper Dangaard Brouer
This driver uses a bool scheme for "enable"/"disable" when setting up different resources. Thus, the hook points for xdp_rxq_info is done in the same function call nicvf_rcv_queue_config(). This is activated through enable/disable via nicvf_config_data_transfer(), which is tied into nicvf_stop()/nicvf_open(). Extending driver packet handler call-path nicvf_rcv_pkt_handler() with a pointer to the given struct rcv_queue, in-order to access the xdp_rxq_info data area (in nicvf_xdp_rx()). V2: Driver have no proper error path for failed XDP RX-queue info reg, as nicvf_rcv_queue_config is a void function. Cc: linux-arm-kernel@lists.infradead.org Cc: Sunil Goutham <sgoutham@cavium.com> Cc: Robert Richter <rric@kernel.org> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05nfp: setup xdp_rxq_infoJesper Dangaard Brouer
Driver hook points for xdp_rxq_info: * reg : nfp_net_rx_ring_alloc * unreg: nfp_net_rx_ring_free In struct nfp_net_rx_ring moved member @size into a hole on 64-bit. Thus, the size remaines the same after adding member @xdp_rxq. Cc: oss-drivers@netronome.com Cc: Jakub Kicinski <jakub.kicinski@netronome.com> Cc: Simon Horman <simon.horman@netronome.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05bnxt_en: setup xdp_rxq_infoJesper Dangaard Brouer
Driver hook points for xdp_rxq_info: * reg : bnxt_alloc_rx_rings * unreg: bnxt_free_rx_rings This driver should be updated to re-register when changing allocation mode of RX rings. Tested on actual hardware. Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05mlx4: setup xdp_rxq_infoJesper Dangaard Brouer
Driver hook points for xdp_rxq_info: * reg : mlx4_en_create_rx_ring * unreg: mlx4_en_destroy_rx_ring Tested on actual hardware. Cc: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05xdp/qede: setup xdp_rxq_info and intro xdp_rxq_info_is_regJesper Dangaard Brouer
The driver code qede_free_fp_array() depend on kfree() can be called with a NULL pointer. This stems from the qede_alloc_fp_array() function which either (kz)alloc memory for fp->txq or fp->rxq. This also simplifies error handling code in case of memory allocation failures, but xdp_rxq_info_unreg need to know the difference. Introduce xdp_rxq_info_is_reg() to handle if a memory allocation fails and detect this is the failure path by seeing that xdp_rxq_info was not registred yet, which first happens after successful alloaction in qede_init_fp(). Driver hook points for xdp_rxq_info: * reg : qede_init_fp * unreg: qede_free_fp_array Tested on actual hardware with samples/bpf program. V2: Driver have no proper error path for failed XDP RX-queue info reg, as qede_init_fp() is a void function. Cc: everest-linux-l2@cavium.com Cc: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05ixgbe: setup xdp_rxq_infoJesper Dangaard Brouer
Driver hook points for xdp_rxq_info: * reg : ixgbe_setup_rx_resources() * unreg: ixgbe_free_rx_resources() Tested on actual hardware. V2: Fix ixgbe_set_ringparam, clear xdp_rxq_info in temp_ring Cc: intel-wired-lan@lists.osuosl.org Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05i40e: setup xdp_rxq_infoJesper Dangaard Brouer
The i40e driver has a special "FDIR" RX-ring (I40E_VSI_FDIR) which is a sideband channel for configuring/updating the flow director tables. This (i40e_vsi_)type does not invoke XDP-ebpf code. As suggested by Björn (V2): Instead of marking this I40E_VSI_FDIR RX-ring a special case, reverse the logic and only select RX-rings of type I40E_VSI_MAIN to register xdp_rxq_info's for. Driver hook points for xdp_rxq_info: * reg : i40e_setup_rx_descriptors (via i40e_vsi_setup_rx_resources) * unreg: i40e_free_rx_resources (via i40e_vsi_free_rx_resources) Tested on actual hardware with samples/bpf program. V2: Fixed bug in i40e_set_ringparam (memset zero) + match on I40E_VSI_MAIN. V4: Update patch desc that got out-of-sync with code. Cc: intel-wired-lan@lists.osuosl.org Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05xdp/mlx5: setup xdp_rxq_infoJesper Dangaard Brouer
The mlx5 driver have a special drop-RQ queue (one per interface) that simply drops all incoming traffic. It helps driver keep other HW objects (flow steering) alive upon down/up operations. It is temporarily pointed by flow steering objects during the interface setup, and when interface is down. It lacks many fields that are set in a regular RQ (for example its state is never switched to MLX5_RQC_STATE_RDY). (Thanks to Tariq Toukan for explanation). The XDP RX-queue info for this drop-RQ marked as unused, which allow us to use the same takedown/free code path as other RX-queues. Driver hook points for xdp_rxq_info: * reg : mlx5e_alloc_rq() * unused: mlx5e_alloc_drop_rq() * unreg : mlx5e_free_rq() Tested on actual hardware with samples/bpf program Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Matan Barak <matanb@mellanox.com> Cc: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05xdp: base API for new XDP rx-queue info conceptJesper Dangaard Brouer
This patch only introduce the core data structures and API functions. All XDP enabled drivers must use the API before this info can used. There is a need for XDP to know more about the RX-queue a given XDP frames have arrived on. For both the XDP bpf-prog and kernel side. Instead of extending xdp_buff each time new info is needed, the patch creates a separate read-mostly struct xdp_rxq_info, that contains this info. We stress this data/cache-line is for read-only info. This is NOT for dynamic per packet info, use the data_meta for such use-cases. The performance advantage is this info can be setup at RX-ring init time, instead of updating N-members in xdp_buff. A possible (driver level) micro optimization is that xdp_buff->rxq assignment could be done once per XDP/NAPI loop. The extra pointer deref only happens for program needing access to this info (thus, no slowdown to existing use-cases). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05apparmor: fix regression in mount mediation when feature set is pinnedJohn Johansen
When the mount code was refactored for Labels it was not correctly updated to check whether policy supported mediation of the mount class. This causes a regression when the kernel feature set is reported as supporting mount and policy is pinned to a feature set that does not support mount mediation. BugLink: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=882697#41 Fixes: 2ea3ffb7782a ("apparmor: add mount mediation") Reported-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> Cc: Stable <stable@vger.kernel.org> Signed-off-by: John Johansen <john.johansen@canonical.com>
2018-01-05Merge tag 'for-4.15-rc7-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "We have two more fixes for 4.15, both aimed for stable. The leak fix is obvious, the second patch fixes a bug revealed by the refcount API, when it behaves differently than previous atomic_t and reports refs going from 0 to 1 in one case" * tag 'for-4.15-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix refcount_t usage when deleting btrfs_delayed_nodes btrfs: Fix flush bio leak
2018-01-05Merge tag 'xfs-4.15-fixes-10' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull XFS fixes from Darrick Wong: "I have just a few fixes for bugs and resource cleanup problems this week: - Fix resource cleanup of failed quota initialization - Fix integer overflow problems wrt s_maxbytes" * tag 'xfs-4.15-fixes-10' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix s_maxbytes overflow problems xfs: quota: check result of register_shrinker() xfs: quota: fix missed destroy of qi_tree_lock
2018-01-05Merge tag 'mfd-fixes-4.15-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd Pull MFD fix from Lee Jones: "Late bugfix to plug a leak in rtsx_pcr" * tag 'mfd-fixes-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: mfd: rtsx: Release IRQ during shutdown
2018-01-05Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more x86 pti fixes from Thomas Gleixner: "Another small stash of fixes for fallout from the PTI work: - Fix the modules vs. KASAN breakage which was caused by making MODULES_END depend of the fixmap size. That was done when the cpu entry area moved into the fixmap, but now that we have a separate map space for that this is causing more issues than it solves. - Use the proper cache flush methods for the debugstore buffers as they are mapped/unmapped during runtime and not statically mapped at boot time like the rest of the cpu entry area. - Make the map layout of the cpu_entry_area consistent for 4 and 5 level paging and fix the KASLR vaddr_end wreckage. - Use PER_CPU_EXPORT for per cpu variable and while at it unbreak nvidia gfx drivers by dropping the GPL export. The subject line of the commit tells it the other way around, but I noticed that too late. - Fix the ASM alternative macros so they can be used in the middle of an inline asm block. - Rename the BUG_CPU_INSECURE flag to BUG_CPU_MELTDOWN so the attack vector is properly identified. The Spectre mitigations will come with their own bug bits later" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/pti: Rename BUG_CPU_INSECURE to BUG_CPU_MELTDOWN x86/alternatives: Add missing '\n' at end of ALTERNATIVE inline asm x86/tlb: Drop the _GPL from the cpu_tlbstate export x86/events/intel/ds: Use the proper cache flush method for mapping ds buffers x86/kaslr: Fix the vaddr_end mess x86/mm: Map cpu_entry_area at the same place on 4/5 level x86/mm: Set MODULES_END to 0xffffffffff000000
2018-01-05Merge branch 'efi-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI updates from Thomas Gleixner: - A fix for a add_efi_memmap parameter regression which ensures that the parameter is parsed before it is used. - Reinstate the virtual capsule mapping as the cached copy turned out to break Quark and other things - Remove Matt Fleming as EFI co-maintainer. He stepped back a few days ago. Thanks Matt for all your great work! * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: MAINTAINERS: Remove Matt Fleming as EFI co-maintainer efi/capsule-loader: Reinstate virtual capsule mapping x86/efi: Fix kernel param add_efi_memmap regression
2018-01-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Martin Schwidefsky: "Four bug fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/dasd: fix wrongly assigned configuration data s390: fix preemption race in disable_sacf_uaccess s390/sclp: disable FORTIFY_SOURCE for early sclp code s390/pci: handle insufficient resources during dma tlb flush