| Age | Commit message (Collapse) | Author |
|
Add support to count upall packets, when kmod of openvswitch
upcall to count the number of packets for upcall succeed and
failed, which is a better way to see how many packets upcalled
on every interfaces.
Signed-off-by: wangchuanlei <wangchuanlei@inspur.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
rhashtable currently only does bh-safe synchronization making it impossible
to use from irq-safe contexts. Switch it to use irq-safe synchronization to
remove the restriction.
v2: Update the lock functions to return the ulong flags value and unlock
functions to take the value directly instead of passing around the
pointer. Suggested by Linus.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: David Vernet <dvernet@meta.com>
Acked-by: Josh Don <joshdon@google.com>
Acked-by: Hao Luo <haoluo@google.com>
Acked-by: Barret Rhoden <brho@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
On kernels using retpoline as a spectrev2 mitigation,
optimize actions and filters that are compiled as built-ins into a direct call.
On subsequent patches we expose the classifiers and actions functions
and wire up the wrapper into tc.
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The type definition should be visible even in configurations not using
CONFIG_NET_CLS_ACT.
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add an option to initialize SOF_TIMESTAMPING_OPT_ID for TCP from
write_seq sockets instead of snd_una.
This should have been the behavior from the start. Because processes
may now exist that rely on the established behavior, do not change
behavior of the existing option, but add the right behavior with a new
flag. It is encouraged to always set SOF_TIMESTAMPING_OPT_ID_TCP on
stream sockets along with the existing SOF_TIMESTAMPING_OPT_ID.
Intuitively the contract is that the counter is zero after the
setsockopt, so that the next write N results in a notification for
the last byte N - 1.
On idle sockets snd_una == write_seq and this holds for both. But on
sockets with data in transmission, snd_una records the unacked offset
in the stream. This depends on the ACK response from the peer. A
process cannot learn this in a race free manner (ioctl SIOCOUTQ is one
racy approach).
write_seq records the offset at the last byte written by the process.
This is a better starting point. It matches the intuitive contract in
all circumstances, unaffected by external behavior.
The new timestamp flag necessitates increasing sk_tsflags to 32 bits.
Move the field in struct sock to avoid growing the socket (for some
common CONFIG variants). The UAPI interface so_timestamping.flags is
already int, so 32 bits wide.
Reported-by: Sotirios Delimanolis <sotodel@meta.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20221207143701.29861-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
No conflicts.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth, can and netfilter.
Current release - new code bugs:
- bonding: ipv6: correct address used in Neighbour Advertisement
parsing (src vs dst typo)
- fec: properly scope IRQ coalesce setup during link up to supported
chips only
Previous releases - regressions:
- Bluetooth fixes for fake CSR clones (knockoffs):
- re-add ERR_DATA_REPORTING quirk
- fix crash when device is replugged
- Bluetooth:
- silence a user-triggerable dmesg error message
- L2CAP: fix u8 overflow, oob access
- correct vendor codec definition
- fix support for Read Local Supported Codecs V2
- ti: am65-cpsw: fix RGMII configuration at SPEED_10
- mana: fix race on per-CQ variable NAPI work_done
Previous releases - always broken:
- af_unix: diag: fetch user_ns from in_skb in unix_diag_get_exact(),
avoid null-deref
- af_can: fix NULL pointer dereference in can_rcv_filter
- can: slcan: fix UAF with a freed work
- can: can327: flush TX_work on ldisc .close()
- macsec: add missing attribute validation for offload
- ipv6: avoid use-after-free in ip6_fragment()
- nft_set_pipapo: actually validate intervals in fields after the
first one
- mvneta: prevent oob access in mvneta_config_rss()
- ipv4: fix incorrect route flushing when table ID 0 is used, or when
source address is deleted
- phy: mxl-gpy: add workaround for IRQ bug on GPY215B and GPY215C"
* tag 'net-6.1-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (77 commits)
net: dsa: sja1105: avoid out of bounds access in sja1105_init_l2_policing()
s390/qeth: fix use-after-free in hsci
macsec: add missing attribute validation for offload
net: mvneta: Fix an out of bounds check
net: thunderbolt: fix memory leak in tbnet_open()
ipv6: avoid use-after-free in ip6_fragment()
net: plip: don't call kfree_skb/dev_kfree_skb() under spin_lock_irq()
net: phy: mxl-gpy: add MDINT workaround
net: dsa: mv88e6xxx: accept phy-mode = "internal" for internal PHY ports
xen/netback: don't call kfree_skb() under spin_lock_irqsave()
dpaa2-switch: Fix memory leak in dpaa2_switch_acl_entry_add() and dpaa2_switch_acl_entry_remove()
ethernet: aeroflex: fix potential skb leak in greth_init_rings()
tipc: call tipc_lxc_xmit without holding node_read_lock
can: esd_usb: Allow REC and TEC to return to zero
can: can327: flush TX_work on ldisc .close()
can: slcan: fix freed work crash
can: af_can: fix NULL pointer dereference in can_rcv_filter
net: dsa: sja1105: fix memory leak in sja1105_setup_devlink_regions()
ipv4: Fix incorrect route flushing when table ID 0 is used
ipv4: Fix incorrect route flushing when source address is deleted
...
|
|
memcg_write_event_control() accesses the dentry->d_name of the specified
control fd to route the write call. As a cgroup interface file can't be
renamed, it's safe to access d_name as long as the specified file is a
regular cgroup file. Also, as these cgroup interface files can't be
removed before the directory, it's safe to access the parent too.
Prior to 347c4a874710 ("memcg: remove cgroup_event->cft"), there was a
call to __file_cft() which verified that the specified file is a regular
cgroupfs file before further accesses. The cftype pointer returned from
__file_cft() was no longer necessary and the commit inadvertently
dropped the file type check with it allowing any file to slip through.
With the invarients broken, the d_name and parent accesses can now race
against renames and removals of arbitrary files and cause
use-after-free's.
Fix the bug by resurrecting the file type check in __file_cft(). Now
that cgroupfs is implemented through kernfs, checking the file
operations needs to go through a layer of indirection. Instead, let's
check the superblock and dentry type.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 347c4a874710 ("memcg: remove cgroup_event->cft")
Cc: stable@kernel.org # v3.14+
Reported-by: Jann Horn <jannh@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Expose port function commands to enable / disable migratable
capability, this is used to set the port function as migratable.
Live migration is the process of transferring a live virtual machine
from one physical host to another without disrupting its normal
operation.
In order for a VM to be able to perform LM, all the VM components must
be able to perform migration. e.g.: to be migratable.
In order for VF to be migratable, VF must be bound to VFIO driver with
migration support.
When migratable capability is enabled for a function of the port, the
device is making the necessary preparations for the function to be
migratable, which might include disabling features which cannot be
migrated.
Example of LM with migratable function configuration:
Set migratable of the VF's port function.
$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0
vfnum 1
function:
hw_addr 00:00:00:00:00:00 migratable disable
$ devlink port function set pci/0000:06:00.0/2 migratable enable
$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0
vfnum 1
function:
hw_addr 00:00:00:00:00:00 migratable enable
Bind VF to VFIO driver with migration support:
$ echo <pci_id> > /sys/bus/pci/devices/0000:08:00.0/driver/unbind
$ echo mlx5_vfio_pci > /sys/bus/pci/devices/0000:08:00.0/driver_override
$ echo <pci_id> > /sys/bus/pci/devices/0000:08:00.0/driver/bind
Attach VF to the VM.
Start the VM.
Perform LM.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Downstream patch requires to get other function GENERAL2 caps while
mlx5_vport_get_other_func_cap() gets only one type of caps (general).
Rename it to represent this and introduce a generic implementation
of mlx5_vport_get_other_func_cap().
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Acked-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Expose port function commands to enable / disable RoCE, this is used to
control the port RoCE device capabilities.
When RoCE is disabled for a function of the port, function cannot create
any RoCE specific resources (e.g GID table).
It also saves system memory utilization. For example disabling RoCE enable a
VF/SF saves 1 Mbytes of system memory per function.
Example of a PCI VF port which supports function configuration:
Set RoCE of the VF's port function.
$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0
vfnum 1
function:
hw_addr 00:00:00:00:00:00 roce enable
$ devlink port function set pci/0000:06:00.0/2 roce disable
$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0
vfnum 1
function:
hw_addr 00:00:00:00:00:00 roce disable
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Introduce IFC related capabilities to enable setting VF to be able to
perform live migration. e.g.: to be migratable.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Acked-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next
Stefan Schmidt says:
====================
ieee802154-next 2022-12-05
Miquel continued his work towards full scanning support. For this,
we now allow the creation of dedicated coordinator interfaces
to allow a PAN coordinator to serve in the network and set
the needed address filters with the hardware.
On top of this we have the first part to allow scanning for available
15.4 networks. A new netlink scan group, within the existing nl802154
API, was added.
In addition Miquel fixed two issues that have been introduced in the former
patches to free an skb correctly and clarifying an expression in the stack.
From David Girault we got tracing support when registering new PANs.
* tag 'ieee802154-for-net-next-2022-12-05' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next:
mac802154: Trace the registration of new PANs
ieee802154: Advertize coordinators discovery
mac802154: Allow the creation of coordinator interfaces
mac802154: Clarify an expression
mac802154: Move an skb free within the rx path
====================
Link: https://lore.kernel.org/r/20221205131909.1871790-1-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If a cookie expires from the LRU and the LRU_DISCARD flag is set, but
the state machine has not run yet, it's possible another thread can call
fscache_use_cookie and begin to use it.
When the cookie_worker finally runs, it will see the LRU_DISCARD flag
set, transition the cookie->state to LRU_DISCARDING, which will then
withdraw the cookie. Once the cookie is withdrawn the object is removed
the below oops will occur because the object associated with the cookie
is now NULL.
Fix the oops by clearing the LRU_DISCARD bit if another thread uses the
cookie before the cookie_worker runs.
BUG: kernel NULL pointer dereference, address: 0000000000000008
...
CPU: 31 PID: 44773 Comm: kworker/u130:1 Tainted: G E 6.0.0-5.dneg.x86_64 #1
Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022
Workqueue: events_unbound netfs_rreq_write_to_cache_work [netfs]
RIP: 0010:cachefiles_prepare_write+0x28/0x90 [cachefiles]
...
Call Trace:
netfs_rreq_write_to_cache_work+0x11c/0x320 [netfs]
process_one_work+0x217/0x3e0
worker_thread+0x4a/0x3b0
kthread+0xd6/0x100
Fixes: 12bb21a29c19 ("fscache: Implement cookie user counting and resource pinning")
Reported-by: Daire Byrne <daire.byrne@gmail.com>
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Daire Byrne <daire@dneg.com>
Link: https://lore.kernel.org/r/20221117115023.1350181-1-dwysocha@redhat.com/ # v1
Link: https://lore.kernel.org/r/20221117142915.1366990-1-dwysocha@redhat.com/ # v2
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This patch adds reset parameter to mtk_wed_rx_ring_setup signature
in order to align rx_ring_setup callback to tx_ring_setup one introduced
in 'commit 23dca7a90017 ("net: ethernet: mtk_wed: add reset to
tx_ring_setup callback")'
Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/29c6e7a5469e784406cf3e2920351d1207713d05.1670239984.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add netlink based support for "ethtool -x <dev> [context x]"
command by implementing ETHTOOL_MSG_RSS_GET netlink message.
This is equivalent to functionality provided via ETHTOOL_GRSSH
in ioctl path. It sends RSS table, hash key and hash function
of an interface to user space.
This patch implements existing functionality available
in ioctl path and enables addition of new RSS context
based parameters in future.
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Link: https://lore.kernel.org/r/20221202002555.241580-1-sudheer.mogilappagari@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
David Howells says:
====================
rxrpc: Increasing SACK size and moving away from softirq, parts 2 & 3
Here are the second and third parts of patches in the process of moving
rxrpc from doing a lot of its stuff in softirq context to doing it in an
I/O thread in process context and thereby making it easier to support a
larger SACK table.
The full description is in the description for the first part[1] which is
already in net-next.
The second part includes some cleanups, adds some testing and overhauls
some tracing:
(1) Remove declaration of rxrpc_kernel_call_is_complete() as the
definition is no longer present.
(2) Remove the knet() and kproto() macros in favour of using tracepoints.
(3) Remove handling of duplicate packets from recvmsg. The input side
isn't now going to insert overlapping/duplicate packets into the
recvmsg queue.
(4) Don't use the rxrpc_conn_parameters struct in the rxrpc_connection or
rxrpc_bundle structs - rather put the members in directly.
(5) Extract the abort code from a received abort packet right up front
rather than doing it in multiple places later.
(6) Use enums and symbol lists rather than __builtin_return_address() to
indicate where a tracepoint was triggered for local, peer, conn, call
and skbuff tracing.
(7) Add a refcount tracepoint for the rxrpc_bundle struct.
(8) Implement an in-kernel server for the AFS rxperf testing program to
talk to (enabled by a Kconfig option).
This is tagged as rxrpc-next-20221201-a.
The third part introduces the I/O thread and switches various bits over to
running there:
(1) Fix call timers and call and connection workqueues to not hold refs on
the rxrpc_call and rxrpc_connection structs to thereby avoid messy
cleanup when the last ref is put in softirq mode.
(2) Split input.c so that the call packet processing bits are separate
from the received packet distribution bits. Call packet processing
gets bumped over to the call event handler.
(3) Create a per-local endpoint I/O thread. Barring some tiny bits that
still get done in softirq context, all packet reception, processing
and transmission is done in this thread. That will allow a load of
locking to be removed.
(4) Perform packet processing and error processing from the I/O thread.
(5) Provide a mechanism to process call event notifications in the I/O
thread rather than queuing a work item for that call.
(6) Move data and ACK transmission into the I/O thread. ACKs can then be
transmitted at the point they're generated rather than getting
delegated from softirq context to some process context somewhere.
(7) Move call and local processor event handling into the I/O thread.
(8) Move cwnd degradation to after packets have been transmitted so that
they don't shorten the window too quickly.
A bunch of simplifications can then be done:
(1) The input_lock is no longer necessary as exclusion is achieved by
running the code in the I/O thread only.
(2) Don't need to use sk->sk_receive_queue.lock to guard socket state
changes as the socket mutex should suffice.
(3) Don't take spinlocks in RCU callback functions as they get run in
softirq context and thus need _bh annotations.
(4) RCU is then no longer needed for the peer's error_targets list.
(5) Simplify the skbuff handling in the receive path by dropping the ref
in the basic I/O thread loop and getting an extra ref as and when we
need to queue the packet for recvmsg or another context.
(6) Get the peer address earlier in the input process and pass it to the
users so that we only do it once.
This is tagged as rxrpc-next-20221201-b.
Changes:
========
ver #2)
- Added a patch to change four assertions into warnings in rxrpc_read()
and fixed a checker warning from a __user annotation that should have
been removed..
- Change a min() to min_t() in rxperf as PAGE_SIZE doesn't seem to match
type size_t on i386.
- Three error handling issues in rxrpc_new_incoming_call():
- If not DATA or not seq #1, should drop the packet, not abort.
- Fix a goto that went to the wrong place, dropping a non-held lock.
- Fix an rcu_read_lock that should've been an unlock.
Tested-by: Marc Dionne <marc.dionne@auristor.com>
Tested-by: kafs-testing+fedora36_64checkkafs-build-144@auristor.com
Link: https://lore.kernel.org/r/166794587113.2389296.16484814996876530222.stgit@warthog.procyon.org.uk/ [1]
Link: https://lore.kernel.org/r/166982725699.621383.2358362793992993374.stgit@warthog.procyon.org.uk/ # v1
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The Tegra MGBE ethernet controller requires that the SERDES link is
powered-up after the PHY link is up, otherwise the link fails to
become ready following a resume from suspend. Add a variable to indicate
that the SERDES link must be powered-up after the PHY link.
Signed-off-by: Revanth Kumar Uppala <ruppala@nvidia.com>
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a helper for drivers wanting to set SW IRQ coalescing
by default. The related sysfs attributes can be used to
override the default values.
Follow Jakub's suggestion and put this functionality into
net core so that drivers wanting to use software interrupt
coalescing per default don't have to open-code it.
Note that this function needs to be called before the
netdevice is registered.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
wireless-next patches for v6.2
Third set of patches for v6.2. mt76 has a new driver for mt7996 Wi-Fi 7
devices and iwlwifi also got initial Wi-Fi 7 support. Otherwise
smaller features and fixes.
Major changes:
ath10k
- store WLAN firmware version in SMEM image table
mt76
- mt7996: new driver for MediaTek Wi-Fi 7 (802.11be) devices
- mt7986, mt7915: enable Wireless Ethernet Dispatch (WED) offload support
- mt7915: add ack signal support
- mt7915: enable coredump support
- mt7921: remain_on_channel support
- mt7921: channel context support
iwlwifi
- enable Wi-Fi 7 Extremely High Throughput (EHT) PHY capabilities
- 320 MHz channels support
* tag 'wireless-next-2022-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (144 commits)
wifi: ath10k: fix QCOM_SMEM dependency
wifi: mt76: mt7921e: add pci .shutdown() support
wifi: mt76: mt7915: mmio: fix naming convention
wifi: mt76: mt7996: add support to configure spatial reuse parameter set
wifi: mt76: mt7996: enable ack signal support
wifi: mt76: mt7996: enable use_cts_prot support
wifi: mt76: mt7915: rely on band_idx of mt76_phy
wifi: mt76: mt7915: enable per bandwidth power limit support
wifi: mt76: mt7915: introduce mt7915_get_power_bound()
mt76: mt7915: Fix PCI device refcount leak in mt7915_pci_init_hif2()
wifi: mt76: do not send firmware FW_FEATURE_NON_DL region
wifi: mt76: mt7921: Add missing __packed annotation of struct mt7921_clc
wifi: mt76: fix coverity overrun-call in mt76_get_txpower()
wifi: mt76: mt7996: add driver for MediaTek Wi-Fi 7 (802.11be) devices
wifi: mt76: mt76x0: remove dead code in mt76x0_phy_get_target_power
wifi: mt76: mt7915: fix band_idx usage
wifi: mt76: mt7915: enable .sta_set_txpwr support
wifi: mt76: mt7915: add basedband Txpower info into debugfs
wifi: mt76: mt7915: add support to configure spatial reuse parameter set
wifi: mt76: mt7915: add missing MODULE_PARM_DESC
...
====================
Link: https://lore.kernel.org/r/20221202214254.D0D3DC433C1@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"MMC core:
- Fix ambiguous TRIM and DISCARD args
- Fix removal of debugfs file for mmc_test
MMC host:
- mtk-sd: Add missing clk_disable_unprepare() in an error path
- sdhci: Fix I/O voltage switch delay for UHS-I SD cards
- sdhci-esdhc-imx: Fix CQHCI exit halt state check
- sdhci-sprd: Fix voltage switch"
* tag 'mmc-v6.1-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-sprd: Fix no reset data and command after voltage switch
mmc: sdhci: Fix voltage switch delay
mmc: mtk-sd: Fix missing clk_disable_unprepare in msdc_of_clock_parse()
mmc: mmc_test: Fix removal of debugfs file
mmc: sdhci-esdhc-imx: correct CQHCI exit halt state check
mmc: core: Fix ambiguous TRIM and DISCARD arg
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc hotfixes from Andrew Morton:
"15 hotfixes, 11 marked cc:stable.
Only three or four of the latter address post-6.0 issues, which is
hopefully a sign that things are converging"
* tag 'mm-hotfixes-stable-2022-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
revert "kbuild: fix -Wimplicit-function-declaration in license_is_gpl_compatible"
Kconfig.debug: provide a little extra FRAME_WARN leeway when KASAN is enabled
drm/amdgpu: temporarily disable broken Clang builds due to blown stack-frame
mm/khugepaged: invoke MMU notifiers in shmem/file collapse paths
mm/khugepaged: fix GUP-fast interaction by sending IPI
mm/khugepaged: take the right locks for page table retraction
mm: migrate: fix THP's mapcount on isolation
mm: introduce arch_has_hw_nonleaf_pmd_young()
mm: add dummy pmd_young() for architectures not having it
mm/damon/sysfs: fix wrong empty schemes assumption under online tuning in damon_sysfs_set_schemes()
tools/vm/slabinfo-gnuplot: use "grep -E" instead of "egrep"
nilfs2: fix NULL pointer dereference in nilfs_palloc_commit_free_entry()
hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing
madvise: use zap_page_range_single for madvise dontneed
mm: replace VM_WARN_ON to pr_warn if the node is offline with __GFP_THISNODE
|
|
As per the specfication vendor codec id is defined.
BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 4, Part E page 2127
Fixes: 9ae664028a9e ("Bluetooth: Add support for Read Local Supported Codecs V2")
Signed-off-by: Chethan T N <chethan.tumkur.narayan@intel.com>
Signed-off-by: Kiran K <kiran.k@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
A patch series by a Qualcomm engineer essentially removed my
quirk/workaround because they thought it was unnecessary.
It wasn't, and it broke everything again:
https://patchwork.kernel.org/project/netdevbpf/list/?series=661703&archive=both&state=*
He argues that the quirk is not necessary because the code should check
if the dongle says if it's supported or not. The problem is that for
these Chinese CSR clones they say that it would work:
= New Index: 00:00:00:00:00:00 (Primary,USB,hci0)
= Open Index: 00:00:00:00:00:00
< HCI Command: Read Local Version Information (0x04|0x0001) plen 0
> HCI Event: Command Complete (0x0e) plen 12
> [hci0] 11.276039
Read Local Version Information (0x04|0x0001) ncmd 1
Status: Success (0x00)
HCI version: Bluetooth 5.0 (0x09) - Revision 2064 (0x0810)
LMP version: Bluetooth 5.0 (0x09) - Subversion 8978 (0x2312)
Manufacturer: Cambridge Silicon Radio (10)
...
< HCI Command: Read Local Supported Features (0x04|0x0003) plen 0
> HCI Event: Command Complete (0x0e) plen 68
> [hci0] 11.668030
Read Local Supported Commands (0x04|0x0002) ncmd 1
Status: Success (0x00)
Commands: 163 entries
...
Read Default Erroneous Data Reporting (Octet 18 - Bit 2)
Write Default Erroneous Data Reporting (Octet 18 - Bit 3)
...
...
< HCI Command: Read Default Erroneous Data Reporting (0x03|0x005a) plen 0
= Close Index: 00:1A:7D:DA:71:XX
So bring it back wholesale.
Fixes: 63b1a7dd38bf ("Bluetooth: hci_sync: Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING")
Fixes: e168f6900877 ("Bluetooth: btusb: Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING for fake CSR")
Fixes: 766ae2422b43 ("Bluetooth: hci_sync: Check LMP feature bit instead of quirk")
Cc: stable@vger.kernel.org
Cc: Zijun Hu <quic_zijuhu@quicinc.com>
Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Tested-by: Ismael Ferreras Morezuelas <swyterzone@gmail.com>
Signed-off-by: Ismael Ferreras Morezuelas <swyterzone@gmail.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
After commit 9ed7bfc79542 ("sctp: fix memory leak in
sctp_stream_outq_migrate()"), sctp_sched_set_sched() is the only
place calling sched->free(), and it can actually be replaced by
sched->free_sid() on each stream, and yet there's already a loop
to traverse all streams in sctp_sched_set_sched().
This patch adds a function sctp_sched_free_sched() where it calls
sched->free_sid() for each stream to replace sched->free() calls
in sctp_sched_set_sched() and then deletes the unused free member
from struct sctp_sched_ops.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://lore.kernel.org/r/e10aac150aca2686cb0bd0570299ec716da5a5c0.1669849471.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch adds two new MPTCP netlink event types for PM listening
socket create and close, named MPTCP_EVENT_LISTENER_CREATED and
MPTCP_EVENT_LISTENER_CLOSED.
Add a new function mptcp_event_pm_listener() to push the new events
with family, port and addr to userspace.
Invoke mptcp_event_pm_listener() with MPTCP_EVENT_LISTENER_CREATED in
mptcp_listen() and mptcp_pm_nl_create_listen_socket(), invoke it with
MPTCP_EVENT_LISTENER_CLOSED in __mptcp_close_ssk().
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
To do that, separate two scenarios:
- where it's the first MD5 key on the system, which means that enabling
of the static key may need to sleep;
- copying of an existing key from a listening socket to the request
socket upon receiving a signed TCP segment, where static key was
already enabled (when the key was added to the listening socket).
Now the life-time of the static branch for TCP-MD5 is until:
- last tcp_md5sig_info is destroyed
- last socket in time-wait state with MD5 key is closed.
Which means that after all sockets with TCP-MD5 keys are gone, the
system gets back the performance of disabled md5-key static branch.
While at here, provide static_key_fast_inc() helper that does ref
counter increment in atomic fashion (without grabbing cpus_read_lock()
on CONFIG_JUMP_LABEL=y). This is needed to add a new user for
a static_key when the caller controls the lifetime of another user.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
1. With CONFIG_JUMP_LABEL=n static_key_slow_inc() doesn't have any
protection against key->enabled refcounter overflow.
2. With CONFIG_JUMP_LABEL=y static_key_slow_inc_cpuslocked()
still may turn the refcounter negative as (v + 1) may overflow.
key->enabled is indeed a ref-counter as it's documented in multiple
places: top comment in jump_label.h, Documentation/staging/static-keys.rst,
etc.
As -1 is reserved for static key that's in process of being enabled,
functions would break with negative key->enabled refcount:
- for CONFIG_JUMP_LABEL=n negative return of static_key_count()
breaks static_key_false(), static_key_true()
- the ref counter may become 0 from negative side by too many
static_key_slow_inc() calls and lead to use-after-free issues.
These flaws result in that some users have to introduce an additional
mutex and prevent the reference counter from overflowing themselves,
see bpf_enable_runtime_stats() checking the counter against INT_MAX / 2.
Prevent the reference counter overflow by checking if (v + 1) > 0.
Change functions API to return whether the increment was successful.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This can be used to selectively disable feature flags for checksum offload,
scatter/gather or GSO by changing vif->netdev_features.
Removing features from vif->netdev_features does not affect the netdev
features themselves, but instead fixes up skbs in the tx path so that the
offloads are not needed in the driver.
Aside from making it easier to deal with vif type based hardware limitations,
this also makes it possible to optimize performance on hardware without native
GSO support by declaring GSO support in hw->netdev_features and removing it
from vif->netdev_features. This allows mac80211 to handle GSO segmentation
after the sta lookup, but before itxq enqueue, thus reducing the number of
unnecessary sta lookups, as well as some other per-packet processing.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20221010094338.78070-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
For ACKs generated inside the I/O thread, transmit the ACK at the point of
generation. Where the ACK is generated outside of the I/O thread, it's
offloaded to the I/O thread to transmit it.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
|
|
Add a tracepoint to log when a cwnd reset occurs due to lack of
transmission on a call.
Add stat counters to count transmission underflows (ie. when we have tx
window space, but sendmsg doesn't manage to keep up), cwnd resets and
transmission failures.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
|
|
Move the functions from the call->processor and local->processor work items
into the domain of the I/O thread.
The call event processor, now called from the I/O thread, then takes over
the job of cranking the call state machine, processing incoming packets and
transmitting DATA, ACK and ABORT packets. In a future patch,
rxrpc_send_ACK() will transmit the ACK on the spot rather than queuing it
for later transmission.
The call event processor becomes purely received-skb driven. It only
transmits things in response to events. We use "pokes" to queue a dummy
skb to make it do things like start/resume transmitting data. Timer expiry
also results in pokes.
The connection event processor, becomes similar, though crypto events, such
as dealing with CHALLENGE and RESPONSE packets is offloaded to a work item
to avoid doing crypto in the I/O thread.
The local event processor is removed and VERSION response packets are
generated directly from the packet parser. Similarly, ABORTs generated in
response to protocol errors will be transmitted immediately rather than
being pushed onto a queue for later transmission.
Changes:
========
ver #2)
- Fix a couple of introduced lock context imbalances.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
|
|
A received skbuff needs a ref when it gets put on a call data queue or conn
packet queue, and rxrpc_input_packet() and co. jump through a lot of hoops
to avoid double-dropping the skbuff ref so that we can avoid getting a ref
when we queue the packet.
Change this so that the skbuff ref is unconditionally dropped by the caller
of rxrpc_input_packet(). An additional ref is then taken on the packet if
it is pushed onto a queue.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
|
|
Move DATA transmission into the call processor work item. In a future
patch, this will be called from the I/O thread rather than being itsown
work item.
This will allow DATA transmission to be driven directly by incoming ACKs,
pokes and timers as those are processed.
The Tx queue is also split: The queue of packets prepared by sendmsg is now
places in call->tx_sendmsg and the packet dispatcher decants the packets
into call->tx_buffer as space becomes available in the transmission
window. This allows sendmsg to run ahead of the available space to try and
prevent an underflow in transmission.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
|
|
Copy client call parameters into rxrpc_call earlier so that that can be
used to convey them to the connection code - which can then be offloaded to
the I/O thread.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
|
|
Provide a means by which an event notification can be sent to a call such
that the I/O thread can process it rather than it being done in a separate
workqueue. This will allow a lot of locking to be removed.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
|
|
Currently, rxrpc gives the connection's work item a ref on the connection
when it queues it - and this is called from the timer expiration function.
The problem comes when queue_work() fails (ie. the work item is already
queued): the timer routine must put the ref - but this may cause the
cleanup code to run.
This has the unfortunate effect that the cleanup code may then be run in
softirq context - which means that any spinlocks it might need to touch
have to be guarded to disable softirqs (ie. they need a "_bh" suffix).
(1) Don't give a ref to the work item.
(2) Simplify handling of service connections by adding a separate active
count so that the refcount isn't also used for this.
(3) Connection destruction for both client and service connections can
then be cleaned up by putting rxrpc_put_connection() out of line and
making a tidy progression through the destruction code (offloaded to a
workqueue if put from softirq or processor function context). The RCU
part of the cleanup then only deals with the freeing at the end.
(4) Make rxrpc_queue_conn() return immediately if it sees the active count
is -1 rather then queuing the connection.
(5) Make sure that the cleanup routine waits for the work item to
complete.
(6) Stash the rxrpc_net pointer in the conn struct so that the rcu free
routine can use it, even if the local endpoint has been freed.
Unfortunately, neither the timer nor the work item can simply get around
the problem by just using refcount_inc_not_zero() as the waits would still
have to be done, and there would still be the possibility of having to put
the ref in the expiration function.
Note the connection work item is mostly going to go away with the main
event work being transferred to the I/O thread, so the wait in (6) will
become obsolete.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
|
|
Currently, rxrpc gives the call timer a ref on the call when it starts it
and this is passed along to the workqueue by the timer expiration function.
The problem comes when queue_work() fails (ie. the work item is already
queued): the timer routine must put the ref - but this may cause the
cleanup code to run.
This has the unfortunate effect that the cleanup code may then be run in
softirq context - which means that any spinlocks it might need to touch
have to be guarded to disable softirqs (ie. they need a "_bh" suffix).
Fix this by:
(1) Don't give a ref to the timer.
(2) Making the expiration function not do anything if the refcount is 0.
Note that this is more of an optimisation.
(3) Make sure that the cleanup routine waits for timer to complete.
However, this has a consequence that timer cannot give a ref to the work
item. Therefore the following fixes are also necessary:
(4) Don't give a ref to the work item.
(5) Make the work item return asap if it sees the ref count is 0.
(6) Make sure that the cleanup routine waits for the work item to
complete.
Unfortunately, neither the timer nor the work item can simply get around
the problem by just using refcount_inc_not_zero() as the waits would still
have to be done, and there would still be the possibility of having to put
the ref in the expiration function.
Note the call work item is going to go away with the work being transferred
to the I/O thread, so the wait in (6) will become obsolete.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
|
|
In rxrpc tracing, use enums to generate lists of points of interest rather
than __builtin_return_address() for the sk_buff tracepoint.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
|
|
Add a tracepoint for the rxrpc_bundle refcounting.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
|
|
In rxrpc tracing, use enums to generate lists of points of interest rather
than __builtin_return_address() for the rxrpc_call tracepoint
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
|
|
In rxrpc tracing, use enums to generate lists of points of interest rather
than __builtin_return_address() for the rxrpc_conn tracepoint
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
|
|
In rxrpc tracing, use enums to generate lists of points of interest rather
than __builtin_return_address() for the rxrpc_peer tracepoint
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
|
|
In rxrpc tracing, use enums to generate lists of points of interest rather
than __builtin_return_address() for the rxrpc_local tracepoint
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
|
|
Remove the kproto() and _proto() debugging macros in preference to using
tracepoints for this.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
|
|
rxrpc_kernel_call_is_complete() has been removed, so remove its declaration
too.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
|
|
Implement an in-kernel rxperf server to allow kernel-based rxrpc services
to be tested directly, unlike with AFS where they're accessed by the
fileserver when the latter decides it wants to.
This is implemented as a module that, if loaded, opens UDP port 7009
(afs3-rmtsys) and listens on it for incoming calls. Calls can be generated
using the rxperf command shipped with OpenAFS, for example.
Changes
=======
ver #2)
- Use min_t() instead of min().
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: Jakub Kicinski <kuba@kernel.org>
|
|
Correct wrong closing bracket.
Signed-off-by: Philipp Hortmann <philipp.g.hortmann@gmail.com>
Link: https://lore.kernel.org/r/20221114200135.GA100176@matrix-ESPRIMO-P710
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
When building with -Wstringop-overflow, GCC's KASAN implementation does
not correctly perform bounds checking within some complex structures
when faced with literal offsets, and can get very confused. For example,
this warning is seen due to literal offsets into sturct ieee80211_hdr
that may or may not be large enough:
drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c: In function 'iwl_mvm_rx_mpdu_mq':
drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c:2022:29: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
2022 | *qc &= ~IEEE80211_QOS_CTL_A_MSDU_PRESENT;
In file included from drivers/net/wireless/intel/iwlwifi/mvm/fw-api.h:32,
from drivers/net/wireless/intel/iwlwifi/mvm/sta.h:15,
from drivers/net/wireless/intel/iwlwifi/mvm/mvm.h:27,
from drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c:10:
drivers/net/wireless/intel/iwlwifi/mvm/../fw/api/rx.h:559:16: note: at offset [78, 166] into destination object 'mpdu_len' of size 2
559 | __le16 mpdu_len;
| ^~~~~~~~
Refactor ieee80211_get_qos_ctl() to avoid using literal offsets,
requiring the creation of the actual structure that is described in the
comments. Explicitly choose the desired offset, making the code more
human-readable too. This is one of the last remaining warning to fix
before enabling -Wstringop-overflow globally.
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97490
Link: https://github.com/KSPP/linux/issues/181
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: Gregory Greenman <gregory.greenman@intel.com>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221130212641.never.627-kees@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
ping_lookup() does not acquire the table spinlock, so iteration should
use hlist_nulls_for_each_entry_rcu().
Spotted during code review.
Fixes: dbca1596bbb0 ("ping: convert to RCU lookups, get rid of rwlock")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20221129140644.28525-1-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|