Age | Commit message (Collapse) | Author |
|
Add a function to change the owner of a network device when it is moved
between network namespaces.
Currently, when moving network devices between network namespaces the
ownership of the corresponding sysfs entries is not changed. This leads
to problems when tools try to operate on the corresponding sysfs files.
This leads to a bug whereby a network device that is created in a
network namespaces owned by a user namespace will have its corresponding
sysfs entry owned by the root user of the corresponding user namespace.
If such a network device has to be moved back to the host network
namespace the permissions will still be set to the user namespaces. This
means unprivileged users can e.g. trigger uevents for such incorrectly
owned devices. They can also modify the settings of the device itself.
Both of these things are unwanted.
For example, workloads will create network devices in the host network
namespace. Other tools will then proceed to move such devices between
network namespaces owner by other user namespaces. While the ownership
of the device itself is updated in
net/core/net-sysfs.c:dev_change_net_namespace() the corresponding sysfs
entry for the device is not:
drwxr-xr-x 5 nobody nobody 0 Jan 25 18:08 .
drwxr-xr-x 9 nobody nobody 0 Jan 25 18:08 ..
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 addr_assign_type
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 addr_len
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 address
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 broadcast
-rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_changes
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_down_count
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_up_count
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dev_id
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dev_port
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dormant
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 duplex
-rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 flags
-rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 gro_flush_timeout
-rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 ifalias
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 ifindex
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 iflink
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 link_mode
-rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 mtu
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 name_assign_type
-rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 netdev_group
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 operstate
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_port_id
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_port_name
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_switch_id
drwxr-xr-x 2 nobody nobody 0 Jan 25 18:09 power
-rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 proto_down
drwxr-xr-x 4 nobody nobody 0 Jan 25 18:09 queues
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 speed
drwxr-xr-x 2 nobody nobody 0 Jan 25 18:09 statistics
lrwxrwxrwx 1 nobody nobody 0 Jan 25 18:08 subsystem -> ../../../../class/net
-rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 tx_queue_len
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 type
-rw-r--r-- 1 nobody nobody 4096 Jan 25 18:08 uevent
However, if a device is created directly in the network namespace then
the device's sysfs permissions will be correctly updated:
drwxr-xr-x 5 root root 0 Jan 25 18:12 .
drwxr-xr-x 9 nobody nobody 0 Jan 25 18:08 ..
-r--r--r-- 1 root root 4096 Jan 25 18:12 addr_assign_type
-r--r--r-- 1 root root 4096 Jan 25 18:12 addr_len
-r--r--r-- 1 root root 4096 Jan 25 18:12 address
-r--r--r-- 1 root root 4096 Jan 25 18:12 broadcast
-rw-r--r-- 1 root root 4096 Jan 25 18:12 carrier
-r--r--r-- 1 root root 4096 Jan 25 18:12 carrier_changes
-r--r--r-- 1 root root 4096 Jan 25 18:12 carrier_down_count
-r--r--r-- 1 root root 4096 Jan 25 18:12 carrier_up_count
-r--r--r-- 1 root root 4096 Jan 25 18:12 dev_id
-r--r--r-- 1 root root 4096 Jan 25 18:12 dev_port
-r--r--r-- 1 root root 4096 Jan 25 18:12 dormant
-r--r--r-- 1 root root 4096 Jan 25 18:12 duplex
-rw-r--r-- 1 root root 4096 Jan 25 18:12 flags
-rw-r--r-- 1 root root 4096 Jan 25 18:12 gro_flush_timeout
-rw-r--r-- 1 root root 4096 Jan 25 18:12 ifalias
-r--r--r-- 1 root root 4096 Jan 25 18:12 ifindex
-r--r--r-- 1 root root 4096 Jan 25 18:12 iflink
-r--r--r-- 1 root root 4096 Jan 25 18:12 link_mode
-rw-r--r-- 1 root root 4096 Jan 25 18:12 mtu
-r--r--r-- 1 root root 4096 Jan 25 18:12 name_assign_type
-rw-r--r-- 1 root root 4096 Jan 25 18:12 netdev_group
-r--r--r-- 1 root root 4096 Jan 25 18:12 operstate
-r--r--r-- 1 root root 4096 Jan 25 18:12 phys_port_id
-r--r--r-- 1 root root 4096 Jan 25 18:12 phys_port_name
-r--r--r-- 1 root root 4096 Jan 25 18:12 phys_switch_id
drwxr-xr-x 2 root root 0 Jan 25 18:12 power
-rw-r--r-- 1 root root 4096 Jan 25 18:12 proto_down
drwxr-xr-x 4 root root 0 Jan 25 18:12 queues
-r--r--r-- 1 root root 4096 Jan 25 18:12 speed
drwxr-xr-x 2 root root 0 Jan 25 18:12 statistics
lrwxrwxrwx 1 nobody nobody 0 Jan 25 18:12 subsystem -> ../../../../class/net
-rw-r--r-- 1 root root 4096 Jan 25 18:12 tx_queue_len
-r--r--r-- 1 root root 4096 Jan 25 18:12 type
-rw-r--r-- 1 root root 4096 Jan 25 18:12 uevent
Now, when creating a network device in a network namespace owned by a
user namespace and moving it to the host the permissions will be set to
the id that the user namespace root user has been mapped to on the host
leading to all sorts of permission issues:
458752
drwxr-xr-x 5 458752 458752 0 Jan 25 18:12 .
drwxr-xr-x 9 root root 0 Jan 25 18:08 ..
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 addr_assign_type
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 addr_len
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 address
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 broadcast
-rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 carrier
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 carrier_changes
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 carrier_down_count
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 carrier_up_count
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 dev_id
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 dev_port
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 dormant
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 duplex
-rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 flags
-rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 gro_flush_timeout
-rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 ifalias
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 ifindex
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 iflink
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 link_mode
-rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 mtu
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 name_assign_type
-rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 netdev_group
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 operstate
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 phys_port_id
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 phys_port_name
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 phys_switch_id
drwxr-xr-x 2 458752 458752 0 Jan 25 18:12 power
-rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 proto_down
drwxr-xr-x 4 458752 458752 0 Jan 25 18:12 queues
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 speed
drwxr-xr-x 2 458752 458752 0 Jan 25 18:12 statistics
lrwxrwxrwx 1 root root 0 Jan 25 18:12 subsystem -> ../../../../class/net
-rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 tx_queue_len
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 type
-rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 uevent
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a helper to change the owner of a device's power entries. This
needs to happen when the ownership of a device is changed, e.g. when
moving network devices between network namespaces.
This function will be used to correctly account for ownership changes,
e.g. when moving network devices between network namespaces.
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a helper to change the owner of a device's sysfs entries. This
needs to happen when the ownership of a device is changed, e.g. when
moving network devices between network namespaces.
This function will be used to correctly account for ownership changes,
e.g. when moving network devices between network namespaces.
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a helper to change the owner of sysfs objects.
This function will be used to correctly account for kobject ownership
changes, e.g. when moving network devices between network namespaces.
This mirrors how a kobject is added through driver core which in its guts is
done via kobject_add_internal() which in summary creates the main directory via
create_dir(), populates that directory with the groups associated with the
ktype of the kobject (if any) and populates the directory with the basic
attributes associated with the ktype of the kobject (if any). These are the
basic steps that are associated with adding a kobject in sysfs.
Any additional properties are added by the specific subsystem itself (not by
driver core) after it has registered the device. So for the example of network
devices, a network device will e.g. register a queue subdirectory under the
basic sysfs directory for the network device and than further subdirectories
within that queues subdirectory. But that is all specific to network devices
and they call the corresponding sysfs functions to do that directly when they
create those queue objects. So anything that a subsystem adds outside of what
driver core does must also be changed by it (That's already true for removal of
files it created outside of driver core.) and it's the same for ownership
changes.
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add helpers to change the owner of sysfs groups.
This function will be used to correctly account for kobject ownership
changes, e.g. when moving network devices between network namespaces.
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a helper to change the owner of a sysfs link.
This function will be used to correctly account for kobject ownership
changes, e.g. when moving network devices between network namespaces.
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add helpers to change the owner of a sysfs files.
This function will be used to correctly account for kobject ownership
changes, e.g. when moving network devices between network namespaces.
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
Lastly, fix the following checkpatch warning:
CHECK: Prefer kernel type 'u32' over 'u_int32_t'
#61: FILE: drivers/net/ethernet/cisco/enic/vnic_devcmd.h:653:
+ u_int32_t val[];
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Martin Habets <mhabets@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
This issue was detected with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit 7458bd540fa0a90220b9e8c349d910d9dde9caf8 ("net: dsa:
bcm_sf2: Also configure Port 5 for 2Gb/sec on 7278") as it causes
advanced congestion buffering issues with 7278 switch devices when using
their internal Giabit PHY. While this is being debugged, continue with
conservative defaults that work and do not cause packet loss.
Fixes: 7458bd540fa0 ("net: dsa: bcm_sf2: Also configure Port 5 for 2Gb/sec on 7278")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Looks like the iavf code actually experienced a race condition, when a
developer took code before the check for chain 0 was put to helper.
So use tc_cls_can_offload_and_chain0() helper instead of direct check and
move the check to _cb() so this is similar to i40e code.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Return NULL instead of 0.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
|
|
drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c:191:13:
sparse: warning: incorrect type in assignment (different base types)
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
|
|
Clang warns:
In file included from
../drivers/net/ethernet/mellanox/mlx5/core/main.c:73:
../drivers/net/ethernet/mellanox/mlx5/core/diag/rsc_dump.h:4:9: warning:
'__MLX5_RSC_DUMP_H' is used as a header guard here, followed by #define
of a different macro [-Wheader-guard]
#ifndef __MLX5_RSC_DUMP_H
^~~~~~~~~~~~~~~~~
../drivers/net/ethernet/mellanox/mlx5/core/diag/rsc_dump.h:5:9: note:
'__MLX5_RSC_DUMP__H' is defined here; did you mean '__MLX5_RSC_DUMP_H'?
#define __MLX5_RSC_DUMP__H
^~~~~~~~~~~~~~~~~~
__MLX5_RSC_DUMP_H
1 warning generated.
Make them match to get the intended behavior and remove the warning.
Fixes: 12206b17235a ("net/mlx5: Add support for resource dump")
Link: https://github.com/ClangBuiltLinux/linux/issues/897
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Fix a vxlan typo in the mlx5 driver documentation.
Signed-off-by: Hans Wippel <ndev@hwipl.net>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
We can avoid an indirect call per compressed completion wrapping the
completion handling call with the appropriate helper.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
We can avoid an indirect call per NAPI cycle wrapping the RX descriptors
posting call with the appropriate helper.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
The current steps that are performed when the trust state changes, if
the channels are active:
1. The trust state is changed in hardware.
2. The new inline mode is calculated.
3. If the new inline mode is different, the channels are recreated using
the new inline mode.
This approach has some issues:
1. There is a time gap between changing trust state in hardware and
starting sending enough inline headers (the latter happens after
recreation of channels). It leads to failed transmissions and error
CQEs.
2. If the new channels fail to open, we'll be left with the old ones,
but the hardware will be configured for the new trust state, so the
interval when we can see TX errors never ends.
This patch fixes the issues above by moving the trust state change into
the preactivate hook that runs during the recreation of the channels
when no channels are active, so it eliminates the gap of partially
applied configuration. If the inline mode doesn't change with the change
of the trust state, the channels won't be recreated, just like before
this patch.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Sometimes the preactivate hook of mlx5e_safe_switch_channels needs more
parameters than just struct mlx5e_priv *. For such cases, a new
parameter (void *context) is added to preactivate hooks.
Some of the existing normal functions are currently used as preactivate
callbacks. To avoid adding an extra unused parameter, they are wrapped
in an automatic way using the MLX5E_DEFINE_PREACTIVATE_WRAPPER_CTX
macro.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Currently mlx5e_switch_priv_channels expects that the preactivate hook
doesn't fail, however, it can fail, because it may set hardware
parameters. This commit addresses this issue and provides a way to
recover from failures of the preactivate hook: the old channels are not
closed until the point where nothing can fail anymore, so in case
preactivate fails, the driver can roll back the old channels and
activate them again.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
The number of queues is now updated by mlx5e_update_netdev_queues in a
centralized way, when no channels are active. Remove an extra occurrence
of netif_set_real_num_tx_queues to prepare it for the next commit.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Currently, mlx5e notifies the kernel about the number of queues and sets
the default XPS cpumasks when channels are activated. This
implementation has several corner cases, in which the kernel may not be
updated on time, or XPS cpumasks may be reset when not directly touched
by the user.
This commit fixes these corner cases to match the following expected
behavior:
1. The number of queues always corresponds to the number of channels
configured.
2. XPS cpumasks are set to driver's defaults on netdev attach.
3. XPS cpumasks set by user are not reset, unless the number of channels
changes. If the number of channels changes, they are reset to driver's
defaults. (In general case, when the number of channels increases or
decreases, it's not possible to guess how to convert the current XPS
cpumasks to work with the new number of channels, so we let the user
reconfigure it if they change the number of channels.)
XPS cpumasks are no longer stored per channel. Only one temporary
cpumask is used. The old stored cpumasks didn't reflect the user's
changes and were not used after applying them.
A scratchpad area is added to struct mlx5e_priv. As cpumask_var_t
requires allocation, and the preactivate hook can't fail, we need to
preallocate the temporary cpumask in advance. It's stored in the
scratchpad.
Fixes: 149e566fef81 ("net/mlx5e: Expand XPS cpumask to cover all online cpus")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
mlx5e_ethtool_set_channels updates the indirection table before
switching to the new channels. If the switch fails, the indirection
table is new, but the channels are old, which is wrong. Fix it by using
the preactivate hook of mlx5e_safe_switch_channels to update the
indirection table at the stage when nothing can fail anymore.
As the code that updates the indirection table is now encapsulated into
a new function, use that function in the attach flow when the driver has
to reduce the number of channels, and prepare the code for the next
commit.
Fixes: 85082dba0a ("net/mlx5e: Correctly handle RSS indirection table when changing number of channels")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
mlx5e_safe_switch_channels accepts a callback to be called before
activating new channels. It is intended to configure some hardware
parameters in cases where channels are recreated because some
configuration has changed.
Recently, this callback has started being used to update the driver's
internal MLX5E_STATE_XDP_OPEN flag, and the following patches also
intend to use this callback for software preparations. This patch
renames the hw_modify callback to preactivate, so that the name fits
better.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
As a preparation for one of the following commits, create a function to
encapsulate the code that notifies the kernel about the new amount of
RX and TX queues. The code will be called multiple times in the next
commit.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
The LRO boolean state in params->lro_en must not be set in case
the NIC is not capable.
Enforce this check and remove the TODO comment.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
We shall always extract channel index out of the txq, regardless
of the relation between txq_ix and num channels. The extraction is
always valid, as if txq is smaller than number of channels,
txq_ix == priv->txq2sq[txq_ix]->ch_ix.
By doing so, we can remove an if clause from the select queue method,
and have one flow for all packets.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Jiri Pirko says:
====================
mlxsw: Implement ACL-dropped packets identification
mlxsw hardware allows to insert a ACL-drop action with a value defined
by user that would be later on passed with a dropped packet.
To implement this, use the existing TC action cookie and pass it to the
driver. As the cookie format coming down from TC and the mlxsw HW cookie
format is different, do the mapping of these two using idr and rhashtable.
The cookie is passed up from the HW through devlink_trap_report() to
drop_monitor code. A new metadata type is used for that.
Example:
$ tc qdisc add dev enp0s16np1 clsact
$ tc filter add dev enp0s16np1 ingress protocol ip pref 10 flower skip_sw dst_ip 192.168.1.2 action drop cookie 3b45fa38c8
^^^^^^^^^^
$ devlink trap set pci/0000:00:10.0 trap acl action trap
$ dropwatch
Initializing null lookup method
dropwatch> set hw true
setting hardware drops monitoring to 1
dropwatch> set alertmode packet
Setting alert mode
Alert mode successfully set
dropwatch> start
Enabling monitoring...
Kernel monitoring activated.
Issue Ctrl-C to stop monitoring
drop at: ingress_flow_action_drop (acl_drops)
origin: hardware
input port ifindex: 30
input port name: enp0s16np1
cookie: 3b45fa38c8 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
timestamp: Fri Jan 24 17:10:53 2020 715387671 nsec
protocol: 0x800
length: 98
original length: 98
This way the user may insert multiple drop rules and monitor the dropped
packets with the information of which action caused the drop.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Extend existing devlink trap test to include metadata type for flow
action cookie.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add new trap ACL which reports flow action cookie in a metadata. Allow
used to setup the cookie using debugfs file.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use the cookie index received along with the packet to lookup original
flow_offload cookie binary and pass it down to devlink_trap_report().
Add "fa_cookie" metadata to the ACL trap.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In case the received packet comes in due to one of ACL discard traps,
take the user_def_val_orig_pkt_len field from CQE and store it
in skb->cb as ACL cookie index.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Track cookies coming down to driver by flow_offload.
Assign a cookie_index to each unique cookie binary. Use previously
defined "Trap with userdef" flex action to ask HW to pass cookie_index
alongside with the dropped packets.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Expose "Trap action with userdef". It is the same as already
defined "Trap action" with a difference that it would ask the policy
engine to pass arbitrary value (userdef) alongside with received packets.
This would be later on used to carry cookie index.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add cookie argument to devlink_trap_report() allowing driver to pass in
the user cookie. Pass on the cookie down to drop monitor code.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If driver passed along the cookie, push it through Netlink.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allow driver to indicate cookie metadata for registered traps.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Extend struct flow_action_entry in order to hold TC action cookie
specified by user inserting the action.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
A new set of changes:
* lots of small documentation fixes, from Jérôme Pouiller
* beacon protection (BIGTK) support from Jouni Malinen
* some initial code for TID configuration, from Tamizh chelvam
* I reverted some new API before it's actually used, because
it's wrong to mix controlled port and preauth
* a few other cleanups/fixes
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is a spelling mistake in a pr_err message. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The "stm32_pwr_wakeup" is optional per the binding and the driver
handles its absence gracefully. Request it with
platform_get_irq_byname_optional, so its absence doesn't needlessly
clutter the log.
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The specification of a "eth-ck" and a "ptp_ref" clock is optional per
the binding and the driver handles them gracefully.
Demote the output to an info message accordingly.
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Jeremy Linton says:
====================
Add ACPI bindings to the genet
This patch series allows the BCM GENET, as used on the RPi4,
to attach when booted in an ACPI environment. The DSDT entry to
trigger this is seen below. Of note, the first patch adds a
small extension to the mdio layer which allows drivers to find
the mii_bus without firmware assistance. The fifth patch in
the set retrieves the MAC address from the umac registers
rather than carrying it directly in the DSDT. This of course
requires the firmware to pre-program it, so we continue to fall
back on a random one if it appears to be garbage.
v1 -> v2:
fail on missing phy-mode property
replace phy-mode internal property read string with
device_get_phy_mode() equivalent
rework mac address detection logic so that it merges
the acpi/DT case into device_get_mac_address()
allowing _DSD mac address properties.
some commit messages justifying why phy_find_first()
isn't the worst choice for this driver.
+ Device (ETH0)
+ {
+ Name (_HID, "BCM6E4E")
+ Name (_UID, 0)
+ Name (_CCA, 0x0)
+ Method (_STA)
+ {
+ Return (0xf)
+ }
+ Method (_CRS, 0x0, Serialized)
+ {
+ Name (RBUF, ResourceTemplate ()
+ {
+ Memory32Fixed (ReadWrite, 0xFd580000, 0x10000, )
+ Interrupt (ResourceConsumer, Level, ActiveHigh, Exclusive) { 0xBD }
+ Interrupt (ResourceConsumer, Level, ActiveHigh, Exclusive) { 0xBE }
+ })
+ Return (RBUF)
+ }
+ Name (_DSD, Package () {
+ ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
+ Package () {
+ Package () { "phy-mode", "rgmii-rxid" },
+ }
+ })
+ }
====================
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If one types "failed to get enet clock" or similar into google
there are ~370k hits. The vast majority are people debugging
problems unrelated to this adapter, or bragging about their
rpi's. Further, the DT clock bindings here are optional.
Given that its not a fatal situation with common DT based
systems, lets reduce the severity so people aren't seeing failure
messages in everyday operation.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Reviewed-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ARM/ACPI machines should utilize self describing hardware
when possible. The MAC address on the BCMGENET can be
read from the adapter if a full featured firmware has already
programmed it. Lets try using the address already programmed,
if it appears to be valid.
It should be noted that while we move the macaddr logic below
the clock and power logic in the driver, none of that code will
ever be active in an ACPI environment as the device will be
attached to the acpi power domain, and brought to full power
with all clocks enabled immediately before the device probe
routine is called.
One side effect of the above tweak is that while its now
possible to read the MAC address via _DSD properties, it should
be avoided.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The rpi4 is capable of booting in ACPI mode with the latest
edk2-platform commits. As such it would be helpful if the genet
platform device were usable.
To achieve this we add a new MODULE_DEVICE_TABLE, and convert
a few dt specific methods to their generic device_ calls. Until
the next patch, ACPI based machines will fallback on random
mac addresses.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The unimac mdio driver falls back to scanning the
entire bus if its given an appropriate mask. In ACPI
mode we expect that the system is well behaved and
conforms to recent versions of the specification.
We then utilize phy_find_first(), and
phy_connect_direct() to find and attach to the
discovered phy during net_device open. While its
apparently possible to build a genet based device
with multiple phys on a single mdio bus, this works
for current machines. Further, this driver makes
a number of assumptions about the platform device,
mac, mdio and phy all being 1:1. Lastly, It also
avoids having to create references across the ACPI
namespace hierarchy.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|