Age | Commit message (Collapse) | Author |
|
In some SGMII use cases where both a fixed link external PHY and the
internal PCS/PMA PHY need to be configured, we should explicitly use a
phandle "pcs-phy" to get the reference to the PCS/PMA PHY. Otherwise, the
driver would use "phy-handle" in the DT as the reference to both the
external and the internal PCS/PMA PHY.
In other cases where the core is connected to a SFP cage, we could still
point phy-handle to the intenal PCS/PMA PHY, and let the driver connect
to the SFP module, if exist, via phylink.
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
Reviewed-by: Robert Hancock <robert.hancock@calian.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Document the new pcs-handle attribute to support connecting to an
external PHY. For Xilinx's AXI Ethernet, this is used when the core
operates in SGMII or 1000Base-X modes and links through the internal
PCS/PMA PHY.
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
the struct member `phy_node` of struct axienet_local is not used by the
driver anymore after initialization. It might be a remnent of old code
and could be removed.
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
Reviewed-by: Robert Hancock <robert.hancock@calian.com>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The call to axienet_mdio_setup should not depend on whether "phy-node"
pressents on the DT. Besides, since `lp->phy_node` is used if PHY is in
SGMII or 100Base-X modes, move it into the if statement. And the next patch
will remove `lp->phy_node` from driver's private structure and do an
of_node_put on it right away after use since it is not used elsewhere.
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
Reviewed-by: Robert Hancock <robert.hancock@calian.com>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In some cases, xdp tx_queue can get used before initialization.
1. interface up/down
2. ring buffer size change
When CPU cores are lower than maximum number of channels of sfc driver,
it creates new channels only for XDP.
When an interface is up or ring buffer size is changed, all channels
are initialized.
But xdp channels are always initialized later.
So, the below scenario is possible.
Packets are received to rx queue of normal channels and it is acted
XDP_TX and tx_queue of xdp channels get used.
But these tx_queues are not initialized yet.
If so, TX DMA or queue error occurs.
In order to avoid this problem.
1. initializes xdp tx_queues earlier than other rx_queue in
efx_start_channels().
2. checks whether tx_queue is initialized or not in efx_xdp_tx_buffers().
Splat looks like:
sfc 0000:08:00.1 enp8s0f1np1: TX queue 10 spurious TX completion id 250
sfc 0000:08:00.1 enp8s0f1np1: resetting (RECOVER_OR_ALL)
sfc 0000:08:00.1 enp8s0f1np1: MC command 0x80 inlen 100 failed rc=-22
(raw=22) arg=789
sfc 0000:08:00.1 enp8s0f1np1: has been disabled
Fixes: f28100cb9c96 ("sfc: fix lack of XDP TX queues - error XDP TX failed (-22)")
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Current code can lead to the following race:
CPU0 CPU1
rxrpc_exit_net()
rxrpc_peer_keepalive_worker()
if (rxnet->live)
rxnet->live = false;
del_timer_sync(&rxnet->peer_keepalive_timer);
timer_reduce(&rxnet->peer_keepalive_timer, jiffies + delay);
cancel_work_sync(&rxnet->peer_keepalive_work);
rxrpc_exit_net() exits while peer_keepalive_timer is still armed,
leading to use-after-free.
syzbot report was:
ODEBUG: free active (active state 0) object type: timer_list hint: rxrpc_peer_keepalive_timeout+0x0/0xb0
WARNING: CPU: 0 PID: 3660 at lib/debugobjects.c:505 debug_print_object+0x16e/0x250 lib/debugobjects.c:505
Modules linked in:
CPU: 0 PID: 3660 Comm: kworker/u4:6 Not tainted 5.17.0-syzkaller-13993-g88e6c0207623 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: netns cleanup_net
RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:505
Code: ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 af 00 00 00 48 8b 14 dd 00 1c 26 8a 4c 89 ee 48 c7 c7 00 10 26 8a e8 b1 e7 28 05 <0f> 0b 83 05 15 eb c5 09 01 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e c3
RSP: 0018:ffffc9000353fb00 EFLAGS: 00010082
RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000
RDX: ffff888029196140 RSI: ffffffff815efad8 RDI: fffff520006a7f52
RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff815ea4ae R11: 0000000000000000 R12: ffffffff89ce23e0
R13: ffffffff8a2614e0 R14: ffffffff816628c0 R15: dffffc0000000000
FS: 0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe1f2908924 CR3: 0000000043720000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__debug_check_no_obj_freed lib/debugobjects.c:992 [inline]
debug_check_no_obj_freed+0x301/0x420 lib/debugobjects.c:1023
kfree+0xd6/0x310 mm/slab.c:3809
ops_free_list.part.0+0x119/0x370 net/core/net_namespace.c:176
ops_free_list net/core/net_namespace.c:174 [inline]
cleanup_net+0x591/0xb00 net/core/net_namespace.c:598
process_one_work+0x996/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e9/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298
</TASK>
Fixes: ace45bec6d77 ("rxrpc: Fix firewall route keepalive")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Marc Dionne <marc.dionne@auristor.com>
Cc: linux-afs@lists.infradead.org
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
While parsing user-provided actions, openvswitch module may dynamically
allocate memory and store pointers in the internal copy of the actions.
So this memory has to be freed while destroying the actions.
Currently there are only two such actions: ct() and set(). However,
there are many actions that can hold nested lists of actions and
ovs_nla_free_flow_actions() just jumps over them leaking the memory.
For example, removal of the flow with the following actions will lead
to a leak of the memory allocated by nf_ct_tmpl_alloc():
actions:clone(ct(commit),0)
Non-freed set() action may also leak the 'dst' structure for the
tunnel info including device references.
Under certain conditions with a high rate of flow rotation that may
cause significant memory leak problem (2MB per second in reporter's
case). The problem is also hard to mitigate, because the user doesn't
have direct control over the datapath flows generated by OVS.
Fix that by iterating over all the nested actions and freeing
everything that needs to be freed recursively.
New build time assertion should protect us from this problem if new
actions will be added in the future.
Unfortunately, openvswitch module doesn't use NLA_F_NESTED, so all
attributes has to be explicitly checked. sample() and clone() actions
are mixing extra attributes into the user-provided action list. That
prevents some code generalization too.
Fixes: 34ae932a4036 ("openvswitch: Make tunnel set action attach a metadata dst")
Link: https://mail.openvswitch.org/pipermail/ovs-dev/2022-March/392922.html
Reported-by: Stéphane Graber <stgraber@ubuntu.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
tlb_remove_huge_tlb_entry only considers PMD_SIZE and PUD_SIZE when
updating the mmu_gather structure.
Unfortunately on arm64 there are two additional huge page sizes that
need to be covered: CONT_PTE_SIZE and CONT_PMD_SIZE. Where an end-user
attempts to employ contiguous huge pages, a VM_BUG_ON can be experienced
due to the fact that the tlb structure hasn't been correctly updated by
the relevant tlb_flush_p.._range() call from tlb_remove_huge_tlb_entry.
This patch adds inequality logic to the generic implementation of
tlb_remove_huge_tlb_entry s.t. CONT_PTE_SIZE and CONT_PMD_SIZE are
effectively covered on arm64. Also, as well as ptes, pmds and puds;
p4ds are now considered too.
Reported-by: David Hildenbrand <david@redhat.com>
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/linux-mm/811c5c8e-b3a2-85d2-049c-717f17c3a03a@redhat.com/
Signed-off-by: Steve Capper <steve.capper@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220330112543.863-1-steve.capper@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The alternatives code must be `noinstr` such that it does not patch itself,
as the cache invalidation is only performed after all the alternatives have
been applied.
Mark patch_alternative() as `noinstr`. Mark branch_insn_requires_update()
and get_alt_insn() with `__always_inline` since they are both only called
through patch_alternative().
Booting a kernel in QEMU TCG with KCSAN=y and ARM64_USE_LSE_ATOMICS=y caused
a boot hang:
[ 0.241121] CPU: All CPU(s) started at EL2
The alternatives code was patching the atomics in __tsan_read4() from LL/SC
atomics to LSE atomics.
The following fragment is using LL/SC atomics in the .text section:
| <__tsan_unaligned_read4+304>: ldxr x6, [x2]
| <__tsan_unaligned_read4+308>: add x6, x6, x5
| <__tsan_unaligned_read4+312>: stxr w7, x6, [x2]
| <__tsan_unaligned_read4+316>: cbnz w7, <__tsan_unaligned_read4+304>
This LL/SC atomic sequence was to be replaced with LSE atomics. However since
the alternatives code was instrumentable, __tsan_read4() was being called after
only the first instruction was replaced, which led to the following code in memory:
| <__tsan_unaligned_read4+304>: ldadd x5, x6, [x2]
| <__tsan_unaligned_read4+308>: add x6, x6, x5
| <__tsan_unaligned_read4+312>: stxr w7, x6, [x2]
| <__tsan_unaligned_read4+316>: cbnz w7, <__tsan_unaligned_read4+304>
This caused an infinite loop as the `stxr` instruction never completed successfully,
so `w7` was always 0.
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220405104733.11476-1-joey.gouly@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
CONFIG_SATA_LPM_MOBILE_POLICY was renamed to CONFIG_SATA_LPM_POLICY in
commit 4dd4d3deb502 ("ata: ahci: Rename CONFIG_SATA_LPM_MOBILE_POLICY
configuration item").
This can potentially cause problems as users would invisibly lose
configuration policy defaults when they built the new kernel. To
avoid such problems, switch back to the old name (even if it's wrong).
Suggested-by: Christoph Hellwig <hch@infradead.org>
Suggested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
|
|
There is often not a MAC address available in an EEPROM accessible by
Linux with Marvell devices. Instead the bootload has the MAC address
and directly programs it into the hardware. So don't consider an error
from of_get_mac_address() has fatal. However, the check was added for
the case where there is a MAC address in an the EEPROM, but the EEPROM
has not probed yet, and -EPROBE_DEFER is returned. In that case the
error should be returned. So make the check specific to this error
code.
Cc: Mauri Sandberg <maukka@ext.kapsi.fi>
Reported-by: Thomas Walther <walther-it@gmx.de>
Fixes: 42404d8f1c01 ("net: mv643xx_eth: process retval from of_get_mac_address")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220405000404.3374734-1-andrew@lunn.ch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
'OVS_CLONE_ATTR_EXEC' is an internal attribute that is used for
performance optimization inside the kernel. It's added by the kernel
while parsing user-provided actions and should not be sent during the
flow dump as it's not part of the uAPI.
The issue doesn't cause any significant problems to the ovs-vswitchd
process, because reported actions are not really used in the
application lifecycle and only supposed to be shown to a human via
ovs-dpctl flow dump. However, the action list is still incorrect
and causes the following error if the user wants to look at the
datapath flows:
# ovs-dpctl add-dp system@ovs-system
# ovs-dpctl add-flow "<flow match>" "clone(ct(commit),0)"
# ovs-dpctl dump-flows
<flow match>, packets:0, bytes:0, used:never,
actions:clone(bad length 4, expected -1 for: action0(01 00 00 00),
ct(commit),0)
With the fix:
# ovs-dpctl dump-flows
<flow match>, packets:0, bytes:0, used:never,
actions:clone(ct(commit),0)
Additionally fixed an incorrect attribute name in the comment.
Fixes: b233504033db ("openvswitch: kernel datapath clone action")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Link: https://lore.kernel.org/r/20220404104150.2865736-1-i.maximets@ovn.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
KS8851 selects MICREL_PHY, which depends on PTP_1588_CLOCK_OPTIONAL, so
make KS8851 also depend on PTP_1588_CLOCK_OPTIONAL.
Fixes kconfig warning and build errors:
WARNING: unmet direct dependencies detected for MICREL_PHY
Depends on [m]: NETDEVICES [=y] && PHYLIB [=y] && PTP_1588_CLOCK_OPTIONAL [=m]
Selected by [y]:
- KS8851 [=y] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_MICREL [=y] && SPI [=y]
ld.lld: error: undefined symbol: ptp_clock_register referenced by micrel.c
net/phy/micrel.o:(lan8814_probe) in archive drivers/built-in.a
ld.lld: error: undefined symbol: ptp_clock_index referenced by micrel.c
net/phy/micrel.o:(lan8814_ts_info) in archive drivers/built-in.a
Reported-by: kernel test robot <lkp@intel.com>
Fixes: ece19502834d ("net: phy: micrel: 1588 support for LAN8814 phy")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20220405065936.4105272-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) Incorrect comparison in bitmask .reduce, from Jeremy Sowden.
2) Missing GFP_KERNEL_ACCOUNT for dynamically allocated objects,
from Vasily Averin.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: memcg accounting for dynamically allocated objects
netfilter: bitwise: fix reduce comparisons
====================
Link: https://lore.kernel.org/r/20220405100923.7231-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Replace the last instance of acpi_bus_get_device(), added recently
by commit 87e59b36e5e2 ("spi: Support selection of the index of the
ACPI Spi Resource before alloc"), with acpi_fetch_acpi_dev() and
finally drop acpi_bus_get_device() that has no more users.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Mark Brown <broonie@kernel.org>
|
|
Pull virtio fixes from Michael Tsirkin:
"Fixes and cleanups:
- A couple of mlx5 fixes related to cvq
- A couple of reverts dropping useless code (code that used it got
reverted earlier)"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vdpa: mlx5: synchronize driver status with CVQ
vdpa: mlx5: prevent cvq work from hogging CPU
Revert "virtio_config: introduce a new .enable_cbs method"
Revert "virtio: use virtio_device_ready() in virtio_device_restore()"
|
|
After resuming from suspend-to-RAM, the MSRs that control CPU's
speculative execution behavior are not being restored on the boot CPU.
These MSRs are used to mitigate speculative execution vulnerabilities.
Not restoring them correctly may leave the CPU vulnerable. Secondary
CPU's MSRs are correctly being restored at S3 resume by
identify_secondary_cpu().
During S3 resume, restore these MSRs for boot CPU when restoring its
processor state.
Fixes: 772439717dbf ("x86/bugs/intel: Set proper CPU features and setup RDS")
Reported-by: Neelima Krishnan <neelima.krishnan@intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The mechanism to save/restore MSRs during S3 suspend/resume checks for
the MSR validity during suspend, and only restores the MSR if its a
valid MSR. This is not optimal, as an invalid MSR will unnecessarily
throw an exception for every suspend cycle. The more invalid MSRs,
higher the impact will be.
Check and save the MSR validity at setup. This ensures that only valid
MSRs that are guaranteed to not throw an exception will be attempted
during suspend.
Fixes: 7a9c2dd08ead ("x86/pm: Introduce quirk framework to save/restore extra MSR registers around suspend/resume")
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Currently when XDP rings are created, each descriptor gets its DD bit
set, which turns out to be the wrong approach as it can lead to a
situation where more descriptors get cleaned than it was supposed to,
e.g. when AF_XDP busy poll is run with a large batch size. In this
situation, the driver would request for more buffers than it is able to
handle.
Fix this by not setting the DD bits in ice_xdp_alloc_setup_rings(). They
should be initialized to zero instead.
Fixes: 9610bd988df9 ("ice: optimize XDP_TX workloads")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Shwetha Nagaraju <shwetha.nagaraju@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
ICE_DOWN is dedicated for pf->state. Check for ICE_VSI_DOWN being set on
vsi->state in ice_xsk_wakeup().
Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Shwetha Nagaraju <shwetha.nagaraju@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Unfortunately, the ice driver doesn't respect the RCU critical section that
XSK wakeup is surrounded with. To fix this, add synchronize_rcu() calls to
paths that destroy resources that might be in use.
This was addressed in other AF_XDP ZC enabled drivers, for reference see
for example commit b3873a5be757 ("net/i40e: Fix concurrency issues
between config flow and XSK")
Fixes: efc2214b6047 ("ice: Add support for XDP")
Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Shwetha Nagaraju <shwetha.nagaraju@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- prevent deleting subvolume with active swapfile
- fix qgroup reserve limit calculation overflow
- remove device count in superblock and its item in one transaction so
they cant't get out of sync
- skip defragmenting an isolated sector, this could cause some extra IO
- unify handling of mtime/permissions in hole punch with fallocate
- zoned mode fixes:
- remove assert checking for only single mode, we have the
DUP mode implemented
- fix potential lockdep warning while traversing devices
when checking for zone activation
* tag 'for-5.18-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: prevent subvol with swapfile from being deleted
btrfs: do not warn for free space inode in cow_file_range
btrfs: avoid defragging extents whose next extents are not targets
btrfs: fix fallocate to use file_modified to update permissions consistently
btrfs: remove device item and update super block in the same transaction
btrfs: fix qgroup reserve overflow the qgroup limit
btrfs: zoned: remove left over ASSERT checking for single profile
btrfs: zoned: traverse devices under chunk_mutex in btrfs_can_activate_zone
|
|
At the moment the GIC IRQ domain translation routine happily converts
ACPI table GSI numbers below 16 to GIC SGIs (Software Generated
Interrupts aka IPIs). On the Devicetree side we explicitly forbid this
translation, actually the function will never return HWIRQs below 16 when
using a DT based domain translation.
We expect SGIs to be handled in the first part of the function, and any
further occurrence should be treated as a firmware bug, so add a check
and print to report this explicitly and avoid lengthy debug sessions.
Fixes: 64b499d8df40 ("irqchip/gic-v3: Configure SGIs as standard interrupts")
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220404110842.2882446-1-andre.przywara@arm.com
|
|
It turns out that our polling of RWP is totally wrong when checking
for it in the redistributors, as we test the *distributor* bit index,
whereas it is a different bit number in the RDs... Oopsie boo.
This is embarassing. Not only because it is wrong, but also because
it took *8 years* to notice the blunder...
Just fix the damn thing.
Fixes: 021f653791ad ("irqchip: gic-v3: Initial support for GICv3")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Link: https://lore.kernel.org/r/20220315165034.794482-2-maz@kernel.org
|
|
The way KVM drives GICv4.{0,1} is as follows:
- vcpu_load() makes the VPE resident, instructing the RD to start
scanning for interrupts
- just before entering the guest, we check that the RD has finished
scanning and that we can start running the vcpu
- on preemption, we deschedule the VPE by making it invalid on
the RD
However, we are preemptible between the first two steps. If it so
happens *and* that the RD was still scanning, we nonetheless write
to the GICR_VPENDBASER register while Dirty is set, and bad things
happen (we're in UNPRED land).
This affects both the 4.0 and 4.1 implementations.
Make sure Dirty is cleared before performing the deschedule,
meaning that its_clear_vpend_valid() becomes a sort of full VPE
residency barrier.
Reported-by: Jingyi Wang <wangjingyi11@huawei.com>
Tested-by: Nianyao Tang <tangnianyao@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Fixes: 57e3cebd022f ("KVM: arm64: Delay the polling of the GICR_VPENDBASER.Dirty bit")
Link: https://lore.kernel.org/r/4aae10ba-b39a-5f84-754b-69c2eb0a2c03@huawei.com
|
|
If devm_platform_ioremap_resource() fails, it never returns
NULL, replace NULL check with IS_ERR().
Fixes: a6199bb514d8 ("irqchip: Add Qualcomm MPM controller driver")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220316025100.1758413-1-yangyingliang@huawei.com
|
|
If MAILBOX is n, building fails:
drivers/irqchip/irq-qcom-mpm.o: In function `mpm_pd_power_off':
irq-qcom-mpm.c:(.text+0x174): undefined reference to `mbox_send_message'
irq-qcom-mpm.c:(.text+0x174): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `mbox_send_message'
Make QCOM_MPM depends on MAILBOX to fix this.
Fixes: a6199bb514d8 ("irqchip: Add Qualcomm MPM controller driver")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220317131956.30004-1-yuehaibing@huawei.com
|
|
In 6f98a4bfee72 ("random: block in /dev/urandom"), we tried to make a
successful try_to_generate_entropy() call *required* if the RNG was not
already initialized. Unfortunately, weird architectures and old
userspaces combined in TCG test harnesses, making that change still not
realistic, so it was reverted in 0313bc278dac ("Revert "random: block in
/dev/urandom"").
However, rather than making a successful try_to_generate_entropy() call
*required*, we can instead make it *best-effort*.
If try_to_generate_entropy() fails, it fails, and nothing changes from
the current behavior. If it succeeds, then /dev/urandom becomes safe to
use for free. This way, we don't risk the regression potential that led
to us reverting the required-try_to_generate_entropy() call before.
Practically speaking, this means that at least on x86, /dev/urandom
becomes safe. Probably other architectures with working cycle counters
will also become safe. And architectures with slow or broken cycle
counters at least won't be affected at all by this change.
So it may not be the glorious "all things are unified!" change we were
hoping for initially, but practically speaking, it makes a positive
impact.
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Now that all in-kernel users of default_attrs for the kobj_type are gone
and converted to properly use the default_groups pointer instead, it can
be safely removed.
There is one standard way to create sysfs files in a kobj_type, and not
two like before, causing confusion as to which should be used.
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Link: https://lore.kernel.org/r/20220106133151.607703-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
There are currently 2 ways to create a set of sysfs files for a
kobj_type, through the default_attrs field, and the default_groups
field. Move the pseries vas sysfs code to use default_groups field
which has been the preferred way since aa30f47cf666 ("kobject: Add
support for default attribute groups to kobj_type") so that we can soon
get rid of the obsolete default_attrs field.
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Haren Myneni <haren@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/r/20220329142552.558339-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
VRF devices are the loopbacks for VRFs, and a loopback can not be
assigned to a VRF. Accordingly, the condition in ip6_pkt_drop should
be '||' not '&&'.
Fixes: 1d3fd8a10bed ("vrf: Use orig netdev to count Ip6InNoRoutes and a fresh route lookup when sending dest unreach")
Reported-by: Pudak, Filip <Filip.Pudak@windriver.com>
Reported-by: Xiao, Jiguang <Jiguang.Xiao@windriver.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220404150908.2937-1-dsahern@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Tony Nguyen says:
====================
ice bug fixes
Alice Michael says:
There were a couple of bugs that have been found and
fixed by Anatolii in the ice driver. First he fixed
a bug on ring creation by setting the default value
for the teid. Anatolli also fixed a bug with deleting
queues in ice_vc_dis_qs_msg based on their enablement.
---
v2: Remove empty lines between tags
The following are changes since commit 458f5d92df4807e2a7c803ed928369129996bf96:
sfc: Do not free an empty page_ring
and are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue 100GbE
====================
Link: https://lore.kernel.org/r/20220404183548.3422851-1-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Disable check for queue being enabled in ice_vc_dis_qs_msg, because
there could be a case when queues were created, but were not enabled.
We still need to delete those queues.
Normal workflow for VF looks like:
Enable path:
VIRTCHNL_OP_ADD_ETH_ADDR (opcode 10)
VIRTCHNL_OP_CONFIG_VSI_QUEUES (opcode 6)
VIRTCHNL_OP_ENABLE_QUEUES (opcode 8)
Disable path:
VIRTCHNL_OP_DISABLE_QUEUES (opcode 9)
VIRTCHNL_OP_DEL_ETH_ADDR (opcode 11)
The issue appears only in stress conditions when VF is enabled and
disabled very fast.
Eventually there will be a case, when queues are created by
VIRTCHNL_OP_CONFIG_VSI_QUEUES, but are not enabled by
VIRTCHNL_OP_ENABLE_QUEUES.
In turn, these queues are not deleted by VIRTCHNL_OP_DISABLE_QUEUES,
because there is a check whether queues are enabled in
ice_vc_dis_qs_msg.
When we bring up the VF again, we will see the "Failed to set LAN Tx queue
context" error during VIRTCHNL_OP_CONFIG_VSI_QUEUES step. This
happens because old 16 queues were not deleted and VF requests to create
16 more, but ice_sched_get_free_qparent in ice_ena_vsi_txq would fail to
find a parent node for first newly requested queue (because all nodes
are allocated to 16 old queues).
Testing Hints:
Just enable and disable VF fast enough, so it would be disabled before
reaching VIRTCHNL_OP_ENABLE_QUEUES.
while true; do
ip link set dev ens785f0v0 up
sleep 0.065 # adjust delay value for you machine
ip link set dev ens785f0v0 down
done
Fixes: 77ca27c41705 ("ice: add support for virtchnl_queue_select.[tx|rx]_queues bitmap")
Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Alice Michael <alice.michael@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When VF is freshly created, but not brought up, ring->txq_teid
value is by default set to 0.
But 0 is a valid TEID. On some platforms the Root Node of
Tx scheduler has a TEID = 0. This can cause issues as shown below.
The proper way is to set ring->txq_teid to ICE_INVAL_TEID (0xFFFFFFFF).
Testing Hints:
echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
ip link set dev ens785f0v0 up
ip link set dev ens785f0v0 down
If we have freshly created VF and quickly turn it on and off, so there
would be no time to reach VIRTCHNL_OP_CONFIG_VSI_QUEUES stage, then
VIRTCHNL_OP_DISABLE_QUEUES stage will fail with error:
[ 639.531454] disable queue 89 failed 14
[ 639.532233] Failed to disable LAN Tx queues, error: ICE_ERR_AQ_ERROR
[ 639.533107] ice 0000:02:00.0: Failed to stop Tx ring 0 on VSI 5
The reason for the fail is that we are trying to send AQ command to
delete queue 89, which has never been created and receive an "invalid
argument" error from firmware.
As this queue has never been created, it's teid and ring->txq_teid
have default value 0.
ice_dis_vsi_txq has a check against non-existent queues:
node = ice_sched_find_node_by_teid(pi->root, q_teids[i]);
if (!node)
continue;
But on some platforms the Root Node of Tx scheduler has a teid = 0.
Hence, ice_sched_find_node_by_teid finds a node with teid = 0 (it is
pi->root), and we go further to submit an erroneous request to firmware.
Fixes: 37bb83901286 ("ice: Move common functions out of ice_main.c part 7/7")
Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Alice Michael <alice.michael@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This node pointer is returned by of_find_compatible_node() with
refcount incremented. Calling of_node_put() to aovid the refcount leak.
Fixes: d346c9e86d86 ("dpaa2-ptp: reuse ptp_qoriq driver")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Link: https://lore.kernel.org/r/20220404125336.13427-1-linmq006@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
nft_*.c files whose NFT_EXPR_STATEFUL flag is set on need to
use __GFP_ACCOUNT flag for objects that are dynamically
allocated from the packet path.
Such objects are allocated inside nft_expr_ops->init() callbacks
executed in task context while processing netlink messages.
In addition, this patch adds accounting to nft_set_elem_expr_clone()
used for the same purposes.
Signed-off-by: Vasily Averin <vvs@openvz.org>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
There were a few patches left in drm-misc-next-fixes, let's bring them
into drm-misc-fixes.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
|
|
Let's start the 5.18 fixes cycle.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
|
|
Since not all compilers have a function attribute to disable KCOV
instrumentation, objtool can rewrite KCOV instrumentation in noinstr
functions as per commit:
f56dae88a81f ("objtool: Handle __sanitize_cov*() tail calls")
However, this has subtle interaction with the SLS validation from
commit:
1cc1e4c8aab4 ("objtool: Add straight-line-speculation validation")
In that when a tail-call instrucion is replaced with a RET an
additional INT3 instruction is also written, but is not represented in
the decoded instruction stream.
This then leads to false positive missing INT3 objtool warnings in
noinstr code.
Instead of adding additional struct instruction objects, mark the RET
instruction with retpoline_safe to suppress the warning (since we know
there really is an INT3).
Fixes: 1cc1e4c8aab4 ("objtool: Add straight-line-speculation validation")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220323230712.GA8939@worktop.programming.kicks-ass.net
|
|
Objtool reports:
arch/x86/crypto/poly1305-x86_64.o: warning: objtool: poly1305_blocks_avx() falls through to next function poly1305_blocks_x86_64()
arch/x86/crypto/poly1305-x86_64.o: warning: objtool: poly1305_emit_avx() falls through to next function poly1305_emit_x86_64()
arch/x86/crypto/poly1305-x86_64.o: warning: objtool: poly1305_blocks_avx2() falls through to next function poly1305_blocks_x86_64()
Which reads like:
0000000000000040 <poly1305_blocks_x86_64>:
40: f3 0f 1e fa endbr64
...
0000000000000400 <poly1305_blocks_avx>:
400: f3 0f 1e fa endbr64
404: 44 8b 47 14 mov 0x14(%rdi),%r8d
408: 48 81 fa 80 00 00 00 cmp $0x80,%rdx
40f: 73 09 jae 41a <poly1305_blocks_avx+0x1a>
411: 45 85 c0 test %r8d,%r8d
414: 0f 84 2a fc ff ff je 44 <poly1305_blocks_x86_64+0x4>
...
These are simple conditional tail-calls and *should* be recognised as
such by objtool, however due to a mistake in commit 08f87a93c8ec
("objtool: Validate IBT assumptions") this is failing.
Specifically, the jump_dest is +4, this means the instruction pointed
at will not be ENDBR and as such it will fail the second clause of
is_first_func_insn() that was supposed to capture this exact case.
Instead, have is_first_func_insn() look at the previous instruction.
Fixes: 08f87a93c8ec ("objtool: Validate IBT assumptions")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220322115125.811582125@infradead.org
|
|
The macro __WARN_FLAGS() uses a local variable named "f". This being a
common name, there is a risk of shadowing other variables.
For example, GCC would yield:
| In file included from ./include/linux/bug.h:5,
| from ./include/linux/cpumask.h:14,
| from ./arch/x86/include/asm/cpumask.h:5,
| from ./arch/x86/include/asm/msr.h:11,
| from ./arch/x86/include/asm/processor.h:22,
| from ./arch/x86/include/asm/timex.h:5,
| from ./include/linux/timex.h:65,
| from ./include/linux/time32.h:13,
| from ./include/linux/time.h:60,
| from ./include/linux/stat.h:19,
| from ./include/linux/module.h:13,
| from virt/lib/irqbypass.mod.c:1:
| ./include/linux/rcupdate.h: In function 'rcu_head_after_call_rcu':
| ./arch/x86/include/asm/bug.h:80:21: warning: declaration of 'f' shadows a parameter [-Wshadow]
| 80 | __auto_type f = BUGFLAG_WARNING|(flags); \
| | ^
| ./include/asm-generic/bug.h:106:17: note: in expansion of macro '__WARN_FLAGS'
| 106 | __WARN_FLAGS(BUGFLAG_ONCE | \
| | ^~~~~~~~~~~~
| ./include/linux/rcupdate.h:1007:9: note: in expansion of macro 'WARN_ON_ONCE'
| 1007 | WARN_ON_ONCE(func != (rcu_callback_t)~0L);
| | ^~~~~~~~~~~~
| In file included from ./include/linux/rbtree.h:24,
| from ./include/linux/mm_types.h:11,
| from ./include/linux/buildid.h:5,
| from ./include/linux/module.h:14,
| from virt/lib/irqbypass.mod.c:1:
| ./include/linux/rcupdate.h:1001:62: note: shadowed declaration is here
| 1001 | rcu_head_after_call_rcu(struct rcu_head *rhp, rcu_callback_t f)
| | ~~~~~~~~~~~~~~~^
For reference, sparse also warns about it, c.f. [1].
This patch renames the variable from f to __flags (with two underscore
prefixes as suggested in the Linux kernel coding style [2]) in order
to prevent collisions.
[1] https://lore.kernel.org/all/CAFGhKbyifH1a+nAMCvWM88TK6fpNPdzFtUXPmRGnnQeePV+1sw@mail.gmail.com/
[2] Linux kernel coding style, section 12) Macros, Enums and RTL,
paragraph 5) namespace collisions when defining local variables in
macros resembling functions
https://www.kernel.org/doc/html/latest/process/coding-style.html#macros-enums-and-rtl
Fixes: bfb1a7c91fb7 ("x86/bug: Merge annotate_reachable() into_BUG_FLAGS() asm")
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20220324023742.106546-1-mailhol.vincent@wanadoo.fr
|
|
When enable a cgroup event, cpuctx->cgrp setting is conditional
on the current task cgrp matching the event's cgroup, so have to
do it for every new event. It brings complexity but no advantage.
To keep it simple, this patch would always set cpuctx->cgrp
when enable the first cgroup event, and reset to NULL when disable
the last cgroup event.
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220329154523.86438-5-zhouchengming@bytedance.com
|
|
There is a race problem that can trigger WARN_ON_ONCE(cpuctx->cgrp)
in perf_cgroup_switch().
CPU1 CPU2
perf_cgroup_sched_out(prev, next)
cgrp1 = perf_cgroup_from_task(prev)
cgrp2 = perf_cgroup_from_task(next)
if (cgrp1 != cgrp2)
perf_cgroup_switch(prev, PERF_CGROUP_SWOUT)
cgroup_migrate_execute()
task->cgroups = ?
perf_cgroup_attach()
task_function_call(task, __perf_cgroup_move)
perf_cgroup_sched_in(prev, next)
cgrp1 = perf_cgroup_from_task(prev)
cgrp2 = perf_cgroup_from_task(next)
if (cgrp1 != cgrp2)
perf_cgroup_switch(next, PERF_CGROUP_SWIN)
__perf_cgroup_move()
perf_cgroup_switch(task, PERF_CGROUP_SWOUT | PERF_CGROUP_SWIN)
The commit a8d757ef076f ("perf events: Fix slow and broken cgroup
context switch code") want to skip perf_cgroup_switch() when the
perf_cgroup of "prev" and "next" are the same.
But task->cgroups can change in concurrent with context_switch()
in cgroup_migrate_execute(). If cgrp1 == cgrp2 in sched_out(),
cpuctx won't do sched_out. Then task->cgroups changed cause
cgrp1 != cgrp2 in sched_in(), cpuctx will do sched_in. So trigger
WARN_ON_ONCE(cpuctx->cgrp).
Even though __perf_cgroup_move() will be synchronized as the context
switch disables the interrupt, context_switch() still can see the
task->cgroups is changing in the middle, since task->cgroups changed
before sending IPI.
So we have to combine perf_cgroup_sched_in() into perf_cgroup_sched_out(),
unified into perf_cgroup_switch(), to fix the incosistency between
perf_cgroup_sched_out() and perf_cgroup_sched_in().
But we can't just compare prev->cgroups with next->cgroups to decide
whether to skip cpuctx sched_out/in since the prev->cgroups is changing
too. For example:
CPU1 CPU2
cgroup_migrate_execute()
prev->cgroups = ?
perf_cgroup_attach()
task_function_call(task, __perf_cgroup_move)
perf_cgroup_switch(task)
cgrp1 = perf_cgroup_from_task(prev)
cgrp2 = perf_cgroup_from_task(next)
if (cgrp1 != cgrp2)
cpuctx sched_out/in ...
task_function_call() will return -ESRCH
In the above example, prev->cgroups changing cause (cgrp1 == cgrp2)
to be true, so skip cpuctx sched_out/in. And later task_function_call()
would return -ESRCH since the prev task isn't running on cpu anymore.
So we would leave perf_events of the old prev->cgroups still sched on
the CPU, which is wrong.
The solution is that we should use cpuctx->cgrp to compare with
the next task's perf_cgroup. Since cpuctx->cgrp can only be changed
on local CPU, and we have irq disabled, we can read cpuctx->cgrp to
compare without holding ctx lock.
Fixes: a8d757ef076f ("perf events: Fix slow and broken cgroup context switch code")
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220329154523.86438-4-zhouchengming@bytedance.com
|
|
Since we use perf_cgroup_set_timestamp() to start cgroup time and
set active to 1, then use update_cgrp_time_from_cpuctx() to stop
cgroup time and set active to 0.
We can use info->active directly to check if cgroup is active.
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220329154523.86438-3-zhouchengming@bytedance.com
|
|
The current code pass task around for ctx_sched_in(), only
to get perf_cgroup of the task, then update the timestamp
of it and its ancestors and set them to active.
But we can use cpuctx->cgrp to get active perf_cgroup and
its ancestors since cpuctx->cgrp has been set before
ctx_sched_in().
This patch remove the task argument in ctx_sched_in()
and cleanup related code.
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220329154523.86438-2-zhouchengming@bytedance.com
|
|
On Sapphire Rapids, the FRONTEND_RETIRED.MS_FLOWS event requires the
FRONTEND MSR value 0x8. However, the current FRONTEND MSR mask doesn't
support it.
Update intel_spr_extra_regs[] to support it.
Fixes: 61b985e3e775 ("perf/x86/intel: Add perf core PMU support for Sapphire Rapids")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/1648482543-14923-2-git-send-email-kan.liang@linux.intel.com
|
|
The INST_RETIRED.PREC_DIST event (0x0100) doesn't count on SPR.
perf stat -e cpu/event=0xc0,umask=0x0/,cpu/event=0x0,umask=0x1/ -C0
Performance counter stats for 'CPU(s) 0':
607,246 cpu/event=0xc0,umask=0x0/
0 cpu/event=0x0,umask=0x1/
The encoding for INST_RETIRED.PREC_DIST is pseudo-encoding, which
doesn't work on the generic counters. However, current perf extends its
mask to the generic counters.
The pseudo event-code for a fixed counter must be 0x00. Check and avoid
extending the mask for the fixed counter event which using the
pseudo-encoding, e.g., ref-cycles and PREC_DIST event.
With the patch,
perf stat -e cpu/event=0xc0,umask=0x0/,cpu/event=0x0,umask=0x1/ -C0
Performance counter stats for 'CPU(s) 0':
583,184 cpu/event=0xc0,umask=0x0/
583,048 cpu/event=0x0,umask=0x1/
Fixes: 2de71ee153ef ("perf/x86/intel: Fix ICL/SPR INST_RETIRED.PREC_DIST encodings")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1648482543-14923-1-git-send-email-kan.liang@linux.intel.com
|
|
It was reported that some perf event setup can make fork failed on
ARM64. It was the case of a group of mixed hw and sw events and it
failed in perf_event_init_task() due to armpmu_event_init().
The ARM PMU code checks if all the events in a group belong to the
same PMU except for software events. But it didn't set the event_caps
of inherited events and no longer identify them as software events.
Therefore the test failed in a child process.
A simple reproducer is:
$ perf stat -e '{cycles,cs,instructions}' perf bench sched messaging
# Running 'sched/messaging' benchmark:
perf: fork(): Invalid argument
The perf stat was fine but the perf bench failed in fork(). Let's
inherit the event caps from the parent.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20220328200112.457740-1-namhyung@kernel.org
|
|
The uncore PMU of the Raptor Lake is the same as Alder Lake.
Add new PCIIDs of IMC for Raptor Lake.
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/1647366360-82824-4-git-send-email-kan.liang@linux.intel.com
|
|
Raptor Lake is Intel's successor to Alder lake. PPERF and SMI_COUNT MSRs
are also supported.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/1647366360-82824-3-git-send-email-kan.liang@linux.intel.com
|