summaryrefslogtreecommitdiff
path: root/drivers/accel/habanalabs
AgeCommit message (Collapse)Author
2025-05-08accel/habanalabs: Don't build the driver on UMLIngo Molnar
The following commit: 288a4ff0ad29 ("x86/msr: Move rdtsc{,_ordered}() to <asm/tsc.h>") removed the <asm/msr.h> include from the accel/habanalabs driver, which broke the build on UML: drivers/accel/habanalabs/common/habanalabs_ioctl.c:326:23: error: call to undeclared function 'rdtsc'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] Make the driver depend on 'X86 && X86_64', instead of just 'X86_64', thus it won't be built on UML. Suggested-by: Johannes Berg <johannes.berg@intel.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Cc: Ofir Bitton <obitton@habana.ai> Cc: Oded Gabbay <ogabbay@kernel.org> Link: https://lore.kernel.org/r/202505080003.0t7ewxGp-lkp@intel.com
2025-05-02x86/msr: Add explicit includes of <asm/msr.h>Xin Li (Intel)
For historic reasons there are some TSC-related functions in the <asm/msr.h> header, even though there's an <asm/tsc.h> header. To facilitate the relocation of rdtsc{,_ordered}() from <asm/msr.h> to <asm/tsc.h> and to eventually eliminate the inclusion of <asm/msr.h> in <asm/tsc.h>, add an explicit <asm/msr.h> dependency to the source files that reference definitions from <asm/msr.h>. [ mingo: Clarified the changelog. ] Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/r/20250501054241.1245648-1-xin@zytor.com
2025-04-01Merge tag 'driver-core-6.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updatesk from Greg KH: "Here is the big set of driver core updates for 6.15-rc1. Lots of stuff happened this development cycle, including: - kernfs scaling changes to make it even faster thanks to rcu - bin_attribute constify work in many subsystems - faux bus minor tweaks for the rust bindings - rust binding updates for driver core, pci, and platform busses, making more functionaliy available to rust drivers. These are all due to people actually trying to use the bindings that were in 6.14. - make Rafael and Danilo full co-maintainers of the driver core codebase - other minor fixes and updates" * tag 'driver-core-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (52 commits) rust: platform: require Send for Driver trait implementers rust: pci: require Send for Driver trait implementers rust: platform: impl Send + Sync for platform::Device rust: pci: impl Send + Sync for pci::Device rust: platform: fix unrestricted &mut platform::Device rust: pci: fix unrestricted &mut pci::Device rust: device: implement device context marker rust: pci: use to_result() in enable_device_mem() MAINTAINERS: driver core: mark Rafael and Danilo as co-maintainers rust/kernel/faux: mark Registration methods inline driver core: faux: only create the device if probe() succeeds rust/faux: Add missing parent argument to Registration::new() rust/faux: Drop #[repr(transparent)] from faux::Registration rust: io: fix devres test with new io accessor functions rust: io: rename `io::Io` accessors kernfs: Move dput() outside of the RCU section. efi: rci2: mark bin_attribute as __ro_after_init rapidio: constify 'struct bin_attribute' firmware: qemu_fw_cfg: constify 'struct bin_attribute' powerpc/perf/hv-24x7: Constify 'struct bin_attribute' ...
2025-03-16accel/habanalabs: convert timeouts to secs_to_jiffies()Easwar Hariharan
Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced secs_to_jiffies(). As the value here is a multiple of 1000, use secs_to_jiffies() instead of msecs_to_jiffies() to avoid the multiplication This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with the following Coccinelle rules: @depends on patch@ expression E; @@ -msecs_to_jiffies +secs_to_jiffies (E - * \( 1000 \| MSEC_PER_SEC \) ) Link: https://lkml.kernel.org/r/20250225-converge-secs-to-jiffies-part-two-v3-3-a43967e36c88@linux.microsoft.com Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com> Cc: Carlos Maiolino <cem@kernel.org> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Chris Mason <clm@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Damien Le Maol <dlemoal@kernel.org> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: David Sterba <dsterba@suse.com> Cc: Dick Kennedy <dick.kennedy@broadcom.com> Cc: Dongsheng Yang <dongsheng.yang@easystack.cn> Cc: Fabio Estevam <festevam@gmail.com> Cc: Frank Li <frank.li@nxp.com> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: Ilpo Jarvinen <ilpo.jarvinen@linux.intel.com> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: James Bottomley <james.bottomley@HansenPartnership.com> Cc: James Smart <james.smart@broadcom.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jens Axboe <axboe@kernel.dk> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Julia Lawall <julia.lawall@inria.fr> Cc: Kalesh Anakkur Purayil <kalesh-anakkur.purayil@broadcom.com> Cc: Keith Busch <kbusch@kernel.org> Cc: Leon Romanovsky <leon@kernel.org> Cc: Marc Kleine-Budde <mkl@pengutronix.de> Cc: Mark Brown <broonie@kernel.org> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Nicolas Palix <nicolas.palix@imag.fr> Cc: Niklas Cassel <cassel@kernel.org> Cc: Oded Gabbay <ogabbay@kernel.org> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Sascha Hauer <s.hauer@pengutronix.de> Cc: Sebastian Reichel <sre@kernel.org> Cc: Selvin Thyparampil Xavier <selvin.xavier@broadcom.com> Cc: Shawn Guo <shawnguo@kernel.org> Cc: Shyam-sundar S-k <Shyam-sundar.S-k@amd.com> Cc: Takashi Iwai <tiwai@suse.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Xiubo Li <xiubli@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-21accel/habanalabs: constify 'struct bin_attribute'Thomas Weißschuh
The sysfs core now allows instances of 'struct bin_attribute' to be moved into read-only memory. Make use of that to protect them against accidental or malicious modifications. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://lore.kernel.org/r/20241216-sysfs-const-bin_attr-habanalabs-v1-1-b35463197efb@weissschuh.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-26Merge tag 'mm-nonmm-stable-2025-01-24-23-16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "Mainly individually changelogged singleton patches. The patch series in this pull are: - "lib min_heap: Improve min_heap safety, testing, and documentation" from Kuan-Wei Chiu provides various tightenings to the min_heap library code - "xarray: extract __xa_cmpxchg_raw" from Tamir Duberstein preforms some cleanup and Rust preparation in the xarray library code - "Update reference to include/asm-<arch>" from Geert Uytterhoeven fixes pathnames in some code comments - "Converge on using secs_to_jiffies()" from Easwar Hariharan uses the new secs_to_jiffies() in various places where that is appropriate - "ocfs2, dlmfs: convert to the new mount API" from Eric Sandeen switches two filesystems to the new mount API - "Convert ocfs2 to use folios" from Matthew Wilcox does that - "Remove get_task_comm() and print task comm directly" from Yafang Shao removes now-unneeded calls to get_task_comm() in various places - "squashfs: reduce memory usage and update docs" from Phillip Lougher implements some memory savings in squashfs and performs some maintainability work - "lib: clarify comparison function requirements" from Kuan-Wei Chiu tightens the sort code's behaviour and adds some maintenance work - "nilfs2: protect busy buffer heads from being force-cleared" from Ryusuke Konishi fixes an issues in nlifs when the fs is presented with a corrupted image - "nilfs2: fix kernel-doc comments for function return values" from Ryusuke Konishi fixes some nilfs kerneldoc - "nilfs2: fix issues with rename operations" from Ryusuke Konishi addresses some nilfs BUG_ONs which syzbot was able to trigger - "minmax.h: Cleanups and minor optimisations" from David Laight does some maintenance work on the min/max library code - "Fixes and cleanups to xarray" from Kemeng Shi does maintenance work on the xarray library code" * tag 'mm-nonmm-stable-2025-01-24-23-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (131 commits) ocfs2: use str_yes_no() and str_no_yes() helper functions include/linux/lz4.h: add some missing macros Xarray: use xa_mark_t in xas_squash_marks() to keep code consistent Xarray: remove repeat check in xas_squash_marks() Xarray: distinguish large entries correctly in xas_split_alloc() Xarray: move forward index correctly in xas_pause() Xarray: do not return sibling entries from xas_find_marked() ipc/util.c: complete the kernel-doc function descriptions gcov: clang: use correct function param names latencytop: use correct kernel-doc format for func params minmax.h: remove some #defines that are only expanded once minmax.h: simplify the variants of clamp() minmax.h: move all the clamp() definitions after the min/max() ones minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp() minmax.h: reduce the #define expansion of min(), max() and clamp() minmax.h: update some comments minmax.h: add whitespace around operators and after commas nilfs2: do not update mtime of renamed directory that is not moved nilfs2: handle errors that nilfs_prepare_chunk() may return CREDITS: fix spelling mistake ...
2025-01-12drivers: remove get_task_comm() and print task comm directlyYafang Shao
Since task->comm is guaranteed to be NUL-terminated, we can print it directly without the need to copy it into a separate buffer. This simplifies the code and avoids unnecessary operations. Link: https://lkml.kernel.org/r/20241219023452.69907-6-laoar.shao@gmail.com Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Reviewed-by: Jiri Slaby <jirislaby@kernel.org> (For tty) Reviewed-by: Lyude Paul <lyude@redhat.com> (For nouveau) Cc: Oded Gabbay <ogabbay@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Cc: Karol Herbst <kherbst@redhat.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Danilo Krummrich <dakr@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Slaby <jirislaby@kernel.org> Cc: "André Almeida" <andrealmeid@igalia.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Darren Hart <dvhart@infradead.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Kalle Valo <kvalo@kernel.org> Cc: Kees Cook <kees@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineet Gupta <vgupta@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-12accel/habanalabs: convert timeouts to secs_to_jiffies()Easwar Hariharan
Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced secs_to_jiffies(). As the value here is a multiple of 1000, use secs_to_jiffies() instead of msecs_to_jiffies to avoid the multiplication. This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with the following Coccinelle rules: @@ constant C; @@ - msecs_to_jiffies(C * 1000) + secs_to_jiffies(C) @@ constant C; @@ - msecs_to_jiffies(C * MSEC_PER_SEC) + secs_to_jiffies(C) Link: https://lkml.kernel.org/r/20241210-converge-secs-to-jiffies-v3-7-ddfefd7e9f2a@linux.microsoft.com Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andrew Lunn <andrew+netdev@lunn.ch> Cc: Anna-Maria Behnsen <anna-maria@linutronix.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Daniel Mack <daniel@zonque.org> Cc: David Airlie <airlied@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dick Kennedy <dick.kennedy@broadcom.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Florian Fainelli <florian.fainelli@broadcom.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haojian Zhuang <haojian.zhuang@gmail.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jack Wang <jinpu.wang@cloud.ionos.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: James Smart <james.smart@broadcom.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Jeff Johnson <jjohnson@kernel.org> Cc: Jeff Johnson <quic_jjohnson@quicinc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jeroen de Borst <jeroendb@google.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Johan Hedberg <johan.hedberg@gmail.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Jozsef Kadlecsik <kadlec@netfilter.org> Cc: Julia Lawall <julia.lawall@inria.fr> Cc: Kalle Valo <kvalo@kernel.org> Cc: Louis Peens <louis.peens@corigine.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Naveen N Rao <naveen@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Nicolas Palix <nicolas.palix@imag.fr> Cc: Oded Gabbay <ogabbay@kernel.org> Cc: Ofir Bitton <obitton@habana.ai> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Praveen Kaligineedi <pkaligineedi@google.com> Cc: Ray Jui <rjui@broadcom.com> Cc: Robert Jarzmik <robert.jarzmik@free.fr> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Roger Pau Monné <roger.pau@citrix.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Scott Branden <sbranden@broadcom.com> Cc: Shailend Chand <shailend@google.com> Cc: Simona Vetter <simona@ffwll.ch> Cc: Simon Horman <horms@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Takashi Iwai <tiwai@suse.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Xiubo Li <xiubli@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-13Merge tag 'drm-misc-next-2024-12-05' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next [airlied: handle module ns conflict] drm-misc-next for 6.14: UAPI Changes: Cross-subsystem Changes: Core Changes: - Remove driver date from drm_driver Driver Changes: - amdxdna: New driver! - ivpu: Fix qemu crash when using passthrough - nouveau: expose GSP-RM logging buffers via debugfs - panfrost: Add MT8188 Mali-G57 MC3 support - panthor: misc improvements, - rockchip: Gamma LUT support - tidss: Misc improvements - virtio: convert to helpers, add prime support for scanout buffers - v3d: Add DRM_IOCTL_V3D_PERFMON_SET_GLOBAL - vc4: Add support for BCM2712 - vkms: Improvements all across the board - panels: - Introduce backlight quirks infrastructure - New panels: KDB KD116N2130B12 Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241205-agile-straight-pegasus-aca7f4@houat
2024-12-05drm: remove driver date from struct drm_driver and all driversJani Nikula
We stopped using the driver initialized date in commit 7fb8af6798e8 ("drm: deprecate driver date") and (eventually) started returning "0" for drm_version ioctl instead. Finish the job, and remove the unused date member from struct drm_driver, its initialization from drivers, along with the common DRIVER_DATE macros. v2: Also update drivers/accel (kernel test robot) Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Simon Ser <contact@emersion.fr> Acked-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Acked-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> # msm Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/1f2bf2543aed270a06f6c707fd6ed1b78bf16712.1733322525.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2024-12-02module: Convert symbol namespace to string literalPeter Zijlstra
Clean up the existing export namespace code along the same lines of commit 33def8498fdd ("treewide: Convert macro and uses of __section(foo) to __section("foo")") and for the same reason, it is not desired for the namespace argument to be a macro expansion itself. Scripted using git grep -l -e MODULE_IMPORT_NS -e EXPORT_SYMBOL_NS | while read file; do awk -i inplace ' /^#define EXPORT_SYMBOL_NS/ { gsub(/__stringify\(ns\)/, "ns"); print; next; } /^#define MODULE_IMPORT_NS/ { gsub(/__stringify\(ns\)/, "ns"); print; next; } /MODULE_IMPORT_NS/ { $0 = gensub(/MODULE_IMPORT_NS\(([^)]*)\)/, "MODULE_IMPORT_NS(\"\\1\")", "g"); } /EXPORT_SYMBOL_NS/ { if ($0 ~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+),/) { if ($0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/ && $0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(\)/ && $0 !~ /^my/) { getline line; gsub(/[[:space:]]*\\$/, ""); gsub(/[[:space:]]/, "", line); $0 = $0 " " line; } $0 = gensub(/(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/, "\\1(\\2, \"\\3\")", "g"); } } { print }' $file; done Requested-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://mail.google.com/mail/u/2/#inbox/FMfcgzQXKWgMmjdFwwdsfgxzKpVHWPlc Acked-by: Greg KH <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-06-23accel/habanalabs: gradual sleep in polling memory macroDidi Freiman
It’s better to avoid long sleeps right from the beginning of the polling since the data may be available much sooner than the sleep period. Because polling host memory is inexpensive, this change gradually increases the sleep time up to the user-requested period. Signed-off-by: Didi Freiman <dfreiman@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs: move heartbeat work initialization to early initTomer Tayar
The device heartbeat work is currently initialized at device_heartbeat_schedule() which is called at the end of hl_device_init(). However hl_device_init() can fail at a previous step, and in such a case, a subsequent call to hl_device_fini() will lead to calling cleanup_resources() and accessing this work uninitialized. As there is no real need to re-initialize this work every time it is rescheduled, move this initialization to device_early_init() to be done once and early enough. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs: print timestamp of last PQ heartbeat on EQ heartbeat failureTomer Tayar
The test packet which is sent to FW for the PQ heartbeat is used also as the trigger in FW to send the EQ heartbeat event. Add the time of the last sent packet to the debug info which is printed upon a EQ heartbeat failure. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs: dump the EQ entries headers on EQ heartbeat failureTomer Tayar
Add a dump of the EQ entries headers upon a EQ heartbeat failure. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs: revise print on EQ heartbeat failureTomer Tayar
Don't print the "previous EQ index" value in case of a EQ heartbeat failure, because it is incremented along with the EQ CI and therefore redundant. In addition, as the CPU-CP PI is zeroed when it reaches a value that is twice the queue size, add a value of the CI with a similar wrap around, to make it easier to compare the values. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs: add more info upon cpu pkt timeoutFarah Kassabri
In order to have better debuggability upon encountering FW issues, We are adding additional info once CPU packet timeout expires. Signed-off-by: Farah Kassabri <fkassabri@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs: additional print in device-in-use infoIlia Levi
When device release triggers a hard reset, there is a printout of the cause. Currently listed causes (that increment context refcount) are active command submissions and exported DMA buffer objects. In any other case, the printout emits "unknown reason". We identify and print another reason - allocated command buffers. Signed-off-by: Ilia Levi <illevi@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalbs/gaudi2: reduce interrupt count to 128Ofir Bitton
Some systems allow a maximum number of 128 MSI-X interrupts. Hence we reduce the interrupt count to 128 instead of 512. Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs: disable EQ interrupt after disabling pciTal Cohen
When sending disable pci msg towards firmware, there is a possibility that an EQ packet is already pending, disabling EQ interrupt will prevent this from happening. The interrupt will be re-enabled after reset. Signed-off-by: Tal Cohen <talcohen@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs: change the heartbeat scheduling pointFarah Kassabri
Currently we schedule the heartbeat thread at late init, only then we set the INTS_REGISTER packet which enables events to be received from firmware. Init may take some time and we want to give firmware 2 full cycles of heartbeat thread after it received INTS_REGISTER. The patch will move the heartbeat thread scheduling to be after driver is done with all initializations. Signed-off-by: Farah Kassabri <fkassabri@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs/gaudi2: unsecure edma max outstanding registerRakesh Ughreja
Netowrk EDMAs uses more outstanding transfers so this needs to be programmed by EDMA firmware. Signed-off-by: Rakesh Ughreja <rughreja@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs: remove timestamp registration debug printsOfir Bitton
There are several timestamp registration debug prints which spams the kernel log whenever dyn debug is enabled. Remove those prints. Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs: add cpld ts cpld_timestamp cpucpVitaly Margolin
Add cpld_timestamp field to cpucp_info structure and return cpld timestamp as part of cpld version Signed-off-by: Vitaly Margolin <vmargolin@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs: add a common handler for clock change eventsTomer Tayar
As the new dynamic EQ includes clock change events which are common and not ASIC-specific, add a common handler for these events. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs/gaudi2: add GAUDI2D revision supportFarah Kassabri
Gaudi2 with PCI revision ID with the value of '4' represents Gaudi2D device and should be detected and initialized as Gaudi2. Signed-off-by: Farah Kassabri <fkassabri@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs: move hl_eq_heartbeat_event_handle() to common codeTomer Tayar
hl_eq_heartbeat_event_handle() doesn't have ASIC specific code, and therefore can be moved from Gaudi2-only code to common code, and possibly used for other ASICs. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs: add an EQ size ASIC propertyTomer Tayar
Future supported ASICs might use the dynamic EQ mechanism with the firmware, and in that case the EQ size won't be equal to the default HL_EQ_SIZE_IN_BYTES value. Add an ASIC property to enable overriding this value. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs/gaudi2: assume hard-reset by FW upon MC SEI severe errorTomer Tayar
FW initiates a hard reset upon an MC SEI severe error. Align the driver to expect this reset and avoid accessing the device until the reset is done. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs/gaudi2: revise return value handling in ↵Tomer Tayar
gaudi2_hbm_sei_handle_read_err() The return value in gaudi2_hbm_sei_handle_read_err() is boolean and not a bitmask, so there is need for "|= true". In addition, rename the 'rc' variable, as no "return code" is returned here but an indication if a hard reset is required. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs/gaudi2: align interrupt names to tableAriel Suller
when reporting tpc events, the dcore and tpc in dcore should be reported and propagated, and not the generatl tpc number Signed-off-by: Ariel Suller <asuller@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs: check for errors after preboot is readyFarah Kassabri
Driver should check and report any fatal errors detected by preboot, before it attempts to load the boot fit. Some errors may cause the driver to stop the boot process and mark the device as unusable. This check will allow the driver to fail and print the error reported by preboot and skip the time wasting attempt of trying to load the boot fit, which will fail due to the error. Signed-off-by: Farah Kassabri <fkassabri@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs: use msg_header instead of desc_headerIgal Zeltser
Struct comms_desc_header is deprecated and replaced by struct comms_msg_header. As a preparation for removing comms_desc_header from FW, all it's usage in code is replaced by comms_msg_header. Signed-off-by: Igal Zeltser <izeltser@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs: add heartbeat debug infoFarah Kassabri
It is hard to debug the reason for heartbeat check failures. As an attempt to ease this task, this patch will provide more information when this failure happens. Heartbeat checks the communication with FW, so printing the CPU queue pi/ci and the counter of how many times that event was received would help in debugging the issue. Signed-off-by: Farah Kassabri <fkassabri@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs: add device name to invalidation failure msgOhad Sharabi
This addition helps log parsers better define the error without the need to go back and search the device name on former log lines. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs: expose server type in debugfsTal Risin
Exposing server type through debugfs to enable easier access via scripts. Signed-off-by: Tal Risin <trisin@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs: use parent device for trace eventsTomer Tayar
Trace events might still be recorded after the accel device is released, while the device name is no longer available. Modify the trace functions to use the parent device instead, which is available at that point and still informative as the device name. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs: no CPUCP prints on heartbeat failureOhad Sharabi
If we detected heartbet event while some daemon in the background send (via driver interface) CPUCP messages the dmesg will be flooded. Instead, a slight refactor in hl_fw_send_cpu_message() returns -EAGAIN when CPU is disabled (i.e. heartbeat failure) and only then. Later, all calling functions that may be invoked by user space can issue prints only if the error code is not -EAGAIN. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs/gaudi2: align embedded specs headersOfir Bitton
Align embedded headers to latest release. Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs: restructure function that checks heartbeat receivedOhad Sharabi
The function returned an error code which isn't propagated up the stack (nor is it printed). The return value is only checked for =0 or !=0 which implies bool return value. The function signature is updated accordingly, renamed, and slightly refactored. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs/gaudi2: update interrupts related headersFarah Kassabri
Align the interrupts related headers to latest release. Signed-off-by: Farah Kassabri <fkassabri@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs: add device name to error printDani Liberman
The extra info will help in better traceability and debug. Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs/gaudi2: use single function to compare FW versionsOhad Sharabi
Currently, the code contains 2 types of FW version comparison functions: - hl_is_fw_sw_ver_[below/equal_or_greater]() - gaudi2 specific function of the type gaudi2_is_fw_ver_[below/above]x_y_z() Moreover, some functions use the inner FW version which shuold be only stage during development but not version dependencies. Finally, some tests are done to deprecated FW version to which LKD should hold no compatibility. This commit aligns all APIs to a single function that just compares the version and return an integers indicator (similar in some way to strcmp()). In addition, this generic function now considers also the sub-minor FW version and also remove dead code resulting in deprecated FW versions compatibility. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-02-26accel/habanalabs: modify pci health checkOfir Bitton
Today we read PCI VENDOR-ID in order to make sure PCI link is healthy. Apparently the VENDOR-ID might be stored on host and hence, when we read it we might not access the PCI bus. In order to make sure PCI health check is reliable, we will start checking the DEVICE-ID instead. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2024-02-26accel/habanalabs: keep explicit size of reserved memory for FWTomer Tayar
The reserved memory for FW is currently saved in an ASIC property in units of MB, just like the value that comes from FW. Except the fact that it is not clear from the property's name, it means also that a calculation to actual size is required everywhere that it is used. Modify the property to hold the size in bytes. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2024-02-26accel/habanalabs: handle reserved memory request when working with full FWTomer Tayar
Currently the reserved memory request from FW is handled when running with preboot only, but this request is relevant also when running with full FW. Modify to always handle this reservation request. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2024-02-26accel/habanalabs/hwmon: rate limit errors user can generateOfir Bitton
Fetching sensor data can fail due to various reasons. In order not to pollute the kernel log, those error prints must be rate limited. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2024-02-26accel/habanalabs/gaudi2: drain event lacks rd/wr indicationOfir Bitton
Due to a H/W issue, AXI drain event does not include a read/write indication, hence we remove this print. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2024-02-26accel/habanalabs: fix error printDani Liberman
The unmasking is for event and it can be other event than RAZWI. Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2024-02-26accel/habanalabs: initialize maybe-uninitialized variablesTal Risin
Prevent static analysis warning. Signed-off-by: Tal Risin <trisin@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>