summaryrefslogtreecommitdiff
path: root/drivers/accel/habanalabs/common/firmware_if.c
AgeCommit message (Collapse)Author
2024-06-23accel/habanalabs: print timestamp of last PQ heartbeat on EQ heartbeat failureTomer Tayar
The test packet which is sent to FW for the PQ heartbeat is used also as the trigger in FW to send the EQ heartbeat event. Add the time of the last sent packet to the debug info which is printed upon a EQ heartbeat failure. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs: add more info upon cpu pkt timeoutFarah Kassabri
In order to have better debuggability upon encountering FW issues, We are adding additional info once CPU packet timeout expires. Signed-off-by: Farah Kassabri <fkassabri@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs: check for errors after preboot is readyFarah Kassabri
Driver should check and report any fatal errors detected by preboot, before it attempts to load the boot fit. Some errors may cause the driver to stop the boot process and mark the device as unusable. This check will allow the driver to fail and print the error reported by preboot and skip the time wasting attempt of trying to load the boot fit, which will fail due to the error. Signed-off-by: Farah Kassabri <fkassabri@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs: use msg_header instead of desc_headerIgal Zeltser
Struct comms_desc_header is deprecated and replaced by struct comms_msg_header. As a preparation for removing comms_desc_header from FW, all it's usage in code is replaced by comms_msg_header. Signed-off-by: Igal Zeltser <izeltser@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs: use parent device for trace eventsTomer Tayar
Trace events might still be recorded after the accel device is released, while the device name is no longer available. Modify the trace functions to use the parent device instead, which is available at that point and still informative as the device name. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs: no CPUCP prints on heartbeat failureOhad Sharabi
If we detected heartbet event while some daemon in the background send (via driver interface) CPUCP messages the dmesg will be flooded. Instead, a slight refactor in hl_fw_send_cpu_message() returns -EAGAIN when CPU is disabled (i.e. heartbeat failure) and only then. Later, all calling functions that may be invoked by user space can issue prints only if the error code is not -EAGAIN. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23accel/habanalabs/gaudi2: use single function to compare FW versionsOhad Sharabi
Currently, the code contains 2 types of FW version comparison functions: - hl_is_fw_sw_ver_[below/equal_or_greater]() - gaudi2 specific function of the type gaudi2_is_fw_ver_[below/above]x_y_z() Moreover, some functions use the inner FW version which shuold be only stage during development but not version dependencies. Finally, some tests are done to deprecated FW version to which LKD should hold no compatibility. This commit aligns all APIs to a single function that just compares the version and return an integers indicator (similar in some way to strcmp()). In addition, this generic function now considers also the sub-minor FW version and also remove dead code resulting in deprecated FW versions compatibility. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-02-26accel/habanalabs: keep explicit size of reserved memory for FWTomer Tayar
The reserved memory for FW is currently saved in an ASIC property in units of MB, just like the value that comes from FW. Except the fact that it is not clear from the property's name, it means also that a calculation to actual size is required everywhere that it is used. Modify the property to hold the size in bytes. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2024-02-26accel/habanalabs: handle reserved memory request when working with full FWTomer Tayar
Currently the reserved memory request from FW is handled when running with preboot only, but this request is relevant also when running with full FW. Modify to always handle this reservation request. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2024-02-26accel/habanalabs: fix error printDani Liberman
The unmasking is for event and it can be other event than RAZWI. Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2024-02-26accel/habanalabs: modify print for skip loading linux FW to debug logTomer Tayar
Skip loading a linux FW image into the device with the current supported ASICs is done for test purposes only. Moreover, for future supported ASICs it is possible that there won't be a need to load such an image. The print in such a case is therefore not needed in most cases, so replace the used dev_info() with dev_dbg(). Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-12-19accel/habanalabs/gaudi2: add signed dev info uAPIMoti Haimovski
User will provide a nonce via the INFO ioctl, and will retrieve the signed device info generated using given nonce. Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-12-19accel/habanalabs: update device boot error checkFarah Kassabri
Use a predefined mask which set the device critical boot errors. Driver will fail and stop its loading, only upon detecting at least one of those errors defined in this mask. Signed-off-by: Farah Kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs: update boot status printAriel Suller
FW shutdown preparation status was added to spec. Signed-off-by: Ariel Suller <asuller@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs: extend preboot timeout when preboot might take longerDafna Hirschfeld
There are cases such when FW runs MBIST, that preboot is expected to take longer than the usual. In such cases the firmware reports status SECURITY_READY/IN_PREBOOT and we extend the timeout waiting for it. This is currently implemented for Gaudi2 only. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs: move cpucp interface to linux/habanalabsDavid Meriin
The CPUCP interface is moved to a shared folder outside of accel as a pre-requisite to upstream the NIC drivers that will also include this file. Signed-off-by: David Meriin <dmeriin@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs: handle f/w reserved dram space requestDani Liberman
It is possible for FW to request reserved space in dram. If the device supports this option, it will retrieve the size from the f/w and will reserve it. Currently we add the common code infrastructure to support it. Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs: dump temperature threshold boot errorOfir Bitton
Add dump of an error reported from f/w during boot time. This error indicates a failure with setting temperature threshold. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09accel/habanalabs: fix standalone preboot descriptor requestfarah kassabri
The preboot used to statically allocate memory for the comms descriptor on the device memory when driver requested the descriptor information. Now preboot moved to dynamic memory allocation where it wants to check the size the driver expects vs. what the f/w expects. Note there are no backward compatibility issues as older f/w versions simply ignore this value. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08accel/habanalabs: update state when loading boot fitKoby Elbaz
Any FW component we load must be followed by a corresponding state update. However, it seems that so far we skipped doing so for the bootfit case, so fix that. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08accel/habanalabs: poll for device status update following WFE cmdKoby Elbaz
Currently, we rely on COMMS protocol's ack to verify that WFE command has been acknowledged by the FW. However, this does not guarantee that the device status has been updated. Although unlikely, this could trigger a race since the driver expects the device to be halted at that stage, but it might not be. Therefore, we increase WFE's robustness by polling on the status register that will be updated once the device is actually halted. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-05accel/habanalabs: do soft-reset using cpucp packetDafna Hirschfeld
This is done depending on the FW version. The cpucp method is preferable and saves scratchpads resource. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-05accel/habanalabs: extract and save the FW's SW major/minor/sub-minorDafna Hirschfeld
It is not always possible to know the FW's SW version from the inner FW version. Therefore we should extract the general SW version in addition to the FW version and use it in functions like 'hl_is_fw_ver_below_1_9' etc. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-05accel/habanalabs: rename fw_{major/minor}_version to fw_inner_{major/minor}_verDafna Hirschfeld
We later want to add fields for Firmware SW version. The current extracted FW version is the inner FW versioning so the new name is better and also better differentiate from the FW's SW version. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-05accel/habanalabs: add helper to extract the FW major/minorDafna Hirschfeld
the helper is extract_u32_until_given_char and can later be used to also get the major/minor of the sw version. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-04-08accel/habanalabs: don't wait for STS_OK after sending COMMS WFEKoby Elbaz
Sending COMMS_GOTO_WFE instructs the FW's CPU to halt (WFE state). Once sent, FW's CPU isn't expected to continue communicating with LKD. Therefore, the stage of waiting for COMMS_STS_OK should be skipped or else waiting for COMMS_STS_OK will simply timeout, which will trigger unexpected behavior. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-04-08accel/habanalabs: improvements to FW ver extractionDafna Hirschfeld
1. Rename the func to hl_get_preboot_major_minor because we also set the extracted values in hdev fields. 2. Free the allocated string in the calling function which makes more sense Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-04-08accel/habanalabs: change COMMS warning messages to error levelKoby Elbaz
COMMS protocol is used for LKD <--> FW communication, and any communication failure between the two might turn out to be destructive, hence, it should be well emphasized. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
2023-03-20accel/habanalabs: fix a missing-braces compilation warningTomer Tayar
Replace initialization of "struct cpucp_packet" from "{0} to "{}" to avoid a "missing braces around initializer" compilation warning. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-01-26habanalabs: enhance info printed on FW load errorsMoti Haimovski
This commit enhances the following error messages to also provide the type of error occurred, this in order to ease debugging of errors detected during firmware-load. Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-01-26habanalabs: add set engines masks ASIC functionOhad Sharabi
This function shall be used whenever components enable/binning masks should be updated. Usage is in one of the below cases: - update user (or default) component masks - update when getting the masks from FW (either CPUCP or COMMS) Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-01-26habanalabs: move driver to accel subsystemOded Gabbay
Now that we have a subsystem for compute accelerators, move the habanalabs driver to it. This patch only moves the files and fixes the Makefiles. Future patches will change the existing code to register to the accel subsystem and expose the accel device char files instead of the habanalabs device char files. Update the MAINTAINERS file to reflect this change. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>