linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-06-23	accel/habanalabs: dump the EQ entries headers on EQ heartbeat failure	Tomer Tayar
	Add a dump of the EQ entries headers upon a EQ heartbeat failure. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2024-06-23	accel/habanalabs: add an EQ size ASIC property	Tomer Tayar
	Future supported ASICs might use the dynamic EQ mechanism with the firmware, and in that case the EQ size won't be equal to the default HL_EQ_SIZE_IN_BYTES value. Add an ASIC property to enable overriding this value. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
2023-10-09	accel/habanalabs: split user interrupts pending list	farah kassabri
	Currently driver maintain one list for both pending user interrupts which seeks to wait till CQ reaches it's target value and also the ones that seeks to get timestamp records when the CQ reaches it's target value. This causes delay in handling the waiters which gets higher priority than the timestamp records. In order to solve this, let's split the list into two, one for each case and each one is protected by it's own spinlock. Waiters will be handled within the interrupt context first, then the timestamp records will be set. Freeing the timestamp related memory will be handled in a workqueue. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09	accel/habanalabs: optimize timestamp registration handler	farah kassabri
	Currently we use dynamic allocation inside the irq handler in order to allocate free node to be used for the free jobs. This operation is expensive, especially when we deal with large burst of events records that get released at the same time. The alternative is to have pre allocated pool of free nodes and just fetch nodes from this pool at irq handling time instead of allocating them. In case the pool becomes full, then the driver will fallback to dynamic allocations. As part of the optimization also update the unregister flow upon re-using a timestamp record, by making the operation much simpler and quicker. We already have the record in the registration flow and now we just seek to re-use with different interrupt. Therefore, no need to look for buffer according to the user handle. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09	accel/habanalabs: fix bug in timestamp interrupt handling	farah kassabri
	There is a potential race between user thread seeking to re-use a timestamp record with new interrupt id, while this record is still in the middle of interrupt handling and it is about to be freed. Imagine the driver set the record in_use to 0 and only then fill the free_node information. This might lead to unpleasant scenario where the new registration thread detects the record as free to use, and change the cq buff address. That will cause the free_node to get the wrong buffer address to put refcount to. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09	accel/habanalabs/gaudi2: add eq health check using irq	farah kassabri
	This is the second patch for applying the eq health check mechanism which will add support for the interrupt flow for gaudi2 asic. More info about the interrupt mechanism: set a dedicated msix for the eq error interrupt, and add interrupt handler for it. when FW detects some issue with EQ like EQ_FULL, it'll raise that interrupt and driver should reset the device. Driver will inform the FW which msix index to use through the already existing handshake mechanism which will send msix info message to fw. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09	accel/habanalabs: Allow single timestamp registration request at a time	farah kassabri
	Protect against concurrency of user requesting to register a timestamp offset (where the driver fills the timestamp when the command submission has finished executing) to a specific user interrupt ID. The protection is basically to allow only one timestamp registration request to be handled at a time. This is needed because the user can decide to re-use a timestamp offset (register an already registered offset, to a different interrupt ID). This means the request will cause the timestamp node to move from one interrupt list to another interrupt list. In such scenario, without proper protection, we could end up adding the same node twice to the interrupts wait lists. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08	accel/habanalabs: add event queue extra validation	Ofir Bitton
	In order to increase reliability of the event queue interface, we apply to Gaudi2 the same mechanism we have in Gaudi1. The extra validation is basically checking that the received event index matches the expected index. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-04-08	accel/habanalabs: remove completion from abnormal interrupt work name	Tomer Tayar
	Decoder abnormal interrupts are for errors and not for completion, so rename the relevant work and work function to not include 'completion'. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-04-08	accel/habanalabs: print event type when device is disabled	Tal Cohen
	When the device is in disabled state, the driver isn't suppose to receive any events from FW. Printing the event type, as part of the message that was already printed, shall help to get more info if this unexpected message is received. Signed-off-by: Tal Cohen <talcohen@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
2023-03-20	accel/habanalabs: add handling for unexpected user event	Ofir Bitton
	In order for the user to be aware of unexpected events in Gaudi2 that aren't assigned to a specific engine, we are adding the handling of this dedicated interrupt. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-03-15	accel/habanalabs: remove hl_irq_handler_default()	Tomer Tayar
	hl_irq_handler_default() is not used and can be removed. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
2023-03-15	accel/habanalabs: fix print in hl_irq_handler_eq()	Tomer Tayar
	"eq_base[eq->ci].hdr.ctl" is used directly in a print without a le32_to_cpu() conversion. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
2023-03-15	accel/habanalabs: add support for TPC assert	Ofir Bitton
	In order to allow TPC engines to raise an assert, we must expose the relevant MSIX interrupt to the user so he will configure the engine correctly. In addition, we implement the corresponding interrupt handler that will notify the user upon such an event. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
2023-03-15	accel/habanalabs: capture interrupt timestamp in handler	Ofir Bitton
	In order for interrupt timestamp to be more accurate we should capture it during the interrupt handling rather than in threaded irq context. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
2023-03-15	accel/habanalabs: change user interrupt to threaded IRQ	Tal Cohen
	We prefer not to handle the user interrupt job inside the interrupt context. Instead, use threaded IRQ to handle the user interrupts. This will allow to avoid disabling interrupts when the user process registers for a new event and to avoid long handling inside an interrupt. Signed-off-by: Tal Cohen <talcohen@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
2023-01-26	habanalabs: optimize command submission completion timestamp	Ofir Bitton
	Completion timestamp is taken during the actual command submission release. As the release happens in a work queue, the timestamp taken is not accurate. Hence, we will take the timestamp in the interrupt handler itself while propagating it to the release function. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-01-26	habanalabs: refactor user interrupt type	Ofir Bitton
	In order to support more user interrupt types in the future, we enumerate the user interrupt type instead of using a boolean. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-01-26	habanalabs: move driver to accel subsystem	Oded Gabbay
	Now that we have a subsystem for compute accelerators, move the habanalabs driver to it. This patch only moves the files and fixes the Makefiles. Future patches will change the existing code to register to the accel subsystem and expose the accel device char files instead of the habanalabs device char files. Update the MAINTAINERS file to reflect this change. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>