Age | Commit message (Collapse) | Author |
|
There are a few issues in this code. If *ppos is non-zero then the
first part of the buffer is not initialized. We never initialize the
last character of the buffer. The return is not checked so it's
possible that none of the buffer is initialized.
This is debugfs code which is root only and the impact of these bugs is
very small. However, it's still worth fixing. To fix this:
1) Check that *ppos is zero.
2) Use copy_from_user() instead of simple_write_to_buffer().
3) Explicitly add a NUL terminator.
Fixes: e2b67859ab6e ("crypto: qat - add heartbeat error simulator")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
During the PCI AER system's error recovery process, the kernel driver
may encounter a race condition with freeing the reset_data structure's
memory. If the device restart will take more than 10 seconds the function
scheduling that restart will exit due to a timeout, and the reset_data
structure will be freed. However, this data structure is used for
completion notification after the restart is completed, which leads
to a UAF bug.
This results in a KFENCE bug notice.
BUG: KFENCE: use-after-free read in adf_device_reset_worker+0x38/0xa0 [intel_qat]
Use-after-free read at 0x00000000bc56fddf (in kfence-#142):
adf_device_reset_worker+0x38/0xa0 [intel_qat]
process_one_work+0x173/0x340
To resolve this race condition, the memory associated to the container
of the work_struct is freed on the worker if the timeout expired,
otherwise on the function that schedules the worker.
The timeout detection can be done by checking if the caller is
still waiting for completion or not by using completion_done() function.
Fixes: d8cba25d2c68 ("crypto: qat - Intel(R) QAT driver framework")
Cc: <stable@vger.kernel.org>
Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The implementation of the Rate Limiting (RL) feature includes the cleanup
of all SLAs during device shutdown. For each SLA, the firmware is notified
of the removal through an admin message, the data structures that take
into account the budgets are updated and the memory is freed.
However, this explicit cleanup is not necessary as (1) the device is
reset, and the firmware state is lost and (2) all RL data structures
are freed anyway.
In addition, if the device is unresponsive, for example after a PCI
AER error is detected, the admin interface might not be available.
This might slow down the shutdown sequence and cause a timeout in
the recovery flows which in turn makes the driver believe that the
device is not recoverable.
Fix by replacing the explicit SLAs removal with just a free of the
SLA data structures.
Fixes: d9fb8408376e ("crypto: qat - add rate limiting feature to qat_4xxx")
Cc: <stable@vger.kernel.org>
Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Rework the AER reset and recovery flow to take into account root port
integrated devices that gets reset between the error detected and the
slot reset callbacks.
In adf_error_detected() the devices is gracefully shut down. The worker
threads are disabled, the error conditions are notified to listeners and
through PFVF comms and finally the device is reset as part of
adf_dev_down().
In adf_slot_reset(), the device is brought up again. If SRIOV VFs were
enabled before reset, these are re-enabled and VFs are notified of
restarting through PFVF comms.
Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
When the driver detects an heartbeat failure, it starts the recovery
flow. Set a limit so that the number of events is limited in case the
heartbeat status is read too frequently.
Signed-off-by: Furong Zhou <furong.zhou@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Expose the `auto_reset` sysfs attribute to configure the driver to reset
the device when a fatal error is detected.
When auto reset is enabled, the driver resets the device when it detects
either an heartbeat failure or a fatal error through an interrupt.
This patch is based on earlier work done by Shashank Gupta.
Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Notify a fatal error condition and optionally reset the device in
the following cases:
* if the device reports an uncorrectable fatal error through an
interrupt
* if the heartbeat feature detects that the device is not
responding
This patch is based on earlier work done by Shashank Gupta.
Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
When a Physical Function (PF) is reset, SR-IOV gets disabled, making the
associated Virtual Functions (VFs) unavailable. Even after reset and
using pci_restore_state, VFs remain uncreated because the numvfs still
at 0. Therefore, it's necessary to reconfigure SR-IOV to re-enable VFs.
This commit introduces the ADF_SRIOV_ENABLED configuration flag to cache
the SR-IOV enablement state. SR-IOV is only re-enabled if it was
previously configured.
This commit also introduces a dedicated workqueue without
`WQ_MEM_RECLAIM` flag for enabling SR-IOV during Heartbeat and CPM error
resets, preventing workqueue flushing warning.
This patch is based on earlier work done by Shashank Gupta.
Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Update the PFVF logic to handle restart and recovery. This adds the
following functions:
* adf_pf2vf_notify_fatal_error(): allows the PF to notify VFs that the
device detected a fatal error and requires a reset. This sends to
VF the event `ADF_PF2VF_MSGTYPE_FATAL_ERROR`.
* adf_pf2vf_wait_for_restarting_complete(): allows the PF to wait for
`ADF_VF2PF_MSGTYPE_RESTARTING_COMPLETE` events from active VFs
before proceeding with a reset.
* adf_pf2vf_notify_restarted(): enables the PF to notify VFs with
an `ADF_PF2VF_MSGTYPE_RESTARTED` event after recovery, indicating that
the device is back to normal. This prompts VF drivers switch back to
use the accelerator for workload processing.
These changes improve the communication and synchronization between PF
and VF drivers during system restart and recovery processes.
Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Disable arbitration to avoid new requests to be processed before
resetting a device.
This is needed so that new requests are not fetched when an error is
detected.
Signed-off-by: Furong Zhou <furong.zhou@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Add error notify method to report a fatal error event to all the
subsystems registered. In addition expose an API,
adf_notify_fatal_error(), that allows to trigger a fatal error
notification asynchronously in the context of a workqueue.
This will be invoked when a fatal error is detected by the ISR or
through Heartbeat.
Signed-off-by: Furong Zhou <furong.zhou@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Add a mechanism that allows to inject a heartbeat error for testing
purposes.
A new attribute `inject_error` is added to debugfs for each QAT device.
Upon a write on this attribute, the driver will inject an error on the
device which can then be detected by the heartbeat feature.
Errors are breaking the device functionality thus they require a
device reset in order to be recovered.
This functionality is not compiled by default, to enable it
CRYPTO_DEV_QAT_ERROR_INJECTION must be set.
Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Lucas Segarra Fernandez <lucas.segarra.fernandez@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Markas Rapoportas <markas.rapoportas@intel.com>
Signed-off-by: Mun Chun Yep <mun.chun.yep@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
As noted in the "Deprecated Interfaces, Language Features, Attributes,
and Conventions" documentation [1], size calculations (especially
multiplication) should not be performed in memory allocator (or similar)
function arguments due to the risk of them overflowing. This could lead
to values wrapping around and a smaller allocation being made than the
caller was expecting. Using those allocations could lead to linear
overflows of heap memory and other misbehaviors.
So, use the purpose specific kcalloc_node() function instead of the
argument count * size in the kzalloc_node() function.
Link: https://www.kernel.org/doc/html/next/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [1]
Link: https://github.com/KSPP/linux/issues/162
Signed-off-by: Erick Archer <erick.archer@gmx.com>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Acked-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The use of array_size() leads gcc to assume the memcpy() can have a larger
limit than actually possible, which triggers a string fortification warning:
In file included from include/linux/string.h:296,
from include/linux/bitmap.h:12,
from include/linux/cpumask.h:12,
from include/linux/sched.h:16,
from include/linux/delay.h:23,
from include/linux/iopoll.h:12,
from drivers/crypto/intel/qat/qat_common/adf_gen4_hw_data.c:3:
In function 'fortify_memcpy_chk',
inlined from 'adf_gen4_init_thd2arb_map' at drivers/crypto/intel/qat/qat_common/adf_gen4_hw_data.c:401:3:
include/linux/fortify-string.h:579:4: error: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning]
579 | __write_overflow_field(p_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/fortify-string.h:588:4: error: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning]
588 | __read_overflow2_field(q_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Add an explicit range check to avoid this.
Fixes: 5da6a2d5353e ("crypto: qat - generate dynamically arbiter mappings")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The commit "crypto: qat - generate dynamically arbiter mappings"
introduced a regression on qat_402xx devices.
This is reported when the driver probes the device, as indicated by
the following error messages:
4xxx 0000:0b:00.0: enabling device (0140 -> 0142)
4xxx 0000:0b:00.0: Generate of the thread to arbiter map failed
4xxx 0000:0b:00.0: Direct firmware load for qat_402xx_mmp.bin failed with error -2
The root cause of this issue was the omission of a necessary function
pointer required by the mapping algorithm during the implementation.
Fix it by adding the missing function pointer.
Fixes: 5da6a2d5353e ("crypto: qat - generate dynamically arbiter mappings")
Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The thread-to-arbiter mapping describes which arbiter can assign jobs
to an acceleration engine thread.
The existing mappings are functionally correct, but hardcoded and not
optimized.
Replace the static mappings with an algorithm that generates optimal
mappings, based on the loaded configuration.
The logic has been made common so that it can be shared between all
QAT GEN4 devices.
Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Expose through debugfs ring pair telemetry data for QAT GEN4 devices.
This allows to gather metrics about the PCIe channel and device TLB for
a selected ring pair. It is possible to monitor maximum 4 ring pairs at
the time per device.
For details, refer to debugfs-driver-qat_telemetry in Documentation/ABI.
This patch is based on earlier work done by Wojciech Ziemba.
Signed-off-by: Lucas Segarra Fernandez <lucas.segarra.fernandez@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Damian Muszynski <damian.muszynski@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Expose through debugfs device telemetry data for QAT GEN4 devices.
This allows to gather metrics about the performance and the utilization
of a device. In particular, statistics on (1) the utilization of the
PCIe channel, (2) address translation, when SVA is enabled and (3) the
internal engines for crypto and data compression.
If telemetry is supported by the firmware, the driver allocates a DMA
region and a circular buffer. When telemetry is enabled, through the
`control` attribute in debugfs, the driver sends to the firmware, via
the admin interface, the `TL_START` command. This triggers the device to
periodically gather telemetry data from hardware registers and write it
into the DMA memory region. The device writes into the shared region
every second.
The driver, every 500ms, snapshots the DMA shared region into the
circular buffer. This is then used to compute basic metric
(min/max/average) on each counter, every time the `device_data` attribute
is queried.
Telemetry counters are exposed through debugfs in the folder
/sys/kernel/debug/qat_<device>_<BDF>/telemetry.
For details, refer to debugfs-driver-qat_telemetry in Documentation/ABI.
This patch is based on earlier work done by Wojciech Ziemba.
Signed-off-by: Lucas Segarra Fernandez <lucas.segarra.fernandez@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Damian Muszynski <damian.muszynski@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Extend the admin interface with two new public APIs to enable
and disable the telemetry feature: adf_send_admin_tl_start() and
adf_send_admin_tl_stop().
The first, sends to the firmware, through the ICP_QAT_FW_TL_START
message, the IO address where the firmware will write telemetry
metrics and a list of ring pairs (maximum 4) to be monitored.
It returns the number of accelerators of each type supported by
this hardware. After this message is sent, the firmware starts
periodically reporting telemetry data using by writing into the
dma buffer specified as input.
The second, sends the admin message ICP_QAT_FW_TL_STOP
which stops the reporting of telemetry data.
This patch is based on earlier work done by Wojciech Ziemba.
Signed-off-by: Lucas Segarra Fernandez <lucas.segarra.fernandez@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Damian Muszynski <damian.muszynski@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
GET_DEV() macro expansion relies on struct pci_dev being defined.
Include <linux/pci.h> at adf_accel_devices.h.
Signed-off-by: Lucas Segarra Fernandez <lucas.segarra.fernandez@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Damian Muszynski <damian.muszynski@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Add support for 420xx devices by including a new device driver that
supports such devices, updates to the firmware loader and capabilities.
Compared to 4xxx devices, 420xx devices have more acceleration engines
(16 service engines and 1 admin) and support the wireless cipher
algorithms ZUC and Snow 3G.
Signed-off-by: Jie Wang <jie.wang@intel.com>
Co-developed-by: Dong Xie <dong.xie@intel.com>
Signed-off-by: Dong Xie <dong.xie@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Relocate the structures adf_fw_objs and adf_fw_config from the file
adf_4xxx_hw_data.c to the newly created adf_fw_config.h.
These structures will be used by new device drivers.
This does not introduce any functional change.
Signed-off-by: Jie Wang <jie.wang@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Move logic that is common between QAT GEN4 accelerators to the
qat_common folder. This includes addresses of CSRs, setters and
configuration logic.
When moved, functions and defines have been renamed from 4XXX to GEN4.
Code specific to the device is moved to the file adf_gen4_hw_data.c.
Code related to configuration is moved to the newly created
adf_gen4_config.c.
This does not introduce any functional change.
Signed-off-by: Jie Wang <jie.wang@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Add accel_dev as parameter of the function uof_get_num_objs().
This is in preparation for the introduction of the QAT 420xx driver as
it will allow to reconfigure the ae_mask when a configuration that does
not require all AEs is loaded on the device.
This does not introduce any functional change.
Signed-off-by: Jie Wang <jie.wang@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Move the function get_service_enabled() from adf_4xxx_hw_data.c to
adf_cfg_services.c and rename it as adf_get_service_enabled().
This function is not specific to the 4xxx and will be used by
other QAT drivers.
This does not introduce any functional change.
Signed-off-by: Jie Wang <jie.wang@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
There is a possibility that the function adf_devmgr_pci_to_accel_dev()
might return a NULL pointer.
Add a NULL pointer check in the function rp2srv_show().
Fixes: dbc8876dd873 ("crypto: qat - add rp2svc sysfs attribute")
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: David Guckian <david.guckian@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
If the function validate_user_input() returns an error, the error path
attempts to unlock an unacquired mutex.
Acquire the mutex before calling validate_user_input(). This is not
strictly necessary but simplifies the code.
Fixes: d9fb8408376e ("crypto: qat - add rate limiting feature to qat_4xxx")
Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The input argument `sla_in` is a pointer to a structure that contains
the parameters of the SLA which is being added or updated.
If this pointer is NULL, the function should return an error as
the data required for the algorithm is not available.
By mistake, the logic jumps to the error path which dereferences
the pointer.
This results in a warnings reported by the static analyzer Smatch when
executed without a database:
drivers/crypto/intel/qat/qat_common/adf_rl.c:871 add_update_sla()
error: we previously assumed 'sla_in' could be null (see line 812)
This issue was not found in internal testing as the pointer cannot be
NULL. The function add_update_sla() is only called (indirectly) by
the rate limiting sysfs interface implementation in adf_sysfs_rl.c
which ensures that the data structure is allocated and valid. This is
also proven by the fact that Smatch executed with a database does not
report such error.
Fix it by returning with error if the pointer `sla_in` is NULL.
Fixes: d9fb8408376e ("crypto: qat - add rate limiting feature to qat_4xxx")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The qat_rl sysfs attribute group is registered within the adf_dev_start()
function, alongside other driver components.
If any of the functions preceding the group registration fails,
the adf_dev_start() function returns, and the caller, to undo the
operation, invokes adf_dev_stop() followed by adf_dev_shutdown().
However, the current flow lacks information about whether the
registration of the qat_rl attribute group was successful or not.
In cases where this condition is encountered, an error similar to
the following might be reported:
4xxx 0000:6b:00.0: Starting device qat_dev0
4xxx 0000:6b:00.0: qat_dev0 started 9 acceleration engines
4xxx 0000:6b:00.0: Failed to send init message
4xxx 0000:6b:00.0: Failed to start device qat_dev0
sysfs group 'qat_rl' not found for kobject '0000:6b:00.0'
...
sysfs_remove_groups+0x2d/0x50
adf_sysfs_rl_rm+0x44/0x70 [intel_qat]
adf_rl_stop+0x2d/0xb0 [intel_qat]
adf_dev_stop+0x33/0x1d0 [intel_qat]
adf_dev_down+0xf1/0x150 [intel_qat]
...
4xxx 0000:6b:00.0: qat_dev0 stopped 9 acceleration engines
4xxx 0000:6b:00.0: Resetting device qat_dev0
To prevent attempting to remove attributes from a group that has not
been added yet, a flag named 'sysfs_added' is introduced. This flag
is set to true upon the successful registration of the attribute group.
Fixes: d9fb8408376e ("crypto: qat - add rate limiting feature to qat_4xxx")
Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The qat_ras sysfs attribute group is registered within the
adf_dev_start() function, alongside other driver components.
If any of the functions preceding the group registration fails,
the adf_dev_start() function returns, and the caller, to undo the
operation, invokes adf_dev_stop() followed by adf_dev_shutdown().
However, the current flow lacks information about whether the
registration of the qat_ras attribute group was successful or not.
In cases where this condition is encountered, an error similar to
the following might be reported:
4xxx 0000:6b:00.0: Starting device qat_dev0
4xxx 0000:6b:00.0: qat_dev0 started 9 acceleration engines
4xxx 0000:6b:00.0: Failed to send init message
4xxx 0000:6b:00.0: Failed to start device qat_dev0
sysfs group 'qat_ras' not found for kobject '0000:6b:00.0'
...
sysfs_remove_groups+0x29/0x50
adf_sysfs_stop_ras+0x4b/0x80 [intel_qat]
adf_dev_stop+0x43/0x1d0 [intel_qat]
adf_dev_down+0x4b/0x150 [intel_qat]
...
4xxx 0000:6b:00.0: qat_dev0 stopped 9 acceleration engines
4xxx 0000:6b:00.0: Resetting device qat_dev0
To prevent attempting to remove attributes from a group that has not
been added yet, a flag named 'sysfs_added' is introduced. This flag
is set to true upon the successful registration of the attribute group.
Fixes: 532d7f6bc458 ("crypto: qat - add error counters")
Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The "ring" variable has an upper bounds check but nothing checks for
negatives. This code uses kstrtouint() already and it was obviously
intended to be declared as unsigned int. Make it so.
Fixes: dbc8876dd873 ("crypto: qat - add rp2svc sysfs attribute")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Acked-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
If a request has the flag CRYPTO_TFM_REQ_MAY_BACKLOG set, the function
qat_alg_send_message_maybacklog(), enqueues it in a backlog list if
either (1) there is already at least one request in the backlog list, or
(2) the HW ring is nearly full or (3) the enqueue to the HW ring fails.
If an interrupt occurs right before the lock in qat_alg_backlog_req() is
taken and the backlog queue is being emptied, then there is no request
in the HW queues that can trigger a subsequent interrupt that can clear
the backlog queue. In addition subsequent requests are enqueued to the
backlog list and not sent to the hardware.
Fix it by holding the lock while taking the decision if the request
needs to be included in the backlog queue or not. This synchronizes the
flow with the interrupt handler that drains the backlog queue.
For performance reasons, the logic has been changed to try to enqueue
first without holding the lock.
Fixes: 386823839732 ("crypto: qat - add backlog mechanism")
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Closes: https://lore.kernel.org/all/af9581e2-58f9-cc19-428f-6f18f1f83d54@redhat.com/T/
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The file adf_cfg_services.h cannot be included in header files since it
instantiates the structure adf_cfg_services. Move that structure to its
own file and export the symbol.
This does not introduce any functional change.
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Damian Muszynski <damian.muszynski@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Add the attribute `num_rps` to the `qat` attribute group. This returns
the number of ring pairs that a single device has. This allows to know
the maximum value that can be set to the attribute `rp2svc`.
Signed-off-by: Ciunas Bennett <ciunas.bennett@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Tero Kristo <tero.kristo@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Add the attribute `rp2svc` to the `qat` attribute group. This provides a
way for a user to query a specific ring pair for the type of service
that is currently configured for.
When read, the service will be returned for the defined ring pair.
When written to this value will be stored as the ring pair to return
the service of.
Signed-off-by: Ciunas Bennett <ciunas.bennett@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Tero Kristo <tero.kristo@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Add an interface for the rate limiting feature which allows to add,
remove and modify a QAT SLA (Service Level Agreement).
This adds a new sysfs attribute group, `qat_rl`, which can be accessed
from /sys/bus/pci/devices/<BUS:DEV:FUNCTION> with the following
hierarchy:
|-+ qat_rl
|---- id (RW) # SLA identifier
|---- cir (RW) # Committed Information Rate
|---- pir (RW) # Peak Information Rate
|---- srv (RW) # Service to be rate limited
|---- rp (RW) (HEX) # Ring pairs to be rate limited
|---- cap_rem (RW) # Remaining capability for a service
|---- sla_op (WO) # Allows to perform an operation on an SLA
The API works by setting the appropriate RW attributes and then
issuing a command through the `sla_op`. For example, to create an SLA, a
user needs to input the necessary data into the attributes cir, pir, srv
and rp and then write into `sla_op` the command `add` to execute the
operation.
The API also provides `cap_rem` attribute to get information about
the remaining device capability within a certain service which is
required when setting an SLA.
Signed-off-by: Ciunas Bennett <ciunas.bennett@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Tero Kristo <tero.kristo@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The Rate Limiting (RL) feature allows to control the rate of requests
that can be submitted on a ring pair (RP). This allows sharing a QAT
device among multiple users while ensuring a guaranteed throughput.
The driver provides a mechanism that allows users to set policies, that
are programmed to the device. The device is then enforcing those policies.
Configuration of RL is accomplished through entities called SLAs
(Service Level Agreement). Each SLA object gets a unique identifier
and defines the limitations for a single service across up to four
ring pairs (RPs count allocated to a single VF).
The rate is determined using two fields:
* CIR (Committed Information Rate), i.e., the guaranteed rate.
* PIR (Peak Information Rate), i.e., the maximum rate achievable
when the device has available resources.
The rate values are expressed in permille scale i.e. 0-1000.
Ring pair selection is achieved by providing a 64-bit mask, where
each bit corresponds to one of the ring pairs.
This adds an interface and logic that allow to add, update, retrieve
and remove an SLA.
Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Tero Kristo <tero.kristo@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The QAT firmware provides a mechanism to retrieve its capabilities
through the init admin interface.
Add logic to retrieve the firmware capability mask from the firmware
through the init/admin channel. This mask reports if the
power management, telemetry and rate limiting features are supported.
The fw capabilities are stored in the accel_dev structure and are used
to detect if a certain feature is supported by the firmware loaded
in the device.
This is supported only by devices which have an admin AE.
Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Tero Kristo <tero.kristo@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Some enums use the macro BIT. Include bits.h as it is missing.
Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Tero Kristo <tero.kristo@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The admin API is growing and deserves its own include.
Move it from adf_common_drv.h to adf_admin.h.
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Tero Kristo <tero.kristo@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The 4xxx drivers hardcode the ring to service mapping. However, when
additional configurations where added to the driver, the mappings were
not updated. This implies that an incorrect mapping might be reported
through pfvf for certain configurations.
Add an algorithm that computes the correct ring to service mapping based
on the firmware loaded on the device.
Fixes: 0cec19c761e5 ("crypto: qat - add support for compression for 4xxx")
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Tero Kristo <tero.kristo@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The adf_fw_config structures hardcode a bit mask that represents the
acceleration engines (AEs) where a certain firmware image will have to
be loaded to. Remove the hardcoded masks and replace them with defines.
This does not introduce any functional change.
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Tero Kristo <tero.kristo@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The logic that selects the correct adf_fw_config structure based on the
configured service is replicated twice in the uof_get_name() and
uof_get_ae_mask() functions. Refactor the code so that there is no
replication.
This does not introduce any functional change.
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Tero Kristo <tero.kristo@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Add logic to count correctable, non fatal and fatal error for QAT GEN4
devices.
These counters are reported through sysfs attributes in the group
qat_ras.
Signed-off-by: Shashank Gupta <shashank.gupta@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Tero Kristo <tero.kristo@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Introduce ras counters interface for counting QAT specific device
errors and expose them through the newly created qat_ras sysfs
group attribute.
This adds the following attributes:
- errors_correctable: number of correctable errors
- errors_nonfatal: number of uncorrectable non fatal errors
- errors_fatal: number of uncorrectable fatal errors
- reset_error_counters: resets all counters
These counters are initialized during device bring up and cleared
during device shutdown and are applicable only to QAT GEN4 devices.
Signed-off-by: Shashank Gupta <shashank.gupta@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Tero Kristo <tero.kristo@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Add logic to detect, report and handle uncorrectable errors reported
through the ERRSOU3 register in QAT GEN4 devices.
Signed-off-by: Shashank Gupta <shashank.gupta@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Tero Kristo <tero.kristo@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Add the function adf_get_aram_base() which allows to return the
base address of the aram bar.
Signed-off-by: Shashank Gupta <shashank.gupta@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Tero Kristo <tero.kristo@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Add logic to detect, report and handle correctable and uncorrectable
errors related to the compression hardware.
These are detected through the EXPRPSSMXLT, EXPRPSSMCPR and EXPRPSSMDCPR
registers.
Signed-off-by: Shashank Gupta <shashank.gupta@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Tero Kristo <tero.kristo@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Add logic to detect, report and handle uncorrectable errors reported
through the ERRSOU2 register in QAT GEN4 devices.
Signed-off-by: Shashank Gupta <shashank.gupta@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Tero Kristo <tero.kristo@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Add logic to detect and report uncorrectable errors reported through
the ERRSOU1 register in QAT GEN4 devices.
This also introduces the adf_dev_err_mask structure as part of
adf_hw_device_data which will allow to provide different error masks
per device generation.
Signed-off-by: Shashank Gupta <shashank.gupta@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Tero Kristo <tero.kristo@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|