Age | Commit message (Collapse) | Author |
|
Now that we have a subsystem for compute accelerators, move the
habanalabs driver to it.
This patch only moves the files and fixes the Makefiles. Future
patches will change the existing code to register to the accel
subsystem and expose the accel device char files instead of the
habanalabs device char files.
Update the MAINTAINERS file to reflect this change.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Move the habanalabs.h uapi file from include/uapi/misc to
include/uapi/drm, and rename it to habanalabs_accel.h.
This is required before moving the actual driver to the accel
subsystem.
Update MAINTAINERS file accordingly.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
The dma-buf private object is freed if a call to dma_buf_fd() fails,
and because a file was already associated with the dma-buf in
dma_buf_export(), the release op will be called and will use this
object.
Mark the 'priv' field as NULL in this case, and avoid accessing it from
the release op.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
This patch fixes a bug that was found in the dmabuf flow.
Bug description as found on Gaudi2 device:
1. User allocates 4MB of device memory
- Note that although the allocation size was 4MB the HMMU allocated
a full page of 768MB to back the request.
- The user gets a memory handle that points to a single page (768MB)
- Mapping the handle, the user gets virtual address to the start of
the page.
2. User exports the buffer
3. User registers the exported buffer in the importer. This flow has
a callback to the exporter which in turn converts the phys_page_pack
to an SG list for the importer. This SG list is of single entry of
size 768MB. However, the size that was passed to the importer was
only 4MB.
The solution for this is to make sure the importer gets exposure only
to the exported size.
This will be done by fixing the SG created by the exporter to be of
the total size of the actual exported memory requested by the user.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
A previous commit deprecated the option to export from handle, leaving
the code with no support for devices with virtual memory.
This commit modifies the export API in a way that unifies the uAPI to
user address for both cases (i.e. with and without MMU support) and add
the actual support for devices with virtual memory.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Validate export parameters in a dedicated function instead of in the
main export flow.
This will be useful later when support to export dmabuf for devices
with virtual memory will be added.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
The API to the user which allows exporting DMA buffer from handle is
deprecated here. It was never used as it is relevant only for Gaudi2,
and the user stack has yet to add support for dmabuf in Gaudi2.
Looking forward, a modified API to export DMA buffer for ASICs that
supports virtual memory will be added.
Until the new API will be ready- exporting DMA buffer will not be
supported for ASICs with virtual memory.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
"Here is the large set of char/misc and other driver subsystem changes
for 6.2-rc1. Nothing earth-shattering in here at all, just a lot of
new driver development and minor fixes.
Highlights include:
- fastrpc driver updates
- iio new drivers and updates
- habanalabs driver updates for new hardware and features
- slimbus driver updates
- speakup module parameters added to aid in boot time configuration
- i2c probe_new conversions for lots of different drivers
- other small driver fixes and additions
One semi-interesting change in here is the increase of the number of
misc dynamic minors available to 1048448 to handle new huge-cpu
systems.
All of these have been in linux-next for a while with no reported
problems"
* tag 'char-misc-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (521 commits)
extcon: usbc-tusb320: Convert to i2c's .probe_new()
extcon: rt8973: Convert to i2c's .probe_new()
extcon: fsa9480: Convert to i2c's .probe_new()
extcon: max77843: Replace irqchip mask_invert with unmask_base
chardev: fix error handling in cdev_device_add()
mcb: mcb-parse: fix error handing in chameleon_parse_gdd()
drivers: mcb: fix resource leak in mcb_probe()
coresight: etm4x: fix repeated words in comments
coresight: cti: Fix null pointer error on CTI init before ETM
coresight: trbe: remove cpuhp instance node before remove cpuhp state
counter: stm32-lptimer-cnt: fix the check on arr and cmp registers update
misc: fastrpc: Add dma_mask to fastrpc_channel_ctx
misc: fastrpc: Add mmap request assigning for static PD pool
misc: fastrpc: Safekeep mmaps on interrupted invoke
misc: fastrpc: Add support for audiopd
misc: fastrpc: Rework fastrpc_req_munmap
misc: fastrpc: Use fastrpc_map_put in fastrpc_map_create on fail
misc: fastrpc: Add fastrpc_remote_heap_alloc
misc: fastrpc: Add reserved mem support
misc: fastrpc: Rename audio protection domain to root
...
|
|
FOLL_FORCE is really only for ptrace access. As we unpin the pinned pages
using unpin_user_pages_dirty_lock(true), the assumption is that all these
pages are writable.
FOLL_FORCE in this case seems to be due to copy-and-past from other
drivers. Let's just remove it.
Link: https://lkml.kernel.org/r/20221116102659.70287-20-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Oded Gabbay <ogabbay@kernel.org>
Cc: Oded Gabbay <ogabbay@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Current implementation is fixing the page size to PAGE_SIZE whereas the
input page size may be different.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
To avoid memory corruption in kernel memory while using timestamp
registration nodes, zero the kernel buff memory when its allocated.
Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Use the simplified API that calculates distance between two devices.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
When doing sizeof() and giving as argument a dereference of
a pointer-to-a-pointer object, clang will issue a warning.
Eliminate the warning by passing struct <name>*
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
The code used the mmu mutex to protect access to the context's page
tables and invalidation of the MMU cache. Because pgt are per
context, the mmu mutex was a member of the context object.
The problem is that the device has a single MMU invalidation h/w
(per MMU). Therefore, the mmu mutex should not be a property of the
context but a property of the device.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Set the addresses for userspace command buffer dynamically
instead of hard-coded. There is no reason for it to
be hard-coded.
Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
the size of a block is always 'block->end - block->start + 1'
Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Several munmap() calls can be done or a mapped H/W block that has a
larger size than a page size.
Releasing the object should be done only when all mapped range is
unmapped.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
In hl_hw_block_mmap(), the vma's 'vm_private_data' and 'vm_ops' fields
are assigned before filling the content of the private data.
In between there is a call to the ASIC hw_block_mmap() function, and if
it fails, the vma close function will be called with a bad private data
value.
Fix the order of assignments to avoid this issue.
In hl_hw_block_mmap() the vma's 'vm_private_data and vm_ops are assigned
before setting the
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
map_block() sets the block id handle even if get_hw_block_id() fails,
and in this case it uses block id 0 which might be a valid id.
Modify it to set the handle only if get_hw_block_id() succeeds.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Gaudi2 has new MMU units. A PMMU for device->host accesses, and HMMU
for HBM accesses.
The page tables of both MMUs are located in the host's memory (referred
to in the code as host-resident pgt).
Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Add the ASIC-specific code for Gaudi2. Supply (almost) all of the
function callbacks that the driver's common code need to initialize,
finalize and submit workloads to the Gaudi2 ASIC.
It also contains the code to initialize the F/W of the Gaudi2 ASIC
and to receive events from the F/W.
It contains new debugfs entry to dump razwi events. razwi is a case
where the device's engines create a transaction that reaches an
invalid destination.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Because in future ASICs the driver will allow the user to set the
page size we need to make sure this data is propagated in all APIs.
In addition, since this is already an ASIC property we no longer need
ASIC function for it.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
free_device_memory() ends with if and else, each has a return statement,
followed by another return statement that can never be reached.
Restructure the function and remove this dead code.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
We dropped support for page sizes that are not power of 2.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
There is no need to do memory scrub when unmapping anymore as it is
an overhead as long as we have a single user at any given time.
Remove that code and change return value of free_phys_pg_pack to void
Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
If hl_mmu_prefetch_cache_range() fails then this code calls
mutex_unlock(&ctx->mmu_lock) when it's no longer holding the mutex.
Fixes: 9e495e24003e ("habanalabs: do MMU prefetch as deferred work")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
This argument is unused by the function.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
When user requests to prefetch the MMU translations, the driver will
not block the user until prefetch is done.
Instead, the prefetch work will be delegated to a WQ which will do it
in the background.
This way, the prefetch may progress without blocking the user at all.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Currently, buffers from multiple flows pass through the same infra.
This way, in logs, we are unable to distinguish between buffers that
came from separate flows.
To address this problem, add a "topic" to buffer behavior
descriptor - a string identifier that will be used to identify in logs
the flow this buffer relates to.
Signed-off-by: Yuri Nudelman <ynudelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The new unified memory manager uses page offset to pass buffer handle
during the mmap operation. One problem with this approach is that it
requires the handle to always be divisible by the page size, else, the
user would not be able to pass it correctly as an argument to the mmap
system call.
Previously, this was achieved by shifting the handle left after alloc
operation, and shifting it right before get operation. This was done in
the user code. This creates code duplication, and, what's worse,
requires some knowledge from the user regarding the handle internal
structure, hurting the encapsulation.
This patch encloses all the page shifts inside memory manager functions.
This way, the user can take the handle as a black box, and simply use
it, without any concert about how it actually works.
Signed-off-by: Yuri Nudelman <ynudelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This patch let the user decide whether the translations done in the
page tables will be fetched directly to the STLB right after the map.
We want to let the user control whether to perform prefetch upon map
operation.
To do so a memory flag was added, to be used in the MAP ioctl, called
HL_MEM_PREFETCH and if set- the mappings will be fetched directly to
the STLB after map operation.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Instead of using for_each_sg when iterating sgt that contains dma
entries, use the more proper for_each_sgtable_dma_sg macro.
In addition, both Goya and Gaudi have the exact same implementation
of the asic function that encapsulate the usage of this macro, so
it is better to move that implementation to the common code.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
drivers/misc/habanalabs/common/memory.c:2137:28: warning: symbol 'hl_ts_behavior' was not declared. Should it be static?
Fixes: 4d530e7d121a ("habanalabs: convert ts to use unified memory manager")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: kernel test robot <lkp@intel.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The out of memory message is rephrased to more subtle expression as out
of memory may be caused by the user in case of, for example, greedy
allocation.
In addition the user is also being notified by an error code.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This is necessary pre-requisite for future ASIC support, where MMU
TLB prefetch is supported.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
With the introduction of the unified memory manager infrastructure, the
timestamp buffers can be converted to use it.
Signed-off-by: Yuri Nudelman <ynudelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Looking forward we will need to report to the user what is the default
page size used.
This will be done more conveniently by explicitly updating the property
rather than to rely on a "0 meaning default" value.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
allmodconfig builds on 32-bit architectures fail with the following error.
drivers/misc/habanalabs/common/memory.c: In function 'alloc_device_memory':
drivers/misc/habanalabs/common/memory.c:153:49: error:
cast from pointer to integer of different size
Fix the typecast. While at it, drop other unnecessary typecasts associated
with the same commit.
Fixes: e8458e20e0a3c ("habanalabs: make sure device mem alloc is page aligned")
Cc: Ohad Sharabi <osharabi@habana.ai>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20220404134859.3278599-1-linux@roeck-us.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Working with MMU that supports multiple page sizes requires that mapping
of a page of a certain size will be aligned to the same size (e.g. the
physical address of 32MB page shall be aligned to 32MB).
To achieve this the gen_poll allocation is now using the "align" variant
to comply with the alignment requirements.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
In future ASICs the MMU will be able to work with multiple page sizes,
thus a new flag is added to allow the user to set the requested page
size.
This flag is added since the whole DRAM is allocated for the user and
the user also should be familiar with the memory usage use case.
As such, the user may choose to "over allocate" memory in favor of
performance (for instance- large page allocations covers more memory
in less TLB entries).
For example: say available page sizes are of 1MB and 32MB. If user
wants to allocate 40MB the user can either set page size to 1MB and
allocate the exact amount of memory (but will result in 40 TLB entries)
or the user can use 32MB pages, "waste" 8MB of physical memory but
occupy only 2 TLB entries.
Note that this feature will be available only to ASIC that supports
multiple DRAM page sizes.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Use of vfree(), vmalloc_user(), vmalloc() and remap_vmalloc_range()
requires this include in some architectures.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
When the code iterates over the free list of physical pages nodes, it
deletes the physical page node which is used as the iterator.
Therefore, we need to use the safe version of the iteration to prevent
use-after-free.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
The name of the property is hints_range_reservation
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Timestamp registration API allows the user to register
a timestamp record event which will make the driver set
timestamp when CQ counter reaches the target value
and write it to a specific location specified
by the user.
This is a non blocking API, unlike the wait_for_interrupt
which is a blocking one.
Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
This is not something we can do a workaround. It is clearly an error
and we should notify the user that it is an error.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Freeing phys_pg_pack includes calling to scrubbing functions of the
device's memory, taking locks and possibly even calling reset.
This is not something that should be done while holding a device-wide
spinlock.
Therefore, save the relevant objects on a local linked-list and after
releasing the spinlock, traverse that list and free the phys_pg_pack
objects.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Unify variables related to device reset, which will help us to
add some new reset functionality in future patches.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
-ENOTTY is returned in case of error in the ioctl arguments themselves,
such as function that doesn't exists.
In all other cases, where the error is in the arguments of the custom
data structures that we define that are passed in the various ioctls,
we need to return -EINVAL.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Fix missing fields, descriptions not according to kernel-doc style.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|
|
Currently there is a deadlock in driver in scenarios where MMU
cache invalidation fails. The issue is basically device reset
being performed without releasing the MMU mutex.
The solution is to skip device reset as it is not necessary.
In addition we introduce a slight code refactor that prints the
invalidation error from a single location.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
|