summaryrefslogtreecommitdiff
path: root/drivers/iommu/intel-iommu.c
AgeCommit message (Collapse)Author
2014-03-04iommu/vt-d: Introduce macro for_each_dev_scope() to walk device scope entriesJiang Liu
Introduce for_each_dev_scope()/for_each_active_dev_scope() to walk {active} device scope entries. This will help following RCU lock related patches. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04iommu/vt-d: Fix error in detect ATS capabilityJiang Liu
Current Intel IOMMU driver only matches a PCIe root port with the first DRHD unit with the samge segment number. It will report false result if there are multiple DRHD units with the same segment number, thus fail to detect ATS capability for some PCIe devices. This patch refines function dmar_find_matched_atsr_unit() to search all DRHD units with the same segment number. An example DMAR table entries as below: [1D0h 0464 2] Subtable Type : 0002 <Root Port ATS Capability> [1D2h 0466 2] Length : 0028 [1D4h 0468 1] Flags : 00 [1D5h 0469 1] Reserved : 00 [1D6h 0470 2] PCI Segment Number : 0000 [1D8h 0472 1] Device Scope Entry Type : 02 [1D9h 0473 1] Entry Length : 08 [1DAh 0474 2] Reserved : 0000 [1DCh 0476 1] Enumeration ID : 00 [1DDh 0477 1] PCI Bus Number : 00 [1DEh 0478 2] PCI Path : [02, 00] [1E0h 0480 1] Device Scope Entry Type : 02 [1E1h 0481 1] Entry Length : 08 [1E2h 0482 2] Reserved : 0000 [1E4h 0484 1] Enumeration ID : 00 [1E5h 0485 1] PCI Bus Number : 00 [1E6h 0486 2] PCI Path : [03, 00] [1E8h 0488 1] Device Scope Entry Type : 02 [1E9h 0489 1] Entry Length : 08 [1EAh 0490 2] Reserved : 0000 [1ECh 0492 1] Enumeration ID : 00 [1EDh 0493 1] PCI Bus Number : 00 [1EEh 0494 2] PCI Path : [03, 02] [1F0h 0496 1] Device Scope Entry Type : 02 [1F1h 0497 1] Entry Length : 08 [1F2h 0498 2] Reserved : 0000 [1F4h 0500 1] Enumeration ID : 00 [1F5h 0501 1] PCI Bus Number : 00 [1F6h 0502 2] PCI Path : [03, 03] [1F8h 0504 2] Subtable Type : 0002 <Root Port ATS Capability> [1FAh 0506 2] Length : 0020 [1FCh 0508 1] Flags : 00 [1FDh 0509 1] Reserved : 00 [1FEh 0510 2] PCI Segment Number : 0000 [200h 0512 1] Device Scope Entry Type : 02 [201h 0513 1] Entry Length : 08 [202h 0514 2] Reserved : 0000 [204h 0516 1] Enumeration ID : 00 [205h 0517 1] PCI Bus Number : 40 [206h 0518 2] PCI Path : [02, 00] [208h 0520 1] Device Scope Entry Type : 02 [209h 0521 1] Entry Length : 08 [20Ah 0522 2] Reserved : 0000 [20Ch 0524 1] Enumeration ID : 00 [20Dh 0525 1] PCI Bus Number : 40 [20Eh 0526 2] PCI Path : [02, 02] [210h 0528 1] Device Scope Entry Type : 02 [211h 0529 1] Entry Length : 08 [212h 0530 2] Reserved : 0000 [214h 0532 1] Enumeration ID : 00 [215h 0533 1] PCI Bus Number : 40 [216h 0534 2] PCI Path : [03, 00] [218h 0536 2] Subtable Type : 0002 <Root Port ATS Capability> [21Ah 0538 2] Length : 0020 [21Ch 0540 1] Flags : 00 [21Dh 0541 1] Reserved : 00 [21Eh 0542 2] PCI Segment Number : 0000 [220h 0544 1] Device Scope Entry Type : 02 [221h 0545 1] Entry Length : 08 [222h 0546 2] Reserved : 0000 [224h 0548 1] Enumeration ID : 00 [225h 0549 1] PCI Bus Number : 80 [226h 0550 2] PCI Path : [02, 00] [228h 0552 1] Device Scope Entry Type : 02 [229h 0553 1] Entry Length : 08 [22Ah 0554 2] Reserved : 0000 [22Ch 0556 1] Enumeration ID : 00 [22Dh 0557 1] PCI Bus Number : 80 [22Eh 0558 2] PCI Path : [02, 02] [230h 0560 1] Device Scope Entry Type : 02 [231h 0561 1] Entry Length : 08 [232h 0562 2] Reserved : 0000 [234h 0564 1] Enumeration ID : 00 [235h 0565 1] PCI Bus Number : 80 [236h 0566 2] PCI Path : [03, 00] [238h 0568 2] Subtable Type : 0002 <Root Port ATS Capability> [23Ah 0570 2] Length : 0020 [23Ch 0572 1] Flags : 00 [23Dh 0573 1] Reserved : 00 [23Eh 0574 2] PCI Segment Number : 0000 [240h 0576 1] Device Scope Entry Type : 02 [241h 0577 1] Entry Length : 08 [242h 0578 2] Reserved : 0000 [244h 0580 1] Enumeration ID : 00 [245h 0581 1] PCI Bus Number : C0 [246h 0582 2] PCI Path : [02, 00] [248h 0584 1] Device Scope Entry Type : 02 [249h 0585 1] Entry Length : 08 [24Ah 0586 2] Reserved : 0000 [24Ch 0588 1] Enumeration ID : 00 [24Dh 0589 1] PCI Bus Number : C0 [24Eh 0590 2] PCI Path : [02, 02] [250h 0592 1] Device Scope Entry Type : 02 [251h 0593 1] Entry Length : 08 [252h 0594 2] Reserved : 0000 [254h 0596 1] Enumeration ID : 00 [255h 0597 1] PCI Bus Number : C0 [256h 0598 2] PCI Path : [03, 00] Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04iommu/vt-d: Check for NULL pointer when freeing IOMMU data structureJiang Liu
Domain id 0 will be assigned to invalid translation without allocating domain data structure if DMAR unit supports caching mode. So in function free_dmar_iommu(), we should check whether the domain pointer is NULL, otherwise it will cause system crash as below: [ 6.790519] BUG: unable to handle kernel NULL pointer dereference at 00000000000000c8 [ 6.799520] IP: [<ffffffff810e2dc8>] __lock_acquire+0x11f8/0x1430 [ 6.806493] PGD 0 [ 6.817972] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [ 6.823303] Modules linked in: [ 6.826862] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.14.0-rc1+ #126 [ 6.834252] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRIVTIN1.86B.0047.R00.1402050741 02/05/2014 [ 6.845951] task: ffff880455a80000 ti: ffff880455a88000 task.ti: ffff880455a88000 [ 6.854437] RIP: 0010:[<ffffffff810e2dc8>] [<ffffffff810e2dc8>] __lock_acquire+0x11f8/0x1430 [ 6.864154] RSP: 0000:ffff880455a89ce0 EFLAGS: 00010046 [ 6.870179] RAX: 0000000000000046 RBX: 0000000000000002 RCX: 0000000000000000 [ 6.878249] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000000c8 [ 6.886318] RBP: ffff880455a89d40 R08: 0000000000000002 R09: 0000000000000001 [ 6.894387] R10: 0000000000000000 R11: 0000000000000001 R12: ffff880455a80000 [ 6.902458] R13: 0000000000000000 R14: 00000000000000c8 R15: 0000000000000000 [ 6.910520] FS: 0000000000000000(0000) GS:ffff88045b800000(0000) knlGS:0000000000000000 [ 6.919687] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6.926198] CR2: 00000000000000c8 CR3: 0000000001e0e000 CR4: 00000000001407f0 [ 6.934269] Stack: [ 6.936588] ffffffffffffff10 ffffffff810f59db 0000000000000010 0000000000000246 [ 6.945219] ffff880455a89d10 0000000000000000 ffffffff82bcb980 0000000000000046 [ 6.953850] 0000000000000000 0000000000000000 0000000000000002 0000000000000000 [ 6.962482] Call Trace: [ 6.965300] [<ffffffff810f59db>] ? vprintk_emit+0x4fb/0x5a0 [ 6.971716] [<ffffffff810e3185>] lock_acquire+0x185/0x200 [ 6.977941] [<ffffffff821fbbee>] ? init_dmars+0x839/0xa1d [ 6.984167] [<ffffffff81870b06>] _raw_spin_lock_irqsave+0x56/0x90 [ 6.991158] [<ffffffff821fbbee>] ? init_dmars+0x839/0xa1d [ 6.997380] [<ffffffff821fbbee>] init_dmars+0x839/0xa1d [ 7.003410] [<ffffffff8147d575>] ? pci_get_dev_by_id+0x75/0xd0 [ 7.010119] [<ffffffff821fc146>] intel_iommu_init+0x2f0/0x502 [ 7.016735] [<ffffffff821a7947>] ? iommu_setup+0x27d/0x27d [ 7.023056] [<ffffffff821a796f>] pci_iommu_init+0x28/0x52 [ 7.029282] [<ffffffff81002162>] do_one_initcall+0xf2/0x220 [ 7.035702] [<ffffffff810a4a29>] ? parse_args+0x2c9/0x450 [ 7.041919] [<ffffffff8219d1b1>] kernel_init_freeable+0x1c9/0x25b [ 7.048919] [<ffffffff8219c8d2>] ? do_early_param+0x8a/0x8a [ 7.055336] [<ffffffff8184d3f0>] ? rest_init+0x150/0x150 [ 7.061461] [<ffffffff8184d3fe>] kernel_init+0xe/0x100 [ 7.067393] [<ffffffff8187b5fc>] ret_from_fork+0x7c/0xb0 [ 7.073518] [<ffffffff8184d3f0>] ? rest_init+0x150/0x150 [ 7.079642] Code: 01 76 18 89 05 46 04 36 01 41 be 01 00 00 00 e9 2f 02 00 00 0f 1f 80 00 00 00 00 41 be 01 00 00 00 e9 1d 02 00 00 0f 1f 44 00 00 <49> 81 3e c0 31 34 82 b8 01 00 00 00 0f 44 d8 41 83 ff 01 0f 87 [ 7.104944] RIP [<ffffffff810e2dc8>] __lock_acquire+0x11f8/0x1430 [ 7.112008] RSP <ffff880455a89ce0> [ 7.115988] CR2: 00000000000000c8 [ 7.119784] ---[ end trace 13d756f0f462c538 ]--- [ 7.125034] note: swapper/0[1] exited with preempt_count 1 [ 7.131285] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 [ 7.131285] Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04iommu/vt-d: Fix incorrect iommu_count for si_domainJiang Liu
The iommu_count field in si_domain(static identity domain) is initialized to zero and never increases. It will underflow when tearing down iommu unit in function free_dmar_iommu() and leak memory. So refine code to correctly manage si_domain->iommu_count. Warning message caused by si_domain memory leak: [ 14.609681] IOMMU: Setting RMRR: [ 14.613496] Ignoring identity map for HW passthrough device 0000:00:1a.0 [0xbdcfd000 - 0xbdd1dfff] [ 14.623809] Ignoring identity map for HW passthrough device 0000:00:1d.0 [0xbdcfd000 - 0xbdd1dfff] [ 14.634162] IOMMU: Prepare 0-16MiB unity mapping for LPC [ 14.640329] Ignoring identity map for HW passthrough device 0000:00:1f.0 [0x0 - 0xffffff] [ 14.673360] IOMMU: dmar init failed [ 14.678157] kmem_cache_destroy iommu_devinfo: Slab cache still has objects [ 14.686076] CPU: 12 PID: 1 Comm: swapper/0 Not tainted 3.13.0-rc1-gerry+ #59 [ 14.694176] Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.99.99.x059.091020121352 09/10/2012 [ 14.707412] 0000000000000000 ffff88042dd33db0 ffffffff8156223d ffff880c2cc37c00 [ 14.716407] ffff88042dd33dc8 ffffffff811790b1 ffff880c2d3533b8 ffff88042dd33e00 [ 14.725468] ffffffff81dc7a6a ffffffff81b1e8e0 ffffffff81f84058 ffffffff81d8a711 [ 14.734464] Call Trace: [ 14.737453] [<ffffffff8156223d>] dump_stack+0x4d/0x66 [ 14.743430] [<ffffffff811790b1>] kmem_cache_destroy+0xf1/0x100 [ 14.750279] [<ffffffff81dc7a6a>] intel_iommu_init+0x122/0x56a [ 14.757035] [<ffffffff81d8a711>] ? iommu_setup+0x27d/0x27d [ 14.763491] [<ffffffff81d8a739>] pci_iommu_init+0x28/0x52 [ 14.769846] [<ffffffff81000342>] do_one_initcall+0x122/0x180 [ 14.776506] [<ffffffff81077738>] ? parse_args+0x1e8/0x320 [ 14.782866] [<ffffffff81d850e8>] kernel_init_freeable+0x1e1/0x26c [ 14.789994] [<ffffffff81d84833>] ? do_early_param+0x88/0x88 [ 14.796556] [<ffffffff8154ffc0>] ? rest_init+0xd0/0xd0 [ 14.802626] [<ffffffff8154ffce>] kernel_init+0xe/0x130 [ 14.808698] [<ffffffff815756ac>] ret_from_fork+0x7c/0xb0 [ 14.814963] [<ffffffff8154ffc0>] ? rest_init+0xd0/0xd0 [ 14.821640] kmem_cache_destroy iommu_domain: Slab cache still has objects [ 14.829456] CPU: 12 PID: 1 Comm: swapper/0 Not tainted 3.13.0-rc1-gerry+ #59 [ 14.837562] Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.99.99.x059.091020121352 09/10/2012 [ 14.850803] 0000000000000000 ffff88042dd33db0 ffffffff8156223d ffff88102c1ee3c0 [ 14.861222] ffff88042dd33dc8 ffffffff811790b1 ffff880c2d3533b8 ffff88042dd33e00 [ 14.870284] ffffffff81dc7a76 ffffffff81b1e8e0 ffffffff81f84058 ffffffff81d8a711 [ 14.879271] Call Trace: [ 14.882227] [<ffffffff8156223d>] dump_stack+0x4d/0x66 [ 14.888197] [<ffffffff811790b1>] kmem_cache_destroy+0xf1/0x100 [ 14.895034] [<ffffffff81dc7a76>] intel_iommu_init+0x12e/0x56a [ 14.901781] [<ffffffff81d8a711>] ? iommu_setup+0x27d/0x27d [ 14.908238] [<ffffffff81d8a739>] pci_iommu_init+0x28/0x52 [ 14.914594] [<ffffffff81000342>] do_one_initcall+0x122/0x180 [ 14.921244] [<ffffffff81077738>] ? parse_args+0x1e8/0x320 [ 14.927598] [<ffffffff81d850e8>] kernel_init_freeable+0x1e1/0x26c [ 14.934738] [<ffffffff81d84833>] ? do_early_param+0x88/0x88 [ 14.941309] [<ffffffff8154ffc0>] ? rest_init+0xd0/0xd0 [ 14.947380] [<ffffffff8154ffce>] kernel_init+0xe/0x130 [ 14.953430] [<ffffffff815756ac>] ret_from_fork+0x7c/0xb0 [ 14.959689] [<ffffffff8154ffc0>] ? rest_init+0xd0/0xd0 [ 14.966299] kmem_cache_destroy iommu_iova: Slab cache still has objects [ 14.973923] CPU: 12 PID: 1 Comm: swapper/0 Not tainted 3.13.0-rc1-gerry+ #59 [ 14.982020] Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.99.99.x059.091020121352 09/10/2012 [ 14.995263] 0000000000000000 ffff88042dd33db0 ffffffff8156223d ffff88042cb5c980 [ 15.004265] ffff88042dd33dc8 ffffffff811790b1 ffff880c2d3533b8 ffff88042dd33e00 [ 15.013322] ffffffff81dc7a82 ffffffff81b1e8e0 ffffffff81f84058 ffffffff81d8a711 [ 15.022318] Call Trace: [ 15.025238] [<ffffffff8156223d>] dump_stack+0x4d/0x66 [ 15.031202] [<ffffffff811790b1>] kmem_cache_destroy+0xf1/0x100 [ 15.038038] [<ffffffff81dc7a82>] intel_iommu_init+0x13a/0x56a [ 15.044786] [<ffffffff81d8a711>] ? iommu_setup+0x27d/0x27d [ 15.051242] [<ffffffff81d8a739>] pci_iommu_init+0x28/0x52 [ 15.057601] [<ffffffff81000342>] do_one_initcall+0x122/0x180 [ 15.064254] [<ffffffff81077738>] ? parse_args+0x1e8/0x320 [ 15.070608] [<ffffffff81d850e8>] kernel_init_freeable+0x1e1/0x26c [ 15.077747] [<ffffffff81d84833>] ? do_early_param+0x88/0x88 [ 15.084300] [<ffffffff8154ffc0>] ? rest_init+0xd0/0xd0 [ 15.090362] [<ffffffff8154ffce>] kernel_init+0xe/0x130 [ 15.096431] [<ffffffff815756ac>] ret_from_fork+0x7c/0xb0 [ 15.102693] [<ffffffff8154ffc0>] ? rest_init+0xd0/0xd0 [ 15.189273] PCI-DMA: Using software bounce buffering for IO (SWIOTLB) Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04iommu/vt-d: Reduce duplicated code to handle virtual machine domainsJiang Liu
Reduce duplicated code to handle virtual machine domains, there's no functionality changes. It also improves code readability. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04iommu/vt-d: Free resources if failed to create domain for PCIe endpointJiang Liu
Enhance function get_domain_for_dev() to release allocated resources if failed to create domain for PCIe endpoint, otherwise the allocated resources will get lost. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04iommu/vt-d: Simplify function get_domain_for_dev()Jiang Liu
Function get_domain_for_dev() is a little complex, simplify it by factoring out dmar_search_domain_by_dev_info() and dmar_insert_dev_info(). Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04iommu/vt-d: Move private structures and variables into intel-iommu.cJiang Liu
Move private structures and variables into intel-iommu.c, which will help to simplify locking policy for hotplug. Also delete redundant declarations. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04iommu/vt-d: Avoid caching stale domain_device_info when hot-removing PCI deviceJiang Liu
Function device_notifier() in intel-iommu.c only remove domain_device_info data structure associated with a PCI device when handling PCI device driver unbinding events. If a PCI device has never been bound to a PCI device driver, there won't be BUS_NOTIFY_UNBOUND_DRIVER event when hot-removing the PCI device. So associated domain_device_info data structure may get lost. On the other hand, if iommu_pass_through is enabled, function iommu_prepare_static_indentify_mapping() will create domain_device_info data structure for each PCIe to PCIe bridge and PCIe endpoint, no matter whether there are drivers associated with those PCIe devices or not. So those domain_device_info data structures will get lost when hot-removing the assocated PCIe devices if they have never bound to any PCI device driver. To be even worse, it's not only an memory leak issue, but also an caching of stale information bug because the memory are kept in device_domain_list and domain->devices lists. Fix the bug by trying to remove domain_device_info data structure when handling BUS_NOTIFY_DEL_DEVICE event. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04iommu/vt-d: Avoid caching stale domain_device_info and fix memory leakJiang Liu
Function device_notifier() in intel-iommu.c fails to remove device_domain_info data structures for PCI devices if they are associated with si_domain because iommu_no_mapping() returns true for those PCI devices. This will cause memory leak and caching of stale information in domain->devices list. So fix the issue by not calling iommu_no_mapping() and skipping check of iommu_pass_through. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-03-04iommu/vt-d: Avoid double free of g_iommus on error recovery pathJiang Liu
Array 'g_iommus' may be freed twice on error recovery path in function init_dmars() and free_dmar_iommu(), thus cause random system crash as below. [ 6.774301] IOMMU: dmar init failed [ 6.778310] PCI-DMA: Using software bounce buffering for IO (SWIOTLB) [ 6.785615] software IO TLB [mem 0x76bcf000-0x7abcf000] (64MB) mapped at [ffff880076bcf000-ffff88007abcefff] [ 6.796887] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC [ 6.804173] Modules linked in: [ 6.807731] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.14.0-rc1+ #108 [ 6.815122] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRIVTIN1.86B.0047.R00.1402050741 02/05/2014 [ 6.836000] task: ffff880455a80000 ti: ffff880455a88000 task.ti: ffff880455a88000 [ 6.844487] RIP: 0010:[<ffffffff8143eea6>] [<ffffffff8143eea6>] memcpy+0x6/0x110 [ 6.853039] RSP: 0000:ffff880455a89cc8 EFLAGS: 00010293 [ 6.859064] RAX: ffff006568636163 RBX: ffff00656863616a RCX: 0000000000000005 [ 6.867134] RDX: 0000000000000005 RSI: ffffffff81cdc439 RDI: ffff006568636163 [ 6.875205] RBP: ffff880455a89d30 R08: 000000000001bc3b R09: 0000000000000000 [ 6.883275] R10: 0000000000000000 R11: ffffffff81cdc43e R12: ffff880455a89da8 [ 6.891338] R13: ffff006568636163 R14: 0000000000000005 R15: ffffffff81cdc439 [ 6.899408] FS: 0000000000000000(0000) GS:ffff88045b800000(0000) knlGS:0000000000000000 [ 6.908575] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6.915088] CR2: ffff88047e1ff000 CR3: 0000000001e0e000 CR4: 00000000001407f0 [ 6.923160] Stack: [ 6.925487] ffffffff8143c904 ffff88045b407e00 ffff006568636163 ffff006568636163 [ 6.934113] ffffffff8120a1a9 ffffffff81cdc43e 0000000000000007 0000000000000000 [ 6.942747] ffff880455a89da8 ffff006568636163 0000000000000007 ffffffff81cdc439 [ 6.951382] Call Trace: [ 6.954197] [<ffffffff8143c904>] ? vsnprintf+0x124/0x6f0 [ 6.960323] [<ffffffff8120a1a9>] ? __kmalloc_track_caller+0x169/0x360 [ 6.967716] [<ffffffff81440e1b>] kvasprintf+0x6b/0x80 [ 6.973552] [<ffffffff81432bf1>] kobject_set_name_vargs+0x21/0x70 [ 6.980552] [<ffffffff8143393d>] kobject_init_and_add+0x4d/0x90 [ 6.987364] [<ffffffff812067c9>] ? __kmalloc+0x169/0x370 [ 6.993492] [<ffffffff8102dbbc>] ? cache_add_dev+0x17c/0x4f0 [ 7.000005] [<ffffffff8102ddfa>] cache_add_dev+0x3ba/0x4f0 [ 7.006327] [<ffffffff821a87ca>] ? i8237A_init_ops+0x14/0x14 [ 7.012842] [<ffffffff821a87f8>] cache_sysfs_init+0x2e/0x61 [ 7.019260] [<ffffffff81002162>] do_one_initcall+0xf2/0x220 [ 7.025679] [<ffffffff810a4a29>] ? parse_args+0x2c9/0x450 [ 7.031903] [<ffffffff8219d1b1>] kernel_init_freeable+0x1c9/0x25b [ 7.038904] [<ffffffff8219c8d2>] ? do_early_param+0x8a/0x8a [ 7.045322] [<ffffffff8184d5e0>] ? rest_init+0x150/0x150 [ 7.051447] [<ffffffff8184d5ee>] kernel_init+0xe/0x100 [ 7.057380] [<ffffffff8187b87c>] ret_from_fork+0x7c/0xb0 [ 7.063503] [<ffffffff8184d5e0>] ? rest_init+0x150/0x150 [ 7.069628] Code: 89 e5 53 48 89 fb 75 16 80 7f 3c 00 75 05 e8 d2 f9 ff ff 48 8b 43 58 48 2b 43 50 88 43 4e 5b 5d c3 90 90 90 90 48 89 f8 48 89 d1 <f3> a4 c3 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b [ 7.094960] RIP [<ffffffff8143eea6>] memcpy+0x6/0x110 [ 7.100856] RSP <ffff880455a89cc8> [ 7.104864] ---[ end trace b5d3fdc6c6c28083 ]--- [ 7.110142] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 7.110142] [ 7.120540] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff) Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-01-29Merge tag 'iommu-updates-v3.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU Updates from Joerg Roedel: "A few patches have been queued up for this merge window: - improvements for the ARM-SMMU driver (IOMMU_EXEC support, IOMMU group support) - updates and fixes for the shmobile IOMMU driver - various fixes to generic IOMMU code and the Intel IOMMU driver - some cleanups in IOMMU drivers (dev_is_pci() usage)" * tag 'iommu-updates-v3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (36 commits) iommu/vt-d: Fix signedness bug in alloc_irte() iommu/vt-d: free all resources if failed to initialize DMARs iommu/vt-d, trivial: clean sparse warnings iommu/vt-d: fix wrong return value of dmar_table_init() iommu/vt-d: release invalidation queue when destroying IOMMU unit iommu/vt-d: fix access after free issue in function free_dmar_iommu() iommu/vt-d: keep shared resources when failed to initialize iommu devices iommu/vt-d: fix invalid memory access when freeing DMAR irq iommu/vt-d, trivial: simplify code with existing macros iommu/vt-d, trivial: use defined macro instead of hardcoding iommu/vt-d: mark internal functions as static iommu/vt-d, trivial: clean up unused code iommu/vt-d, trivial: check suitable flag in function detect_intel_iommu() iommu/vt-d, trivial: print correct domain id of static identity domain iommu/vt-d, trivial: refine support of 64bit guest address iommu/vt-d: fix resource leakage on error recovery path in iommu_init_domains() iommu/vt-d: fix a race window in allocating domain ID for virtual machines iommu/vt-d: fix PCI device reference leakage on error recovery path drm/msm: Fix link error with !MSM_IOMMU iommu/vt-d: use dedicated bitmap to track remapping entry allocation status ...
2014-01-21intel-iommu: fix off-by-one in pagetable freeingAlex Williamson
dma_pte_free_level() has an off-by-one error when checking whether a pte is completely covered by a range. Take for example the case of attempting to free pfn 0x0 - 0x1ff, ie. 512 entries covering the first 2M superpage. The level_size() is 0x200 and we test: static void dma_pte_free_level(... ... if (!(0 > 0 || 0x1ff < 0 + 0x200)) { ... } Clearly the 2nd test is true, which means we fail to take the branch to clear and free the pagetable entry. As a result, we're leaking pagetables and failing to install new pages over the range. This was found with a PCI device assigned to a QEMU guest using vfio-pci without a VGA device present. The first 1M of guest address space is mapped with various combinations of 4K pages, but eventually the range is entirely freed and replaced with a 2M contiguous mapping. intel-iommu errors out with something like: ERROR: DMA PTE for vPFN 0x0 already set (to 5c2b8003 not 849c00083) In this case 5c2b8003 is the pointer to the previous leaf page that was neither freed nor cleared and 849c00083 is the superpage entry that we're trying to replace it with. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-09iommu/vt-d: free all resources if failed to initialize DMARsJiang Liu
Enhance intel_iommu_init() to free all resources if failed to initialize DMAR hardware. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-01-09iommu/vt-d, trivial: clean sparse warningsJiang Liu
Clean up most sparse warnings in Intel DMA and interrupt remapping drivers. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-01-09iommu/vt-d: fix access after free issue in function free_dmar_iommu()Jiang Liu
Function free_dmar_iommu() may access domain->iommu_lock by spin_unlock_irqrestore(&domain->iommu_lock, flags); after freeing corresponding domain structure. Sample stack dump: [ 8.912818] ========================= [ 8.917072] [ BUG: held lock freed! ] [ 8.921335] 3.13.0-rc1-gerry+ #12 Not tainted [ 8.926375] ------------------------- [ 8.930629] swapper/0/1 is freeing memory ffff880c23b56040-ffff880c23b5613f, with a lock still held there! [ 8.941675] (&(&domain->iommu_lock)->rlock){......}, at: [<ffffffff81dc775c>] init_dmars+0x72c/0x95b [ 8.952582] 1 lock held by swapper/0/1: [ 8.957031] #0: (&(&domain->iommu_lock)->rlock){......}, at: [<ffffffff81dc775c>] init_dmars+0x72c/0x95b [ 8.968487] [ 8.968487] stack backtrace: [ 8.973602] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-rc1-gerry+ #12 [ 8.981556] Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.99.99.x059.091020121352 09/10/2012 [ 8.994742] ffff880c23b56040 ffff88042dd33c98 ffffffff815617fd ffff88042dd38b28 [ 9.003566] ffff88042dd33cd0 ffffffff810a977a ffff880c23b56040 0000000000000086 [ 9.012403] ffff88102c4923c0 ffff88042ddb4800 ffffffff81b1e8c0 ffff88042dd33d28 [ 9.021240] Call Trace: [ 9.024138] [<ffffffff815617fd>] dump_stack+0x4d/0x66 [ 9.030057] [<ffffffff810a977a>] debug_check_no_locks_freed+0x15a/0x160 [ 9.037723] [<ffffffff811aa1c2>] kmem_cache_free+0x62/0x5b0 [ 9.044225] [<ffffffff81465e27>] domain_exit+0x197/0x1c0 [ 9.050418] [<ffffffff81dc7788>] init_dmars+0x758/0x95b [ 9.056527] [<ffffffff81dc7dfa>] intel_iommu_init+0x351/0x438 [ 9.063207] [<ffffffff81d8a711>] ? iommu_setup+0x27d/0x27d [ 9.069601] [<ffffffff81d8a739>] pci_iommu_init+0x28/0x52 [ 9.075910] [<ffffffff81000342>] do_one_initcall+0x122/0x180 [ 9.082509] [<ffffffff81077738>] ? parse_args+0x1e8/0x320 [ 9.088815] [<ffffffff81d850e8>] kernel_init_freeable+0x1e1/0x26c [ 9.095895] [<ffffffff81d84833>] ? do_early_param+0x88/0x88 [ 9.102396] [<ffffffff8154f580>] ? rest_init+0xd0/0xd0 [ 9.108410] [<ffffffff8154f58e>] kernel_init+0xe/0x130 [ 9.114423] [<ffffffff81574a2c>] ret_from_fork+0x7c/0xb0 [ 9.120612] [<ffffffff8154f580>] ? rest_init+0xd0/0xd0 Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-01-09iommu/vt-d: keep shared resources when failed to initialize iommu devicesJiang Liu
Data structure drhd->iommu is shared between DMA remapping driver and interrupt remapping driver, so DMA remapping driver shouldn't release drhd->iommu when it failed to initialize IOMMU devices. Otherwise it may cause invalid memory access to the interrupt remapping driver. Sample stack dump: [ 13.315090] BUG: unable to handle kernel paging request at ffffc9000605a088 [ 13.323221] IP: [<ffffffff81461bac>] qi_submit_sync+0x15c/0x400 [ 13.330107] PGD 82f81e067 PUD c2f81e067 PMD 82e846067 PTE 0 [ 13.336818] Oops: 0002 [#1] SMP [ 13.340757] Modules linked in: [ 13.344422] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 3.13.0-rc1-gerry+ #7 [ 13.352474] Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.99.99.x059.091020121352 09/10/2012 [ 13.365659] Workqueue: events work_for_cpu_fn [ 13.370774] task: ffff88042ddf00d0 ti: ffff88042ddee000 task.ti: ffff88042dde e000 [ 13.379389] RIP: 0010:[<ffffffff81461bac>] [<ffffffff81461bac>] qi_submit_sy nc+0x15c/0x400 [ 13.389055] RSP: 0000:ffff88042ddef940 EFLAGS: 00010002 [ 13.395151] RAX: 00000000000005e0 RBX: 0000000000000082 RCX: 0000000200000025 [ 13.403308] RDX: ffffc9000605a000 RSI: 0000000000000010 RDI: ffff88042ddb8610 [ 13.411446] RBP: ffff88042ddef9a0 R08: 00000000000005d0 R09: 0000000000000001 [ 13.419599] R10: 0000000000000000 R11: 000000000000005d R12: 000000000000005c [ 13.427742] R13: ffff88102d84d300 R14: 0000000000000174 R15: ffff88042ddb4800 [ 13.435877] FS: 0000000000000000(0000) GS:ffff88043de00000(0000) knlGS:00000 00000000000 [ 13.445168] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 13.451749] CR2: ffffc9000605a088 CR3: 0000000001a0b000 CR4: 00000000000407f0 [ 13.459895] Stack: [ 13.462297] ffff88042ddb85d0 000000000000005d ffff88042ddef9b0 0000000000000 5d0 [ 13.471147] 00000000000005c0 ffff88042ddb8000 000000000000005c 0000000000000 015 [ 13.480001] ffff88042ddb4800 0000000000000282 ffff88042ddefa40 ffff88042ddef ac0 [ 13.488855] Call Trace: [ 13.491771] [<ffffffff8146848d>] modify_irte+0x9d/0xd0 [ 13.497778] [<ffffffff8146886d>] intel_setup_ioapic_entry+0x10d/0x290 [ 13.505250] [<ffffffff810a92a6>] ? trace_hardirqs_on_caller+0x16/0x1e0 [ 13.512824] [<ffffffff810346b0>] ? default_init_apic_ldr+0x60/0x60 [ 13.519998] [<ffffffff81468be0>] setup_ioapic_remapped_entry+0x20/0x30 [ 13.527566] [<ffffffff8103683a>] io_apic_setup_irq_pin+0x12a/0x2c0 [ 13.534742] [<ffffffff8136673b>] ? acpi_pci_irq_find_prt_entry+0x2b9/0x2d8 [ 13.544102] [<ffffffff81037fd5>] io_apic_setup_irq_pin_once+0x85/0xa0 [ 13.551568] [<ffffffff8103816f>] ? mp_find_ioapic_pin+0x8f/0xf0 [ 13.558434] [<ffffffff81038044>] io_apic_set_pci_routing+0x34/0x70 [ 13.565621] [<ffffffff8102f4cf>] mp_register_gsi+0xaf/0x1c0 [ 13.572111] [<ffffffff8102f5ee>] acpi_register_gsi_ioapic+0xe/0x10 [ 13.579286] [<ffffffff8102f33f>] acpi_register_gsi+0xf/0x20 [ 13.585779] [<ffffffff81366b86>] acpi_pci_irq_enable+0x171/0x1e3 [ 13.592764] [<ffffffff8146d771>] pcibios_enable_device+0x31/0x40 [ 13.599744] [<ffffffff81320e9b>] do_pci_enable_device+0x3b/0x60 [ 13.606633] [<ffffffff81322248>] pci_enable_device_flags+0xc8/0x120 [ 13.613887] [<ffffffff813222f3>] pci_enable_device+0x13/0x20 [ 13.620484] [<ffffffff8132fa7e>] pcie_port_device_register+0x1e/0x510 [ 13.627947] [<ffffffff810a92a6>] ? trace_hardirqs_on_caller+0x16/0x1e0 [ 13.635510] [<ffffffff810a947d>] ? trace_hardirqs_on+0xd/0x10 [ 13.642189] [<ffffffff813302b8>] pcie_portdrv_probe+0x58/0xc0 [ 13.648877] [<ffffffff81323ba5>] local_pci_probe+0x45/0xa0 [ 13.655266] [<ffffffff8106bc44>] work_for_cpu_fn+0x14/0x20 [ 13.661656] [<ffffffff8106fa79>] process_one_work+0x369/0x710 [ 13.668334] [<ffffffff8106fa02>] ? process_one_work+0x2f2/0x710 [ 13.675215] [<ffffffff81071d56>] ? worker_thread+0x46/0x690 [ 13.681714] [<ffffffff81072194>] worker_thread+0x484/0x690 [ 13.688109] [<ffffffff81071d10>] ? cancel_delayed_work_sync+0x20/0x20 [ 13.695576] [<ffffffff81079c60>] kthread+0xf0/0x110 [ 13.701300] [<ffffffff8108e7bf>] ? local_clock+0x3f/0x50 [ 13.707492] [<ffffffff81079b70>] ? kthread_create_on_node+0x250/0x250 [ 13.714959] [<ffffffff81574d2c>] ret_from_fork+0x7c/0xb0 [ 13.721152] [<ffffffff81079b70>] ? kthread_create_on_node+0x250/0x250 Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-01-09iommu/vt-d: fix invalid memory access when freeing DMAR irqJiang Liu
In function free_dmar_iommu(), it sets IRQ handler data to NULL before calling free_irq(), which will cause invalid memory access because free_irq() will access IRQ handler data when calling function dmar_msi_mask(). So only set IRQ handler data to NULL after calling free_irq(). Sample stack dump: [ 13.094010] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 [ 13.103215] IP: [<ffffffff810a97cd>] __lock_acquire+0x4d/0x12a0 [ 13.110104] PGD 0 [ 13.112614] Oops: 0000 [#1] SMP [ 13.116585] Modules linked in: [ 13.120260] CPU: 60 PID: 1 Comm: swapper/0 Tainted: G W 3.13.0-rc1-gerry+ #9 [ 13.129367] Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.99.99.x059.091020121352 09/10/2012 [ 13.142555] task: ffff88042dd38010 ti: ffff88042dd32000 task.ti: ffff88042dd32000 [ 13.151179] RIP: 0010:[<ffffffff810a97cd>] [<ffffffff810a97cd>] __lock_acquire+0x4d/0x12a0 [ 13.160867] RSP: 0000:ffff88042dd33b78 EFLAGS: 00010046 [ 13.166969] RAX: 0000000000000046 RBX: 0000000000000002 RCX: 0000000000000000 [ 13.175122] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000048 [ 13.183274] RBP: ffff88042dd33bd8 R08: 0000000000000002 R09: 0000000000000001 [ 13.191417] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88042dd38010 [ 13.199571] R13: 0000000000000000 R14: 0000000000000048 R15: 0000000000000000 [ 13.207725] FS: 0000000000000000(0000) GS:ffff88103f200000(0000) knlGS:0000000000000000 [ 13.217014] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 13.223596] CR2: 0000000000000048 CR3: 0000000001a0b000 CR4: 00000000000407e0 [ 13.231747] Stack: [ 13.234160] 0000000000000004 0000000000000046 ffff88042dd33b98 ffffffff810a567d [ 13.243059] ffff88042dd33c08 ffffffff810bb14c ffffffff828995a0 0000000000000046 [ 13.251969] 0000000000000000 0000000000000000 0000000000000002 0000000000000000 [ 13.260862] Call Trace: [ 13.263775] [<ffffffff810a567d>] ? trace_hardirqs_off+0xd/0x10 [ 13.270571] [<ffffffff810bb14c>] ? vprintk_emit+0x23c/0x570 [ 13.277058] [<ffffffff810ab1e3>] lock_acquire+0x93/0x120 [ 13.283269] [<ffffffff814623f7>] ? dmar_msi_mask+0x47/0x70 [ 13.289677] [<ffffffff8156b449>] _raw_spin_lock_irqsave+0x49/0x90 [ 13.296748] [<ffffffff814623f7>] ? dmar_msi_mask+0x47/0x70 [ 13.303153] [<ffffffff814623f7>] dmar_msi_mask+0x47/0x70 [ 13.309354] [<ffffffff810c0d93>] irq_shutdown+0x53/0x60 [ 13.315467] [<ffffffff810bdd9d>] __free_irq+0x26d/0x280 [ 13.321580] [<ffffffff810be920>] free_irq+0xf0/0x180 [ 13.327395] [<ffffffff81466591>] free_dmar_iommu+0x271/0x2b0 [ 13.333996] [<ffffffff810a947d>] ? trace_hardirqs_on+0xd/0x10 [ 13.340696] [<ffffffff81461a17>] free_iommu+0x17/0x50 [ 13.346597] [<ffffffff81dc75a5>] init_dmars+0x691/0x77a [ 13.352711] [<ffffffff81dc7afd>] intel_iommu_init+0x351/0x438 [ 13.359400] [<ffffffff81d8a711>] ? iommu_setup+0x27d/0x27d [ 13.365806] [<ffffffff81d8a739>] pci_iommu_init+0x28/0x52 [ 13.372114] [<ffffffff81000342>] do_one_initcall+0x122/0x180 [ 13.378707] [<ffffffff81077738>] ? parse_args+0x1e8/0x320 [ 13.385016] [<ffffffff81d850e8>] kernel_init_freeable+0x1e1/0x26c [ 13.392100] [<ffffffff81d84833>] ? do_early_param+0x88/0x88 [ 13.398596] [<ffffffff8154f8b0>] ? rest_init+0xd0/0xd0 [ 13.404614] [<ffffffff8154f8be>] kernel_init+0xe/0x130 [ 13.410626] [<ffffffff81574d6c>] ret_from_fork+0x7c/0xb0 [ 13.416829] [<ffffffff8154f8b0>] ? rest_init+0xd0/0xd0 [ 13.422842] Code: ec 99 00 85 c0 8b 05 53 05 a5 00 41 0f 45 d8 85 c0 0f 84 ff 00 00 00 8b 05 99 f9 7e 01 49 89 fe 41 89 f7 85 c0 0f 84 03 01 00 00 <49> 8b 06 be 01 00 00 00 48 3d c0 0e 01 82 0f 44 de 41 83 ff 01 [ 13.450191] RIP [<ffffffff810a97cd>] __lock_acquire+0x4d/0x12a0 [ 13.458598] RSP <ffff88042dd33b78> [ 13.462671] CR2: 0000000000000048 [ 13.466551] ---[ end trace c5bd26a37c81d760 ]--- Reviewed-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-01-09iommu/vt-d, trivial: simplify code with existing macrosJiang Liu
Simplify vt-d related code with existing macros and introduce a new macro for_each_active_drhd_unit() to enumerate all active DRHD unit. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-01-09iommu/vt-d, trivial: clean up unused codeJiang Liu
Remove dead code from VT-d related files. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Joerg Roedel <joro@8bytes.org> Conflicts: drivers/iommu/dmar.c
2014-01-09iommu/vt-d, trivial: print correct domain id of static identity domainJiang Liu
Field si_domain->id is set by iommu_attach_domain(), so we should only print domain id for static identity domain after calling iommu_attach_domain(si_domain, iommu), otherwise it's always zero. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-01-09iommu/vt-d, trivial: refine support of 64bit guest addressJiang Liu
In Intel IOMMU driver, it calculate page table level from adjusted guest address width as 'level = (agaw - 30) / 9', which assumes (agaw -30) could be divided by 9. On the other hand, 64bit is a valid agaw and (64 - 30) can't be divided by 9, so it needs special handling. This patch enhances Intel IOMMU driver to correctly handle 64bit agaw. It's mainly for code readability because there's no hardware supporting 64bit agaw yet. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-01-09iommu/vt-d: fix resource leakage on error recovery path in iommu_init_domains()Jiang Liu
Release allocated resources on error recovery path in function iommu_init_domains(). Also improve printk messages in iommu_init_domains(). Acked-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-01-09iommu/vt-d: fix a race window in allocating domain ID for virtual machinesJiang Liu
Function intel_iommu_domain_init() may be concurrently called by upper layer without serialization, so use atomic_t to protect domain id allocation. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-01-07iommu/vt-d: Use dev_is_pci() to check whether it is pci deviceYijing Wang
Use PCI standard marco dev_is_pci() instead of directly compare pci_bus_type to check whether it is pci device. Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Joerg Roedel <joro@8bytes.org>
2013-11-01iommu/vt-d: Use list_for_each_entry_safe() for dmar_domain->devices traversalYijing Wang
Replace list_for_each_safe() + list_entry() with the simpler list_for_each_entry_safe(). Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Joerg Roedel <joro@8bytes.org>
2013-11-01iommu/vt-d: Fixed interaction of VFIO_IOMMU_MAP_DMA with IOMMU address limitsJulian Stecklina
The BUG_ON in drivers/iommu/intel-iommu.c:785 can be triggered from userspace via VFIO by calling the VFIO_IOMMU_MAP_DMA ioctl on a vfio device with any address beyond the addressing capabilities of the IOMMU. The problem is that the ioctl code calls iommu_iova_to_phys before it calls iommu_map. iommu_map handles the case that it gets addresses beyond the addressing capabilities of its IOMMU. intel_iommu_iova_to_phys does not. This patch fixes iommu_iova_to_phys to return NULL for addresses beyond what the IOMMU can handle. This in turn causes the ioctl call to fail in iommu_map and (correctly) return EFAULT to the user with a helpful warning message in the kernel log. Signed-off-by: Julian Stecklina <jsteckli@os.inf.tu-dresden.de> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Joerg Roedel <joro@8bytes.org>
2013-08-14intel-iommu: Fix leaks in pagetable freeingAlex Williamson
At best the current code only seems to free the leaf pagetables and the root. If you're unlucky enough to have a large gap (like any QEMU guest with more than 3G of memory), only the first chunk of leaf pagetables are freed (plus the root). This is a massive memory leak. This patch re-writes the pagetable freeing function to use a recursive algorithm and manages to not only free all the pagetables, but does it without any apparent performance loss versus the current broken version. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Joerg Roedel <joro@8bytes.org>
2013-06-20iommu/{vt-d,amd}: Remove multifunction assumption around groupingAlex Williamson
If a device is multifunction and does not have ACS enabled then we assume that the entire package lacks ACS and use function 0 as the base of the group. The PCIe spec however states that components are permitted to implement ACS on some, none, or all of their applicable functions. It's therefore conceivable that function 0 may be fully independent and support ACS while other functions do not. Instead use the lowest function of the slot that does not have ACS enabled as the base of the group. This may be the current device, which is intentional. So long as we use a consistent algorithm, all the non-ACS functions will be grouped together and ACS functions will get separate groups. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Joerg Roedel <joro@8bytes.org>
2013-05-02Merge branches 'iommu/fixes', 'x86/vt-d', 'x86/amd', 'ppc/pamu', 'core' and ↵Joerg Roedel
'arm/tegra' into next
2013-04-23iommu: Move swap_pci_ref function to drivers/iommu/pci.h.Varun Sethi
The swap_pci_ref function is used by the IOMMU API code for swapping pci device pointers, while determining the iommu group for the device. Currently this function was being implemented for different IOMMU drivers. This patch moves the function to a new file, drivers/iommu/pci.h so that the implementation can be shared across various IOMMU drivers. Signed-off-by: Varun Sethi <Varun.Sethi@freescale.com> Signed-off-by: Joerg Roedel <joro@8bytes.org>
2013-04-23iommu/vt-d: Disable translation if already enabledTakao Indoh
This patch disables translation(dma-remapping) before its initialization if it is already enabled. This is needed for kexec/kdump boot. If dma-remapping is enabled in the first kernel, it need to be disabled before initializing its page table during second kernel boot. Wei Hu also reported that this is needed when second kernel boots with intel_iommu=off. Basically iommu->gcmd is used to know whether translation is enabled or disabled, but it is always zero at boot time even when translation is enabled since iommu->gcmd is initialized without considering such a case. Therefor this patch synchronizes iommu->gcmd value with global command register when iommu structure is allocated. Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com> Signed-off-by: Joerg Roedel <joro@8bytes.org>
2013-04-02iommu/fsl: Make iova dma_addr_t in the iommu_iova_to_phys API.Varun Sethi
This is required in case of PAMU, as it can support a window size of up to 64G (even on 32bit). Signed-off-by: Varun Sethi <Varun.Sethi@freescale.com> Signed-off-by: Joerg Roedel <joro@8bytes.org>
2013-02-25Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm merge from Dave Airlie: "Highlights: - TI LCD controller KMS driver - TI OMAP KMS driver merged from staging - drop gma500 stub driver - the fbcon locking fixes - the vgacon dirty like zebra fix. - open firmware videomode and hdmi common code helpers - major locking rework for kms object handling - pageflip/cursor won't block on polling anymore! - fbcon helper and prime helper cleanups - i915: all over the map, haswell power well enhancements, valleyview macro horrors cleaned up, killing lots of legacy GTT code, - radeon: CS ioctl unification, deprecated UMS support, gpu reset rework, VM fixes - nouveau: reworked thermal code, external dp/tmds encoder support (anx9805), fences sleep instead of polling, - exynos: all over the driver fixes." Lovely conflict in radeon/evergreen_cs.c between commit de0babd60d8d ("drm/radeon: enforce use of radeon_get_ib_value when reading user cmd") and the new changes that modified that evergreen_dma_cs_parse() function. * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (508 commits) drm/tilcdc: only build on arm drm/i915: Revert hdmi HDP pin checks drm/tegra: Add list of framebuffers to debugfs drm/tegra: Fix color expansion drm/tegra: Split DC_CMD_STATE_CONTROL register write drm/tegra: Implement page-flipping support drm/tegra: Implement VBLANK support drm/tegra: Implement .mode_set_base() drm/tegra: Add plane support drm/tegra: Remove bogus tegra_framebuffer structure drm: Add consistency check for page-flipping drm/radeon: Use generic HDMI infoframe helpers drm/tegra: Use generic HDMI infoframe helpers drm: Add EDID helper documentation drm: Add HDMI infoframe helpers video: Add generic HDMI infoframe helpers drm: Add some missing forward declarations drm: Move mode tables to drm_edid.c drm: Remove duplicate drm_mode_cea_vic() gma500: Fix n, m1 and m2 clock limits for sdvo and lvds ...
2013-02-19Merge branch 'x86-apic-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/apic changes from Ingo Molnar: "Main changes: - Multiple MSI support added to the APIC, PCI and AHCI code - acked by all relevant maintainers, by Alexander Gordeev. The advantage is that multiple AHCI ports can have multiple MSI irqs assigned, and can thus spread to multiple CPUs. [ Drivers can make use of this new facility via the pci_enable_msi_block_auto() method ] - x86 IOAPIC code from interrupt remapping cleanups from Joerg Roedel: These patches move all interrupt remapping specific checks out of the x86 core code and replaces the respective call-sites with function pointers. As a result the interrupt remapping code is better abstraced from x86 core interrupt handling code. - Various smaller improvements, fixes and cleanups." * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits) x86/intel/irq_remapping: Clean up x2apic opt-out security warning mess x86, kvm: Fix intialization warnings in kvm.c x86, irq: Move irq_remapped out of x86 core code x86, io_apic: Introduce eoi_ioapic_pin call-back x86, msi: Introduce x86_msi.compose_msi_msg call-back x86, irq: Introduce setup_remapped_irq() x86, irq: Move irq_remapped() check into free_remapped_irq x86, io-apic: Remove !irq_remapped() check from __target_IO_APIC_irq() x86, io-apic: Move CONFIG_IRQ_REMAP code out of x86 core x86, irq: Add data structure to keep AMD specific irq remapping information x86, irq: Move irq_remapping_enabled declaration to iommu code x86, io_apic: Remove irq_remapping_enabled check in setup_timer_IRQ0_pin x86, io_apic: Move irq_remapping_enabled checks out of check_timer() x86, io_apic: Convert setup_ioapic_entry to function pointer x86, io_apic: Introduce set_affinity function pointer x86, msi: Use IRQ remapping specific setup_msi_irqs routine x86, hpet: Introduce x86_msi_ops.setup_hpet_msi x86, io_apic: Introduce x86_io_apic_ops.print_entries for debugging x86, io_apic: Introduce x86_io_apic_ops.disable() x86, apic: Mask IO-APIC and PIC unconditionally on LAPIC resume ...
2013-02-20intel/iommu: force writebuffer-flush quirk on Gen 4 ChipsetsDaniel Vetter
We already have the quirk entry for the mobile platform, but also reports on some desktop versions. So be paranoid and set it everywhere. References: http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg33138.html Cc: stable@vger.kernel.org Cc: David Woodhouse <dwmw2@infradead.org> Cc: "Sankaran, Rajesh" <rajesh.sankaran@intel.com> Reported-and-tested-by: Mihai Moldovan <ionic@ionic.de> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-01-28x86, irq: Move irq_remapping_enabled declaration to iommu codeJoerg Roedel
Remove the last left-over from this flag from x86 code. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-23iommu/intel: disable DMAR for g4x integrated gfxDaniel Vetter
DMAR support on g4x/gm45 integrated gpus seems to be totally busted. So don't bother, but instead disable it by default to allow distros to unconditionally enable DMAR support. v2: Actually wire up the right quirk entry, spotted by Adam Jackson. Note that according to intel marketing materials only g45 and gm45 support DMAR/VT-d. So we have reports for all relevant gen4 pci ids by now. Still, keep all the other gen4 ids in the quirk table in case the marketing stuff confused me again, which would not be the first time. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=51921 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=538163 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=538163 Cc: Adam Jackson <ajax@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: stable@vger.kernel.org Acked-By: David Woodhouse <David.Woodhouse@intel.com> Tested-by: stathis <stathis@npcglib.org> Tested-by: Mihai Moldovan <ionic@ionic.de> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-01-03Drivers: iommu: remove __dev* attributes.Greg Kroah-Hartman
CONFIG_HOTPLUG is going away as an option. As a result, the __dev* markings need to be removed. This change removes the use of __devinit, __devexit_p, __devinitdata, and __devexit from these drivers. Based on patches originally written by Bill Pemberton, but redone by me in order to handle some of the coding style issues better, by hand. Cc: Bill Pemberton <wfp5p@virginia.edu> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Ohad Ben-Cohen <ohad@wizery.com> Cc: Tony Lindgren <tony@atomide.com> Cc: Omar Ramirez Luna <omar.luna@linaro.org> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Hiroshi Doyu <hdoyu@nvidia.com> Cc: Stephen Warren <swarren@wwwdotorg.org> Cc: Bharat Nihalani <bnihalani@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-12-20Merge tag 'iommu-updates-v3.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU updates from Joerg Roedel: "A few new features this merge-window. The most important one is probably, that dma-debug now warns if a dma-handle is not checked with dma_mapping_error by the device driver. This requires minor changes to some architectures which make use of dma-debug. Most of these changes have the respective Acks by the Arch-Maintainers. Besides that there are updates to the AMD IOMMU driver for refactor the IOMMU-Groups support and to make sure it does not trigger a hardware erratum. The OMAP changes (for which I pulled in a branch from Tony Lindgren's tree) have a conflict in linux-next with the arm-soc tree. The conflict is in the file arch/arm/mach-omap2/clock44xx_data.c which is deleted in the arm-soc tree. It is safe to delete the file too so solve the conflict. Similar changes are done in the arm-soc tree in the common clock framework migration. A missing hunk from the patch in the IOMMU tree will be submitted as a seperate patch when the merge-window is closed." * tag 'iommu-updates-v3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (29 commits) ARM: dma-mapping: support debug_dma_mapping_error ARM: OMAP4: hwmod data: ipu and dsp to use parent clocks instead of leaf clocks iommu/omap: Adapt to runtime pm iommu/omap: Migrate to hwmod framework iommu/omap: Keep mmu enabled when requested iommu/omap: Remove redundant clock handling on ISR iommu/amd: Remove obsolete comment iommu/amd: Don't use 512GB pages iommu/tegra: smmu: Move bus_set_iommu after probe for multi arch iommu/tegra: gart: Move bus_set_iommu after probe for multi arch iommu/tegra: smmu: Remove unnecessary PTC/TLB flush all tile: dma_debug: add debug_dma_mapping_error support sh: dma_debug: add debug_dma_mapping_error support powerpc: dma_debug: add debug_dma_mapping_error support mips: dma_debug: add debug_dma_mapping_error support microblaze: dma-mapping: support debug_dma_mapping_error ia64: dma_debug: add debug_dma_mapping_error support c6x: dma_debug: add debug_dma_mapping_error support ARM64: dma_debug: add debug_dma_mapping_error support intel-iommu: Prevent devices with RMRRs from being placed into SI Domain ...
2012-12-20intel-iommu: Free old page tables before creating superpageWoodhouse, David
The dma_pte_free_pagetable() function will only free a page table page if it is asked to free the *entire* 2MiB range that it covers. So if a page table page was used for one or more small mappings, it's likely to end up still present in the page tables... but with no valid PTEs. This was fine when we'd only be repopulating it with 4KiB PTEs anyway but the same virtual address range can end up being reused for a *large-page* mapping. And in that case were were trying to insert the large page into the second-level page table, and getting a complaint from the sanity check in __domain_mapping() because there was already a corresponding entry. This was *relatively* harmless; it led to a memory leak of the old page table page, but no other ill-effects. Fix it by calling dma_pte_clear_range (hopefully redundant) and dma_pte_free_pagetable() before setting up the new large page. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Tested-by: Ravi Murty <Ravi.Murty@intel.com> Tested-by: Sudeep Dutt <sudeep.dutt@intel.com> Cc: stable@kernel.org [3.0+] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-16Merge branches 'iommu/fixes', 'dma-debug', 'x86/amd', 'x86/vt-d', ↵Joerg Roedel
'arm/tegra' and 'arm/omap' into next
2012-11-21intel-iommu: Prevent devices with RMRRs from being placed into SI DomainTom Mingarelli
This patch is to prevent non-USB devices that have RMRRs associated with them from being placed into the SI Domain during init. This fixes the issue where the RMRR info for devices being placed in and out of the SI Domain gets lost. Signed-off-by: Thomas Mingarelli <thomas.mingarelli@hp.com> Tested-by: Shuah Khan <shuah.khan@hp.com> Reviewed-by: Donald Dutile <ddutile@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Joerg Roedel <joro@8bytes.org>
2012-11-17intel-iommu: Fix lookup in add deviceAlex Williamson
We can't assume this device exists, fall back to the bridge itself. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Matthew Thode <prometheanfire@gentoo.org> Cc: stable@vger.kernel.org Signed-off-by: Joerg Roedel <joro@8bytes.org>
2012-10-08Merge tag 'iommu-updates-v3.7-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU updates from Joerg Roedel: "This time the IOMMU updates contain a bunch of fixes and cleanups to various IOMMU drivers and the DMA debug code. New features are the code for IRQ remapping support with the AMD IOMMU (preperation for that was already merged in the last release) and a debugfs interface to export some statistics in the NVidia Tegra IOMMU driver." * tag 'iommu-updates-v3.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (27 commits) iommu/amd: Remove obsolete comment line dma-debug: Remove local BUS_NOTIFY_UNBOUND_DRIVER define iommu/amd: Fix possible use after free in get_irq_table() iommu/amd: Report irq remapping through IOMMU-API iommu/amd: Print message to system log when irq remapping is enabled iommu/irq: Use amd_iommu_irq_ops if supported iommu/amd: Make sure irq remapping still works on dma init failure iommu/amd: Add initialization routines for AMD interrupt remapping iommu/amd: Add call-back routine for HPET MSI iommu/amd: Implement MSI routines for interrupt remapping iommu/amd: Add IOAPIC remapping routines iommu/amd: Add routines to manage irq remapping tables iommu/amd: Add IRTE invalidation routine iommu/amd: Make sure IOMMU is not considered to translate itself iommu/amd: Split device table initialization into irq and dma part iommu/amd: Check if IOAPIC information is correct iommu/amd: Allocate data structures to keep track of irq remapping tables iommu/amd: Add slab-cache for irq remapping tables iommu/amd: Keep track of HPET and IOAPIC device ids iommu/amd: Fix features reporting ...
2012-09-18intel-iommu: Default to non-coherent for domains unattached to iommusAlex Williamson
domain_update_iommu_coherency() currently defaults to setting domains as coherent when the domain is not attached to any iommus. This allows for a window in domain_context_mapping_one() where such a domain can update context entries non-coherently, and only after update the domain capability to clear iommu_coherency. This can be seen using KVM device assignment on VT-d systems that do not support coherency in the ecap register. When a device is added to a guest, a domain is created (iommu_coherency = 0), the device is attached, and ranges are mapped. If we then hot unplug the device, the coherency is updated and set to the default (1) since no iommus are attached to the domain. A subsequent attach of a device makes use of the same dmar domain (now marked coherent) updates context entries with coherency enabled, and only disables coherency as the last step in the process. To fix this, switch domain_update_iommu_coherency() to use the safer, non-coherent default for domains not attached to iommus. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Donald Dutile <ddutile@redhat.com> Acked-by: Donald Dutile <ddutile@redhat.com> Acked-by: Chris Wright <chrisw@sous-sol.org> Cc: stable@vger.kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-09-13Merge commit 'v3.6-rc5' into nextBjorn Helgaas
* commit 'v3.6-rc5': (1098 commits) Linux 3.6-rc5 HID: tpkbd: work even if the new Lenovo Keyboard driver is not configured Remove user-triggerable BUG from mpol_to_str xen/pciback: Fix proper FLR steps. uml: fix compile error in deliver_alarm() dj: memory scribble in logi_dj Fix order of arguments to compat_put_time[spec|val] xen: Use correct masking in xen_swiotlb_alloc_coherent. xen: fix logical error in tlb flushing xen/p2m: Fix one-off error in checking the P2M tree directory. powerpc: Don't use __put_user() in patch_instruction powerpc: Make sure IPI handlers see data written by IPI senders powerpc: Restore correct DSCR in context switch powerpc: Fix DSCR inheritance in copy_thread() powerpc: Keep thread.dscr and thread.dscr_inherit in sync powerpc: Update DSCR on all CPUs when writing sysfs dscr_default powerpc/powernv: Always go into nap mode when CPU is offline powerpc: Give hypervisor decrementer interrupts their own handler powerpc/vphn: Fix arch_update_cpu_topology() return value ARM: gemini: fix the gemini build ... Conflicts: drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c drivers/rapidio/devices/tsi721.c
2012-08-23PCI: Introduce pci_pcie_type(dev) to replace pci_dev->pcie_typeYijing Wang
Introduce an inline function pci_pcie_type(dev) to extract PCIe device type from pci_dev->pcie_flags_reg field, and prepare for removing pci_dev->pcie_type. Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-08-06iommu/intel: Fix ACS path checkingAlex Williamson
SR-IOV can create buses without a bridge. There may be other cases where this happens as well. In these cases skip to the parent bus and continue testing devices there. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-08-03iommu/intel: add missing free_domain_memJulia Lawall
Add missing free_domain_mem on failure path after alloc_domain. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @km exists@ local idexpression e; expression e1,e2,e3; type T,T1; identifier f; @@ * e = alloc_domain(...) ... when any when != e = e1 when != e1 = (T)e when != e1(...,(T)e,...) when != &e->f if(...) { ... when != e2(...,(T1)e,...) when != e3 = e when forall ( return <+...e...+>; | * return ...; ) } // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>