diff options
| author | Anthoine Bourgeois <anthoine.bourgeois@vates.tech> | 2025-07-21 09:34:54 +0000 | 
|---|---|---|
| committer | Juergen Gross <jgross@suse.com> | 2025-07-23 13:11:03 +0200 | 
| commit | 114a2de6fa86d99ed9546cc9113a3cad58beef79 (patch) | |
| tree | b563984e1689b029f3ea4529c2d781b2afabb358 /tools/perf/scripts/python/syscall-counts-by-pid.py | |
| parent | 70045cf6593cbf0740956ea9b7b4269142c6ee38 (diff) | |
xen/netfront: Fix TX response spurious interrupts
We found at Vates that there are lot of spurious interrupts when
benchmarking the xen-net PV driver frontend. This issue appeared with a
patch that addresses security issue XSA-391 (b27d47950e48 "xen/netfront:
harden netfront against event channel storms"). On an iperf benchmark,
spurious interrupts can represent up to 50% of the interrupts.
Spurious interrupts are interrupts that are rised for nothing, there is
no work to do. This appends because the function that handles the
interrupts ("xennet_tx_buf_gc") is also called at the end of the request
path to garbage collect the responses received during the transmission
load.
The request path is doing the work that the interrupt handler should
have done otherwise. This is particurary true when there is more than
one vcpu and get worse linearly with the number of vcpu/queue.
Moreover, this problem is amplifyed by the penalty imposed by a spurious
interrupt. When an interrupt is found spurious the interrupt chip will
delay the EOI to slowdown the backend. This delay will allow more
responses to be handled by the request path and then there will be more
chance the next interrupt will not find any work to do, creating a new
spurious interrupt.
This causes performance issue. The solution here is to remove the calls
from the request path and let the interrupt handler do the processing of
the responses. This approch removes most of the spurious interrupts
(<0.05%) and also has the benefit of freeing up cycles in the request
path, allowing it to process more work, which improves performance
compared to masking the spurious interrupt one way or another.
This optimization changes a part of the code that is present since the
net frontend driver was upstreamed. There is no similar pattern in the
other xen PV drivers. Since the first commit of xen-netfront is a blob
that doesn't explain all the design choices I can only guess why this
specific mecanism was here. This could have been introduce to compensate
a slow backend at the time (maybe the backend was fixed or optimize
later) or a small queue. In 18 years, both frontend and backend gain lot
of features and optimizations that could have obsolete the feature of
reaping completions from the TX path.
Some vif throughput performance figures from a 8 vCPUs, 4GB of RAM HVM
guest(s):
Without this patch on the :
vm -> dom0: 4.5Gb/s
vm -> vm:   7.0Gb/s
Without XSA-391 patch (revert of b27d47950e48):
vm -> dom0: 8.3Gb/s
vm -> vm:   8.7Gb/s
With XSA-391 and this patch:
vm -> dom0: 11.5Gb/s
vm -> vm:   12.6Gb/s
v2:
- add revewed and tested by tags
- resend with the maintainers in the recipients list
v3:
- remove Fixes tag but keep the commit ref in the explanation
- add a paragraph on why this code was here
Signed-off-by: Anthoine Bourgeois <anthoine.bourgeois@vates.tech>
Reviewed-by: Juergen Gross <jgross@suse.com>
Tested-by: Elliott Mitchell <ehem+xen@m5p.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Message-ID: <20250721093316.23560-1-anthoine.bourgeois@vates.tech>
Diffstat (limited to 'tools/perf/scripts/python/syscall-counts-by-pid.py')
0 files changed, 0 insertions, 0 deletions
