On PPC there are instructions to store element from vector(e.g. stxsdx/stxsiwx), and these instructions can be leveraged to avoid tail constant in memset.
This patch tries to add callback in SDAG to allow the reuse.
Differential D138883
[SelectionDAG][PowerPC] Memset reuse vector element for tail store tingwang on Nov 28 2022, 5:29 PM. Authored by
Details On PPC there are instructions to store element from vector(e.g. stxsdx/stxsiwx), and these instructions can be leveraged to avoid tail constant in memset. This patch tries to add callback in SDAG to allow the reuse.
Diff Detail
Event TimelineComment Actions I think this is useful, but we should ensure we can get rid of the swap that this introduces (in a separate patch).
Comment Actions Update according to comments:
Comment Actions Realized I need to use PPCTTIImpl::getVectorInstrCost() API to determine the cost of instructions. I'm working on it now. Comment Actions Changes in this update: (2) Refactored logic inside getMemsetStores(). For PPC CombineStoreAndExtract is beneficial on some specific element index according to endianness and instruction (StoreAndExtract elements on other indexes requires vector permutation, which makes the whole idea less attractive). Since this is platform independent logic, I'm querying the cost for indexes, and pick the least cost to do the combine.
Comment Actions I would expect not only memset, some consecutive stores could also reuse the result of vector split, see https://godbolt.org/z/77aMvncb4. void foo(long a[3]) { a[0] = 12; a[1] = 12; a[2] = 12; } foo(long*): # @foo(long*) .quad .Lfunc_begin0 .quad .TOC.@tocbase .quad 0 .Lfunc_begin0: xxlxor 0, 0, 0 li 4, 12 xxsplti32dx 0, 1, 12 std 4, 16(3) stxv 0, 0(3) blr .long 0 .quad 0 We don't reuse the result of xxsplti32dx. Comment Actions Sure. The posted IR could be handled by DAGCombiner::mergeConsecutiveStores(), and I agree similar combine can be applied there. But for this case, memset stores are volatile, and DAGCombiner::getStoreMergeCandidates() does not accept volatile store currently.
Comment Actions Update patch as following test case pattern changed: Comment Actions While I am not principally against this approach, it doesn't really give me a great feeling going in this direction. The issue is more widespread than just memset/memcpy/memmove as Kai's example illustrates. Comment Actions I will try to get similar results by DAG combine. Thanks to Nemanja and Kai's insight! |