VSX D-form load/store instructions of POWER9 require the offset be a multiple of 16 and a helper`isOffsetMultipleOf` is used to check this.
So far, it handles FrameIndex + offset case, but not handling FrameIndex without offset case. Due to this, we are missing opportunities to exploit D-form instructions when accessing an object or array allocated on stack.
For example, x-form store (stxvx) is used for int a[4] = {0}; instead of stxv. For larger arrays, D-form instruction is not used when accessing the first 16-byte. Using D-form instructions reduces instructions as well as reducing register pressure.
Details
Diff Detail
Event Timeline
lib/Target/PowerPC/PPCISelDAGToDAG.cpp | ||
---|---|---|
3961 | Can we just combine this with the above? Perhaps with something like: |
lib/Target/PowerPC/PPCISelDAGToDAG.cpp | ||
---|---|---|
3961 | Thanks for the advice. Is this better? |
I assume that there are cases where the frame index that doesn't have an offset actually ends up being an address with some displacement and that's the purpose of this patch. What I mean is that I assume that this will sometimes lead to something like:
li 4, 16 stxvx 2, 3, 4
Since we'd never have something like:
li 4, 0 stxvx 2, 3, 4
as that is simply:
stxvx 2, 0, 3
If we have cases of the former, please add a test case (i.e. FrameIndex is used without an offset, but we end up needing a non-zero immediate). If we have a case of the latter, that should be fixed more generally (i.e. we should never have a zero input to an instruction where zero in a register field means literal zero).
Even the offset to the stack object is zero, the offset to stack pointer ($x1) is not zero. So we do not have li 4, 0 before stxvx, but we have something like addi 4, 1, 32.
For example, after the instruction selection without this patch, generated code for int a[12] = {0}; looks like:
%1:g8rc = ADDI8 %stack.0.a, 0 STXVX %0:vsrc, $zero8, %1:g8rc :: (store 16 into %ir.0) STXV %0:vsrc, 32, %stack.0.a :: (store 16 into %ir.0 + 32) STXV %0:vsrc, 16, %stack.0.a :: (store 16 into %ir.0 + 16)
Then %stack.0.a is resolved in Prologue/Epilogue Insertion & Frame Finalization pass and the code become
renamable $x3 = ADDI8 $x1, 32 STXVX renamable $vsl0, $zero8, renamable $x3 :: (store 16 into %ir.0) STXV renamable $vsl0, 64, $x1 :: (store 16 into %ir.0 + 32) STXV killed renamable $vsl0, 48, $x1 :: (store 16 into %ir.0 + 16)
With this patch these snippets become
STXV %0:vsrc, 32, %stack.0.a :: (store 16 into %ir.0 + 32) STXV %0:vsrc, 16, %stack.0.a :: (store 16 into %ir.0 + 16) STXV %0:vsrc, 0, %stack.0.a :: (store 16 into %ir.0)
and
STXV renamable $vsl0, 64, $x1 :: (store 16 into %ir.0 + 32) STXV renamable $vsl0, 48, $x1 :: (store 16 into %ir.0 + 16) STXV killed renamable $vsl0, 32, $x1 :: (store 16 into %ir.0)
Can we just combine this with the above? Perhaps with something like:
if (FrameIndexSDNode *FI == dyn_cast<FrameIndexSDNode>(AddrOp.getOpcode() == ISD::ADD ? AddrOp.getOperand(0) : AddrOp))