This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] allow D-form VSX load/store when accessing FrameIndex without offset
ClosedPublic

Authored by inouehrs on Mar 30 2018, 1:34 AM.

Details

Summary

VSX D-form load/store instructions of POWER9 require the offset be a multiple of 16 and a helper`isOffsetMultipleOf` is used to check this.
So far, it handles FrameIndex + offset case, but not handling FrameIndex without offset case. Due to this, we are missing opportunities to exploit D-form instructions when accessing an object or array allocated on stack.
For example, x-form store (stxvx) is used for int a[4] = {0}; instead of stxv. For larger arrays, D-form instruction is not used when accessing the first 16-byte. Using D-form instructions reduces instructions as well as reducing register pressure.

Diff Detail

Repository
rL LLVM

Event Timeline

inouehrs created this revision.Mar 30 2018, 1:34 AM
inouehrs retitled this revision from [PowerPC] allow D-form load/store when accessing FrameIndex without offset to [PowerPC] allow D-form VSX load/store when accessing FrameIndex without offset .Mar 30 2018, 1:36 AM
nemanjai added inline comments.Apr 3 2018, 3:15 PM
lib/Target/PowerPC/PPCISelDAGToDAG.cpp
3956 ↗(On Diff #140390)

Can we just combine this with the above? Perhaps with something like:
if (FrameIndexSDNode *FI == dyn_cast<FrameIndexSDNode>(AddrOp.getOpcode() == ISD::ADD ? AddrOp.getOperand(0) : AddrOp))

inouehrs updated this revision to Diff 140904.Apr 3 2018, 10:30 PM
inouehrs marked an inline comment as done.
inouehrs added inline comments.
lib/Target/PowerPC/PPCISelDAGToDAG.cpp
3956 ↗(On Diff #140390)

Thanks for the advice. Is this better?

I assume that there are cases where the frame index that doesn't have an offset actually ends up being an address with some displacement and that's the purpose of this patch. What I mean is that I assume that this will sometimes lead to something like:

li 4, 16
stxvx 2, 3, 4

Since we'd never have something like:

li 4, 0
stxvx 2, 3, 4

as that is simply:
stxvx 2, 0, 3

If we have cases of the former, please add a test case (i.e. FrameIndex is used without an offset, but we end up needing a non-zero immediate). If we have a case of the latter, that should be fixed more generally (i.e. we should never have a zero input to an instruction where zero in a register field means literal zero).

inouehrs marked an inline comment as done.EditedApr 4 2018, 11:06 PM

Even the offset to the stack object is zero, the offset to stack pointer ($x1) is not zero. So we do not have li 4, 0 before stxvx, but we have something like addi 4, 1, 32.

For example, after the instruction selection without this patch, generated code for int a[12] = {0}; looks like:

%1:g8rc = ADDI8 %stack.0.a, 0
STXVX %0:vsrc, $zero8, %1:g8rc :: (store 16 into %ir.0)
STXV %0:vsrc, 32, %stack.0.a :: (store 16 into %ir.0 + 32)
STXV %0:vsrc, 16, %stack.0.a :: (store 16 into %ir.0 + 16)

Then %stack.0.a is resolved in Prologue/Epilogue Insertion & Frame Finalization pass and the code become

renamable $x3 = ADDI8 $x1, 32
STXVX renamable $vsl0, $zero8, renamable $x3 :: (store 16 into %ir.0)
STXV renamable $vsl0, 64, $x1 :: (store 16 into %ir.0 + 32)
STXV killed renamable $vsl0, 48, $x1 :: (store 16 into %ir.0 + 16)

With this patch these snippets become

STXV %0:vsrc, 32, %stack.0.a :: (store 16 into %ir.0 + 32)
STXV %0:vsrc, 16, %stack.0.a :: (store 16 into %ir.0 + 16)
STXV %0:vsrc, 0, %stack.0.a :: (store 16 into %ir.0)

and

STXV renamable $vsl0, 64, $x1 :: (store 16 into %ir.0 + 32)
STXV renamable $vsl0, 48, $x1 :: (store 16 into %ir.0 + 16)
STXV killed renamable $vsl0, 32, $x1 :: (store 16 into %ir.0)
inouehrs updated this revision to Diff 141107.Apr 4 2018, 11:26 PM
  • make unit tests more strict
nemanjai accepted this revision.Apr 5 2018, 6:59 AM

LGTM.

This revision is now accepted and ready to land.Apr 5 2018, 6:59 AM
This revision was automatically updated to reflect the committed changes.