This patch fixes the load and store quadword instructions on
PowerPC to use correct offset and base address.
Details
- Reviewers
nemanjai stefanp lkail - Group Reviewers
Restricted Project - Commits
- rG3d259a82da3e: [PowerPC] Fix LQ-STQ instructions to use correct offset and base
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I think the root cause here is lq/stq is lacking x-form. There are LQX_PSEUDO and STQX_PSEUDO defined to assist accessing big offset inside stack/heap, but they are expanded pre-RA. I think we should expand them in PPCAtomicExpandPass(which is after prologepilog) and make lq/stq's x-form mapping to LQX_PSEUDO/STQX_PSEUDO.
This really doesn't seem like an instruction we should be expanding prior to RA (i.e. there is no reason for it to be a PPCCustomInserterPseudo). It should probably just be a PPCPostRAExpPseudo. That would be expanded in PPCInstrInfo::expandPostRAPseudo(). Wouldn't that be late enough?
In any case, this works if the offset fits but not if it doesn't. So we're just kicking the can down the road a little bit. We need to add the X-Form pseudo to the ImmToIdxMap and expand it at the appropriate time.
That would be expanded in PPCInstrInfo::expandPostRAPseudo()
Looks also viable as long as we do it after prologepilog.
Note that we might need an additional register in outs to keep the result of the sum of two registers in memrr.
// handle x-form during isel. def LQX_PSEUDO : PPCPostRAExpPseudo<(outs g8prc:$RTp, g8rc:$scratch), (ins memrr:$src), "#LQX_PSEUDO", []>; def STQX_PSEUDO : PPCPostRAExpPseudo<(outs g8rc:$scratch), (ins g8prc:$RSp, memrr:$dst), "#STQX_PSEUDO", []>;
This fix looks making offset handling more complex. We can make it easier by add LQX_PSEUDO and STQX_PSEUDO to ImmToIdxMap.
ImmToIdxMap[PPC::LQ] = PPC::LQX_PSEUDO; ImmToIdxMap[PPC::STQ] = PPC::STQX_PSEUDO;
And expand PPC::LQX_PSEUDO and PPC::STQX_PSEUDO post RA(They are expanded in PPCTargetLowering::EmitInstrWithCustomInserter right now). Something like
$x6 = LQX_PSEUDO $x0, $x1 => $x3 = ADD $x0, $x1 $x6 = LQ 0($x3)
Thus we make LQ/STQ fits in how we are handling frame index now, no more code is needed to handling the offset for LQ/STQ(When LQ/STQ are selected by ISEL, alignment is guaranteed to be 16 bytes since they are for atomic operations).
Thanks for the suggestion. Adding LQX_PSEUDO and STQX_PSEUDO to ImmToIdxMap fixes both cases:
- Offset fits in the instruction
- Offset does not fit in the instruction
I have taken out the test case for the unaligned case since alignment is guaranteed to be 16 bytes for atomic operations.
llvm/test/CodeGen/PowerPC/LQ-STQ-32bit-offset.ll | ||
---|---|---|
2 | What is this test case actually testing? It produces the same code with and without this patch. |
llvm/test/CodeGen/PowerPC/LQ-STQ-32bit-offset.ll | ||
---|---|---|
2 | Sorry, this isn't actually true. However with this patch, the frame index isn't actually removed during frame index elimination. So that doesn't test the intent of this test case. |
If this IR is compiled with -mattr=+quadword-atomics, we emit #STQX_PSEUDO which is definitely not what we want:
%struct.StructA = type { [16 x i8] } @s1 = dso_local global i128 324929342, align 16 ; Function Attrs: mustprogress noinline nounwind optnone uwtable define dso_local void @_Z4testv() #0 { entry: %s2 = alloca %struct.StructA, align 16 %s3 = alloca %struct.StructA, align 16 %arr = alloca [997003 x i8], align 1 %tmp = alloca %struct.StructA, align 16 call void @llvm.memcpy.p0.p0.i64(ptr align 16 %tmp, ptr align 16 @s1, i64 16, i1 false) %0 = load i128, ptr %tmp, align 16 store atomic i128 %0, ptr %s2 seq_cst, align 16 ret void } ; Function Attrs: argmemonly nofree nounwind willreturn declare void @llvm.memcpy.p0.p0.i64(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i64, i1 immarg) #1 attributes #0 = { noinline optnone }
llvm/test/CodeGen/PowerPC/LQ-STQ-32bit-offset.ll | ||
---|---|---|
35 | Please get rid of all of this and just compile with -mcpu=pwr10. |
llvm/test/CodeGen/PowerPC/LQ-STQ-32bit-offset.ll | ||
---|---|---|
28 | This does not look right, I am looking into it. |
llvm/test/CodeGen/PowerPC/LQ-STQ-32bit-offset.ll | ||
---|---|---|
28 | Updated test case and now it looks good. |
llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp | ||
---|---|---|
496 | This looks redundant. Test cases can pass without this change. | |
1589 | Expanding two PPCCustomInserterPseudo instructions here looks odd. I would expect these two instructions expanded after prologepilog, in postrapseudos. This is making us expand these two instructions twice in backend code. It's more adequate to make them PPCPostRAExpPseudo rather than PPCCustomInserterPseudo. |
llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp | ||
---|---|---|
496 | It would be difficult to write a test case that ensures all registers are allocated and we need to rely on this in order to guarantee that the register scavenger is able to spill a register to be able to scavenge one. | |
1589 | This code is the only place we convert D-Form instructions to X-Form instructions post-RA. Expanding them here makes sense because:
|
This looks redundant. Test cases can pass without this change.