This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Fix LQ-STQ instructions to use correct offset and base
ClosedPublic

Authored by saghir on Jun 1 2022, 12:16 PM.

Details

Summary

This patch fixes the load and store quadword instructions on
PowerPC to use correct offset and base address.

Diff Detail

Event Timeline

saghir created this revision.Jun 1 2022, 12:16 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 1 2022, 12:16 PM
saghir requested review of this revision.Jun 1 2022, 12:16 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 1 2022, 12:16 PM
saghir added reviewers: Restricted Project, nemanjai, stefanp.Jun 1 2022, 12:17 PM
saghir updated this revision to Diff 433505.Jun 1 2022, 1:02 PM

Use string substitution blocks for matching register and offset.

lkail added a subscriber: lkail.EditedJun 1 2022, 3:59 PM

I think the root cause here is lq/stq is lacking x-form. There are LQX_PSEUDO and STQX_PSEUDO defined to assist accessing big offset inside stack/heap, but they are expanded pre-RA. I think we should expand them in PPCAtomicExpandPass(which is after prologepilog) and make lq/stq's x-form mapping to LQX_PSEUDO/STQX_PSEUDO.

I think the root cause here is lq/stq is lacking x-form. There are LQX_PSEUDO and STQX_PSEUDO defined to assist accessing big offset inside stack/heap, but they are expanded pre-RA. I think we should expand them in PPCAtomicExpandPass(which is after prologepilog) and make lq/stq's x-form mapping to LQX_PSEUDO/STQX_PSEUDO.

This really doesn't seem like an instruction we should be expanding prior to RA (i.e. there is no reason for it to be a PPCCustomInserterPseudo). It should probably just be a PPCPostRAExpPseudo. That would be expanded in PPCInstrInfo::expandPostRAPseudo(). Wouldn't that be late enough?

nemanjai requested changes to this revision.Jun 1 2022, 5:56 PM

In any case, this works if the offset fits but not if it doesn't. So we're just kicking the can down the road a little bit. We need to add the X-Form pseudo to the ImmToIdxMap and expand it at the appropriate time.

This revision now requires changes to proceed.Jun 1 2022, 5:56 PM
lkail added a comment.Jun 1 2022, 6:35 PM

That would be expanded in PPCInstrInfo::expandPostRAPseudo()

Looks also viable as long as we do it after prologepilog.

Note that we might need an additional register in outs to keep the result of the sum of two registers in memrr.

// handle x-form during isel.
def LQX_PSEUDO : PPCPostRAExpPseudo<(outs g8prc:$RTp, g8rc:$scratch),
                                    (ins memrr:$src), "#LQX_PSEUDO", []>;
def STQX_PSEUDO : PPCPostRAExpPseudo<(outs g8rc:$scratch),
                                     (ins g8prc:$RSp, memrr:$dst),
                                     "#STQX_PSEUDO", []>;
saghir updated this revision to Diff 436760.Jun 14 2022, 6:06 AM

Handle unaligned offsets and offsets that do not fit in the instruction.

lkail added a comment.EditedJun 14 2022, 6:57 AM

This fix looks making offset handling more complex. We can make it easier by add LQX_PSEUDO and STQX_PSEUDO to ImmToIdxMap.

ImmToIdxMap[PPC::LQ]  = PPC::LQX_PSEUDO;
ImmToIdxMap[PPC::STQ] = PPC::STQX_PSEUDO;

And expand PPC::LQX_PSEUDO and PPC::STQX_PSEUDO post RA(They are expanded in PPCTargetLowering::EmitInstrWithCustomInserter right now). Something like

$x6 = LQX_PSEUDO $x0, $x1
=>
$x3 = ADD $x0, $x1
$x6 = LQ 0($x3)

Thus we make LQ/STQ fits in how we are handling frame index now, no more code is needed to handling the offset for LQ/STQ(When LQ/STQ are selected by ISEL, alignment is guaranteed to be 16 bytes since they are for atomic operations).

saghir updated this revision to Diff 436955.EditedJun 14 2022, 2:52 PM

Added LQX_PSEUDO and STQX_PSEUDO to ImmToIdxMap, and rebased.

This fix looks making offset handling more complex. We can make it easier by add LQX_PSEUDO and STQX_PSEUDO to ImmToIdxMap.

ImmToIdxMap[PPC::LQ]  = PPC::LQX_PSEUDO;
ImmToIdxMap[PPC::STQ] = PPC::STQX_PSEUDO;

And expand PPC::LQX_PSEUDO and PPC::STQX_PSEUDO post RA(They are expanded in PPCTargetLowering::EmitInstrWithCustomInserter right now). Something like

$x6 = LQX_PSEUDO $x0, $x1
=>
$x3 = ADD $x0, $x1
$x6 = LQ 0($x3)

Thus we make LQ/STQ fits in how we are handling frame index now, no more code is needed to handling the offset for LQ/STQ(When LQ/STQ are selected by ISEL, alignment is guaranteed to be 16 bytes since they are for atomic operations).

Thanks for the suggestion. Adding LQX_PSEUDO and STQX_PSEUDO to ImmToIdxMap fixes both cases:

  • Offset fits in the instruction
  • Offset does not fit in the instruction

I have taken out the test case for the unaligned case since alignment is guaranteed to be 16 bytes for atomic operations.

nemanjai added inline comments.Jun 14 2022, 4:40 PM
llvm/test/CodeGen/PowerPC/LQ-STQ-32bit-offset.ll
13–19

No regex please. Lets produce actual instructions that show the registers as well as offsets.

llvm/test/CodeGen/PowerPC/LQ-STQ.ll
13

Same as above.

nemanjai added inline comments.Jun 14 2022, 4:55 PM
llvm/test/CodeGen/PowerPC/LQ-STQ-32bit-offset.ll
2

What is this test case actually testing? It produces the same code with and without this patch.

nemanjai added inline comments.Jun 14 2022, 5:05 PM
llvm/test/CodeGen/PowerPC/LQ-STQ-32bit-offset.ll
2

Sorry, this isn't actually true. However with this patch, the frame index isn't actually removed during frame index elimination. So that doesn't test the intent of this test case.

nemanjai requested changes to this revision.Jun 14 2022, 5:42 PM

If this IR is compiled with -mattr=+quadword-atomics, we emit #STQX_PSEUDO which is definitely not what we want:

%struct.StructA = type { [16 x i8] }

@s1 = dso_local global i128 324929342, align 16

; Function Attrs: mustprogress noinline nounwind optnone uwtable
define dso_local void @_Z4testv() #0 {
entry:
  %s2 = alloca %struct.StructA, align 16
  %s3 = alloca %struct.StructA, align 16
  %arr = alloca [997003 x i8], align 1
  %tmp = alloca %struct.StructA, align 16
  call void @llvm.memcpy.p0.p0.i64(ptr align 16 %tmp, ptr align 16 @s1, i64 16, i1 false)
  %0 = load i128, ptr %tmp, align 16
  store atomic i128 %0, ptr %s2 seq_cst, align 16
  ret void
}

; Function Attrs: argmemonly nofree nounwind willreturn
declare void @llvm.memcpy.p0.p0.i64(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i64, i1 immarg) #1

attributes #0 = { noinline optnone }
llvm/test/CodeGen/PowerPC/LQ-STQ-32bit-offset.ll
35

Please get rid of all of this and just compile with -mcpu=pwr10.

This revision now requires changes to proceed.Jun 14 2022, 5:42 PM
saghir updated this revision to Diff 437257.Jun 15 2022, 11:12 AM

Address review comments

saghir added inline comments.Jun 15 2022, 11:17 AM
llvm/test/CodeGen/PowerPC/LQ-STQ-32bit-offset.ll
28

This does not look right, I am looking into it.

saghir updated this revision to Diff 437345.Jun 15 2022, 2:35 PM

updated test case.

saghir updated this revision to Diff 437347.Jun 15 2022, 2:41 PM

update test case output

saghir added inline comments.Jun 15 2022, 2:43 PM
llvm/test/CodeGen/PowerPC/LQ-STQ-32bit-offset.ll
28

Updated test case and now it looks good.

lkail requested changes to this revision.Jun 15 2022, 3:45 PM
lkail added inline comments.
llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp
496

This looks redundant. Test cases can pass without this change.

1589

Expanding two PPCCustomInserterPseudo instructions here looks odd.

I would expect these two instructions expanded after prologepilog, in postrapseudos. This is making us expand these two instructions twice in backend code. It's more adequate to make them PPCPostRAExpPseudo rather than PPCCustomInserterPseudo.

This revision now requires changes to proceed.Jun 15 2022, 3:45 PM
nemanjai added inline comments.Jun 15 2022, 4:56 PM
llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp
496

It would be difficult to write a test case that ensures all registers are allocated and we need to rely on this in order to guarantee that the register scavenger is able to spill a register to be able to scavenge one.

1589

This code is the only place we convert D-Form instructions to X-Form instructions post-RA. Expanding them here makes sense because:

  1. We create them here to begin with
  2. Producing X-Forms is what this portion of the code already does
  3. We have the register scavenger here and a slot saved for the scavenger to ensure it can always find a free GPR. If we do this expansion somewhere later, there is no guarantee that the scavenger we acquire will be able to scavenge a register.
lkail accepted this revision as: lkail.Jun 15 2022, 6:23 PM

LGTM as nemanja has detailed explanation for my concern.

nemanjai accepted this revision.Jun 16 2022, 7:56 AM

LGTM. Thanks for the fix and all the updates.

This revision is now accepted and ready to land.Jun 16 2022, 7:56 AM
This revision was landed with ongoing or failed builds.Jun 16 2022, 8:47 AM
This revision was automatically updated to reflect the committed changes.