This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] [Peephole] fold frame offset by using index form to save add.
ClosedPublic

Authored by shchenz on Aug 15 2019, 7:36 PM.

Details

Summary

Before pass PEI, all stack variables has a placeholder 0 as its offset to frame base. And in pass PEI, the real offset is filled. Like:

Before PEI:

renamable $x6 = ADDI8 %stack.0, 0
renamable $x6 = ADD8 killed renamable $x6, renamable $x5
STW killed renamable $r3, 4, killed renamable $x6 :: (store 4 into %ir.14, !tbaa !2)

after PEI:

renamable $x6 = ADDI8 $x1, -80      ;;; 0 is replaced with -80
renamable $x6 = ADD8 killed renamable $x6, renamable $x5
STW killed renamable $r3, 4, killed renamable $x6 :: (store 4 into %ir.14, !tbaa !2)

After PEI there is a peephole opt opportunity to combine above -80 in ADDI8 with 4 in the STW to eliminate unnecessary ADD8.

So for above case, optimized instruction should be like:

renamable $x6 = ADDI8 $x1, -76
STWX killed renamable $r3, renamable $x5, killed renamable $x6 :: (store 4 into %ir.6, !tbaa !2)

This patch is to implement such peephole opt.

Diff Detail

Event Timeline

shchenz created this revision.Aug 15 2019, 7:36 PM
shchenz edited reviewers, added: steven.zhang; removed: qshanz.Aug 15 2019, 7:36 PM
jsji added a reviewer: Restricted Project.Aug 27 2019, 7:47 PM
stefanp requested changes to this revision.Aug 29 2019, 12:08 PM
stefanp added a subscriber: stefanp.

I like what this patch is doing. Overall, I only have some refactoring comments.

I do think that there are more opportunities here (however they may be a separate patch):

renamable $x6 = ADDI8 $x1, -80
renamable $x7 = ADD8 (killed?) renamable $x6, renamable $x5   ;; x6 here may or may not be killed...
STW killed renamable $r3, 4, killed renamable $x7 :: (store 4 into %ir.14, !tbaa !2)

Can also be turned into this:

[renamable $x6 = ADDI8 $x1, -80] ? ;; May or may not be there...
renamable $x7 = ADD8 $x1, renamable $x5
STW killed renamable $r3, -76, killed renamable $x7 :: (store 4 into %ir.14, !tbaa !2)

There are some advantages to using this approach too:

  1. You don't need an XForm for the Store. You keep the same D form instruction.
  2. You don't need to have the result of ADDI killed. If it is killed you can remove the instruction. If it is not killed you leave it but you still win even if you leave it because you have broken the dependency between the two instructions.

One disadvantage that I can see:

  1. Some loads/stores require the immediate to be aligned in some way which will limit which immediates you can use for them.

My suggestion is more of an addition to what you already have. I think what you have and what I was describing will catch different opportunities.

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
2539

nit:

This opt tries to convert following imm form to index form to save add for stack variables.

to

This opt tries to convert the following imm form to an index form to save an add for stack variables.
2600

You know at this point in the code that this MI is either an ADD4 or and ADD8.
You can create another function to check an operand and then pass in ADDMI->getOperand(1) and ADDMI->getOperand(2). That way you can get rid of the loop. I think the loop makes it look like there could be a number of operands but as far as I know the two adds only ever have 2 input operands.

2602

If you keep the loop this can be an early continue.

if (!ADDMI->getOperand(Idx).isKill())
  continue;

You are only really interested in operands that are killed.

2633

You already use the assumption that you have exactly two operands of interest here. :)

This revision now requires changes to proceed.Aug 29 2019, 12:08 PM
shchenz updated this revision to Diff 218399.Sep 2 2019, 9:06 PM

address @stefanp comments

I like what this patch is doing. Overall, I only have some refactoring comments.

I do think that there are more opportunities here (however they may be a separate patch):

renamable $x6 = ADDI8 $x1, -80
renamable $x7 = ADD8 (killed?) renamable $x6, renamable $x5   ;; x6 here may or may not be killed...
STW killed renamable $r3, 4, killed renamable $x7 :: (store 4 into %ir.14, !tbaa !2)

Can also be turned into this:

[renamable $x6 = ADDI8 $x1, -80] ? ;; May or may not be there...
renamable $x7 = ADD8 $x1, renamable $x5
STW killed renamable $r3, -76, killed renamable $x7 :: (store 4 into %ir.14, !tbaa !2)

There are some advantages to using this approach too:

  1. You don't need an XForm for the Store. You keep the same D form instruction.
  2. You don't need to have the result of ADDI killed. If it is killed you can remove the instruction. If it is not killed you leave it but you still win even if you leave it because you have broken the dependency between the two instructions.

One disadvantage that I can see:

  1. Some loads/stores require the immediate to be aligned in some way which will limit which immediates you can use for them.

My suggestion is more of an addition to what you already have. I think what you have and what I was describing will catch different opportunities.

Yes, this is indeed an opportunity for stack variables to save one ADDI. Thanks for pointing it out. This seems not easy to mix together with current patch since this patch will convert dform to xform while above opportunity does not require load/store form changing.

Maybe need to implement it in another patch.

I do have a couple additional comments.

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
2703

Since HasImmForm is only used here you can put it directly into the if statement.

However, I'm a little confused as to why you have to call instrHasImmForm and then check the return. It seems like you start with the immediate form and then move on to get the X-Form for it and then you ask to see if there is an immediate. Is it to fill in the III object with the correct flags?

If that is the case we need to have another function to set the flags correctly for an arbitrary immediate opcode. That, however is out of the scope of this patch.

llvm/lib/Target/PowerPC/PPCRegisterInfo.h
68

Instead of returning the map like this I would prefer to have a couple of functions along the lines of hasIndexFormforImm and getIndexFormFormImm. You don't need to use my function names but I think you get the idea. The register info owns that map and should be able to answer questions about it.

amyk added a subscriber: amyk.Sep 13 2019, 9:04 PM
amyk added inline comments.
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
2549

FraemBaseReg -> FrameBaseReg

2677

nit: Capitalize the W on wrong.

2706

This may be a silly question, but I am curious as to what this line means.

2713

MI.getOperand(III.OpNoForForwarding) is used a few times here, maybe it can be its own variable for more clarity?

shchenz updated this revision to Diff 220269.Sep 15 2019, 7:36 PM
shchenz marked 2 inline comments as done.

address Stefan and Amy Comments

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
2703

Yes, the flags in III is what this patch needs later. Because the opcode handling in instrHasImmForm and ImmToIdxMap is not the same, so I do a little redundant checking for load/store opcodes to avoid functionality issue.

We already have a plan to redesign the mapping representation between index form and displacement form by using InstrMapping in td files. After that I think we can avoid such code.

2706

I think it means "Is the instruction summing the operand". In load/store instruction, it means we get the EA by summing its operands. For example, for instruction grA = ld Imm, grB, the EA is calculated by Imm + grB

Here we want to put OffsetImm of load/store to ADDI immediate operand, so we must ensure load/store is IsSummingOperands?

shchenz updated this revision to Diff 220325.Sep 16 2019, 6:29 AM

remove one useless comment.

Hi @stefanp @amyk , any more comments? Thanks.

I took this patch and applied it to master today and I noticed that one of the LIT tests fails.
The test is:

test-suite/MultiSource/Benchmarks/tramp3d-v4/

It segfaults with this patch and runs fine with out it.
I took a quick look at it and it seems that in this basic block:

; predecessors: %bb.11, %bb.12, %bb.13
  successors: %bb.21(0x80000000); %bb.21(100.00%)
  liveins: $r6, $r7, $r9, $x4, $x5, $x10, $x12, $x30, $r8
  renamable $x3 = ADDI8 $x1, 32
  STW killed renamable $r8, 32, $x1 :: (store 4 into %ir.arrayidx.i.i.i.i.i.i.i.i424, align 8), (store 4 into %ir.129, align 8)
  STW killed renamable $r6, 36, $x1 :: (store 4 into %ir.132)
  STW killed renamable $r7, 44, $x1 :: (store 4 into %ir.134)
  STW killed renamable $r9, 52, $x1 :: (store 4 into %ir.136)
  renamable $x3 = ADD8 killed renamable $x3, killed renamable $x5
  renamable $x5 = nsw ADD8 killed renamable $x10, killed renamable $x12
  renamable $r3 = LWZ 4, killed renamable $x3 :: (load 4 from %ir.137)
  renamable $x5 = nsw SUBF8 renamable $x4, killed renamable $x5
  STW8 killed renamable $x4, 24, renamable $x30 :: (store 4 into %ir.141)
  renamable $r3 = ADD4 killed renamable $r3, killed renamable $r5, implicit $x5
  B %bb.21

We replace

renamable $x3 = ADDI8 $x1, 32
renamable $x3 = ADD8 killed renamable $x3, killed renamable $x5
renamable $r3 = LWZ 4, killed renamable $x3 :: (load 4 from %ir.137)

with:

renamable $x3 = ADDI8 $x1, 36
renamable $r3 = LWZX killed $x5, killed $x3 :: (load 4 from %ir.137)

However, it looks like x5 is redefined before the load so using it in the LWZX is not safe because it is not the same x5 as before.

shchenz planned changes to this revision.Oct 8 2019, 2:07 AM

I took this patch and applied it to master today and I noticed that one of the LIT tests fails.

Sorry, I didn't do a LIT test recently, I really should do it.

shchenz updated this revision to Diff 224003.Oct 9 2019, 2:20 AM

ScaleReg can not redefined between ADD and Imm instruction.

stefanp accepted this revision.Oct 24 2019, 1:08 PM

LGTM.
Sorry for taking so long to get to this.

This revision is now accepted and ready to land.Oct 24 2019, 1:08 PM

Thanks for your reviewing. @stefanp

This revision was automatically updated to reflect the committed changes.