This is an archive of the discontinued LLVM Phabricator instance.

[Power9] Spill gprs to vector registers rather than stack
ClosedPublic

Authored by syzaara on Jun 29 2017, 9:12 AM.

Details

Summary

This patch updates register allocation to enable spilling gprs to vector registers rather than the stack. A new register class is added which is a super class of G8RC and VSFRC, called GPFPRC. The getLargestLegalSuperClass then returns GPFPRC for an input of G8RC. The patch also adds post RA pseudo instructions (VSRSPILL_LD, VSRSPILL_ST) for spilling a register of the new class used in LoadRegFromStackSlot and StoreRegToStackSlot. These are then expanded after register allocation to either a scalar or vector load.

Diff Detail

Event Timeline

syzaara created this revision.Jun 29 2017, 9:12 AM
inouehrs edited edge metadata.Jun 29 2017, 10:35 AM

Looks interesting.
This patch potentially increases the number of VSR save/restore in method prologue/epilogue (depending on which VSR is selected for spilling). Is my understanding correct?

lib/Target/PowerPC/PPCInstrInfo.cpp
52

I feel this name somewhat misleading. MTVSR instruction may be used for other purposes. NumGPRtoVSRSpill or somthing?

test/CodeGen/PowerPC/gpr-vsr-spill2.ll
25

What's the intention of this complicated test case without spills to VSR?

inouehrs added inline comments.Jun 29 2017, 10:55 AM
test/CodeGen/PowerPC/gpr-vsr-spill.ll
19

Actually, I cannot catch why we need spill here.
The inline-asm clobbers all gprs but r30 and r31. So why we don't just use r30 and r31 for %a and %b?

syzaara added inline comments.Jun 29 2017, 11:10 AM
test/CodeGen/PowerPC/gpr-vsr-spill2.ll
25

This case shows how a spill of the new reg class is handled. Here we spilled a GPR to GPFPR where the new reg was also a gpr. We then needed to spill the new GPFPR using either a scalar store or vector store depending on the allocated register.

syzaara added inline comments.Jun 29 2017, 11:31 AM
test/CodeGen/PowerPC/gpr-vsr-spill.ll
19

Yes, but we need a register to save the result of the add. The result register used for the add is r30 and so one of the input parameters is spilled.

hfinkel added inline comments.Jul 3 2017, 8:20 PM
test/CodeGen/PowerPC/gpr-vsr-spill2.ll
1

Having this as an IR-level test seems fragile. Could you make this into a (simpler) MIR test that shows the behavior?

nemanjai added inline comments.Jul 7 2017, 2:02 PM
lib/Target/PowerPC/PPCInstrInfo.cpp
1994

Well, this is a pseudo that requires being expandPostRAPseudo()-ed. Wouldn't we want to say return expandPostRAPseudo(MI) here?

nemanjai edited edge metadata.Jul 11 2017, 8:53 AM

Overall, I like the patch. Seems quite nice and simple. However, I'm really not a fan of the naming convention. It is not clear to me why someone would be expected to make the connection between "GPFPRC" and "VSRSPILL". I think those should use the same base name to make the connection clear. Furthermore, I don't really think you should convey what registers are in the register class, but what the register class is used for. Perhaps the class and the related artifacts should be something like SPILLTOVSRRC and SPILLTOVSR_LD, etc. Perhaps other reviewers can chime in here as well.

Also, I think an important opportunity is lost since we don't do this for GPRRC. Of course, it doesn't have to be done in this patch, but I think a comment including a FIXME indicating this limitation is in order. Then if this turns out to be a performance win, we can follow this up with a patch that handles the 32-bit registers as well.

syzaara updated this revision to Diff 108980.Jul 31 2017, 12:56 PM

Only spill to volatile vsrs as spilling to non-volatile increases prologue/epilogue and leads to performance degradation.

syzaara added inline comments.Jul 31 2017, 1:11 PM
test/CodeGen/PowerPC/gpr-vsr-spill2.ll
1

I tried to create an MIR case using this, but the limitation with MIR tests identified in https://reviews.llvm.org/D33562 with MachineFunctionInfo not being saved/dumped as part of emitting .mir leads to machine verified errors. I tried to change the global vars to local vars to get around this limitation. However, doing that no longer reproduces the narrowed case so I will leave this as an IR test.

Other than a few minor inline comments, this LGTM.

Perhaps some of the other reviewers want to chime in on this. Otherwise please address those nits and commit.

lib/Target/PowerPC/PPCInstrInfo.cpp
910

I think for most (all?) other conditions, we have the source first. Please stick to that convention here as well.

2002

Just a nit. The register you're spilling is the source and the stack slot you're spilling it to is the target. So calling it TargetReg is a bit misleading when it's a store. :)

lib/Target/PowerPC/PPCRegisterInfo.cpp
53

Nit: it's not actually called gp8rc but g8rc if I remember correctly.

323

// For Power9 we allow the user to enable GPR to vector spills.

Since we don't currently enable it by default even on Power9.

326

Please add a check for ELFv2 ABI. We are allowing spills only to the volatile VSR's, so we want to enable this only on the ABI where the VSR's we've selected are actually volatile.

lib/Target/PowerPC/PPCRegisterInfo.td
308

// Allow spilling GPR's into caller-saved VSR's.

test/CodeGen/PowerPC/gpr-vsr-spill2.ll
2

As implemented, this test case doesn't really test anything meaningful. It really just tests that there's a reg-to-reg copy (implemented as a move-register) followed by a spill of the target register. The two could be separated by arbitrary amount of code (including redefinition of the register).

Unless you can add more meaningful testing to this complicated test case, I would simply get rid of it.

nemanjai accepted this revision.Sep 18 2017, 6:30 AM

Forgot to accept :).

This revision is now accepted and ready to land.Sep 18 2017, 6:30 AM
syzaara updated this revision to Diff 116199.Sep 21 2017, 8:57 AM

Addressed review comments.

This revision was automatically updated to reflect the committed changes.