This is an archive of the discontinued LLVM Phabricator instance.

[X86] Allow folding of reloads from stack slots when loading a subreg of the spilled reg
ClosedPublic

Authored by mkuper on Nov 10 2016, 1:18 PM.

Details

Summary

Currently we don't support subregs in InlineSpiller::foldMemoryOperand() because targets may not deal with them correctly.

It seems that X86, at least, already does the right thing when folding a load from a stack slot when it only needs the low (e.g. not "ah") subreg.

This fixes PR30832.

Diff Detail

Repository
rL LLVM

Event Timeline

mkuper updated this revision to Diff 77542.Nov 10 2016, 1:18 PM
mkuper retitled this revision from to [X86] Allow folding of reloads from stack slots when loading a subreg of the spilled reg.
mkuper updated this object.
mkuper added reviewers: RKSimon, craig.topper, zvi, MatzeB, wmi.
mkuper added a subscriber: llvm-commits.
mkuper updated this revision to Diff 77545.Nov 10 2016, 1:20 PM

Add accidentally missing negative test.

MatzeB added inline comments.Nov 10 2016, 2:48 PM
include/llvm/Target/TargetInstrInfo.h
820–828 ↗(On Diff #77545)

Shouldn't we better fix the existing foldMemoryOperand() implementations in the targets instead of adding more API? A "fix" in the sense to actually check the SubReg and return nullptr if it is set would be enough.

mkuper added inline comments.Nov 10 2016, 3:10 PM
include/llvm/Target/TargetInstrInfo.h
820–828 ↗(On Diff #77545)

I'm somewhat wary of changing the default behavior to be more permissive.

It would be very easy to "fix" all the in-tree targets by sinking the SubReg check into their foldMemoryOperandImpl(), but it has the potential of introducing really nasty bugs for downstream out-of-tree targets.

craig.topper added inline comments.Nov 10 2016, 11:13 PM
lib/Target/X86/X86InstrInfo.h
387 ↗(On Diff #77545)

Fix the formatting here to line up the arguments.

zvi added inline comments.Nov 10 2016, 11:42 PM
lib/Target/X86/X86InstrInfo.h
383 ↗(On Diff #77545)

Can you please clarify the last sentence in the comment?

mkuper added inline comments.Nov 11 2016, 12:35 AM
lib/Target/X86/X86InstrInfo.h
383 ↗(On Diff #77545)

Basically, this is the same as the LoadMI parameter to InlineSpiller::foldMemoryOperand(). There it's described as "Load instruction to use instead of stack slot when non-null." but I'm not sure that's any clearer.

There are 3 cases InlineSpiller::foldMemoryOperand() supports:

  1. Fold a store to a stack slot into a def.
  2. Fold a load from a stack slot into a use.
  3. Fold an existing load instruction into a use.

Cases (2) and (3) are distinguished by actually passing the load instruction we want to fold through the API.
I can try to write that down more concisely (but hopefully in a clearer way than it is right now).

Note that this doesn't actually use LoadMI, just verifies it's non-null, but, conceivably, it could be useful. Do you think it would be better if I just pass a bool here for now, and change the API if we eventually need the instruction?

387 ↗(On Diff #77545)

Whoops, will do, thanks.

RKSimon edited edge metadata.Nov 11 2016, 1:48 AM

Nice! I have an outstanding poor codegen issue with the inability to split+fold scalarization cases, do you think we would be able to expand this patch in the future to handle those cases?

__m128i popcnt1(__m128i *in) {
  return (__m128i) { __builtin_popcountll(in[0][0]), __builtin_popcountll(in[0][1]) };
}

popcnt1(long long __vector(2)*):
        vmovdqu (%rdi), %xmm0
        vmovq   %xmm0, %rax
        vpextrq $1, %xmm0, %rcx
        popcntq %rax, %rax
        popcntq %rcx, %rcx
        vmovq   %rcx, %xmm0
        vmovq   %rax, %xmm1
        vpunpcklqdq     %xmm0, %xmm1, %xmm0 # xmm0 = xmm1[0],xmm0[0]
        retq

Which would be better as:

popcnt1(long long __vector(2)*):
        popcntq (%rdi), %rax
        popcntq 8(%rdi), %rcx
        vmovq   %rcx, %xmm0
        vmovq   %rax, %xmm1
        vpunpcklqdq     %xmm0, %xmm1, %xmm0 # xmm0 = xmm1[0],xmm0[0]
        retq
test/CodeGen/X86/partial-fold.ll
1 ↗(On Diff #77545)

Add i686 tests as well and regenerate with utils\update_llc_test_checks.py

Nice! I have an outstanding poor codegen issue with the inability to split+fold scalarization cases, do you think we would be able to expand this patch in the future to handle those cases?

I don't know the spill/reload code well enough to even tell if it's happening for similar reasons or not.
But CC me on the PR, I'll take a look.

test/CodeGen/X86/partial-fold.ll
1 ↗(On Diff #77545)

Sure, I'll add i686.
I'd prefer not to use update_llc_test_checks here - there's a *lot* of push/pop noise because of forcing all the regs to be used.

Nice! I have an outstanding poor codegen issue with the inability to split+fold scalarization cases, do you think we would be able to expand this patch in the future to handle those cases?

I don't know the spill/reload code well enough to even tell if it's happening for similar reasons or not.
But CC me on the PR, I'll take a look.

PR30986

test/CodeGen/X86/partial-fold.ll
1 ↗(On Diff #77545)

That's fine - it might mean that you have to split this into 32/64 versions of the file as the existing inline asm will fail on i686

wmi added inline comments.Nov 11 2016, 9:47 AM
lib/CodeGen/TargetInstrInfo.cpp
547 ↗(On Diff #77545)

In which case we will have (SubRegSize % 8) != 0?

lib/Target/X86/X86InstrInfo.cpp
6281 ↗(On Diff #77545)

Why only folding from stack-slot is supported but folding from load is not?

mkuper added inline comments.Nov 11 2016, 9:51 AM
lib/CodeGen/TargetInstrInfo.cpp
547 ↗(On Diff #77545)

It will never happen for x86, but it's possible in theory, and this is target-independent code.

lib/Target/X86/X86InstrInfo.cpp
6281 ↗(On Diff #77545)

Because it's the simple case. :-)
I need to make sure the other case makes sense (and find a test-case), and assuming it does, I'll send a follow-up patch.

I'll add a TODO.

mkuper updated this revision to Diff 77632.Nov 11 2016, 9:58 AM
mkuper edited edge metadata.

(Hopefully) addressed comments.

zvi edited edge metadata.Nov 12 2016, 11:31 AM

Thanks for improving the documentation, Michael. LGTM.

Is it worth adding the partial-fold test files to trunk with current codegen?

lib/CodeGen/TargetInstrInfo.cpp
553 ↗(On Diff #77632)

Do we need an assert(MemSize && "Empty stack slot") here?

Is it worth adding the partial-fold test files to trunk with current codegen?

I think that's rather less useful when not using update_llc_test_checks, since you don't see the full diff anyway. And, as I wrote above, I think update_llc_test_checks really does more harm than good in this case.
But I can add it anyway with just the relevant checks if you want the diff to be more explicit. What do you think?

lib/CodeGen/TargetInstrInfo.cpp
553 ↗(On Diff #77632)

Sure, I'll add one.

qcolombet added inline comments.Nov 14 2016, 1:10 PM
include/llvm/Target/TargetInstrInfo.h
821 ↗(On Diff #77632)

I am not sure I understand the intent. I believe this is just a problem of wording.
Basically, are you referring to this case:
v1 = ld

op v1.sub

or this case:
v1.sub = ld

op v1.sub

I believe this is the former and adding the example would help :).

mkuper added inline comments.Nov 14 2016, 1:32 PM
include/llvm/Target/TargetInstrInfo.h
821 ↗(On Diff #77632)

Yes, it's the former, I'll add an example, thanks!

mkuper updated this revision to Diff 77878.Nov 14 2016, 1:50 PM
mkuper edited edge metadata.

Updated per comments, and rebased to show test diffs.

Also moved the code that computes MemSize to be before calling into the target's foldMemoryOperandImpl() because, apparently, the X86 implementation feels free to change MI, so accessing MI.getOperand(Idx) after the Impl call may produce unexpected results.

gberry added a subscriber: gberry.Nov 22 2016, 11:03 AM
MatzeB edited edge metadata.Nov 22 2016, 3:09 PM

What about changing isSubregFoldable() to not take any parameters and just return true/false, so all the decision logic can stay in foldMemoryOperand()? At least that keeps the newly added API as simple as possible.
It also seems to match the usage pattern in https://reviews.llvm.org/D27002.

lib/CodeGen/TargetInstrInfo.cpp
574 ↗(On Diff #77632)

(can't remove this comment because of phabricator bug http://llvm.org/PR30572)

What about changing isSubregFoldable() to not take any parameters and just return true/false, so all the decision logic can stay in foldMemoryOperand()?

Do you mean in the target's foldMemoryOperandImpl()? (The decision needs to be target-dependent)
If you do, I think that can work. Not sure it's going to be a net improvement, but I'll post a patch.

What about changing isSubregFoldable() to not take any parameters and just return true/false, so all the decision logic can stay in foldMemoryOperand()?

Do you mean in the target's foldMemoryOperandImpl()? (The decision needs to be target-dependent)
If you do, I think that can work. Not sure it's going to be a net improvement, but I'll post a patch.

Yep, I would expect that to make the code (slightly) easier to follow as the logic is not split accross two callbacks anymore.

mkuper updated this revision to Diff 78986.Nov 22 2016, 4:50 PM
mkuper edited edge metadata.

Updated with suggested API.

MatzeB accepted this revision.Nov 22 2016, 5:46 PM
MatzeB edited edge metadata.

Thanks.

I wanted to propose to only perform the check in the `foldMemoryOperandImpl(MachineFunction &MF, MachineInstr &MI, unsigned OpNum,

ArrayRef<MachineOperand> MOs, MachineBasicBlock::iterator InsertPt, unsigned Size, unsigned Align, bool AllowCommute)`

variant, as the other two finally call into that anyway. However I see a MI.setDesc(get(NewOpc)); and I currently fail to reason about whether that common function may actually still abort after we changed the instruction type (soo much code I'm not that familiar with...).

So this LGTM as is.

This revision is now accepted and ready to land.Nov 22 2016, 5:46 PM
This revision was automatically updated to reflect the committed changes.