This is an archive of the discontinued LLVM Phabricator instance.

[X86] Recognize a splat of negate in isFNEG
ClosedPublic

Authored by eraman on Jun 21 2018, 5:54 PM.

Details

Summary

Expand isFNEG so that we generate the appropriate F(N)M(ADD|SUB) instructions in more cases. For example, the following sequence

a = _mm256_broadcast_ss(f)
d = _mm256_fnmadd_ps(a, b, c)

generates an fsub and fma without this patch and an fnma with this change.

Diff Detail

Repository
rL LLVM

Event Timeline

eraman created this revision.Jun 21 2018, 5:54 PM

I've updated avx2-fma-fneg-combine.ll to get rid of those comment changes please can you rebase?

eraman updated this revision to Diff 152475.Jun 22 2018, 7:26 AM

Rebase after r335342

RKSimon added inline comments.Jun 22 2018, 8:21 AM
test/CodeGen/X86/avx2-fma-fneg-combine.ll
131 ↗(On Diff #152475)

Sorry I should have asked you to commit this test to trunk as well so this patch shows the codegen diff

Does this patch deal with the case if we negate the scalar before doing the splat?

define <8 x float> @test7(float %a, <8 x float> %b, <8 x float> %c) {

%t0 = fsub float -0.0, %a
%t1 = insertelement <8 x float> undef, float %t0, i32 0
%t2 = shufflevector <8 x float> %t1, <8 x float> undef, <8 x i32> zeroinitializer
%t3 = tail call <8 x float> @llvm.fma.v8f32(<8 x float> %t2, <8 x float> %b, <8 x float> %c)
ret <8 x float> %t3

}

That transform was suggested as a canonicalization in:
https://bugs.llvm.org/show_bug.cgi?id=37463

No, this doesn't deal with this. Should I start from the negate and push it
down if it is transitively used by fma?

eraman updated this revision to Diff 152511.Jun 22 2018, 10:38 AM

Rebase after adding the test case at r335367

No, this doesn't deal with this. Should I start from the negate and push it
down if it is transitively used by fma?

That seems backwards. Why don't we start from the fma and pattern match an fneg of 1 of its operands (looking through a splat as needed)?

Looks like there's already logic in place for this in X86's combineFMA() with a possibly related FIXME comment?

RKSimon added inline comments.Jun 22 2018, 11:45 AM
lib/Target/X86/X86ISelLowering.cpp
31185 ↗(On Diff #152511)

Please move the isFNEG as a NFC commit

31210 ↗(On Diff #152511)

Can this be written as a llvm::any_of pattern?

31214 ↗(On Diff #152511)

Early out if !SVOp || !SVOp->isSplat()

31215 ↗(On Diff #152511)

Don't use auto for non-obvious (casts etc.) cases.

31218 ↗(On Diff #152511)

Its unlikely that Op1 isn't undef, but if you have cases of this you could handle both ops but testing SVOp->getSplatIndex()

I am going to take the approach suggested by Sanjay and expand isFNEG to
handle splat of a negated scalar. Then, the current combineFMA should take
care of the rest. I got the isFNEG to work with shuffle (and work on the
current test case) but haven't yet handled the insertelement case. I am out
traveling for the next ten days and will send a revised patch after I am
back.

eraman updated this revision to Diff 154035.Jul 3 2018, 6:16 PM

Implement this by expanding the patterns generated by isFNEG

eraman retitled this revision from [X86] Recognize an fnma in the presence of an intervening shuffle. to [X86] Recognize a splat of negate in isFNEG.Jul 3 2018, 6:17 PM
eraman edited the summary of this revision. (Show Details)
RKSimon added inline comments.Jul 4 2018, 5:37 AM
test/CodeGen/X86/avx2-fma-fneg-combine.ll
139 ↗(On Diff #154035)

Please can you commit this test to trunk so the patch shows the codegen diff?

eraman updated this revision to Diff 154342.Jul 5 2018, 6:00 PM
eraman edited the summary of this revision. (Show Details)

Rebase after r336404

eraman marked an inline comment as done.Jul 9 2018, 4:16 PM
eraman added inline comments.
test/CodeGen/X86/avx2-fma-fneg-combine.ll
139 ↗(On Diff #154035)

Forgot to respond here, but I have committed the test and rebased the patch after the commit.

RKSimon added inline comments.Jul 10 2018, 3:41 AM
lib/Target/X86/X86ISelLowering.cpp
36774 ↗(On Diff #154342)
if (auto *SVOp = dyn_cast<ShuffleVectorSDNode>(Op.getNode()))
36849 ↗(On Diff #154342)
if (Opc == ISD::FSUB)
  std::swap(Op0, Op1);
return Negate(Op0, Op1);
36856 ↗(On Diff #154342)

Now that we're creating new nodes inside isFNEG() I wonder if we should try to avoid repetition of isFNEG() in combineXor etc. - Maybe split combineFneg into combineFneg and combineFnegPatterns which take the isFNEG result?

eraman marked 3 inline comments as done.Jul 11 2018, 12:17 PM
eraman added inline comments.
lib/Target/X86/X86ISelLowering.cpp
36856 ↗(On Diff #154342)

Instead of splitting combineFneg, I have removed the assert below and returned early if isFNEG returns an empty node.

eraman updated this revision to Diff 155045.Jul 11 2018, 12:18 PM

Address review comments.

RKSimon added inline comments.Jul 16 2018, 9:25 AM
lib/Target/X86/X86ISelLowering.cpp
36829 ↗(On Diff #155045)

auto *BV = dyn_cast<BuildVectorSDNode>(Op1)

36834 ↗(On Diff #155045)

Constant *C

eraman marked 2 inline comments as done.Jul 16 2018, 4:02 PM
eraman added inline comments.
lib/Target/X86/X86ISelLowering.cpp
36829 ↗(On Diff #155045)

I have also changed the following line to
if (auto *CN = BV->getConstantFPSplatNode())

eraman updated this revision to Diff 155779.Jul 16 2018, 4:03 PM
eraman marked an inline comment as done.

Address Simon's comments.

RKSimon added inline comments.Jul 17 2018, 2:57 AM
lib/Target/X86/X86ISelLowering.cpp
36852 ↗(On Diff #155779)

Are there any circumstances that this isn't a ConstantFP? getTargetConstantFromNode peeks through bitcasts so don't you need to use dyn_cast_or_null?

36863 ↗(On Diff #155779)

dyn_cast_or_null?

eraman updated this revision to Diff 155968.Jul 17 2018, 1:57 PM

Change cast_or_null to dyn_cast_or_null

lib/Target/X86/X86ISelLowering.cpp
36852 ↗(On Diff #155779)

I have changed it to dyn_cast_or_null, but thinking about it I don't think that is needed. First, the current code reads

if (Op1.getOpcode() == X86ISD::VBROADCAST) {

if (auto *C = getTargetConstantFromNode(Op1.getOperand(0)))
  if (isSignMask(cast<ConstantFP>(C)))

The x86 vbroadcast instruction broadcasts floating point values, so I think the cast<ConstantFP> is right.

RKSimon added inline comments.Jul 18 2018, 7:41 AM
lib/Target/X86/X86ISelLowering.cpp
36852 ↗(On Diff #155779)

If you look at getTargetConstantFromNode the first thing it does is call peekThroughBitcasts so it can have any type. In fact we should probably be checking that the vector's element size is correct as well.

This is the kind of thing a fuzz test finds in 3 months time......

eraman added inline comments.Jul 26 2018, 11:06 AM
lib/Target/X86/X86ISelLowering.cpp
36852 ↗(On Diff #155779)

I don't get the comment about vector's element size. Which verctor's element size should be checked here?

RKSimon added inline comments.Jul 30 2018, 6:30 AM
lib/Target/X86/X86ISelLowering.cpp
36852 ↗(On Diff #155779)

Looking at this again, you could simplify a lot of this by using getTargetConstantBitsFromNode instead to extract all the bits for you, avoiding all the BROADCAST/BUILD_VECTOR special cases as getTargetConstantBitsFromNode should do all of that already.

@eraman Please can you take a look at rL338358 - this shows how to use getTargetConstantBitsFromNode to avoid a lot of extra checks you need from using getTargetConstantFromNode directly

eraman updated this revision to Diff 158802.Aug 2 2018, 11:08 AM

Updates to work with r338358.

@eraman Please can you take a look at rL338358 - this shows how to use getTargetConstantBitsFromNode to avoid a lot of extra checks you need from using getTargetConstantFromNode directly

I have updated the code to make use of this. I had to extend getTargetConstantBitsFromNode to support build vector of ConstantFPSDNodes. PTAL.

RKSimon added inline comments.Aug 3 2018, 3:36 AM
lib/Target/X86/X86ISelLowering.cpp
36937 ↗(On Diff #158802)

unsigned - only use auto when the type is very obvious

36965 ↗(On Diff #158802)

Do we have test coverage for ignoring the undefs?

36970 ↗(On Diff #158802)
for (unsigned I = 0, E = EltBits.size(); I < E; I++)
36980 ↗(On Diff #158802)

Why did you create the lambda? Why not just inline Negate?

eraman marked 3 inline comments as done.Aug 3 2018, 10:24 AM
eraman added inline comments.
lib/Target/X86/X86ISelLowering.cpp
36965 ↗(On Diff #158802)

test7 of avx2-fma-fneg-combine.ll has fsub with all but one elements of the constant being undef.

36980 ↗(On Diff #158802)

Carryover from an initial version where I thought the lambda made sense.

eraman updated this revision to Diff 159040.Aug 3 2018, 10:25 AM
eraman marked an inline comment as done.

Address review comments.

RKSimon accepted this revision.Aug 6 2018, 7:10 AM

LGTM

This revision is now accepted and ready to land.Aug 6 2018, 7:10 AM
This revision was automatically updated to reflect the committed changes.