This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Expand combining of FP logical operations to sign-setting FP operations
ClosedPublic

Authored by nemanjai on Mar 15 2018, 4:27 PM.

Details

Summary

This patch does two things:

  1. Expand the DAG combiner to handle vectors when combining FP logical ops and to produce negated absolute value
  2. Enable this on PPC

I tried to split this out to two patches to handle the target independent part separately from the PPC part, however I'm not sure how to write a test case for it without a target that enables this on both scalar and vector types. I'd be happy to split it if someone has an idea on how to produce a test case.

I would appreciate it if target owners can try out this patch as it of course has the potential to cause infinite loops if they have combines that will convert ISD::FABS and ISD::FNEG into bitcasts and logical ops and their target is set to allow these combines for the respective type.

Diff Detail

Repository
rL LLVM

Event Timeline

nemanjai created this revision.Mar 15 2018, 4:27 PM

Ping. Do any other targets care about this? The sign manipulation instructions are definitely faster on PPC.

lebedev.ri added inline comments.
test/CodeGen/PowerPC/float-logic-ops.ll
4 ↗(On Diff #138649)

Separate by newline

35 ↗(On Diff #138649)

Would be nice to at least one test with one undef element, and with bad non-splat constant.

RKSimon added a subscriber: RKSimon.
RKSimon added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
8933 ↗(On Diff #138649)

We have ISD::isConstOrConstSplat - can we use that instead and merge these 2 paths?

AArch64 doesn't have a nabs instruction; orr is probably faster than fabs+fneg (unless some implementations have a transition penalty between int SIMD and float SIMD). But we could accept fneg(fabs(x)) as the canonical form and pattern-match it.

AArch64 doesn't have a nabs instruction; orr is probably faster than fabs+fneg (unless some implementations have a transition penalty between int SIMD and float SIMD). But we could accept fneg(fabs(x)) as the canonical form and pattern-match it.

If you think it's worthwhile, I could make hasBitPreservingFPLogic() return an enum with None, FabsOnly, FnegOnly, SingleNodeOnly, All or something along those lines... and then use that to guide which transformations to actually apply.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
8933 ↗(On Diff #138649)

I think that will fail if some of the elements are undef. And that seems unnecessary. Something like this:

%conv = bitcast <4 x float> %a to <4 x i32>
%and = and <4 x i32> %conv, <i32 2147483647, i32 undef, i32 undef, i32 2147483647>

should be perfectly fine to transform.

nemanjai updated this revision to Diff 149236.May 30 2018, 11:43 PM

Updated the test to include invalid (non-splat) values as well as undefs as part of the splat.

RKSimon added inline comments.May 31 2018, 3:23 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
8933 ↗(On Diff #138649)

Its not out of the question to add an 'allowUndef' optional argument (default to false) to ISD::isConstOrConstSplat

Thinking about it a bit more, we probably want to canonicalize to fabs+fneg anyway on AArch64, rather than the other way, because the fneg could combine with a later operation. And pattern-matching it is pretty easy, anyway.

nemanjai updated this revision to Diff 168416.Oct 4 2018, 5:27 PM

Since there seems to be overall agreement that this is the direction we want to go in, here's an update that removes the unsightly dual path for scalars and vectors.
Sorry about the delay in producing this update.

spatel added a comment.Oct 7 2018, 9:40 AM

I'm trying to improve vector-demanded-elements folds in D52912 which means we need to do better at recognizing patterns with undefs.

So I extracted 2 parts of this patch in:
rL343865
rL343940

So rebasing should allow you to shrink this patch.

nemanjai updated this revision to Diff 168788.Oct 9 2018, 6:45 AM

Rebased to account for Sanjay's patches.

spatel added a comment.Oct 9 2018, 7:58 AM

How about enabling the hook for PPC before adding to the generic DAG combine...
Right now, you'll get something horrible for normal fabs (and fneg?) without this patch, right?

	xscvdpspn 0, 1
	xxsldwi 0, 0, 0, 3
	mfvsrwz 3, 0
	rlwinm 3, 3, 0, 1, 31
	mtvsrd 0, 3
	xxsldwi 0, 0, 0, 1
	xscvspdpn 1, 0

How about enabling the hook for PPC before adding to the generic DAG combine...
Right now, you'll get something horrible for normal fabs (and fneg?) without this patch, right?

	xscvdpspn 0, 1
	xxsldwi 0, 0, 0, 3
	mfvsrwz 3, 0
	rlwinm 3, 3, 0, 1, 31
	mtvsrd 0, 3
	xxsldwi 0, 0, 0, 1
	xscvspdpn 1, 0

Yes. Although that is the worst case (since PPC f32 isn't 32 bits in a register).
No doubt, the PPC-specific portion of this needs to go in and can certainly go in first. I can commit this as the PPC portion first (with the fnabs test missing). And then the generic part (adding the fnabs test).
The question is do I need to split this into two patches and post for review or can I just commit it as two separate patches (or maybe just commit the PPC-portion and post the generic portion for review)?

How about enabling the hook for PPC before adding to the generic DAG combine...
Right now, you'll get something horrible for normal fabs (and fneg?) without this patch, right?

	xscvdpspn 0, 1
	xxsldwi 0, 0, 0, 3
	mfvsrwz 3, 0
	rlwinm 3, 3, 0, 1, 31
	mtvsrd 0, 3
	xxsldwi 0, 0, 0, 1
	xscvspdpn 1, 0

Yes. Although that is the worst case (since PPC f32 isn't 32 bits in a register).
No doubt, the PPC-specific portion of this needs to go in and can certainly go in first. I can commit this as the PPC portion first (with the fnabs test missing). And then the generic part (adding the fnabs test).
The question is do I need to split this into two patches and post for review or can I just commit it as two separate patches (or maybe just commit the PPC-portion and post the generic portion for review)?

Please extract and commit the PPC change. That's a pure win in all cases AFAICT.
Then we can come back to the fnabs part here. I don't think there will be any controversy based on the earlier comments, but let me add a couple of non-PPC tests, so we're sure about that.

nemanjai updated this revision to Diff 168869.Oct 9 2018, 2:07 PM

Update to account for already committed PPC-specific changes.

spatel accepted this revision.Oct 9 2018, 2:32 PM

LGTM

Nit: it would be nicer to check in the PPC tests with baseline asm as a preliminary step. That way, we'd just see the asm diff there too.

AArch64 and x86 don't have vector diffs, but for different reasons.

AArch:

bool hasBitPreservingFPLogic(EVT VT) const override {
  // FIXME: Is this always true? It should be true for vectors at least.
  return VT == MVT::f32 || VT == MVT::f64;
}

x86:
We're already producing the optimal 'or' instruction, and either that transform has already occurred, or this transform fires and gets folded to the same thing (I didn't step through the debug).

This revision is now accepted and ready to land.Oct 9 2018, 2:32 PM
This revision was automatically updated to reflect the committed changes.