Page MenuHomePhabricator

[DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold
ClosedPublic

Authored by lebedev.ri on May 21 2019, 3:16 PM.

Details

Summary

The main motivation is shown by all these neg instructions that are now created.
In particular, the @reg32_lshr_by_negated_unfolded_sub_b test.

AArch64 test changes all look good (neg created), or neutral.

X86 changes look neutral (vectors), or good (neg / xor eax, eax created).

I'm not sure about X86/ragreedy-hoist-spill.ll, it looks like the spill
is now hoisted into preheader (which should still be good?),
2 4-byte reloads become 1 8-byte reload, and are elsewhere,
but i'm not sure how that affects that loop.

I'm unable to interpret AMDGPU change, looks neutral-ish?

This is hopefully a step towards solving PR41952.

https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later)

Diff Detail

Repository
rL LLVM

Event Timeline

lebedev.ri created this revision.May 21 2019, 3:16 PM

Better test coverage.

lebedev.ri marked an inline comment as done.May 22 2019, 10:36 AM
lebedev.ri added inline comments.
test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll
6

@arsenm I believe D62263 recovers (and improves upon?) from changes here, somewhat, PTAL.

bjope added a subscriber: bjope.May 22 2019, 12:04 PM
bjope added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2926

Not sure if it is super important, or common, in reality. But maybe this should be trigger for an add-like-or as well?

An ADD with no common bits set in the operands is canonicalized into OR by DAGCombiner, so there could be lots of OR:s out there that really are ADD:s.

lebedev.ri added inline comments.May 22 2019, 12:17 PM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2926

Yes, correct observation. I did think about it when writing,
but i did not see any easy way to do that here.
Any suggestions?

bjope added inline comments.May 23 2019, 2:49 PM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2926

We could introduce some helper to make it easier to match an "add-like" node. However, checking "noCommonBitsSet" to detect if an OR is add-like currently involves ValueTracking, so it is not a cheap operation. We probably want to see that we get some payback if doing that, and not just wasting time.

Since we already match on ISD::ADD in several places above, it might have been too much to ask for in this patch. So I think we can save it for later.

lebedev.ri marked 2 inline comments as done.May 24 2019, 8:58 AM

Do x86 test changes look good?

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2926

Since we already match on ISD::ADD in several places above, it might have been too much to ask for in this patch. So I think we can save it for later.

I agree. Thank you for taking a look.

test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll
6

I've verified that with that patch, the original test (with non-auto-generated check lines) passes, so i'm going to assume this is ok.

The x86 changes look OK to me.

The x86 changes look OK to me.

Thank you.
That means this patch is only waiting for @arsenm to comment whether D62263 sufficiently addresses this AMDGPU regression.

LGTM

Thank you for the review!

Time to land first few of these patches. :)

RKSimon accepted this revision.May 28 2019, 9:18 AM

LGTM

This revision is now accepted and ready to land.May 28 2019, 9:18 AM

Rebased, NFC.

lebedev.ri reopened this revision.May 28 2019, 12:04 PM

One of the patches seems to have introduced a hang in test-suite, reverted.

This revision is now accepted and ready to land.May 28 2019, 12:04 PM
lebedev.ri reopened this revision.May 30 2019, 9:05 AM

And once more.
rL362109 + rL362110

This revision is now accepted and ready to land.May 30 2019, 9:05 AM

Unfortunately, we (Google) are seeing some regressions on our internal benchmarks on x86_64 -- mostly around 2%, but in some cases a good bit more -- from this revision and some of the subsequent revisions in the series, notably r362217. Just wanted to give you a heads-up at this point -- we're still working to try to understand the underlying reasons and get a reproducing test-case, but it looks like these are going to block our internal release until they're addressed.