This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE
ClosedPublic

Authored by RKSimon on Jan 7 2019, 4:38 AM.

Diff Detail

Event Timeline

RKSimon created this revision.Jan 7 2019, 4:38 AM
RKSimon added inline comments.Jan 7 2019, 5:18 AM
test/CodeGen/ARM/lowerMUL-newload.ll
28 ↗(On Diff #180459)

This just looks like we're missing something for the ARMISD::VMULL lowering

test/CodeGen/X86/avx512-any_extend_load.ll
53 ↗(On Diff #180459)

Simplifying to ANY_EXTEND prevents PACKSS/PACKUS from working

test/CodeGen/X86/combine-sra.ll
252 ↗(On Diff #180459)

We'd been relying on the v4i64 ashr expansion

test/CodeGen/X86/vector-blend.ll
956 ↗(On Diff #180459)

Haven't worked out the problem here yet

test/CodeGen/X86/vector-trunc-widen.ll
77 ↗(On Diff #180459)

We'd been relying on the v8i64 ashr expansion

craig.topper added inline comments.Jan 7 2019, 4:33 PM
test/CodeGen/X86/vector-blend.ll
956 ↗(On Diff #180459)

I think we need to call SimplifyDemandedBits on Conditions of SHRUNKBLEND. We only do it when we convert from VSELECT to SHRUNKBLEND.

craig.topper added inline comments.
test/CodeGen/X86/vector-blend.ll
956 ↗(On Diff #180459)

Patch here D56421

RKSimon updated this revision to Diff 181284.Jan 11 2019, 8:28 AM
huihuiz added inline comments.
test/CodeGen/ARM/lowerMUL-newload.ll
28 ↗(On Diff #180459)

Using "CHECK-NEXT" and matching with the exact register names will make this test cast very sensitive to scheduling and register allocation changes.
Use pattern matching should be a better approach.

RKSimon added inline comments.Jan 23 2019, 1:20 PM
test/CodeGen/ARM/lowerMUL-newload.ll
28 ↗(On Diff #180459)

But it stops people missing/hiding codegen changes that need to be kept an eye on, including register allocation changes.

This argument has been going on for years now, and we've tended to see that the benefits of update_llc_test_checks.py outweighs any difficulties.

More importantly, do you have any insights as to how to improve ARMISD::VMULL lowering?

RKSimon updated this revision to Diff 196070.Apr 22 2019, 7:29 AM

rebase - still showing a number of regressions that are proving tricky to fix

Herald added a project: Restricted Project. · View Herald TranscriptApr 22 2019, 7:29 AM
RKSimon updated this revision to Diff 211912.Jul 26 2019, 4:56 AM

rebase + vector support for truncate(srl(x,c)) case

RKSimon updated this revision to Diff 214598.Aug 12 2019, 4:18 AM

rebase - most of the remaining x86 issues should be fixed by D66004

RKSimon updated this revision to Diff 262099.May 5 2020, 6:58 AM
RKSimon retitled this revision from [DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE (WIP) to [DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE.
RKSimon edited the summary of this revision. (Show Details)

Add support for ANY_EXTEND ops to ARM's LowerMUL.

This fixes the main MULL regression but I'm not sure how to fix the ADDW regression which seems to be a purely isel pattern - @t.p.northover @efriedma @huihuiz any thoughts?

RKSimon planned changes to this revision.Jul 4 2020, 1:23 AM

The pattern in question comes out of https://github.com/llvm/llvm-project/blob/0fa0cf8638b0777a1a44feebf78a63865e48ecf6/llvm/lib/Target/ARM/ARMInstrNEON.td#L3100 , and it traces out to https://github.com/llvm/llvm-project/blob/0fa0cf8638b0777a1a44feebf78a63865e48ecf6/llvm/lib/Target/ARM/ARMInstrNEON.td#L4216 .

Probably we want to do what the Hexagon backend does: def asext: PatFrags<(ops node:$Rs), [(sext node:$Rs), (anyext node:$Rs)]>;.

RKSimon updated this revision to Diff 291969.Sep 15 2020, 10:41 AM

rebase - avg.ll regressions now fixed

RKSimon planned changes to this revision.Sep 15 2020, 10:41 AM
lebedev.ri added inline comments.
llvm/test/CodeGen/X86/combine-sra.ll
248–249

Appears to be a regression

llvm/test/CodeGen/X86/vector-trunc.ll
71–72

Appears to be a regression

388–399

I'm not very sure it's an improvement

note: this is still a wip

RKSimon retitled this revision from [DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE to [DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE (WIP).Sep 15 2020, 11:00 AM
yubing added a subscriber: yubing.Dec 27 2020, 5:49 PM
arsenm added inline comments.Jan 4 2021, 11:28 AM
llvm/lib/Target/ARM/ARMISelLowering.cpp
8693–8695 ↗(On Diff #291969)

return getOpcode() == ANY_EXTEND but I'm guessing this is just a placeholder function anyway

RKSimon updated this revision to Diff 314868.Jan 6 2021, 5:41 AM

rebase (still WIP though)

RKSimon planned changes to this revision.Jan 6 2021, 5:41 AM
foad added a subscriber: foad.Jan 7 2021, 3:27 AM
RKSimon added inline comments.Jan 18 2021, 9:07 AM
llvm/test/CodeGen/AArch64/lowerMUL-newload.ll
362

@dmgreen What do you think is the best way to extend D93833 to handle multiply-add/sub as well? Handle in DAG or refactor the isel patterns to accept sanyext/zanyext (I didn't get very far with my initial attempt with this approach as a lot of the patfrags were hardcodded)?

RKSimon updated this revision to Diff 317383.Jan 18 2021, 9:51 AM

@dmgreen I've added ANY_EXTEND matching in isSignExtended to give you an indication of effect on codegen

I only looked at the ARM equivalent. From what I remember, the sequence of events was something like:

  • One of the two operands to the mul was converted from a sext to an anyext. The other was not due to having multiple uses.
  • That anyext was folded into a load to produce a zextload (we don't produce a vector anyext load)
  • We couldn't match anything due one operand being a sext and the other being a zextload.

So in that case we would either need to use demanded bits know the top bits are not needed when converting it to a mull, create an anyextload instead of a zextload or handle multiple uses so both inputs turn into anyext or zextloads.

I'm happy for the isSignExtended change, as far as I understand that should be fine. The ARM side may be harder to fix, and as the test seems to only added for correctness - it doesn't seem like something that should hold up this patch. We should have fixed the majority of cases and if more come up we can tackle them as needed. I would be happy with this patch so long as the X86 changes are OK.

RKSimon updated this revision to Diff 317511.Jan 19 2021, 3:35 AM
RKSimon retitled this revision from [DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE (WIP) to [DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE.
RKSimon edited the summary of this revision. (Show Details)

rebase - if we're happy with the AARCH64/ARM changes - any more comments?

xbolva00 added inline comments.
llvm/test/CodeGen/X86/combine-sra.ll
248–249
RKSimon added inline comments.Jan 19 2021, 7:38 AM
llvm/test/CodeGen/X86/combine-sra.ll
248–249

Looking at this now - the set of combines that was necessary to get to the old codegen is pretty impressive.....

RKSimon updated this revision to Diff 317867.Jan 20 2021, 6:41 AM

rebase - the last x86 regression (combine-sra.ll) should now be fixed.

Seems fine to me, thanks.

llvm/test/CodeGen/X86/vector-trunc.ll
388–399

Looks like for pre-SSE41 we still fail to detect high bits as zeros?

RKSimon added inline comments.Jan 20 2021, 7:12 AM
llvm/test/CodeGen/X86/vector-trunc.ll
388–399

pre-SSE41 we don't have packusdw (I've no idea why this wasn't included in SSE2 with the rest of them...) so we have a fallback to continue to use packssdw

Yes, looks good now.

lebedev.ri accepted this revision.Jan 20 2021, 7:22 AM
This revision is now accepted and ready to land.Jan 20 2021, 7:22 AM
xbolva00 accepted this revision.Jan 20 2021, 7:22 AM
This revision was landed with ongoing or failed builds.Jan 20 2021, 7:40 AM
This revision was automatically updated to reflect the committed changes.
hans added a subscriber: hans.Jan 20 2021, 11:07 AM

This caused asserts in Chromium. See https://bugs.chromium.org/p/chromium/issues/detail?id=1168629#c2 for a reproducer.

I've reverted in a51226057fc30510ac86b32a36a9769ddbf4c318 in the meantime.