Page MenuHomePhabricator

[SelectionDAG] Better legalization for FSHL and FSHR
ClosedPublic

Authored by foad on Mar 31 2020, 9:59 AM.

Details

Summary

In SelectionDAGBuilder always translate the fshl and fshr intrinsics to
FSHL and FSHR (or ROTL and ROTR) instead of lowering them to shifts and
ORs. Improve the legalization of FSHL and FSHR to avoid code quality
regressions.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
foad marked an inline comment as done.Jun 18 2020, 7:42 AM

Rebase. Improve isNonZeroModBitWidth.

RKSimon added inline comments.Jun 25 2020, 1:24 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6276

@foad We've lost this functionality - this needs adding to the new expansion code or to DAGCombiner

foad marked an inline comment as done.Jun 25 2020, 4:13 AM
foad added inline comments.
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6276

TargetLowering::expandROT does this: "// If a rotate in the other direction is supported, use it." Are there tests showing that it's not working for some reason?

RKSimon added inline comments.Jun 25 2020, 4:59 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6276

Yes the AMDGPU/fshl.ll tests look like they've regressed because of this

reverse ping - any luck on the amdgpu regressions?

foad updated this revision to Diff 277695.Jul 14 2020, 12:58 AM
foad marked an inline comment as done.

Expand FSHL as FSHR or vice versa if the target only supports one.

foad added a comment.Jul 14 2020, 1:00 AM

reverse ping - any luck on the amdgpu regressions?

Yes should be fixed now.

Looks good to me.

RKSimon added inline comments.Jul 15 2020, 1:31 AM
llvm/test/CodeGen/PowerPC/funnel-shift.ll
24–26

Any luck with this?

foad marked an inline comment as done.Jul 15 2020, 2:16 AM
foad added inline comments.
llvm/test/CodeGen/PowerPC/funnel-shift.ll
24–26

No. I think it would need a target-specific lowering to take advantage of the fact that shift by 32 is well defined on PowerPC. Can I leave this for a later patch, and/or could a PowerPC expert volunteer to help with it?

RKSimon added inline comments.Jul 16 2020, 1:10 AM
llvm/test/CodeGen/PowerPC/funnel-shift.ll
24–26

So we're not just missing a micro optimization we can add to the powerpc backend as part of this patch? @hfinkel Any suggestions?

foad marked an inline comment as done.Jul 16 2020, 1:59 AM
foad added inline comments.
llvm/test/CodeGen/PowerPC/funnel-shift.ll
24–26

I can't think of a simple micro optimization that would help. For PowerPC it would be better to expand fshl(X, Y, Z) as X << (Z & 31) | Y >> (32 - (Z & 31)) (where the right shift supports shifting by 32) which would be better than either the "before" or the "after" code shown in this diff.

RKSimon added inline comments.Jul 16 2020, 3:05 AM
llvm/test/CodeGen/PowerPC/funnel-shift.ll
24–26

Are you able to add the PPC custom lowering to this patch?

foad marked an inline comment as done.Jul 16 2020, 6:40 AM
foad added inline comments.
llvm/test/CodeGen/PowerPC/funnel-shift.ll
24–26

I'd prefer to do it as a separate patch first: D83948.

foad updated this revision to Diff 283168.Aug 5 2020, 2:17 AM

Rebase now D83948 has landed.

This fixes the PowerPC regression, but there's a new regression in some newly added RISCV test cases.

foad updated this revision to Diff 283190.Aug 5 2020, 3:58 AM

RISCV updates to avoid code quality regressions from changes in the way funnel shifts are lowered.

foad added a subscriber: lewis-revill.

RISCV updates to avoid code quality regressions from changes in the way funnel shifts are lowered.

@asb @lewis-revill could you take a look at my RISCV changes please? Code quality seems to be as good or better in all lit test cases, I just had to change some RISCV patterns to match the new lowering.

@asb Any thoughts on the RISCV changes? This is the last blocker on the patch.

arsenm added inline comments.Aug 13 2020, 7:14 AM
llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
146

dyn_cast rather than isa and cast?

foad updated this revision to Diff 285979.Aug 17 2020, 5:33 AM

Rebase and address a review comment.

foad added inline comments.Aug 17 2020, 5:35 AM
llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
146

I refactored it a little to use isa and getConstantOperandVal, which seems to be the prevailing style in this file.

@asb any objections to me approving this?

RKSimon accepted this revision.Aug 21 2020, 2:00 AM

LGTM

This revision is now accepted and ready to land.Aug 21 2020, 2:00 AM
asb added a comment.Aug 21 2020, 2:27 AM

No objections accepting from the RISC-V side. I don't have a set of execution tests for the fshl/fshr changes to validate them, but I didn't see anything obviously wrong. The changes primarily affect the (still experimental) bitmanip extension also, so the bar for review is somewhat lower. The change does seem to improve srliw lowering in a few places in the GCC torture suite as well (in a way that is both correct and beneficial).

This revision was landed with ongoing or failed builds.Aug 21 2020, 2:33 AM
This revision was automatically updated to reflect the committed changes.
bjope added a subscriber: bjope.Aug 21 2020, 5:10 AM
bjope added inline comments.
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
6159

Our OOT target got legal types that aren't pow-of-2. So this kind of broke some funnel-shift tests that happened to work earlier.

I realize that it might be hard to verify the behavior for non-pow-of-2 types upstream, so not sure if I'm welcome to do patches upstream that fixes the problem (although there might be other OOT targets in the same situation).

BTW: I haven't figured out exactly what part of the code below that only works for pow-of-2 types yet. I see some checks for isPowerOf2_32(BW) further down in the code, so maybe the assert isn't needed at all?

foad added inline comments.Aug 21 2020, 5:29 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
6159

I believe that only the conversion fshl(X, Y, Z) <--> fshr(X, Y, -Z) depends on bitwidth being a power of two. So pushing the assertion inside the "if" might be a quick fix, assuming your target has equally good support for shifting in both directions?

The correct conversion for non-power-of-two would be something like: fshl(X, Y, Z) -> fshr(X, Y, BW - (Z % BW)) and vice versa. Personally I'd be happy to have this code upstream, though I don't know if there would be any way of testing it.

bjope added inline comments.Aug 21 2020, 5:33 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
6159

Yes, I realize this is related to the new code below.
So we could just add it as a condition in the if-statement related to swapping between FSHL<->FSHR.

Now it is a bit weird to have the assert here, guarding all the code below, when the general case below handle non-pow-of-2 BW explicitly.

foad added inline comments.Aug 21 2020, 5:37 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
6159

So we could just add it as a condition in the if-statement related to swapping between FSHL<->FSHR.

Good idea, maybe with a comment explaining how to implement properly.

Now it is a bit weird to have the assert here, guarding all the code below, when the general case below handle non-pow-of-2 BW explicitly.

Agreed. Sorry about that!

bjope added inline comments.Aug 21 2020, 5:43 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
6159

No worries.
Patch seem good in general. I saw some lowering improvements for important cases (and only some minor regression, at least in instruction count, for weird types like i37 but that isn't really important for my backend).

bjope added inline comments.Aug 21 2020, 10:31 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
6163

Is this transform even correct?

Assume we got fshl 3, 2, 0, then the result would be 3. But this code could turn it into fshr 3, 2, (0-0), then the result would become 2.

bjope added inline comments.Aug 21 2020, 1:58 PM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
6163

Maybe one can limit the condition to

if (!isOperationLegalOrCustom(Node->getOpcode(), VT) &&
      isOperationLegalOrCustom(RevOpcode, VT) &&
      isPowerOf2_32(BW) && isNonZeroModBitWidth(Z, BW)) {

Although, isNonZeroModBitWidth currently allow undef values. And I wonder if it is allowed to turn fshl X, Y, undef into fshr X, Y, (0-UNDEF) , considering that we for example know that fshl -1, 0, undef can't be zero, but fshr -1, 0, undef can be zero. So one probably needs to restrict the condition further by adding a isNonZeroNonUndefModBitWidth helper and use that here.

foad added inline comments.Aug 22 2020, 1:11 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
6163

You're right. I'll take a proper look on Monday. Of course this is just an optimization so it's safe to disable it completely if it's causing problems. I think I added it to fix some code quality regressions in the AMDGPU tests.

xbolva00 added inline comments.
llvm/test/CodeGen/X86/vector-fshl-512.ll
672

Is this better sequence? Looks longer.

bjope added inline comments.Mon, Aug 24, 12:31 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
6163

Proposed workaround/fixup here: https://reviews.llvm.org/D86430

foad added inline comments.Mon, Aug 24, 1:16 AM
llvm/test/CodeGen/X86/vector-fshl-512.ll
672

I agree it's longer but my knowledge of AVX-512 is limited. It's using the expansion of fshl: X << (Z % BW) | Y >> 1 >> (BW - 1 - (Z % BW))
but it looks like the "Y >> 1 >> expr" part is generating messy code. Compared to the old code:

vpsrlw %xmm4, %ymm5, %ymm5
vpsrlw %xmm4, %ymm1, %ymm1

I would expect to see just a couple more instructions here:

vpsrlw $1, %ymm5, %ymm5
vpsrlw %xmm4, %ymm5, %ymm5
vpsrlw $1, %ymm1, %ymm1
vpsrlw %xmm4, %ymm1, %ymm1

which would be offset by not needing the vpcmpeqw/vpternlogq at the end.

The fuzzer has found a failing case here: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=25150

define i33 @fshr_multi_use(i33 %a) {
bb:
  %b = tail call i33 @llvm.fshr.i33(i33 %a, i33 %a, i33 1)
  %e = and i33 %b, 31
  ret i33 %e
}
foad added a comment.Mon, Aug 24, 5:35 AM

The fuzzer has found a failing case here: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=25150

define i33 @fshr_multi_use(i33 %a) {
bb:
  %b = tail call i33 @llvm.fshr.i33(i33 %a, i33 %a, i33 1)
  %e = and i33 %b, 31
  ret i33 %e
}

D86449.

dmajor added a subscriber: dmajor.Mon, Aug 24, 8:59 AM

We're seeing a break in the Firefox build after this change:

WidenVectorResult #0: t29: v2i32 = rotl t25, t28, /builds/worker/checkouts/gecko/gfx/angle/checkout/src/image_util/loadimage.cpp:640

Do not know how to widen the result of this operator!

The reproducers are available from the upper-left pane of https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=313707074&repo=try&lineNumber=14721 if you click "Show Job Info".

Will this be covered by one of the existing followups?

We're seeing a break in the Firefox build after this change:

WidenVectorResult #0: t29: v2i32 = rotl t25, t28, /builds/worker/checkouts/gecko/gfx/angle/checkout/src/image_util/loadimage.cpp:640

Do not know how to widen the result of this operator!

The reproducers are available from the upper-left pane of https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=313707074&repo=try&lineNumber=14721 if you click "Show Job Info".

Will this be covered by one of the existing followups?

We're also seeing this in our internal testing. I was about to put together a patch for it if its not already being worked on.

foad added a comment.Mon, Aug 24, 9:55 AM

We're seeing a break in the Firefox build after this change:

WidenVectorResult #0: t29: v2i32 = rotl t25, t28, /builds/worker/checkouts/gecko/gfx/angle/checkout/src/image_util/loadimage.cpp:640

Do not know how to widen the result of this operator!

The reproducers are available from the upper-left pane of https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=313707074&repo=try&lineNumber=14721 if you click "Show Job Info".

Will this be covered by one of the existing followups?

We're also seeing this in our internal testing. I was about to put together a patch for it if its not already being worked on.

I'm not working on it yet as I don't have a small test case yet. I imagine the fix is just to add cases for ROTL and ROTR at LegalizeVectorTypes.cpp:2866.

We're seeing a break in the Firefox build after this change:

WidenVectorResult #0: t29: v2i32 = rotl t25, t28, /builds/worker/checkouts/gecko/gfx/angle/checkout/src/image_util/loadimage.cpp:640

Do not know how to widen the result of this operator!

The reproducers are available from the upper-left pane of https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=313707074&repo=try&lineNumber=14721 if you click "Show Job Info".

Will this be covered by one of the existing followups?

We're also seeing this in our internal testing. I was about to put together a patch for it if its not already being worked on.

I'm not working on it yet as I don't have a small test case yet. I imagine the fix is just to add cases for ROTL and ROTR at LegalizeVectorTypes.cpp:2866.

Should be fixed after 43465a43755498e11b14ceb46e278bd127b3b3d7

ro added a subscriber: ro.Tue, Aug 25, 9:05 AM

This patch broke sparcv9-sun-solaris2.11 RelWithDebInfo build: cf. Bug 47303.

foad added a comment.Wed, Aug 26, 1:58 AM
In D77152#2236467, @ro wrote:

This patch broke sparcv9-sun-solaris2.11 RelWithDebInfo build: cf. Bug 47303.

D86601.