Page MenuHomePhabricator

[SelectionDAG] Better legalization for FSHL and FSHR
Needs ReviewPublic

Authored by foad on Mar 31 2020, 9:59 AM.

Details

Summary

In SelectionDAGBuilder always translate the fshl and fshr intrinsics to
FSHL and FSHR (or ROTL and ROTR) instead of lowering them to shifts and
ORs. Improve the legalization of FSHL and FSHR to avoid code quality
regressions.

Diff Detail

Unit TestsFailed

TimeTest
70 mslinux > Polly.Isl/Ast::alias_checks_with_empty_context.ll
Script: -- : 'RUN: at line 1'; opt -polly-process-unprofitable -polly-remarks-minimal -polly-use-llvm-names -polly-import-jscop-dir=/mnt/disks/ssd0/agent/llvm-project/polly/test/Isl/Ast -polly-codegen-verify -polly-ast -analyze < /mnt/disks/ssd0/agent/llvm-project/polly/test/Isl/Ast/alias_checks_with_empty_context.ll | FileCheck /mnt/disks/ssd0/agent/llvm-project/polly/test/Isl/Ast/alias_checks_with_empty_context.ll

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

This is a work in progress but I'd appreciate any feedback.

  • Does this seem like the right thing to do in general?

I think SelectionDAGBuilder expanding this is nonsense, so yes. The expansion is also duplicated in two places, which is also bad.

  • How should I implement integer expansion for funnel shifts?

This is already done?

  • Any advice on fixing any of the code quality regressions in the tests?
foad added a comment.Mar 31 2020, 12:15 PM
  • How should I implement integer expansion for funnel shifts?

This is already done?

I mean they're not handled in DAGTypeLegalizer::ExpandIntegerResult. This is what causes some X86 tests to crash.

foad marked 2 inline comments as done.Apr 1 2020, 1:26 AM
foad added inline comments.
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
1122–1123

You mean generate AND explicitly instead of UREM? I don't think there's any need to do that. Something else in SelectionDAG cleans it up.

foad updated this revision to Diff 254244.Apr 1 2020, 10:21 AM
foad marked an inline comment as done.

Translate fshl/fshr intrinsics to ROTL/ROTR up front whenever possible.
Implement integer promotion for ROTL/ROTR in addition to FSHL/FSHR.
Improve expandROT.
This removes most of the major code quality regressions.
Some tests still crash because I haven't implemented integer expansion.

@foad Any progress on fixing the crashes?

foad added a comment.Apr 15 2020, 10:07 AM

@foad Any progress on fixing the crashes?

Short answer: no not yet.

I haven't thought of any clever way to implement ExpandIntegerResult for funnel shifts, so I just want to lower them and run ExpandIntegerResult on the lowered code instead. But I'm not sure how to implement that.

@foad Any progress on fixing the crashes?

Short answer: no not yet.

I haven't thought of any clever way to implement ExpandIntegerResult for funnel shifts, so I just want to lower them and run ExpandIntegerResult on the lowered code instead. But I'm not sure how to implement that.

I believe this should be possible by calling ReplaceValueWith on the expandFunnelShift result (and returning an empty SDValue).

@foad D77316 might not be needed for funnel shift lowering now that D80489 has landed

foad updated this revision to Diff 266175.May 26 2020, 5:34 AM

Rebase.

foad marked 3 inline comments as done.May 26 2020, 5:39 AM

@foad D77316 might not be needed for funnel shift lowering now that D80489 has landed

True, but we get a slight PowerPC regression instead.

llvm/test/CodeGen/PowerPC/funnel-shift.ll
24–26

This is a slight regression. We need one more instruction overall because of the two clrlwi instructions to mask both normal and the inverted shift amounts.

141–143

Ditto.

178–179

Ditto.

RKSimon added a subscriber: hfinkel.

IIRC powerpc scalar shift ops can use shift amounts upto the bitwidth (not bitwidth-1) so there might be custom funnel shifts that they can do with that - @hfinkel @nemanjai any thoughts?

foad updated this revision to Diff 266888.May 28 2020, 8:35 AM

Implement ExpandIntegerResult for funnel shifts and rotates.
Simpler PromoteIntegerResult for rotates.

IIRC powerpc scalar shift ops can use shift amounts upto the bitwidth (not bitwidth-1) so there might be custom funnel shifts that they can do with that - @hfinkel @nemanjai any thoughts?

The reg+reg shifts consider enough bits of the register that contains the shift amount to shift by bitwidth * 2 - 1 bits. For example (i64:x << y) == 0 for 64 <= y < 128. The immediate forms do not have this property (i.e. there is no way to encode (i64:x << 64) since the immediate field is 6 bits.
But even for the reg+reg versions, I can't think of how we can utilize this.

foad updated this revision to Diff 267228.May 29 2020, 7:11 AM

Improve expansion of FSHL/FSHR by non-zero constant amount

foad added a comment.May 29 2020, 7:43 AM

IIRC powerpc scalar shift ops can use shift amounts upto the bitwidth (not bitwidth-1) so there might be custom funnel shifts that they can do with that - @hfinkel @nemanjai any thoughts?

The reg+reg shifts consider enough bits of the register that contains the shift amount to shift by bitwidth * 2 - 1 bits. For example (i64:x << y) == 0 for 64 <= y < 128. The immediate forms do not have this property (i.e. there is no way to encode (i64:x << 64) since the immediate field is 6 bits.
But even for the reg+reg versions, I can't think of how we can utilize this.

That's perfect. It means that for e.q. the fshl_i32 test case, the original code is fine and we don't even need the iseleq to handle the shift-by-zero case:

; CHECK-NEXT:    andi. 5, 5, 31
; CHECK-NEXT:    subfic 6, 5, 32
; CHECK-NEXT:    slw 5, 3, 5
; CHECK-NEXT:    srw 4, 4, 6
; CHECK-NEXT:    or 4, 5, 4
; CHECK-NEXT:    iseleq 3, 3, 4 # not needed!

Unfortunately I think we'd need a PowerPC-specific lowering to implement this. (It doesn't matter about the immediate forms because funnel shifts by immediate-0 can be folded away.)

foad marked an inline comment as done.May 29 2020, 7:47 AM
foad added inline comments.
llvm/test/CodeGen/X86/vector-fshl-rot-128.ll
1666–1672

Can anyone suggest what might have caused this regression? We get two presumably redundant movsd instructions:

	psrlq	$50, %xmm1
	movsd	%xmm1, %xmm1            # xmm1 = xmm1[0,1]
	psllq	$14, %xmm0
	movsd	%xmm0, %xmm0            # xmm0 = xmm0[0,1]
	orpd	%xmm1, %xmm0
foad added a comment.May 29 2020, 7:51 AM

This is functionally complete now and could be committed if we're willing to accept the remaining regressions.

foad added a comment.Jun 15 2020, 5:36 AM

ping?

I agree, ping!

spatel added inline comments.Jun 16 2020, 5:25 AM
llvm/test/CodeGen/X86/vector-fshl-rot-128.ll
1666–1672

I looked at the debug output for this case.
It only shows up for x86-32 because we legalize the shift amount constant there with:

t4: v2i64 = BUILD_VECTOR Constant:i64<14>, Constant:i64<14>

-->

  t15: v4i32 = BUILD_VECTOR Constant:i32<14>, undef:i32, Constant:i32<14>, undef:i32
t13: v2i64 = bitcast t15

And that then leads to a big mess when legalizing the 'rotl' node.

We convert an x86 vector-shift-by-variable node into an x86-vector-shift-by-constant node very late which alters an x86 MOVSD node to have the same operands.

But we've already processed that node in the last combining stage, so it survives to isel. We could probably zap that in DAGToDAG pre-processing.

This seems like an unlikely pattern/problem, and there are no existing tests that show it, so I don't think we need to hold this patch up for this regression.

RKSimon added inline comments.Jun 17 2020, 7:01 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
6132

I'd recommend C->getAPIntValue().urem(BW) instead - the fuzzer will catch this at some point........

6137

C->getAPIntValue().urem(BW)

6147

You might be able to use ISD::matchUnaryPredicate to do some of this - so you just need to provide a predicate lambda

RKSimon added inline comments.Jun 18 2020, 1:37 AM
llvm/test/CodeGen/AMDGPU/fshl.ll
28

IIRC amdgpu supports fshr but not fshl - could we do more in the expansion to handle this - invert the shift amount and use the legal opcode?

arsenm added inline comments.Jun 18 2020, 4:51 AM
llvm/test/CodeGen/AMDGPU/fshl.ll
28

I think this is what was here before; the alignbit is gone

I was also wondering if there was a tricky way to use the 32 bit version in the 64 bit implementation

foad marked 4 inline comments as done.Jun 18 2020, 7:40 AM
foad added inline comments.
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
6147

Nice tip, thanks!

llvm/test/CodeGen/AMDGPU/fshl.ll
28

If we had a test like fshl_i32 but with the inputs in vgprs then I guess it would show a regression here.

foad updated this revision to Diff 271716.Jun 18 2020, 7:42 AM
foad marked an inline comment as done.

Rebase. Improve isNonZeroModBitWidth.

RKSimon added inline comments.Jun 25 2020, 1:24 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6274

@foad We've lost this functionality - this needs adding to the new expansion code or to DAGCombiner

foad marked an inline comment as done.Jun 25 2020, 4:13 AM
foad added inline comments.
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6274

TargetLowering::expandROT does this: "// If a rotate in the other direction is supported, use it." Are there tests showing that it's not working for some reason?

RKSimon added inline comments.Jun 25 2020, 4:59 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6274

Yes the AMDGPU/fshl.ll tests look like they've regressed because of this

reverse ping - any luck on the amdgpu regressions?

foad updated this revision to Diff 277695.Tue, Jul 14, 12:58 AM
foad marked an inline comment as done.

Expand FSHL as FSHR or vice versa if the target only supports one.

foad added a comment.Tue, Jul 14, 1:00 AM

reverse ping - any luck on the amdgpu regressions?

Yes should be fixed now.

Looks good to me.

RKSimon added inline comments.Wed, Jul 15, 1:31 AM
llvm/test/CodeGen/PowerPC/funnel-shift.ll
24–26

Any luck with this?

foad marked an inline comment as done.Wed, Jul 15, 2:16 AM
foad added inline comments.
llvm/test/CodeGen/PowerPC/funnel-shift.ll
24–26

No. I think it would need a target-specific lowering to take advantage of the fact that shift by 32 is well defined on PowerPC. Can I leave this for a later patch, and/or could a PowerPC expert volunteer to help with it?

RKSimon added inline comments.Thu, Jul 16, 1:10 AM
llvm/test/CodeGen/PowerPC/funnel-shift.ll
24–26

So we're not just missing a micro optimization we can add to the powerpc backend as part of this patch? @hfinkel Any suggestions?

foad marked an inline comment as done.Thu, Jul 16, 1:59 AM
foad added inline comments.
llvm/test/CodeGen/PowerPC/funnel-shift.ll
24–26

I can't think of a simple micro optimization that would help. For PowerPC it would be better to expand fshl(X, Y, Z) as X << (Z & 31) | Y >> (32 - (Z & 31)) (where the right shift supports shifting by 32) which would be better than either the "before" or the "after" code shown in this diff.

RKSimon added inline comments.Thu, Jul 16, 3:05 AM
llvm/test/CodeGen/PowerPC/funnel-shift.ll
24–26

Are you able to add the PPC custom lowering to this patch?

foad marked an inline comment as done.Thu, Jul 16, 6:40 AM
foad added inline comments.
llvm/test/CodeGen/PowerPC/funnel-shift.ll
24–26

I'd prefer to do it as a separate patch first: D83948.

foad updated this revision to Diff 283168.Wed, Aug 5, 2:17 AM

Rebase now D83948 has landed.

This fixes the PowerPC regression, but there's a new regression in some newly added RISCV test cases.

foad updated this revision to Diff 283190.Wed, Aug 5, 3:58 AM

RISCV updates to avoid code quality regressions from changes in the way funnel shifts are lowered.

foad added a subscriber: lewis-revill.

RISCV updates to avoid code quality regressions from changes in the way funnel shifts are lowered.

@asb @lewis-revill could you take a look at my RISCV changes please? Code quality seems to be as good or better in all lit test cases, I just had to change some RISCV patterns to match the new lowering.

@asb Any thoughts on the RISCV changes? This is the last blocker on the patch.