Page MenuHomePhabricator

[TargetLowering] Improve expansion of FSHL/FSHR
ClosedPublic

Authored by foad on Apr 2 2020, 5:46 AM.

Details

Summary

Use an extra shift-by-1 instead of a compare and select to handle the
shift-by-zero case. This sometimes saves one instruction (if the compare
couldn't be combined with a previous instruction). It also works better
on targets that don't have good select instructions.

Note that currently this change doesn't affect most targets because
expandFunnelShift is not used because funnel shift intrinsics are
lowered early in SelectionDAGBuilder. But there is work afoot to change
that; see D77152.

Diff Detail

Event Timeline

foad created this revision.Apr 2 2020, 5:46 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 2 2020, 5:46 AM
foad marked an inline comment as done.Apr 2 2020, 5:49 AM
foad added inline comments.
llvm/test/CodeGen/X86/fshl.ll
123–124

The andb is redundant here and in a bunch of other i32/i64 test cases. I would have thought the simplifyDemandedBits machinery could work that out. Any ideas why this isn't already working?

foad marked an inline comment as done.Apr 2 2020, 8:39 AM
foad added inline comments.
llvm/test/CodeGen/X86/fshl.ll
123–124

Answering my own question, X86InstrCompiler.td has patterns that match (shift x (and y, 31)) ==> (shift x, y) but it has no chance of matching when there is an intervening sub or xor as in (shift x (xor (and y, 31), 31)).

RKSimon added inline comments.Apr 2 2020, 10:24 AM
llvm/test/CodeGen/X86/fshl.ll
123–124

It might be possible to extend X86DAGToDAGISel.isUnneededShiftMask to handle this - @craig.topper might be able to advise.

RKSimon added inline comments.Apr 3 2020, 2:12 PM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
6050

Incidently, DAGCombiner::MatchRotate doesn't currently recognise this pattern to CREATE funnel shifts / rotates - X86 handles it in combineOrShiftToFunnelShift, which is the last part that needs to be moved to DAGCombiner before we can close PR40081.

RKSimon added inline comments.Apr 15 2020, 9:52 AM
llvm/test/CodeGen/X86/fshl.ll
123–124

@craig.topper What do you reckon? Could we handle the xor(and(x,bw-1),bw-1) pattern directly or even try to call SimplifyDemandedBits/SimplifyMultipleUseDemandedBits ?

craig.topper added inline comments.Apr 15 2020, 10:52 AM
llvm/test/CodeGen/X86/fshl.ll
123–124

Node creation inside of Select requires maintaining topological sorting so calling SimplifyDemandedBits/SimplifyMultipleUseDemandedBits is problematic.

Why isn't this combine from DAGCombiner.cpp visitXOR kicking in "fold (xor (and x, y), y) -> (and (not x), y) "? Is it because the (and x, y) has another use by the other shift until the isel pattern skips over it?

RKSimon added inline comments.Apr 15 2020, 11:45 AM
llvm/test/CodeGen/X86/fshl.ll
123–124

Almost certainly - I'm happy for the xor+and fix to be handled later if you are - the slow-case codegen looks better even with this.

RKSimon added inline comments.Apr 17 2020, 7:11 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
6071–6082

Shouldn't this be using ShVT?

ping? I'm waiting on this to copy the same for the GlobalISel expansion

RKSimon added inline comments.Wed, May 13, 9:59 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
6071–6082

SDValue One = DAG.getConstant(1, DL, ShVT);

foad updated this revision to Diff 263983.Thu, May 14, 6:12 AM

Rebase. Fix VT -> ShVT.

RKSimon accepted this revision.Thu, May 14, 7:43 AM

LGTM - cheers

This revision is now accepted and ready to land.Thu, May 14, 7:43 AM
This revision was automatically updated to reflect the committed changes.