This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
1/6
InstCombineAndOrXor.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
funnel.ll
-
rotate.ll

Differential D88783

[InstCombine] matchFunnelShift - fold or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a,b,x) iff x < bw
ClosedPublic

Authored by RKSimon on Oct 3 2020, 9:24 AM.

Download Raw Diff

Details

Reviewers

spatel
lebedev.ri
nikic

Commits

rGbbf3925879b5: [InstCombine] matchFunnelShift - fold or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a…
rGb97093e52003: [InstCombine] matchFunnelShift - fold or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a…

Summary

If value tracking can confirm that a shift value is less than the type bitwidth then we can more confidently fold general or(shl(a,x),lshr(b,sub(bw,x))) patterns to a funnel/rotate intrinsic pattern

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

RKSimon created this revision.Oct 3 2020, 9:24 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 3 2020, 9:24 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

RKSimon requested review of this revision.Oct 3 2020, 9:24 AM

Harbormaster completed remote builds in B73882: Diff 295984.Oct 3 2020, 9:37 AM

rebase - pass InstCombinerImpl into matchRotate

Harbormaster completed remote builds in B74254: Diff 296641.Oct 7 2020, 4:55 AM

Add generic funnel support now that D88834 has landed

RKSimon mentioned this in D88834: [InstCombine] matchRotate - add support for matching general funnel shifts with constant shift amounts (PR46896).Oct 8 2020, 3:21 AM

Harbormaster completed remote builds in B74413: Diff 296911.Oct 8 2020, 3:51 AM

lebedev.ri requested changes to this revision.Oct 8 2020, 4:00 AM

lebedev.ri added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp

2097

// (shl ShVal, L) | (lshr ShVal, (Width - L)) iff L < Width

2099–2100

Either this needs a motivational comment, or this can be dropped.

----------------------------------------
define i64 @src(i64 %x, i64 %y, i64 %a) {
%0:
  %mask = and i64 %a, -1
  %shl = shl i64 %x, %mask
  %sub = sub nsw nuw i64 64, %mask
  %shr = lshr i64 %y, %sub
  %r = or i64 %shl, %shr
  ret i64 %r
}
=>
define i64 @tgt(i64 %x, i64 %y, i64 %a) {
%0:
  %r = fshl i64 %x, i64 %y, i64 %a
  ret i64 %r
}
Transformation seems to be correct!

This revision now requires changes to proceed.Oct 8 2020, 4:00 AM

spatel added inline comments.Oct 8 2020, 5:40 AM

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp

2099–2100

We can correctly convert to the intrinsic because the intrinsic makes guarantees about the shift amount (modulo bitwidth) that may not be present in the original IR.

But without proper masking/select for UB shift amounts, the expanded code will be worse in codegen. So this is an irreversible transform, and we can't do it universally.

So even with the computeKnownBits restriction in this patch, this transform may not be allowed. For example, can we fix this in codegen:
https://alive2.llvm.org/ce/z/IsuRoN

$ cat fsh.ll 
define i32 @fake_fshl(i32 %x, i32 %y, i32 %a) {
  %mask = and i32 %a, 31
  %shl = shl i32 %x, %mask
  %sub = sub nuw nsw i32 32, %mask
  %shr = lshr i32 %y, %sub
  %r = or i32 %shl, %shr
  ret i32 %r
}

define i32 @fshl(i32 %x, i32 %y, i32 %a) {
  %r = call i32 @llvm.fshl.i32(i32 %x, i32 %y, i32 %a)
  ret i32 %r
}

declare i32 @llvm.fshl.i32(i32, i32, i32)

$ llc -o - fsh.ll -mtriple=riscv64
fake_fshl:                              # @fake_fshl
	sllw	a0, a0, a2
	addi	a3, zero, 32
	sub	a2, a3, a2
	srlw	a1, a1, a2
	or	a0, a0, a1
	ret
fshl:                                   # @fshl
	andi	a2, a2, 31
	sll	a0, a0, a2
	not	a2, a2
	slli	a1, a1, 32
	srli	a1, a1, 1
	srl	a1, a1, a2
	or	a0, a0, a1
	ret

lebedev.ri added inline comments.Oct 8 2020, 6:02 AM

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2099–2100	We can correctly convert to the intrinsic because the intrinsic makes guarantees about the shift amount (modulo bitwidth) that may not be present in the original IR. Yep, the intrinsic has an implicit masking, as we can see by said masking being dropped after the transform. But without proper masking/select for UB shift amounts, the expanded code will be worse in codegen. So this is an irreversible transform, and we can't do it universally. Yep, that is why i'm asking for more blurb. The problematic case in mind is, what we were able to proove that the value is limited because we had range info on a load?

RKSimon added inline comments.Oct 8 2020, 6:39 AM

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp

2099–2100

At least part of the riscv64 issue is due to it deciding to promote the fshl to i64 but not the or+shifts pattern (iirc the sllw/srlw variants are i32 shifts with mod32 that sext the results to i64).

SelectionDAG has 15 nodes:
  t0: ch = EntryToken
      t2: i64,ch = CopyFromReg t0, Register:i64 %0
        t4: i64,ch = CopyFromReg t0, Register:i64 %1
      t16: i64 = shl t4, Constant:i64<32>
        t6: i64,ch = CopyFromReg t0, Register:i64 %2
      t21: i64 = and t6, Constant:i64<31>
    t18: i64 = fshl t2, t16, t21
  t13: ch,glue = CopyToReg t0, Register:i64 $x10, t18
  t14: ch = RISCVISD::RET_FLAG t13, Register:i64 $x10, t13:1

RKSimon added inline comments.Oct 8 2020, 7:00 AM

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2099–2100	We might be able to improve DAGTypeLegalizer::PromoteIntRes_FunnelShift in the case where the promoted type is >= 2* original type to just a shift(or(shl(x,bw),y),urem(z,bw)) pattern unless the promoted fshl/fshr is legal.

Just so I don't unnecessarily waste time on this - are your main/blocking concerns that backend legalization/lowering is still poor or is it more fundamental than that?

In D88783#2321414, @RKSimon wrote:

Just so I don't unnecessarily waste time on this - are your main/blocking concerns that backend legalization/lowering is still poor or is it more fundamental than that?

I'm mainly interested in a better spelled-out motivation for the knownbits restriction.

In D88783#2321425, @lebedev.ri wrote:

In D88783#2321414, @RKSimon wrote:

Just so I don't unnecessarily waste time on this - are your main/blocking concerns that backend legalization/lowering is still poor or is it more fundamental than that?

I'm mainly interested in a better spelled-out motivation for the knownbits restriction.

Agree - the default backend expansion seems fine for most targets. If the riscv case is an outlier that can be fixed, then it's probably ok to proceed here (but we should file a bug to raise awareness).

RKSimon mentioned this in D89139: [DAG][ARM][MIPS][RISCV] Improve funnel shift promotion to use 'double shift' patterns.Oct 9 2020, 9:05 AM

Improve comment describing the limits of the fold.

In D88783#2322046, @RKSimon wrote:

Improve comment describing the limits of the fold.

So it is indeed trying to prevent degradation of code quality in case of expansion
(maybe funnel shift intrinsics should simply track whether or not they were originally ub-safe?)

And that highlights my point - we might know that the shift amount is safe not from an explicit mask,
but e.g. from range metadata on load,
which means expansion will still introduce a masking op that wasn't there originally.

How bad performance-wise would that be?

In D88783#2322063, @lebedev.ri wrote:

And that highlights my point - we might know that the shift amount is safe not from an explicit mask, but e.g. from range metadata on load, which means expansion will still introduce a masking op that wasn't there originally.

So ValueTracking/SelectionDAG::ComputeKnownBits won't be able to check range data for us?

Aha! I was confused for a moment whether that was info was one of those that does survive into the back-end, and it does:
https://godbolt.org/z/6MWecW

SelectionDAG has 25 nodes:
  t0: ch = EntryToken
      t2: i32,ch = CopyFromReg t0, Register:i32 %0
      t4: i32,ch = CopyFromReg t0, Register:i32 %1
    t12: i64 = build_pair t2, t4
      t6: i32,ch = CopyFromReg t0, Register:i32 %2
      t8: i32,ch = CopyFromReg t0, Register:i32 %3
    t13: i64 = build_pair t6, t8
      t11: i32,ch = load<(load 4 from %fixed-stack.0)> t0, FrameIndex:i32<-1>, undef:i32
    t15: i64,ch = load<(load 8 from %ir.aptr, !range !0)> t0, t11, undef:i32
  t16: i64 = fshl t12, t13, t15
    t19: i32 = extract_element t16, Constant:i32<0>
  t21: ch,glue = CopyToReg t0, Register:i32 $r0, t19
    t18: i32 = extract_element t16, Constant:i32<1>
  t23: ch,glue = CopyToReg t21, Register:i32 $r1, t18, t21:1
  t24: ch = ARMISD::RET_FLAG t23, Register:i32 $r0, Register:i32 $r1, t23:1

So this seems fine to me.

This revision is now accepted and ready to land.Oct 9 2020, 9:57 AM

Harbormaster completed remote builds in B74601: Diff 297257.Oct 9 2020, 10:02 AM

RKSimon mentioned this in rG191fbda5d2a5: [ARM][MIPS] Add funnel shift test coverage.Oct 9 2020, 11:26 AM

Closed by commit rGb97093e52003: [InstCombine] matchFunnelShift - fold or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a… (authored by RKSimon). · Explain WhyOct 11 2020, 2:37 AM

This revision was automatically updated to reflect the committed changes.

RKSimon added a commit: rGb97093e52003: [InstCombine] matchFunnelShift - fold or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a….

Alive2 complains about one of the test cases:

define i64 @fshr_sub_mask(i64 %x, i64 %y, i64 %a) {
  %mask = and i64 %a, 63
  %shr = lshr i64 %x, %mask
  %sub = sub nsw nuw i64 64, %mask
  %shl = shl i64 %y, %sub
  %r = or i64 %shl, %shr
  ret i64 %r
}
=>
define i64 @fshr_sub_mask(i64 %x, i64 %y, i64 %a) {
  %r = fshr i64 %x, i64 %y, i64 %a
  ret i64 %r
}
Transformation doesn't verify!
ERROR: Value mismatch

Example:
i64 %x = #x0000100000000000 (17592186044416)
i64 %y = #x0000000000000000 (0)
i64 %a = #x0000000000000027 (39)

Source:
i64 %mask = #x0000000000000027 (39)
i64 %shr = #x0000000000000020 (32)
i64 %sub = #x0000000000000019 (25)
i64 %shl = #x0000000000000000 (0)
i64 %r = #x0000000000000020 (32)

Target:
i64 %r = #x0000000000000000 (0)
Source value: #x0000000000000020 (32)
Target value: #x0000000000000000 (0)

https://web.ist.utl.pt/nuno.lopes/alive2/index.php?hash=65e89ee9edd0ccba&test=Transforms%2FInstCombine%2Ffunnel.ll

This broke armv7 (tested with both armv7-w64-mingw32 and armv7-linux-gnueabihf) builds of compiler-rt's __udivmoddi4 function; after this commit, it miscalculates __udivmoddi4(883547321287490176, 128).

In D88783#2324088, @nlopes wrote:

Alive2 complains about one of the test cases:

Thanks @nlopes it looks like the args aren't getting swapped for some reason - I'll investigate

RKSimon added a reverting change: rG45d785e22b2c: Revert rGb97093e520036f8 - "[InstCombine] matchFunnelShift - fold or(shl(a,x)….Oct 12 2020, 3:40 AM

RKSimon mentioned this in rGc252200e4de4: [DAG][ARM][MIPS][RISCV] Improve funnel shift promotion to use 'double shift'….Oct 12 2020, 6:20 AM

RKSimon added a commit: rGbbf3925879b5: [InstCombine] matchFunnelShift - fold or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a….Oct 12 2020, 8:08 AM

@nlopes @mstorsjo Thanks for the regression reports - I've recommitted the patch now, so it should now pass.

Thanks! Now it does indeed seem to work, at least so far.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineAndOrXor.cpp

14 lines

test/

Transforms/

InstCombine/

funnel.ll

18 lines

rotate.ll

24 lines

Diff 297453

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp

Show First 20 Lines • Show All 2,047 Lines • ▼ Show 20 Lines	Instruction *InstCombinerImpl::matchBSwap(BinaryOperator &Or) {
LastInst->removeFromParent();		LastInst->removeFromParent();

for (auto *Inst : Insts)		for (auto *Inst : Insts)
Worklist.push(Inst);		Worklist.push(Inst);
return LastInst;		return LastInst;
}		}

/// Match UB-safe variants of the funnel shift intrinsic.		/// Match UB-safe variants of the funnel shift intrinsic.
static Instruction *matchFunnelShift(Instruction &Or) {		static Instruction *matchFunnelShift(Instruction &Or, InstCombinerImpl &IC) {
// TODO: Can we reduce the code duplication between this and the related		// TODO: Can we reduce the code duplication between this and the related
// rotate matching code under visitSelect and visitTrunc?		// rotate matching code under visitSelect and visitTrunc?
unsigned Width = Or.getType()->getScalarSizeInBits();		unsigned Width = Or.getType()->getScalarSizeInBits();

// First, find an or'd pair of opposite shifts:		// First, find an or'd pair of opposite shifts:
// or (lshr ShVal0, ShAmt0), (shl ShVal1, ShAmt1)		// or (lshr ShVal0, ShAmt0), (shl ShVal1, ShAmt1)
BinaryOperator Or0, Or1;		BinaryOperator Or0, Or1;
if (!match(Or.getOperand(0), m_BinOp(Or0)) \|\|		if (!match(Or.getOperand(0), m_BinOp(Or0)) \|\|
Show All 24 Lines	auto matchShiftAmount = [&](Value L, Value R, unsigned Width) -> Value * {
if (match(L, m_Constant(LC)) && !LC->containsUndefElement() &&		if (match(L, m_Constant(LC)) && !LC->containsUndefElement() &&
match(R, m_Constant(RC)) && !RC->containsUndefElement() &&		match(R, m_Constant(RC)) && !RC->containsUndefElement() &&
match(L, m_SpecificInt_ICMP(ICmpInst::ICMP_ULT, APInt(Width, Width))) &&		match(L, m_SpecificInt_ICMP(ICmpInst::ICMP_ULT, APInt(Width, Width))) &&
match(R, m_SpecificInt_ICMP(ICmpInst::ICMP_ULT, APInt(Width, Width)))) {		match(R, m_SpecificInt_ICMP(ICmpInst::ICMP_ULT, APInt(Width, Width)))) {
if (match(ConstantExpr::getAdd(LC, RC), m_SpecificInt(Width)))		if (match(ConstantExpr::getAdd(LC, RC), m_SpecificInt(Width)))
return L;		return L;
}		}

		// (shl ShVal, X) \| (lshr ShVal, (Width - x)) iff X < Width.
		lebedev.riUnsubmitted Not Done Reply Inline Actions `// (shl ShVal, L) \| (lshr ShVal, (Width - L)) iff L < Width` lebedev.ri: `// (shl ShVal, L) \| (lshr ShVal, (Width - L)) iff L < Width`
		// We limit this to X < Width in case the backend re-expands the intrinsic,
		// and has to reintroduce a shift modulo operation (InstCombine might remove
		// it after this fold). This still doesn't guarantee that the final codegen
		lebedev.riUnsubmitted Not Done Reply Inline Actions Either this needs a motivational comment, or this can be dropped. ---------------------------------------- define i64 @src(i64 %x, i64 %y, i64 %a) { %0: %mask = and i64 %a, -1 %shl = shl i64 %x, %mask %sub = sub nsw nuw i64 64, %mask %shr = lshr i64 %y, %sub %r = or i64 %shl, %shr ret i64 %r } => define i64 @tgt(i64 %x, i64 %y, i64 %a) { %0: %r = fshl i64 %x, i64 %y, i64 %a ret i64 %r } Transformation seems to be correct! lebedev.ri: Either this needs a motivational comment, or this can be dropped. ```…
		spatelUnsubmitted Not Done Reply Inline Actions We can correctly convert to the intrinsic because the intrinsic makes guarantees about the shift amount (modulo bitwidth) that may not be present in the original IR. But without proper masking/select for UB shift amounts, the expanded code will be worse in codegen. So this is an irreversible transform, and we can't do it universally. So even with the computeKnownBits restriction in this patch, this transform may not be allowed. For example, can we fix this in codegen: https://alive2.llvm.org/ce/z/IsuRoN $ cat fsh.ll define i32 @fake_fshl(i32 %x, i32 %y, i32 %a) { %mask = and i32 %a, 31 %shl = shl i32 %x, %mask %sub = sub nuw nsw i32 32, %mask %shr = lshr i32 %y, %sub %r = or i32 %shl, %shr ret i32 %r } define i32 @fshl(i32 %x, i32 %y, i32 %a) { %r = call i32 @llvm.fshl.i32(i32 %x, i32 %y, i32 %a) ret i32 %r } declare i32 @llvm.fshl.i32(i32, i32, i32) $ llc -o - fsh.ll -mtriple=riscv64 fake_fshl: # @fake_fshl sllw a0, a0, a2 addi a3, zero, 32 sub a2, a3, a2 srlw a1, a1, a2 or a0, a0, a1 ret fshl: # @fshl andi a2, a2, 31 sll a0, a0, a2 not a2, a2 slli a1, a1, 32 srli a1, a1, 1 srl a1, a1, a2 or a0, a0, a1 ret spatel: We can correctly convert to the intrinsic because the intrinsic makes guarantees about the…
		lebedev.riUnsubmitted Not Done Reply Inline Actions We can correctly convert to the intrinsic because the intrinsic makes guarantees about the shift amount (modulo bitwidth) that may not be present in the original IR. Yep, the intrinsic has an implicit masking, as we can see by said masking being dropped after the transform. But without proper masking/select for UB shift amounts, the expanded code will be worse in codegen. So this is an irreversible transform, and we can't do it universally. Yep, that is why i'm asking for more blurb. The problematic case in mind is, what we were able to proove that the value is limited because we had range info on a load? lebedev.ri: > We can correctly convert to the intrinsic because the intrinsic makes guarantees about the…
		RKSimonAuthorUnsubmitted Not Done Reply Inline Actions At least part of the riscv64 issue is due to it deciding to promote the fshl to i64 but not the or+shifts pattern (iirc the sllw/srlw variants are i32 shifts with mod32 that sext the results to i64). SelectionDAG has 15 nodes: t0: ch = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t4: i64,ch = CopyFromReg t0, Register:i64 %1 t16: i64 = shl t4, Constant:i64<32> t6: i64,ch = CopyFromReg t0, Register:i64 %2 t21: i64 = and t6, Constant:i64<31> t18: i64 = fshl t2, t16, t21 t13: ch,glue = CopyToReg t0, Register:i64 $x10, t18 t14: ch = RISCVISD::RET_FLAG t13, Register:i64 $x10, t13:1 RKSimon: At least part of the riscv64 issue is due to it deciding to promote the fshl to i64 but not the…
		RKSimonAuthorUnsubmitted Done Reply Inline Actions We might be able to improve DAGTypeLegalizer::PromoteIntRes_FunnelShift in the case where the promoted type is >= 2* original type to just a shift(or(shl(x,bw),y),urem(z,bw)) pattern unless the promoted fshl/fshr is legal. RKSimon: We might be able to improve DAGTypeLegalizer::PromoteIntRes_FunnelShift in the case where the…
		// will match this original pattern.
		if (match(R, m_OneUse(m_Sub(m_SpecificInt(Width), m_Specific(L))))) {
		KnownBits KnownL = IC.computeKnownBits(L, /Depth/ 0, &Or);
		return KnownL.getMaxValue().ult(Width) ? L : nullptr;
		}

// For non-constant cases, the following patterns currently only work for		// For non-constant cases, the following patterns currently only work for
// rotation patterns.		// rotation patterns.
// TODO: Add general funnel-shift compatible patterns.		// TODO: Add general funnel-shift compatible patterns.
if (ShVal0 != ShVal1)		if (ShVal0 != ShVal1)
return nullptr;		return nullptr;

// For non-constant cases we don't support non-pow2 shift masks.		// For non-constant cases we don't support non-pow2 shift masks.
// TODO: Is it worth matching urem as well?		// TODO: Is it worth matching urem as well?
▲ Show 20 Lines • Show All 480 Lines • ▼ Show 20 Lines	if (Value *V = SimplifyBSwap(I, Builder))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);

if (Instruction *FoldedLogic = foldBinOpIntoSelectOrPhi(I))		if (Instruction *FoldedLogic = foldBinOpIntoSelectOrPhi(I))
return FoldedLogic;		return FoldedLogic;

if (Instruction *BSwap = matchBSwap(I))		if (Instruction *BSwap = matchBSwap(I))
return BSwap;		return BSwap;

if (Instruction *Funnel = matchFunnelShift(I))		if (Instruction Funnel = matchFunnelShift(I, this))
return Funnel;		return Funnel;

if (Instruction *Concat = matchOrConcat(I, Builder))		if (Instruction *Concat = matchOrConcat(I, Builder))
return replaceInstUsesWith(I, Concat);		return replaceInstUsesWith(I, Concat);

Value X, Y;		Value X, Y;
const APInt *CV;		const APInt *CV;
if (match(&I, m_c_Or(m_OneUse(m_Xor(m_Value(X), m_APInt(CV))), m_Value(Y))) &&		if (match(&I, m_c_Or(m_OneUse(m_Xor(m_Value(X), m_APInt(CV))), m_Value(Y))) &&
▲ Show 20 Lines • Show All 881 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/funnel.ll

Show First 20 Lines • Show All 162 Lines • ▼ Show 20 Lines	;
%r = or <3 x i36> %shl, %shr		%r = or <3 x i36> %shl, %shr
ret <3 x i36> %r		ret <3 x i36> %r
}		}

; Fold or(shl(x,a),lshr(y,bw-a)) -> fshl(x,y,a) iff a < bw		; Fold or(shl(x,a),lshr(y,bw-a)) -> fshl(x,y,a) iff a < bw

define i64 @fshl_sub_mask(i64 %x, i64 %y, i64 %a) {		define i64 @fshl_sub_mask(i64 %x, i64 %y, i64 %a) {
; CHECK-LABEL: @fshl_sub_mask(		; CHECK-LABEL: @fshl_sub_mask(
; CHECK-NEXT: [[MASK:%.]] = and i64 [[A:%.]], 63		; CHECK-NEXT: [[R:%.]] = call i64 @llvm.fshl.i64(i64 [[X:%.]], i64 [[Y:%.]], i64 [[A:%.]])
; CHECK-NEXT: [[SHL:%.]] = shl i64 [[X:%.]], [[MASK]]
; CHECK-NEXT: [[SUB:%.*]] = sub nuw nsw i64 64, [[MASK]]
; CHECK-NEXT: [[SHR:%.]] = lshr i64 [[Y:%.]], [[SUB]]
; CHECK-NEXT: [[R:%.*]] = or i64 [[SHL]], [[SHR]]
; CHECK-NEXT: ret i64 [[R]]		; CHECK-NEXT: ret i64 [[R]]
;		;
%mask = and i64 %a, 63		%mask = and i64 %a, 63
%shl = shl i64 %x, %mask		%shl = shl i64 %x, %mask
%sub = sub nuw nsw i64 64, %mask		%sub = sub nuw nsw i64 64, %mask
%shr = lshr i64 %y, %sub		%shr = lshr i64 %y, %sub
%r = or i64 %shl, %shr		%r = or i64 %shl, %shr
ret i64 %r		ret i64 %r
}		}

; Fold or(lshr(v,a),shl(v,bw-a)) -> fshr(y,x,a) iff a < bw		; Fold or(lshr(v,a),shl(v,bw-a)) -> fshr(y,x,a) iff a < bw

define i64 @fshr_sub_mask(i64 %x, i64 %y, i64 %a) {		define i64 @fshr_sub_mask(i64 %x, i64 %y, i64 %a) {
; CHECK-LABEL: @fshr_sub_mask(		; CHECK-LABEL: @fshr_sub_mask(
; CHECK-NEXT: [[MASK:%.]] = and i64 [[A:%.]], 63		; CHECK-NEXT: [[R:%.]] = call i64 @llvm.fshr.i64(i64 [[X:%.]], i64 [[Y:%.]], i64 [[A:%.]])
; CHECK-NEXT: [[SHR:%.]] = lshr i64 [[X:%.]], [[MASK]]
; CHECK-NEXT: [[SUB:%.*]] = sub nuw nsw i64 64, [[MASK]]
; CHECK-NEXT: [[SHL:%.]] = shl i64 [[Y:%.]], [[SUB]]
; CHECK-NEXT: [[R:%.*]] = or i64 [[SHL]], [[SHR]]
; CHECK-NEXT: ret i64 [[R]]		; CHECK-NEXT: ret i64 [[R]]
;		;
%mask = and i64 %a, 63		%mask = and i64 %a, 63
%shr = lshr i64 %x, %mask		%shr = lshr i64 %x, %mask
%sub = sub nuw nsw i64 64, %mask		%sub = sub nuw nsw i64 64, %mask
%shl = shl i64 %y, %sub		%shl = shl i64 %y, %sub
%r = or i64 %shl, %shr		%r = or i64 %shl, %shr
ret i64 %r		ret i64 %r
}		}

define <2 x i64> @fshr_sub_mask_vector(<2 x i64> %x, <2 x i64> %y, <2 x i64> %a) {		define <2 x i64> @fshr_sub_mask_vector(<2 x i64> %x, <2 x i64> %y, <2 x i64> %a) {
; CHECK-LABEL: @fshr_sub_mask_vector(		; CHECK-LABEL: @fshr_sub_mask_vector(
; CHECK-NEXT: [[MASK:%.]] = and <2 x i64> [[A:%.]], <i64 63, i64 63>		; CHECK-NEXT: [[R:%.]] = call <2 x i64> @llvm.fshr.v2i64(<2 x i64> [[X:%.]], <2 x i64> [[Y:%.]], <2 x i64> [[A:%.]])
; CHECK-NEXT: [[SHR:%.]] = lshr <2 x i64> [[X:%.]], [[MASK]]
; CHECK-NEXT: [[SUB:%.*]] = sub nuw nsw <2 x i64> <i64 64, i64 64>, [[MASK]]
; CHECK-NEXT: [[SHL:%.]] = shl <2 x i64> [[Y:%.]], [[SUB]]
; CHECK-NEXT: [[R:%.*]] = or <2 x i64> [[SHL]], [[SHR]]
; CHECK-NEXT: ret <2 x i64> [[R]]		; CHECK-NEXT: ret <2 x i64> [[R]]
;		;
%mask = and <2 x i64> %a, <i64 63, i64 63>		%mask = and <2 x i64> %a, <i64 63, i64 63>
%shr = lshr <2 x i64> %x, %mask		%shr = lshr <2 x i64> %x, %mask
%sub = sub nuw nsw <2 x i64> <i64 64, i64 64>, %mask		%sub = sub nuw nsw <2 x i64> <i64 64, i64 64>, %mask
%shl = shl <2 x i64> %y, %sub		%shl = shl <2 x i64> %y, %sub
%r = or <2 x i64> %shl, %shr		%r = or <2 x i64> %shl, %shr
ret <2 x i64> %r		ret <2 x i64> %r
}		}

llvm/test/Transforms/InstCombine/rotate.ll

Show First 20 Lines • Show All 670 Lines • ▼ Show 20 Lines	;
%ret = trunc i33 %or to i9		%ret = trunc i33 %or to i9
ret i9 %ret		ret i9 %ret
}		}

; Fold or(shl(v,x),lshr(v,bw-x)) iff x < bw		; Fold or(shl(v,x),lshr(v,bw-x)) iff x < bw

define i64 @rotl_sub_mask(i64 %0, i64 %1) {		define i64 @rotl_sub_mask(i64 %0, i64 %1) {
; CHECK-LABEL: @rotl_sub_mask(		; CHECK-LABEL: @rotl_sub_mask(
; CHECK-NEXT: [[TMP3:%.]] = and i64 [[TMP1:%.]], 63		; CHECK-NEXT: [[TMP3:%.]] = call i64 @llvm.fshl.i64(i64 [[TMP0:%.]], i64 [[TMP0]], i64 [[TMP1:%.*]])
; CHECK-NEXT: [[TMP4:%.]] = shl i64 [[TMP0:%.]], [[TMP3]]		; CHECK-NEXT: ret i64 [[TMP3]]
; CHECK-NEXT: [[TMP5:%.*]] = sub nuw nsw i64 64, [[TMP3]]
; CHECK-NEXT: [[TMP6:%.*]] = lshr i64 [[TMP0]], [[TMP5]]
; CHECK-NEXT: [[TMP7:%.*]] = or i64 [[TMP6]], [[TMP4]]
; CHECK-NEXT: ret i64 [[TMP7]]
;		;
%3 = and i64 %1, 63		%3 = and i64 %1, 63
%4 = shl i64 %0, %3		%4 = shl i64 %0, %3
%5 = sub nuw nsw i64 64, %3		%5 = sub nuw nsw i64 64, %3
%6 = lshr i64 %0, %5		%6 = lshr i64 %0, %5
%7 = or i64 %6, %4		%7 = or i64 %6, %4
ret i64 %7		ret i64 %7
}		}

; Fold or(lshr(v,x),shl(v,bw-x)) iff x < bw		; Fold or(lshr(v,x),shl(v,bw-x)) iff x < bw

define i64 @rotr_sub_mask(i64 %0, i64 %1) {		define i64 @rotr_sub_mask(i64 %0, i64 %1) {
; CHECK-LABEL: @rotr_sub_mask(		; CHECK-LABEL: @rotr_sub_mask(
; CHECK-NEXT: [[TMP3:%.]] = and i64 [[TMP1:%.]], 63		; CHECK-NEXT: [[TMP3:%.]] = call i64 @llvm.fshr.i64(i64 [[TMP0:%.]], i64 [[TMP0]], i64 [[TMP1:%.*]])
; CHECK-NEXT: [[TMP4:%.]] = lshr i64 [[TMP0:%.]], [[TMP3]]		; CHECK-NEXT: ret i64 [[TMP3]]
; CHECK-NEXT: [[TMP5:%.*]] = sub nuw nsw i64 64, [[TMP3]]
; CHECK-NEXT: [[TMP6:%.*]] = shl i64 [[TMP0]], [[TMP5]]
; CHECK-NEXT: [[TMP7:%.*]] = or i64 [[TMP6]], [[TMP4]]
; CHECK-NEXT: ret i64 [[TMP7]]
;		;
%3 = and i64 %1, 63		%3 = and i64 %1, 63
%4 = lshr i64 %0, %3		%4 = lshr i64 %0, %3
%5 = sub nuw nsw i64 64, %3		%5 = sub nuw nsw i64 64, %3
%6 = shl i64 %0, %5		%6 = shl i64 %0, %5
%7 = or i64 %6, %4		%7 = or i64 %6, %4
ret i64 %7		ret i64 %7
}		}

define <2 x i64> @rotr_sub_mask_vector(<2 x i64> %0, <2 x i64> %1) {		define <2 x i64> @rotr_sub_mask_vector(<2 x i64> %0, <2 x i64> %1) {
; CHECK-LABEL: @rotr_sub_mask_vector(		; CHECK-LABEL: @rotr_sub_mask_vector(
; CHECK-NEXT: [[TMP3:%.]] = and <2 x i64> [[TMP1:%.]], <i64 63, i64 63>		; CHECK-NEXT: [[TMP3:%.]] = call <2 x i64> @llvm.fshr.v2i64(<2 x i64> [[TMP0:%.]], <2 x i64> [[TMP0]], <2 x i64> [[TMP1:%.*]])
; CHECK-NEXT: [[TMP4:%.]] = lshr <2 x i64> [[TMP0:%.]], [[TMP3]]		; CHECK-NEXT: ret <2 x i64> [[TMP3]]
; CHECK-NEXT: [[TMP5:%.*]] = sub nuw nsw <2 x i64> <i64 64, i64 64>, [[TMP3]]
; CHECK-NEXT: [[TMP6:%.*]] = shl <2 x i64> [[TMP0]], [[TMP5]]
; CHECK-NEXT: [[TMP7:%.*]] = or <2 x i64> [[TMP6]], [[TMP4]]
; CHECK-NEXT: ret <2 x i64> [[TMP7]]
;		;
%3 = and <2 x i64> %1, <i64 63, i64 63>		%3 = and <2 x i64> %1, <i64 63, i64 63>
%4 = lshr <2 x i64> %0, %3		%4 = lshr <2 x i64> %0, %3
%5 = sub nuw nsw <2 x i64> <i64 64, i64 64>, %3		%5 = sub nuw nsw <2 x i64> <i64 64, i64 64>, %3
%6 = shl <2 x i64> %0, %5		%6 = shl <2 x i64> %0, %5
%7 = or <2 x i64> %6, %4		%7 = or <2 x i64> %6, %4
ret <2 x i64> %7		ret <2 x i64> %7
}		}
▲ Show 20 Lines • Show All 172 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] matchFunnelShift - fold or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a,b,x) iff x < bwClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 297453

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp

llvm/test/Transforms/InstCombine/funnel.ll

llvm/test/Transforms/InstCombine/rotate.ll

[InstCombine] matchFunnelShift - fold or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a,b,x) iff x < bw
ClosedPublic