This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
1/4
InstCombineCalls.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
bswap.ll
1/2
fsh.ll

Differential D146637

[InstCombine] Try to recognize bswap pattern when calling funnel shifts
ClosedPublic

Authored by junaire on Mar 22 2023, 9:00 AM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
nikic

Commits

rGcea938390ea7: [InstCombine] Try to recognize bswap pattern when calling funnel shifts

Summary

Alive2: https://alive2.llvm.org/ce/z/dxxD7B
Fixes: https://github.com/llvm/llvm-project/issues/60690

Signed-off-by: Jun Zhang <jun@junz.org>

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

junaire created this revision.Mar 22 2023, 9:00 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 22 2023, 9:00 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

junaire requested review of this revision.Mar 22 2023, 9:00 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 22 2023, 9:00 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

junaire added a parent revision: D146636: Precommit tests for #60690.Mar 22 2023, 9:00 AM

Harbormaster completed remote builds in B221039: Diff 507380.Mar 22 2023, 9:01 AM

RKSimon added inline comments.Mar 22 2023, 9:52 AM

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
1780	Why not use test matchBSwapOrBitReverse here?

RKSimon added reviewers: spatel, nikic.Mar 22 2023, 10:14 AM

Herald added a subscriber: StephenFan. · View Herald TranscriptMar 22 2023, 10:14 AM

As @RKSimon said, this transform is in the wrong place. It has nothing to do with shuffled intrinsic operands...

This revision now requires changes to proceed.Mar 22 2023, 10:38 AM

Put the transform in the right place.

junaire retitled this revision from [InstCombine] Try to recognize bswap pattern when calling fshl to [InstCombine] Try to recognize bswap pattern when calling funnel shifts.Mar 22 2023, 10:14 PM

junaire edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B221219: Diff 507606.Mar 22 2023, 11:04 PM

Update existing tests

nikic added inline comments.Mar 23 2023, 1:21 AM

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
1797–1801	This code block can probably be dropped now, if it's already handled by the preceding?

junaire added inline comments.Mar 23 2023, 1:23 AM

llvm/test/Transforms/InstCombine/fsh.ll
677	Looks like this transform somehow breaks the existing optimization? Any thoughts about how to fix it? My idea is to add another default parameter called `bool ForceTrunc = true` and ask `matchBSwapOrBitReverse` not to truncate in this case. However, I don't know if this is appropriate and please let me know if you have better solutions...

nikic added inline comments.Mar 23 2023, 1:24 AM

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
1797–1801	Or you can move your code after the existing one, which will avoid the regression in fshl_mask_args_same2. It shouldn't be necessary to handle this specially, but clearly we're missing a zext(shl(trunc)) fold.

junaire added inline comments.Mar 23 2023, 1:57 AM

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
1797–1801	No that doesn't work since the existing transform is highly depend on the bit-width (only valid if it's 16 but in the test that's i32) Or maybe we can leave it as is and add a zext(shl(trunc)) fold in the future?

Harbormaster completed remote builds in B221243: Diff 507635.Mar 23 2023, 2:01 AM

LGTM

llvm/test/Transforms/InstCombine/fsh.ll
677	We should just add support for folding this trunc+shl+zext pattern. I've filed https://github.com/llvm/llvm-project/issues/61650 to track.

This revision is now accepted and ready to land.Mar 23 2023, 4:05 AM

junaire mentioned this in D146636: Precommit tests for #60690.Mar 23 2023, 7:51 PM

This revision was landed with ongoing or failed builds.Mar 23 2023, 7:51 PM

Closed by commit rGcea938390ea7: [InstCombine] Try to recognize bswap pattern when calling funnel shifts (authored by junaire). · Explain Why

This revision was automatically updated to reflect the committed changes.

junaire added a commit: rGcea938390ea7: [InstCombine] Try to recognize bswap pattern when calling funnel shifts.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineCalls.cpp

4 lines

test/

Transforms/

InstCombine/

bswap.ll

24 lines

fsh.ll

9 lines

Diff 507951

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

Show First 20 Lines • Show All 1,771 Lines • ▼ Show 20 Lines	if (match(II->getArgOperand(2), m_ImmConstant(ShAmtC))) {
// fshr X, Y, C --> fshl X, Y, (BitWidth - C)		// fshr X, Y, C --> fshl X, Y, (BitWidth - C)
Constant *LeftShiftC = ConstantExpr::getSub(WidthC, ShAmtC);		Constant *LeftShiftC = ConstantExpr::getSub(WidthC, ShAmtC);
Module *Mod = II->getModule();		Module *Mod = II->getModule();
Function *Fshl = Intrinsic::getDeclaration(Mod, Intrinsic::fshl, Ty);		Function *Fshl = Intrinsic::getDeclaration(Mod, Intrinsic::fshl, Ty);
return CallInst::Create(Fshl, { Op0, Op1, LeftShiftC });		return CallInst::Create(Fshl, { Op0, Op1, LeftShiftC });
}		}
assert(IID == Intrinsic::fshl &&		assert(IID == Intrinsic::fshl &&
"All funnel shifts by simple constants should go left");		"All funnel shifts by simple constants should go left");

		RKSimonUnsubmitted Not Done Reply Inline Actions Why not use test matchBSwapOrBitReverse here? RKSimon: Why not use test matchBSwapOrBitReverse here?
// fshl(X, 0, C) --> shl X, C		// fshl(X, 0, C) --> shl X, C
// fshl(X, undef, C) --> shl X, C		// fshl(X, undef, C) --> shl X, C
if (match(Op1, m_ZeroInt()) \|\| match(Op1, m_Undef()))		if (match(Op1, m_ZeroInt()) \|\| match(Op1, m_Undef()))
return BinaryOperator::CreateShl(Op0, ShAmtC);		return BinaryOperator::CreateShl(Op0, ShAmtC);

// fshl(0, X, C) --> lshr X, (BW-C)		// fshl(0, X, C) --> lshr X, (BW-C)
// fshl(undef, X, C) --> lshr X, (BW-C)		// fshl(undef, X, C) --> lshr X, (BW-C)
if (match(Op0, m_ZeroInt()) \|\| match(Op0, m_Undef()))		if (match(Op0, m_ZeroInt()) \|\| match(Op0, m_Undef()))
return BinaryOperator::CreateLShr(Op1,		return BinaryOperator::CreateLShr(Op1,
ConstantExpr::getSub(WidthC, ShAmtC));		ConstantExpr::getSub(WidthC, ShAmtC));

// fshl i16 X, X, 8 --> bswap i16 X (reduce to more-specific form)		// fshl i16 X, X, 8 --> bswap i16 X (reduce to more-specific form)
if (Op0 == Op1 && BitWidth == 16 && match(ShAmtC, m_SpecificInt(8))) {		if (Op0 == Op1 && BitWidth == 16 && match(ShAmtC, m_SpecificInt(8))) {
Module *Mod = II->getModule();		Module *Mod = II->getModule();
Function *Bswap = Intrinsic::getDeclaration(Mod, Intrinsic::bswap, Ty);		Function *Bswap = Intrinsic::getDeclaration(Mod, Intrinsic::bswap, Ty);
return CallInst::Create(Bswap, { Op0 });		return CallInst::Create(Bswap, { Op0 });
}		}
		if (Instruction *BitOp =
		matchBSwapOrBitReverse(II, /MatchBSwaps*/ true,
		/MatchBitReversals/ true))
		return BitOp;
		nikicUnsubmitted Not Done Reply Inline Actions This code block can probably be dropped now, if it's already handled by the preceding? nikic: This code block can probably be dropped now, if it's already handled by the preceding?
		nikicUnsubmitted Not Done Reply Inline Actions Or you can move your code after the existing one, which will avoid the regression in fshl_mask_args_same2. It shouldn't be necessary to handle this specially, but clearly we're missing a zext(shl(trunc)) fold. nikic: Or you can move your code after the existing one, which will avoid the regression in…
		junaireAuthorUnsubmitted Done Reply Inline Actions No that doesn't work since the existing transform is highly depend on the bit-width (only valid if it's 16 but in the test that's i32) Or maybe we can leave it as is and add a zext(shl(trunc)) fold in the future? junaire: No that doesn't work since the existing transform is highly depend on the bit-width (only valid…
}		}

// Left or right might be masked.		// Left or right might be masked.
if (SimplifyDemandedInstructionBits(*II))		if (SimplifyDemandedInstructionBits(*II))
return &CI;		return &CI;

// The shift amount (operand 2) of a funnel shift is modulo the bitwidth,		// The shift amount (operand 2) of a funnel shift is modulo the bitwidth,
// so only the low bits of the shift amount are demanded if the bitwidth is		// so only the low bits of the shift amount are demanded if the bitwidth is
▲ Show 20 Lines • Show All 2,224 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/bswap.ll

Show First 20 Lines • Show All 926 Lines • ▼ Show 20 Lines	;
%t4 = lshr i64 %t0, 56		%t4 = lshr i64 %t0, 56
%t5 = or i64 %t3, %t4		%t5 = or i64 %t3, %t4
%t6 = trunc i64 %t5 to i32		%t6 = trunc i64 %t5 to i32
ret i32 %t6		ret i32 %t6
}		}

define i64 @PR60690_call_fshl(i64 %result) {		define i64 @PR60690_call_fshl(i64 %result) {
; CHECK-LABEL: @PR60690_call_fshl(		; CHECK-LABEL: @PR60690_call_fshl(
; CHECK-NEXT: [[AND_I:%.]] = lshr i64 [[RESULT:%.]], 8		; CHECK-NEXT: [[OR_I12:%.]] = call i64 @llvm.bswap.i64(i64 [[RESULT:%.]])
; CHECK-NEXT: [[SHR_I:%.*]] = and i64 [[AND_I]], 71777214294589695
; CHECK-NEXT: [[AND1_I:%.*]] = shl i64 [[RESULT]], 8
; CHECK-NEXT: [[SHL_I:%.*]] = and i64 [[AND1_I]], -71777214294589696
; CHECK-NEXT: [[OR_I:%.*]] = or i64 [[SHR_I]], [[SHL_I]]
; CHECK-NEXT: [[AND_I7:%.*]] = shl i64 [[OR_I]], 16
; CHECK-NEXT: [[SHL_I8:%.*]] = and i64 [[AND_I7]], -281470681808896
; CHECK-NEXT: [[AND1_I9:%.*]] = lshr i64 [[OR_I]], 16
; CHECK-NEXT: [[SHR_I10:%.*]] = and i64 [[AND1_I9]], 281470681808895
; CHECK-NEXT: [[OR_I11:%.*]] = or i64 [[SHL_I8]], [[SHR_I10]]
; CHECK-NEXT: [[OR_I12:%.*]] = tail call i64 @llvm.fshl.i64(i64 [[OR_I11]], i64 [[OR_I11]], i64 32)
; CHECK-NEXT: ret i64 [[OR_I12]]		; CHECK-NEXT: ret i64 [[OR_I12]]
;		;
%and.i = lshr i64 %result, 8		%and.i = lshr i64 %result, 8
%shr.i = and i64 %and.i, 71777214294589695		%shr.i = and i64 %and.i, 71777214294589695
%and1.i = shl i64 %result, 8		%and1.i = shl i64 %result, 8
%shl.i = and i64 %and1.i, -71777214294589696		%shl.i = and i64 %and1.i, -71777214294589696
%or.i = or i64 %shr.i, %shl.i		%or.i = or i64 %shr.i, %shl.i
%and.i7 = shl i64 %or.i, 16		%and.i7 = shl i64 %or.i, 16
%shl.i8 = and i64 %and.i7, -281470681808896		%shl.i8 = and i64 %and.i7, -281470681808896
%and1.i9 = lshr i64 %or.i, 16		%and1.i9 = lshr i64 %or.i, 16
%shr.i10 = and i64 %and1.i9, 281470681808895		%shr.i10 = and i64 %and1.i9, 281470681808895
%or.i11 = or i64 %shl.i8, %shr.i10		%or.i11 = or i64 %shl.i8, %shr.i10
%or.i12 = tail call i64 @llvm.fshl.i64(i64 %or.i11, i64 %or.i11, i64 32)		%or.i12 = tail call i64 @llvm.fshl.i64(i64 %or.i11, i64 %or.i11, i64 32)
ret i64 %or.i12		ret i64 %or.i12
}		}
declare i64 @llvm.fshl.i64(i64, i64, i64)		declare i64 @llvm.fshl.i64(i64, i64, i64)

define i64 @PR60690_call_fshr(i64 %result) {		define i64 @PR60690_call_fshr(i64 %result) {
; CHECK-LABEL: @PR60690_call_fshr(		; CHECK-LABEL: @PR60690_call_fshr(
; CHECK-NEXT: [[AND_I:%.]] = lshr i64 [[RESULT:%.]], 8		; CHECK-NEXT: [[OR_I12:%.]] = call i64 @llvm.bswap.i64(i64 [[RESULT:%.]])
; CHECK-NEXT: [[SHR_I:%.*]] = and i64 [[AND_I]], 71777214294589695
; CHECK-NEXT: [[AND1_I:%.*]] = shl i64 [[RESULT]], 8
; CHECK-NEXT: [[SHL_I:%.*]] = and i64 [[AND1_I]], -71777214294589696
; CHECK-NEXT: [[OR_I:%.*]] = or i64 [[SHR_I]], [[SHL_I]]
; CHECK-NEXT: [[AND_I7:%.*]] = shl i64 [[OR_I]], 16
; CHECK-NEXT: [[SHL_I8:%.*]] = and i64 [[AND_I7]], -281470681808896
; CHECK-NEXT: [[AND1_I9:%.*]] = lshr i64 [[OR_I]], 16
; CHECK-NEXT: [[SHR_I10:%.*]] = and i64 [[AND1_I9]], 281470681808895
; CHECK-NEXT: [[OR_I11:%.*]] = or i64 [[SHL_I8]], [[SHR_I10]]
; CHECK-NEXT: [[OR_I12:%.*]] = call i64 @llvm.fshl.i64(i64 [[OR_I11]], i64 [[OR_I11]], i64 32)
; CHECK-NEXT: ret i64 [[OR_I12]]		; CHECK-NEXT: ret i64 [[OR_I12]]
;		;
%and.i = lshr i64 %result, 8		%and.i = lshr i64 %result, 8
%shr.i = and i64 %and.i, 71777214294589695		%shr.i = and i64 %and.i, 71777214294589695
%and1.i = shl i64 %result, 8		%and1.i = shl i64 %result, 8
%shl.i = and i64 %and1.i, -71777214294589696		%shl.i = and i64 %and1.i, -71777214294589696
%or.i = or i64 %shr.i, %shl.i		%or.i = or i64 %shr.i, %shl.i
%and.i7 = shl i64 %or.i, 16		%and.i7 = shl i64 %or.i, 16
%shl.i8 = and i64 %and.i7, -281470681808896		%shl.i8 = and i64 %and.i7, -281470681808896
%and1.i9 = lshr i64 %or.i, 16		%and1.i9 = lshr i64 %or.i, 16
%shr.i10 = and i64 %and1.i9, 281470681808895		%shr.i10 = and i64 %and1.i9, 281470681808895
%or.i11 = or i64 %shl.i8, %shr.i10		%or.i11 = or i64 %shl.i8, %shr.i10
%or.i12 = tail call i64 @llvm.fshr.i64(i64 %or.i11, i64 %or.i11, i64 32)		%or.i12 = tail call i64 @llvm.fshr.i64(i64 %or.i11, i64 %or.i11, i64 32)
ret i64 %or.i12		ret i64 %or.i12
}		}
declare i64 @llvm.fshr.i64(i64, i64, i64)		declare i64 @llvm.fshr.i64(i64, i64, i64)

llvm/test/Transforms/InstCombine/fsh.ll

	Show First 20 Lines • Show All 666 Lines • ▼ Show 20 Lines
	;			;
	%t1 = and i32 %a, 4294901760 ; 0xffff0000			%t1 = and i32 %a, 4294901760 ; 0xffff0000
	%t2 = call i32 @llvm.fshl.i32(i32 %t1, i32 %t1, i32 16)			%t2 = call i32 @llvm.fshl.i32(i32 %t1, i32 %t1, i32 16)
	ret i32 %t2			ret i32 %t2
	}			}

	define i32 @fshl_mask_args_same2(i32 %a) {			define i32 @fshl_mask_args_same2(i32 %a) {
	; CHECK-LABEL: @fshl_mask_args_same2(			; CHECK-LABEL: @fshl_mask_args_same2(
	; CHECK-NEXT: [[T1:%.]] = shl i32 [[A:%.]], 8			; CHECK-NEXT: [[TRUNC:%.]] = trunc i32 [[A:%.]] to i16
	; CHECK-NEXT: [[T2:%.*]] = and i32 [[T1]], 65280			; CHECK-NEXT: [[REV:%.*]] = shl i16 [[TRUNC]], 8
				; CHECK-NEXT: [[T2:%.*]] = zext i16 [[REV]] to i32
				junaireAuthorUnsubmitted Done Reply Inline Actions Looks like this transform somehow breaks the existing optimization? Any thoughts about how to fix it? My idea is to add another default parameter called `bool ForceTrunc = true` and ask `matchBSwapOrBitReverse` not to truncate in this case. However, I don't know if this is appropriate and please let me know if you have better solutions... junaire: Looks like this transform somehow breaks the existing optimization? Any thoughts about how to…
				nikicUnsubmitted Not Done Reply Inline Actions We should just add support for folding this trunc+shl+zext pattern. I've filed https://github.com/llvm/llvm-project/issues/61650 to track. nikic: We should just add support for folding this trunc+shl+zext pattern. I've filed https://github.
	; CHECK-NEXT: ret i32 [[T2]]			; CHECK-NEXT: ret i32 [[T2]]
	;			;
	%t1 = and i32 %a, 255			%t1 = and i32 %a, 255
	%t2 = call i32 @llvm.fshl.i32(i32 %t1, i32 %t1, i32 8)			%t2 = call i32 @llvm.fshl.i32(i32 %t1, i32 %t1, i32 8)
	ret i32 %t2			ret i32 %t2
	}			}

	define i32 @fshl_mask_args_same3(i32 %a) {			define i32 @fshl_mask_args_same3(i32 %a) {
	; CHECK-LABEL: @fshl_mask_args_same3(			; CHECK-LABEL: @fshl_mask_args_same3(
	; CHECK-NEXT: [[T2:%.]] = shl i32 [[A:%.]], 24			; CHECK-NEXT: [[REV:%.]] = shl i32 [[A:%.]], 24
	; CHECK-NEXT: ret i32 [[T2]]			; CHECK-NEXT: ret i32 [[REV]]
	;			;
	%t1 = and i32 %a, 255			%t1 = and i32 %a, 255
	%t2 = call i32 @llvm.fshl.i32(i32 %t1, i32 %t1, i32 24)			%t2 = call i32 @llvm.fshl.i32(i32 %t1, i32 %t1, i32 24)
	ret i32 %t2			ret i32 %t2
	}			}

	define i32 @fshl_mask_args_different(i32 %a) {			define i32 @fshl_mask_args_different(i32 %a) {
	; CHECK-LABEL: @fshl_mask_args_different(			; CHECK-LABEL: @fshl_mask_args_different(
	▲ Show 20 Lines • Show All 161 Lines • Show Last 20 Lines