This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
InstCombineShifts.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
load-cmp.ll
-
shift-by-signext.ll

Differential D68103

[InstCombine] Simplify shift-by-sext to shift-by-zext
ClosedPublic

Authored by lebedev.ri on Sep 26 2019, 2:32 PM.

Download Raw Diff

Details

Reviewers

spatel
nikic
RKSimon

Commits

rG269f1bea0d50: [InstCombine] Simplify shift-by-sext to shift-by-zext
rL373106: [InstCombine] Simplify shift-by-sext to shift-by-zext

Summary

This is valid for any sext bitwidth pair:

Processing /tmp/opt.ll..

----------------------------------------
  %signed = sext %y
  %r = shl %x, %signed
  ret %r
=>
  %unsigned = zext %y
  %r = shl %x, %unsigned
  ret %r
  %signed = sext %y

Done: 2016
Optimization is correct!

(This isn't so for funnel shifts, there it's illegal for e.g. i6->i7.)

Main motivation is the C++ semantics:

int shl(int a, char b) {
    return a << b;
}

ends as

%3 = sext i8 %1 to i32
%4 = shl i32 %0, %3

https://godbolt.org/z/0jgqUq
which is, as this shows, too pessimistic.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lebedev.ri created this revision.Sep 26 2019, 2:32 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptSep 26 2019, 2:32 PM

Revisited tests.

We can quite easily have more than one shift by the same sext.
I'm not quite sure where the fix for that should be.
Here i'm proposing to do that in instcombine, by checking
that all uses of such sext are shifts.
This is obviously O(N^2) iff the sext is also used by a non-shift.
Thoughts?

There are a handful of existing transforms in InstCombine that look at users rather than operands, so it's not unprecedented. But I'm not sure if there's another/better way.
One possibility would be to ignore the usual hasOneUse() constraint because we're ok with the trade-off (better analysis despite a potential extra instruction).
You could add the basic fold (with m_OneUse()) as a preliminary and non-controversial step? That would reduce risk and make it clear that whatever extra code that we use is specifically for the multi-use case.
cc @efriedma in case he has advice.

In D68103#1686130, @spatel wrote:

There are a handful of existing transforms in InstCombine that look at users rather than operands, so it's not unprecedented. But I'm not sure if there's another/better way.

The other obvious alternative i know of is putting the non-single-use handling into AggressiveInstCombine.

One possibility would be to ignore the usual hasOneUse() constraint because we're ok with the trade-off (better analysis despite a potential extra instruction).

Hmm, or that. We are sure no other pass will ever do the opposite transform and replace that zext with sext?

You could add the basic fold (with m_OneUse()) as a preliminary and non-controversial step? That would reduce risk and make it clear that whatever extra code that we use is specifically for the multi-use case.

Will split the patch up.

cc @efriedma in case he has advice.

I will fully understand if the greedy variants aren't good for instcombine proper,
i can put the current-ish vaariant into aggressiveinstcombine; there it won't have O(N^2) problem, too.
It's just that the AIC IIRC only runs in -O3, not even -O2?

Back to least greedy version of the patch, this should raise no concerns whatsoever.
The greedy version can be added later in another review.

In D68103#1686144, @lebedev.ri wrote:

i can put the current-ish vaariant into aggressiveinstcombine; there it won't have O(N^2) problem, too.
It's just that the AIC IIRC only runs in -O3, not even -O2?

That's correct, but it was limited to -O3 only because that was the easy, non-controversial path forward. If someone is willing to quantify the perf vs. compile-time trade-off for AIC to show it's worth it, then it could be moved to -O2.

LGTM - it would be best to send the multi-use example to llvm-dev, so we can get some consensus on what's best.

This revision is now accepted and ready to land.Sep 27 2019, 10:53 AM

lebedev.ri mentioned this in D68150: [InstCombine] Simplify shift-by-sext to shift-by-zext - ignore use count on sext.Sep 27 2019, 11:05 AM

Closed by commit rL373106: [InstCombine] Simplify shift-by-sext to shift-by-zext (authored by lebedevri). · Explain WhySep 27 2019, 11:10 AM

This revision was automatically updated to reflect the committed changes.

Diffusion mentioned this in rL373128: [PatternMatch] Add m_SExtOrSelf(), m_ZExtOrSExtOrSelf() matchers + unittests.Sep 27 2019, 2:51 PM

lebedev.ri mentioned this in rG8c39d016705e: [PatternMatch] Add m_SExtOrSelf(), m_ZExtOrSExtOrSelf() matchers + unittests.Sep 27 2019, 2:54 PM

lebedev.ri mentioned this in D68654: [CVP} Replace SExt with ZExt if the input is known-non-negative.Oct 8 2019, 11:15 AM

Diffusion mentioned this in rL374112: [CVP} Replace SExt with ZExt if the input is known-non-negative.Oct 8 2019, 1:31 PM

lebedev.ri mentioned this in rG354ba6985ca0: [CVP} Replace SExt with ZExt if the input is known-non-negative.Oct 8 2019, 1:36 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineShifts.cpp

7 lines

test/

Transforms/

InstCombine/

load-cmp.ll

2 lines

shift-by-signext.ll

24 lines

Diff 222196

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp

Show First 20 Lines • Show All 235 Lines • ▼ Show 20 Lines	dropRedundantMaskingOfLeftShiftInput(BinaryOperator *OuterShift,
Builder.Insert(NewShift);		Builder.Insert(NewShift);
return BinaryOperator::Create(Instruction::And, NewShift, NewMask);		return BinaryOperator::Create(Instruction::And, NewShift, NewMask);
}		}

Instruction *InstCombiner::commonShiftTransforms(BinaryOperator &I) {		Instruction *InstCombiner::commonShiftTransforms(BinaryOperator &I) {
Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);		Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);
assert(Op0->getType() == Op1->getType());		assert(Op0->getType() == Op1->getType());

		// If the shift amount is a one-use `sext`, we can demote it to `zext`.
		Value *Y;
		if (match(Op1, m_OneUse(m_SExt(m_Value(Y))))) {
		Value *NewExt = Builder.CreateZExt(Y, I.getType(), Op1->getName());
		return BinaryOperator::Create(I.getOpcode(), Op0, NewExt);
		}

// See if we can fold away this shift.		// See if we can fold away this shift.
if (SimplifyDemandedInstructionBits(I))		if (SimplifyDemandedInstructionBits(I))
return &I;		return &I;

// Try to fold constant and into select arguments.		// Try to fold constant and into select arguments.
if (isa<Constant>(Op0))		if (isa<Constant>(Op0))
if (SelectInst *SI = dyn_cast<SelectInst>(Op1))		if (SelectInst *SI = dyn_cast<SelectInst>(Op1))
if (Instruction *R = FoldOpIntoSelect(I, SI))		if (Instruction *R = FoldOpIntoSelect(I, SI))
▲ Show 20 Lines • Show All 864 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/load-cmp.ll

Show First 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	;
%P = getelementptr inbounds [10 x i16], [10 x i16]* @G16, i32 0, i32 %X		%P = getelementptr inbounds [10 x i16], [10 x i16]* @G16, i32 0, i32 %X
%Q = load i16, i16* %P		%Q = load i16, i16* %P
%R = icmp sle i16 %Q, 73		%R = icmp sle i16 %Q, 73
ret i1 %R		ret i1 %R
}		}

define i1 @test4_i16(i16 %X) {		define i1 @test4_i16(i16 %X) {
; CHECK-LABEL: @test4_i16(		; CHECK-LABEL: @test4_i16(
; CHECK-NEXT: [[TMP1:%.]] = sext i16 [[X:%.]] to i32		; CHECK-NEXT: [[TMP1:%.]] = zext i16 [[X:%.]] to i32
; CHECK-NEXT: [[TMP2:%.*]] = lshr i32 933, [[TMP1]]		; CHECK-NEXT: [[TMP2:%.*]] = lshr i32 933, [[TMP1]]
; CHECK-NEXT: [[TMP3:%.*]] = and i32 [[TMP2]], 1		; CHECK-NEXT: [[TMP3:%.*]] = and i32 [[TMP2]], 1
; CHECK-NEXT: [[R:%.*]] = icmp ne i32 [[TMP3]], 0		; CHECK-NEXT: [[R:%.*]] = icmp ne i32 [[TMP3]], 0
; CHECK-NEXT: ret i1 [[R]]		; CHECK-NEXT: ret i1 [[R]]
;		;
%P = getelementptr inbounds [10 x i16], [10 x i16]* @G16, i32 0, i16 %X		%P = getelementptr inbounds [10 x i16], [10 x i16]* @G16, i32 0, i16 %X
%Q = load i16, i16* %P		%Q = load i16, i16* %P
%R = icmp sle i16 %Q, 73		%R = icmp sle i16 %Q, 73
▲ Show 20 Lines • Show All 200 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/shift-by-signext.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt %s -instcombine -S \| FileCheck %s			; RUN: opt %s -instcombine -S \| FileCheck %s

	; If we have a shift by sign-extended value, we can replace sign-extension			; If we have a shift by sign-extended value, we can replace sign-extension
	; with zero-extension.			; with zero-extension.

	define i32 @t0_shl(i32 %x, i8 %shamt) {			define i32 @t0_shl(i32 %x, i8 %shamt) {
	; CHECK-LABEL: @t0_shl(			; CHECK-LABEL: @t0_shl(
	; CHECK-NEXT: [[SHAMT_WIDE:%.]] = sext i8 [[SHAMT:%.]] to i32			; CHECK-NEXT: [[SHAMT_WIDE1:%.]] = zext i8 [[SHAMT:%.]] to i32
	; CHECK-NEXT: [[R:%.]] = shl i32 [[X:%.]], [[SHAMT_WIDE]]			; CHECK-NEXT: [[R:%.]] = shl i32 [[X:%.]], [[SHAMT_WIDE1]]
	; CHECK-NEXT: ret i32 [[R]]			; CHECK-NEXT: ret i32 [[R]]
	;			;
	%shamt_wide = sext i8 %shamt to i32			%shamt_wide = sext i8 %shamt to i32
	%r = shl i32 %x, %shamt_wide			%r = shl i32 %x, %shamt_wide
	ret i32 %r			ret i32 %r
	}			}
	define i32 @t1_lshr(i32 %x, i8 %shamt) {			define i32 @t1_lshr(i32 %x, i8 %shamt) {
	; CHECK-LABEL: @t1_lshr(			; CHECK-LABEL: @t1_lshr(
	; CHECK-NEXT: [[SHAMT_WIDE:%.]] = sext i8 [[SHAMT:%.]] to i32			; CHECK-NEXT: [[SHAMT_WIDE1:%.]] = zext i8 [[SHAMT:%.]] to i32
	; CHECK-NEXT: [[R:%.]] = lshr i32 [[X:%.]], [[SHAMT_WIDE]]			; CHECK-NEXT: [[R:%.]] = lshr i32 [[X:%.]], [[SHAMT_WIDE1]]
	; CHECK-NEXT: ret i32 [[R]]			; CHECK-NEXT: ret i32 [[R]]
	;			;
	%shamt_wide = sext i8 %shamt to i32			%shamt_wide = sext i8 %shamt to i32
	%r = lshr i32 %x, %shamt_wide			%r = lshr i32 %x, %shamt_wide
	ret i32 %r			ret i32 %r
	}			}
	define i32 @t2_ashr(i32 %x, i8 %shamt) {			define i32 @t2_ashr(i32 %x, i8 %shamt) {
	; CHECK-LABEL: @t2_ashr(			; CHECK-LABEL: @t2_ashr(
	; CHECK-NEXT: [[SHAMT_WIDE:%.]] = sext i8 [[SHAMT:%.]] to i32			; CHECK-NEXT: [[SHAMT_WIDE1:%.]] = zext i8 [[SHAMT:%.]] to i32
	; CHECK-NEXT: [[R:%.]] = ashr i32 [[X:%.]], [[SHAMT_WIDE]]			; CHECK-NEXT: [[R:%.]] = ashr i32 [[X:%.]], [[SHAMT_WIDE1]]
	; CHECK-NEXT: ret i32 [[R]]			; CHECK-NEXT: ret i32 [[R]]
	;			;
	%shamt_wide = sext i8 %shamt to i32			%shamt_wide = sext i8 %shamt to i32
	%r = ashr i32 %x, %shamt_wide			%r = ashr i32 %x, %shamt_wide
	ret i32 %r			ret i32 %r
	}			}

	define <2 x i32> @t3_vec_shl(<2 x i32> %x, <2 x i8> %shamt) {			define <2 x i32> @t3_vec_shl(<2 x i32> %x, <2 x i8> %shamt) {
	; CHECK-LABEL: @t3_vec_shl(			; CHECK-LABEL: @t3_vec_shl(
	; CHECK-NEXT: [[SHAMT_WIDE:%.]] = sext <2 x i8> [[SHAMT:%.]] to <2 x i32>			; CHECK-NEXT: [[SHAMT_WIDE1:%.]] = zext <2 x i8> [[SHAMT:%.]] to <2 x i32>
	; CHECK-NEXT: [[R:%.]] = shl <2 x i32> [[X:%.]], [[SHAMT_WIDE]]			; CHECK-NEXT: [[R:%.]] = shl <2 x i32> [[X:%.]], [[SHAMT_WIDE1]]
	; CHECK-NEXT: ret <2 x i32> [[R]]			; CHECK-NEXT: ret <2 x i32> [[R]]
	;			;
	%shamt_wide = sext <2 x i8> %shamt to <2 x i32>			%shamt_wide = sext <2 x i8> %shamt to <2 x i32>
	%r = shl <2 x i32> %x, %shamt_wide			%r = shl <2 x i32> %x, %shamt_wide
	ret <2 x i32> %r			ret <2 x i32> %r
	}			}
	define <2 x i32> @t4_vec_lshr(<2 x i32> %x, <2 x i8> %shamt) {			define <2 x i32> @t4_vec_lshr(<2 x i32> %x, <2 x i8> %shamt) {
	; CHECK-LABEL: @t4_vec_lshr(			; CHECK-LABEL: @t4_vec_lshr(
	; CHECK-NEXT: [[SHAMT_WIDE:%.]] = sext <2 x i8> [[SHAMT:%.]] to <2 x i32>			; CHECK-NEXT: [[SHAMT_WIDE1:%.]] = zext <2 x i8> [[SHAMT:%.]] to <2 x i32>
	; CHECK-NEXT: [[R:%.]] = lshr <2 x i32> [[X:%.]], [[SHAMT_WIDE]]			; CHECK-NEXT: [[R:%.]] = lshr <2 x i32> [[X:%.]], [[SHAMT_WIDE1]]
	; CHECK-NEXT: ret <2 x i32> [[R]]			; CHECK-NEXT: ret <2 x i32> [[R]]
	;			;
	%shamt_wide = sext <2 x i8> %shamt to <2 x i32>			%shamt_wide = sext <2 x i8> %shamt to <2 x i32>
	%r = lshr <2 x i32> %x, %shamt_wide			%r = lshr <2 x i32> %x, %shamt_wide
	ret <2 x i32> %r			ret <2 x i32> %r
	}			}
	define <2 x i32> @t5_vec_ashr(<2 x i32> %x, <2 x i8> %shamt) {			define <2 x i32> @t5_vec_ashr(<2 x i32> %x, <2 x i8> %shamt) {
	; CHECK-LABEL: @t5_vec_ashr(			; CHECK-LABEL: @t5_vec_ashr(
	; CHECK-NEXT: [[SHAMT_WIDE:%.]] = sext <2 x i8> [[SHAMT:%.]] to <2 x i32>			; CHECK-NEXT: [[SHAMT_WIDE1:%.]] = zext <2 x i8> [[SHAMT:%.]] to <2 x i32>
	; CHECK-NEXT: [[R:%.]] = ashr <2 x i32> [[X:%.]], [[SHAMT_WIDE]]			; CHECK-NEXT: [[R:%.]] = ashr <2 x i32> [[X:%.]], [[SHAMT_WIDE1]]
	; CHECK-NEXT: ret <2 x i32> [[R]]			; CHECK-NEXT: ret <2 x i32> [[R]]
	;			;
	%shamt_wide = sext <2 x i8> %shamt to <2 x i32>			%shamt_wide = sext <2 x i8> %shamt to <2 x i32>
	%r = ashr <2 x i32> %x, %shamt_wide			%r = ashr <2 x i32> %x, %shamt_wide
	ret <2 x i32> %r			ret <2 x i32> %r
	}			}

	define i32 @t6_twoshifts(i32 %x, i8 %shamt) {			define i32 @t6_twoshifts(i32 %x, i8 %shamt) {
	▲ Show 20 Lines • Show All 104 Lines • Show Last 20 Lines