This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
InstCombineShifts.cpp
-
test/Transforms/
-
Transforms/
-
InstCombine/
-
2008-01-21-MulTrunc.ll
-
apint-cast.ll
-
cast.ll
2/3
pr50555.ll
-
SLPVectorizer/X86/
-
X86/
4/5
pr50555.ll

Differential D107766

[AggressiveInstCombine] Add shift instructions to `TruncInstCombine` DAG
AbandonedPublic

Authored by anton-afanasyev on Aug 9 2021, 7:31 AM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
ABataev
dtemirbulatov
lebedev.ri
nikic

Summary

Add `shl`, `lshr` and `ashr` instructions to the DAG post-dominated by `trunc`,
allowing TruncInstCombine to reduce bitwidth of expressions containing shifts.

Fixes PR50555.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

anton-afanasyev created this revision.Aug 9 2021, 7:31 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptAug 9 2021, 7:31 AM

anton-afanasyev requested review of this revision.Aug 9 2021, 7:31 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 9 2021, 7:31 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

lebedev.ri added a reviewer: lebedev.ri.Aug 9 2021, 7:34 AM

We can not do this transform as proposed here,
it increases the instruction count.

Could you point me at the test for this change?
It should only contain lshr-of-zext, there should not be any trunc;
please add a test where zext has an extra use.

This revision now requires changes to proceed.Aug 9 2021, 7:40 AM

Add test with zext having extra use

Harbormaster completed remote builds in B118680: Diff 365189.Aug 9 2021, 8:14 AM

anton-afanasyev added inline comments.Aug 9 2021, 8:33 AM

llvm/test/Transforms/InstCombine/pr50555.ll
9	In D107766#2934536, @lebedev.ri wrote: We can not do this transform as proposed here, it increases the instruction count. Could you point me at the test for this change? It should only contain lshr-of-zext, there should not be any trunc; please add a test where zext has an extra use. @lebedev.ri: Yes, you're right, it increases instruction count, adding new `zext` when the old one has an extra use. But it also simplifies `lshr` to lower bits type making it simpler. Isn't it a good compromise? And also, after `zext` sinking, it can trigger a chain of changes combining `zext` with other instrs like in cases below: `add`, `trunc` and so on.

I think this can be adjusted in SLP vectorizer. We have MinBWs container in there, to try to operate on non-wide instructions. Probably need to tweak it to handle this case. Did you try to modify collectValuesToDemote function?

FWIW i agree that it obviously improves the SLP snippet in question,
i'm just not sure this is the right way to do it. Sorry.

llvm/test/Transforms/InstCombine/pr50555.ll
9	@lebedev.ri: Yes, you're right, it increases instruction count, adding new zext when the old one has an extra use. To be noted, this is the profitability heuristics of instcombine: don't increase instruction count. But it also simplifies lshr to lower bits type making it simpler. Isn't it a good compromise? And also, after zext sinking, it can trigger a chain of changes combining zext with other instrs like in cases below: add, trunc and so on. Sure. But it still increases instruction count.

If we want to solve this as an instcombine (or maybe aggressive-instcombine) problem, we have to expand the pattern to make it clearly profitable. I'm not sure how to generalize it, but we can do the narrowing starting from the trunc and remove an instruction:
https://alive2.llvm.org/ce/z/lwtDwZ

define i16 @src(i8 %x) {
  %z = zext i8 %x to i32
  %s = lshr i32 %z, 1
  %a = add nuw nsw i32 %s, %z
  %s2 = lshr i32 %a, 2
  %t = trunc i32 %s2 to i16
  ret i16 %t
}

define i16 @tgt(i8 %x) {
  %z = zext i8 %x to i16
  %s = lshr i16 %z, 1
  %a = add nuw nsw i16 %s, %z
  %s2 = lshr i16 %a, 2
  ret i16 %s2
}

Thanks to all, I've moved this fix to aggresive-instcombine, where it is even planned in TODO: section.

llvm/test/Transforms/InstCombine/pr50555.ll
9	To be noted, this is the profitability heuristics of instcombine: don't increase instruction count. Ok, I see, thanks for noting this.

Move fix to AggressiveInstCombine

Harbormaster completed remote builds in B118701: Diff 365222.Aug 9 2021, 10:52 AM

Remove InstCombine tests

Harbormaster completed remote builds in B118704: Diff 365227.Aug 9 2021, 11:00 AM

Nice, this seems to fit naturally there.
That being said, you probably still want some standalone tests for the pattern in question, both a positive ones, and a negative ones - what's the requirement on the shift amount?

In D107766#2935073, @lebedev.ri wrote:

Nice, this seems to fit naturally there.
That being said, you probably still want some standalone tests for the pattern in question, both a positive ones, and a negative ones - what's the requirement on the shift amount?

Agree - aggressive-instcombine doesn't get nearly as much testing as regular instcombine, so we need more tests to be confident it doesn't over-reach.
Leaving the shift amount off of the getRelevantOperands() list doesn't work on this example (crash):

define i16 @sh_amt(i8 %x, i8 %sh1) {
  %z = zext i8 %x to i32
  %zs = zext i8 %sh1 to i32
  %s = lshr i32 %z, %zs
  %a = add nuw nsw i32 %s, %z
  %s2 = lshr i32 %a, 2
  %t = trunc i32 %s2 to i16
  ret i16 %t
}

Some observations for logical right-shift.
this will have a hard time with variable shift amounts.
You need to avoid creating out-of-bounds shifts, there are two obvious options:

https://alive2.llvm.org/ce/z/XShcju <- shift amount needs to be less than target width, and truncation should only drop zeros.
https://alive2.llvm.org/ce/z/QiDPV7 <- could saturate the shift amount if you know that %x has more leading zeros than the number of bits to be truncated

We might already have this logic somewhere, not sure.

In D107766#2936935, @lebedev.ri wrote:

Some observations for logical right-shift.
this will have a hard time with variable shift amounts.
You need to avoid creating out-of-bounds shifts, there are two obvious options:

https://alive2.llvm.org/ce/z/XShcju <- shift amount needs to be less than target width, and truncation should only drop zeros.

https://alive2.llvm.org/ce/z/QiDPV7 <- could saturate the shift amount if you know that %x has more leading zeros than the number of bits to be truncated

We might already have this logic somewhere, not sure.

Yes, thanks for your observations, I'm already working on it: https://alive2.llvm.org/ce/z/XcCJ9Q
There is also special care for the vector case to make a transform not being more poisonous.
TruncInstCombine already has appropriate logic but needs to be tweaked.
For now I'm supposing that shift amout is constant (int or vector). Not sure that transform adding check for variable shift amount is good.

In D107766#2937021, @anton-afanasyev wrote:

In D107766#2936935, @lebedev.ri wrote:

Some observations for logical right-shift.
this will have a hard time with variable shift amounts.
You need to avoid creating out-of-bounds shifts, there are two obvious options:

https://alive2.llvm.org/ce/z/XShcju <- shift amount needs to be less than target width, and truncation should only drop zeros.

https://alive2.llvm.org/ce/z/QiDPV7 <- could saturate the shift amount if you know that %x has more leading zeros than the number of bits to be truncated

We might already have this logic somewhere, not sure.

Yes, thanks for your observations, I'm already working on it: https://alive2.llvm.org/ce/z/XcCJ9Q
There is also special care for the vector case to make a transform not being more poisonous.
TruncInstCombine already has appropriate logic but needs to be tweaked.
For now I'm supposing that shift amout is constant (int or vector).

Not sure that transform adding check for variable shift amount is good.

Define good. I think supporting variable shifts will take exactly two lines:
compute knownbits of the shift amount, and get the maximal shift amount via KnownBits::getMaxValue().

Define good. I think supporting variable shifts will take exactly two lines:
compute knownbits of the shift amount, and get the maximal shift amount via KnownBits::getMaxValue().

Hmm, how could we compute knownbits of the shift amount at compile time? Do you mean analyzing DAG for the shift amount Value, taking knowbits recursively?
Also I don't believe this computing makes sense: for the most cases, when shift amount is variable, its first byte is unknown. For instance, how could knownbits help to optimize @spatel's example?

define i16 @sh_amt(i8 %x, i8 %sh1) {
  %z = zext i8 %x to i32
  %zs = zext i8 %sh1 to i32
  %s = lshr i32 %z, %zs
  %a = add nuw nsw i32 %s, %z
  %s2 = lshr i32 %a, 2
  %t = trunc i32 %s2 to i16
  ret i16 %t
}

anton-afanasyev planned changes to this revision.Aug 10 2021, 11:28 AM

anton-afanasyev retitled this revision from [InstCombine] Get rid of `hasOneUses()` when swapping `lshr` and `zext` to [AggressiveInstCombine] Add `lshr` and `ashr` instructions to TruncInstCombine DAG.

anton-afanasyev edited the summary of this revision. (Show Details)

In D107766#2937675, @anton-afanasyev wrote:

Define good. I think supporting variable shifts will take exactly two lines:
compute knownbits of the shift amount, and get the maximal shift amount via KnownBits::getMaxValue().

Hmm, how could we compute knownbits of the shift amount at compile time? Do you mean analyzing DAG for the shift amount Value, taking knowbits recursively?

You've seen llvm::computeKnownBits(), right?

Also I don't believe this computing makes sense: for the most cases, when shift amount is variable, its first byte is unknown. For instance, how could knownbits help to optimize @spatel's example?
define i16 @sh_amt(i8 %x, i8 %sh1) {
  %z = zext i8 %x to i32
  %zs = zext i8 %sh1 to i32
  %s = lshr i32 %z, %zs
  %a = add nuw nsw i32 %s, %z
  %s2 = lshr i32 %a, 2
  %t = trunc i32 %s2 to i16
  ret i16 %t
}

I find this comment to be highly inflammatory.

Just because there's large number of cases it won't help doesn't mean it can't ever help with anything.
https://alive2.llvm.org/ce/z/RkkBTy <- we have no idea what %y is, but we can tell it's less than the target bitwidth.

AggressiveInstCombine runs only with -O3, right? Do we know how expensive it would be for -O2?

xbolva00 added a reviewer: nikic.Aug 12 2021, 5:40 AM

In D107766#2941350, @xbolva00 wrote:

AggressiveInstCombine runs only with -O3, right? Do we know how expensive it would be for -O2?

Yes - only at O3 currently. That's mainly because nobody has bothered to see if it was worth fighting over to include at -O2.

That question is much easier to answer since we have compile-time-tracker. But I'm not sure how to answer the cost question directly - I think we can approximate it by just removing the pass from O3 and checking the difference.

The result seems to be a consistent but very small cost (0.04% geomean here):
https://llvm-compile-time-tracker.com/?config=NewPM-O3&stat=instructions&remote=rotateright

Update, add shl

Add negative and positive tests

Harbormaster completed remote builds in B119534: Diff 366400.Aug 13 2021, 11:18 PM

Hmm, how could we compute knownbits of the shift amount at compile time? Do you mean analyzing DAG for the shift amount Value, taking knowbits recursively?

You've seen llvm::computeKnownBits(), right?

Thanks, used it.

anton-afanasyev retitled this revision from [AggressiveInstCombine] Add `lshr` and `ashr` instructions to TruncInstCombine DAG to [AggressiveInstCombine] Add shift instructions to `TruncInstCombine` DAG.Aug 13 2021, 11:28 PM

anton-afanasyev edited the summary of this revision. (Show Details)

Fix test

Harbormaster completed remote builds in B119536: Diff 366403.Aug 13 2021, 11:35 PM

I think we want to do this in three steps.
lshr is easy and obvious, but for ashr we want to count *sign* bits.
Haven't really thought about shl

In D107766#2945161, @lebedev.ri wrote:

I think we want to do this in three steps.
lshr is easy and obvious, but for ashr we want to count *sign* bits.
Haven't really thought about shl

Do you mean splitting this to three separate patches?
shl is simpler than both right shifts since it has no bits moved from truncated part to the untruncated one.
The condition used for shl here is necessary and sufficient, whereas it is only sufficient for the right shifts.

RKSimon added inline comments.Aug 15 2021, 3:06 AM

llvm/test/Transforms/SLPVectorizer/X86/pr50555.ll
2–3	Should this be moved to be a phase ordering test do you think?

anton-afanasyev added inline comments.Aug 15 2021, 3:37 AM

llvm/test/Transforms/SLPVectorizer/X86/pr50555.ll
2–3	Do you think it's more test that slp-vectorizer follows aggressive-instcombine? Ok, moved.

Move SLPVectorizer test to PhaseOrdering

Harbormaster completed remote builds in B119600: Diff 366483.Aug 15 2021, 3:38 AM

Fix test move

Harbormaster completed remote builds in B119601: Diff 366484.Aug 15 2021, 3:41 AM

In D107766#2945416, @anton-afanasyev wrote:

In D107766#2945161, @lebedev.ri wrote:

I think we want to do this in three steps.
lshr is easy and obvious, but for ashr we want to count *sign* bits.
Haven't really thought about shl

Do you mean splitting this to three separate patches?

Yes.

shl is simpler than both right shifts since it has no bits moved from truncated part to the untruncated one.
The condition used for shl here is necessary and sufficient, whereas it is only sufficient for the right shifts.

That is kind of my point.
At least the left and right shifts have different legality rules,
and different right-shifts also have slightly different rules.
Not having to deal with everything at once will strictly simplify review.

anton-afanasyev mentioned this in D108091: [AggressiveInstCombine] Add shift left instruction to `TruncInstCombine` DAG.Aug 15 2021, 10:39 AM

In D107766#2945525, @lebedev.ri wrote:

That is kind of my point.
At least the left and right shifts have different legality rules,
and different right-shifts also have slightly different rules.
Not having to deal with everything at once will strictly simplify review.

Ok, start from shl: https://reviews.llvm.org/D108091

RKSimon added inline comments.Aug 16 2021, 9:16 AM

llvm/test/Transforms/SLPVectorizer/X86/pr50555.ll

2–3

This should be in the X86 sub-directory - look at other tests in there for examples as we don't specify explicit passes:
e.g.

; RUN: opt -O2 -S < %s | FileCheck %s--check-prefixes=SSE
; RUN: opt -O2 -S -mattr=avx < %s | FileCheck %s--check-prefixes=AVX
; RUN: opt -passes='default<O2>' -S < %s | FileCheck %s--check-prefixes=SSE
; RUN: opt -passes='default<O2>' -S -mattr=avx < %s | FileCheck %s--check-prefixes=AVX

spatel added inline comments.Aug 16 2021, 9:33 AM

llvm/test/Transforms/SLPVectorizer/X86/pr50555.ll
2–3	Right - the goal of PhaseOrdering tests is to make sure that >1 passes are interacting as expected and that we get the expected results from the typical (-On) pass pipelines in 'opt'.

anton-afanasyev marked 2 inline comments as done.Aug 16 2021, 11:38 AM

anton-afanasyev added inline comments.

llvm/test/Transforms/SLPVectorizer/X86/pr50555.ll
2–3	Sure, thanks! Moved to subdirectory, changed to -O3 option.

Address comments

Harbormaster completed remote builds in B119755: Diff 366688.Aug 16 2021, 11:39 AM

(Feel free to post lshr patch after landing D108091)

anton-afanasyev mentioned this in rG8f8f9260a95f: [Test][AggressiveInstCombine] Add test for shifts.Aug 17 2021, 2:40 AM

anton-afanasyev mentioned this in rG1f3e35b6d165: [AggressiveInstCombine] Add shift left instruction to `TruncInstCombine` DAG.Aug 17 2021, 3:17 AM

anton-afanasyev mentioned this in D108201: [AggressiveInstCombine] Add logical shift right instr to `TruncInstCombine` DAG.Aug 17 2021, 4:15 AM

lshr case: https://reviews.llvm.org/D108201

anton-afanasyev mentioned this in rGcfb6dfcbd13b: [AggressiveInstCombine] Add logical shift right instr to `TruncInstCombine` DAG.Aug 18 2021, 12:22 PM

Is there anything left to do on this?

In D107766#2953168, @RKSimon wrote:

Is there anything left to do on this?

Rebase this?
ashr is left.

Yes, I'm to add ashr. Also planning to add AssumptionCache to use it for computeKnownBits(). And investigate question about including AIC to -O2.

And investigate question about including AIC to -O2.

Yeah, AIC for O2+ - it would be great if possible..

anton-afanasyev mentioned this in D108355: [AggressiveInstCombine] Add arithmetic shift right instr to `TruncInstCombine` DAG.Aug 19 2021, 2:02 AM

ashr part: https://reviews.llvm.org/D108355

anton-afanasyev mentioned this in rGbed587631f90: [AggressiveInstCombine] Add arithmetic shift right instr to `TruncInstCombine`….Aug 24 2021, 12:41 AM

@anton-afanasyev reverse ping - is there anything left on this patch?

This was already commited in a series of patch, abandoning.

anton-afanasyev mentioned this in D113179: [Passes] Move AggressiveInstCombine after InstCombine.Nov 4 2021, 3:47 AM

anton-afanasyev mentioned this in rGc34d157fc739: [Passes] Move AggressiveInstCombine after InstCombine.Dec 4 2021, 3:24 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineShifts.cpp

2 lines

test/

Transforms/

InstCombine/

2008-01-21-MulTrunc.ll

8 lines

apint-cast.ll

8 lines

cast.ll

12 lines

pr50555.ll

15 lines

SLPVectorizer/

X86/

pr50555.ll

91 lines

Diff 365189

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp

Show First 20 Lines • Show All 1,097 Lines • ▼ Show 20 Lines	if (match(Op0, m_Shl(m_Value(X), m_APInt(ShOp1))) && ShOp1->ult(BitWidth)) {
return BinaryOperator::CreateAnd(NewShl, ConstantInt::get(Ty, Mask));		return BinaryOperator::CreateAnd(NewShl, ConstantInt::get(Ty, Mask));
}		}
assert(*ShOp1 == ShAmt);		assert(*ShOp1 == ShAmt);
// (X << C) >>u C --> X & (-1 >>u C)		// (X << C) >>u C --> X & (-1 >>u C)
APInt Mask(APInt::getLowBitsSet(BitWidth, BitWidth - ShAmt));		APInt Mask(APInt::getLowBitsSet(BitWidth, BitWidth - ShAmt));
return BinaryOperator::CreateAnd(X, ConstantInt::get(Ty, Mask));		return BinaryOperator::CreateAnd(X, ConstantInt::get(Ty, Mask));
}		}

if (match(Op0, m_OneUse(m_ZExt(m_Value(X)))) &&		if (match(Op0, m_ZExt(m_Value(X))) &&
(!Ty->isIntegerTy() \|\| shouldChangeType(Ty, X->getType()))) {		(!Ty->isIntegerTy() \|\| shouldChangeType(Ty, X->getType()))) {
assert(ShAmt < X->getType()->getScalarSizeInBits() &&		assert(ShAmt < X->getType()->getScalarSizeInBits() &&
"Big shift not simplified to zero?");		"Big shift not simplified to zero?");
// lshr (zext iM X to iN), C --> zext (lshr X, C) to iN		// lshr (zext iM X to iN), C --> zext (lshr X, C) to iN
Value *NewLShr = Builder.CreateLShr(X, ShAmt);		Value *NewLShr = Builder.CreateLShr(X, ShAmt);
return new ZExtInst(NewLShr, Ty);		return new ZExtInst(NewLShr, Ty);
}		}

▲ Show 20 Lines • Show All 267 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/2008-01-21-MulTrunc.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -S \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128"

	define i16 @test1(i16 %a) {			define i16 @test1(i16 %a) {
	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	; CHECK-NEXT: [[C:%.]] = lshr i16 [[A:%.]], 8			; CHECK-NEXT: [[TMP1:%.]] = lshr i16 [[A:%.]], 8
	; CHECK-NEXT: [[D:%.*]] = mul i16 [[A]], 5			; CHECK-NEXT: [[D:%.*]] = mul i16 [[A]], 5
	; CHECK-NEXT: [[E:%.*]] = or i16 [[C]], [[D]]			; CHECK-NEXT: [[E:%.*]] = or i16 [[D]], [[TMP1]]
	; CHECK-NEXT: ret i16 [[E]]			; CHECK-NEXT: ret i16 [[E]]
	;			;
	%b = zext i16 %a to i32 ; <i32> [#uses=2]			%b = zext i16 %a to i32 ; <i32> [#uses=2]
	%c = lshr i32 %b, 8 ; <i32> [#uses=1]			%c = lshr i32 %b, 8 ; <i32> [#uses=1]
	%d = mul i32 %b, 5 ; <i32> [#uses=1]			%d = mul i32 %b, 5 ; <i32> [#uses=1]
	%e = or i32 %c, %d ; <i32> [#uses=1]			%e = or i32 %c, %d ; <i32> [#uses=1]
	%f = trunc i32 %e to i16 ; <i16> [#uses=1]			%f = trunc i32 %e to i16 ; <i16> [#uses=1]
	ret i16 %f			ret i16 %f
	}			}

	define <2 x i16> @test1_vec(<2 x i16> %a) {			define <2 x i16> @test1_vec(<2 x i16> %a) {
	; CHECK-LABEL: @test1_vec(			; CHECK-LABEL: @test1_vec(
	; CHECK-NEXT: [[C:%.]] = lshr <2 x i16> [[A:%.]], <i16 8, i16 8>			; CHECK-NEXT: [[TMP1:%.]] = lshr <2 x i16> [[A:%.]], <i16 8, i16 8>
	; CHECK-NEXT: [[D:%.*]] = mul <2 x i16> [[A]], <i16 5, i16 5>			; CHECK-NEXT: [[D:%.*]] = mul <2 x i16> [[A]], <i16 5, i16 5>
	; CHECK-NEXT: [[E:%.*]] = or <2 x i16> [[C]], [[D]]			; CHECK-NEXT: [[E:%.*]] = or <2 x i16> [[D]], [[TMP1]]
	; CHECK-NEXT: ret <2 x i16> [[E]]			; CHECK-NEXT: ret <2 x i16> [[E]]
	;			;
	%b = zext <2 x i16> %a to <2 x i32>			%b = zext <2 x i16> %a to <2 x i32>
	%c = lshr <2 x i32> %b, <i32 8, i32 8>			%c = lshr <2 x i32> %b, <i32 8, i32 8>
	%d = mul <2 x i32> %b, <i32 5, i32 5>			%d = mul <2 x i32> %b, <i32 5, i32 5>
	%e = or <2 x i32> %c, %d			%e = or <2 x i32> %c, %d
	%f = trunc <2 x i32> %e to <2 x i16>			%f = trunc <2 x i32> %e to <2 x i16>
	ret <2 x i16> %f			ret <2 x i16> %f
	Show All 33 Lines

llvm/test/Transforms/InstCombine/apint-cast.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -S \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128"

	; Tests to make sure elimination of casts is working correctly			; Tests to make sure elimination of casts is working correctly

	define i17 @test1(i17 %a) {			define i17 @test1(i17 %a) {
	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	; CHECK-NEXT: [[C:%.]] = lshr i17 [[A:%.]], 8			; CHECK-NEXT: [[TMP1:%.]] = lshr i17 [[A:%.]], 8
	; CHECK-NEXT: [[D:%.*]] = shl i17 [[A]], 8			; CHECK-NEXT: [[D:%.*]] = shl i17 [[A]], 8
	; CHECK-NEXT: [[E:%.*]] = or i17 [[C]], [[D]]			; CHECK-NEXT: [[E:%.*]] = or i17 [[D]], [[TMP1]]
	; CHECK-NEXT: ret i17 [[E]]			; CHECK-NEXT: ret i17 [[E]]
	;			;
	%b = zext i17 %a to i37 ; <i37> [#uses=2]			%b = zext i17 %a to i37 ; <i37> [#uses=2]
	%c = lshr i37 %b, 8 ; <i37> [#uses=1]			%c = lshr i37 %b, 8 ; <i37> [#uses=1]
	%d = shl i37 %b, 8 ; <i37> [#uses=1]			%d = shl i37 %b, 8 ; <i37> [#uses=1]
	%e = or i37 %c, %d ; <i37> [#uses=1]			%e = or i37 %c, %d ; <i37> [#uses=1]
	%f = trunc i37 %e to i17 ; <i17> [#uses=1]			%f = trunc i37 %e to i17 ; <i17> [#uses=1]
	ret i17 %f			ret i17 %f
	}			}

	define i167 @test2(i167 %a) {			define i167 @test2(i167 %a) {
	; CHECK-LABEL: @test2(			; CHECK-LABEL: @test2(
	; CHECK-NEXT: [[C:%.]] = lshr i167 [[A:%.]], 9			; CHECK-NEXT: [[TMP1:%.]] = lshr i167 [[A:%.]], 9
	; CHECK-NEXT: [[D:%.*]] = shl i167 [[A]], 8			; CHECK-NEXT: [[D:%.*]] = shl i167 [[A]], 8
	; CHECK-NEXT: [[E:%.*]] = or i167 [[C]], [[D]]			; CHECK-NEXT: [[E:%.*]] = or i167 [[D]], [[TMP1]]
	; CHECK-NEXT: ret i167 [[E]]			; CHECK-NEXT: ret i167 [[E]]
	;			;
	%b = zext i167 %a to i577 ; <i577> [#uses=2]			%b = zext i167 %a to i577 ; <i577> [#uses=2]
	%c = lshr i577 %b, 9 ; <i577> [#uses=1]			%c = lshr i577 %b, 9 ; <i577> [#uses=1]
	%d = shl i577 %b, 8 ; <i577> [#uses=1]			%d = shl i577 %b, 8 ; <i577> [#uses=1]
	%e = or i577 %c, %d ; <i577> [#uses=1]			%e = or i577 %c, %d ; <i577> [#uses=1]
	%f = trunc i577 %e to i167 ; <i167> [#uses=1]			%f = trunc i577 %e to i167 ; <i167> [#uses=1]
	ret i167 %f			ret i167 %f
	}			}

llvm/test/Transforms/InstCombine/cast.ll

Show First 20 Lines • Show All 466 Lines • ▼ Show 20 Lines	;
%t5 = shl i32 %t, 8		%t5 = shl i32 %t, 8
%t32 = or i32 %t21, %t5		%t32 = or i32 %t21, %t5
%r = trunc i32 %t32 to i16		%r = trunc i32 %t32 to i16
ret i16 %r		ret i16 %r
}		}

define i16 @test40(i16 %a) {		define i16 @test40(i16 %a) {
; ALL-LABEL: @test40(		; ALL-LABEL: @test40(
; ALL-NEXT: [[T21:%.]] = lshr i16 [[A:%.]], 9		; ALL-NEXT: [[TMP1:%.]] = lshr i16 [[A:%.]], 9
; ALL-NEXT: [[T5:%.*]] = shl i16 [[A]], 8		; ALL-NEXT: [[T5:%.*]] = shl i16 [[A]], 8
; ALL-NEXT: [[T32:%.*]] = or i16 [[T21]], [[T5]]		; ALL-NEXT: [[T32:%.*]] = or i16 [[T5]], [[TMP1]]
; ALL-NEXT: ret i16 [[T32]]		; ALL-NEXT: ret i16 [[T32]]
;		;
%t = zext i16 %a to i32		%t = zext i16 %a to i32
%t21 = lshr i32 %t, 9		%t21 = lshr i32 %t, 9
%t5 = shl i32 %t, 8		%t5 = shl i32 %t, 8
%t32 = or i32 %t21, %t5		%t32 = or i32 %t21, %t5
%r = trunc i32 %t32 to i16		%r = trunc i32 %t32 to i16
ret i16 %r		ret i16 %r
}		}

define <2 x i16> @test40vec(<2 x i16> %a) {		define <2 x i16> @test40vec(<2 x i16> %a) {
; ALL-LABEL: @test40vec(		; ALL-LABEL: @test40vec(
; ALL-NEXT: [[T21:%.]] = lshr <2 x i16> [[A:%.]], <i16 9, i16 9>		; ALL-NEXT: [[TMP1:%.]] = lshr <2 x i16> [[A:%.]], <i16 9, i16 9>
; ALL-NEXT: [[T5:%.*]] = shl <2 x i16> [[A]], <i16 8, i16 8>		; ALL-NEXT: [[T5:%.*]] = shl <2 x i16> [[A]], <i16 8, i16 8>
; ALL-NEXT: [[T32:%.*]] = or <2 x i16> [[T21]], [[T5]]		; ALL-NEXT: [[T32:%.*]] = or <2 x i16> [[T5]], [[TMP1]]
; ALL-NEXT: ret <2 x i16> [[T32]]		; ALL-NEXT: ret <2 x i16> [[T32]]
;		;
%t = zext <2 x i16> %a to <2 x i32>		%t = zext <2 x i16> %a to <2 x i32>
%t21 = lshr <2 x i32> %t, <i32 9, i32 9>		%t21 = lshr <2 x i32> %t, <i32 9, i32 9>
%t5 = shl <2 x i32> %t, <i32 8, i32 8>		%t5 = shl <2 x i32> %t, <i32 8, i32 8>
%t32 = or <2 x i32> %t21, %t5		%t32 = or <2 x i32> %t21, %t5
%r = trunc <2 x i32> %t32 to <2 x i16>		%r = trunc <2 x i32> %t32 to <2 x i16>
ret <2 x i16> %r		ret <2 x i16> %r
▲ Show 20 Lines • Show All 1,578 Lines • ▼ Show 20 Lines	;
%D = trunc <3 x i32> %C to <3 x i8>		%D = trunc <3 x i32> %C to <3 x i8>
ret <3 x i8> %D		ret <3 x i8> %D
}		}

define <2 x i8> @trunc_lshr_zext_uses1(<2 x i8> %A) {		define <2 x i8> @trunc_lshr_zext_uses1(<2 x i8> %A) {
; ALL-LABEL: @trunc_lshr_zext_uses1(		; ALL-LABEL: @trunc_lshr_zext_uses1(
; ALL-NEXT: [[B:%.]] = zext <2 x i8> [[A:%.]] to <2 x i32>		; ALL-NEXT: [[B:%.]] = zext <2 x i8> [[A:%.]] to <2 x i32>
; ALL-NEXT: call void @use_v2i32(<2 x i32> [[B]])		; ALL-NEXT: call void @use_v2i32(<2 x i32> [[B]])
; ALL-NEXT: [[C:%.*]] = lshr <2 x i8> [[A]], <i8 6, i8 6>		; ALL-NEXT: [[TMP1:%.*]] = lshr <2 x i8> [[A]], <i8 6, i8 6>
; ALL-NEXT: ret <2 x i8> [[C]]		; ALL-NEXT: ret <2 x i8> [[TMP1]]
;		;
%B = zext <2 x i8> %A to <2 x i32>		%B = zext <2 x i8> %A to <2 x i32>
call void @use_v2i32(<2 x i32> %B)		call void @use_v2i32(<2 x i32> %B)
%C = lshr <2 x i32> %B, <i32 6, i32 6>		%C = lshr <2 x i32> %B, <i32 6, i32 6>
%D = trunc <2 x i32> %C to <2 x i8>		%D = trunc <2 x i32> %C to <2 x i8>
ret <2 x i8> %D		ret <2 x i8> %D
}		}

▲ Show 20 Lines • Show All 82 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/pr50555.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -S \| FileCheck %s

	define void @zext_used_twice(i32* %a, i8 %b) {			define void @zext_used_twice(i32* %a, i8 %b) {
	; CHECK-LABEL: @zext_used_twice(			; CHECK-LABEL: @zext_used_twice(
	; CHECK-NEXT: [[ZEXT:%.]] = zext i8 [[B:%.]] to i32			; CHECK-NEXT: [[ZEXT:%.]] = zext i8 [[B:%.]] to i32
	; CHECK-NEXT: [[SHR:%.*]] = lshr i32 [[ZEXT]], 1			; CHECK-NEXT: [[TMP1:%.*]] = lshr i8 [[B]], 1
	; CHECK-NEXT: [[ADD:%.*]] = add nuw nsw i32 [[SHR]], [[ZEXT]]			; CHECK-NEXT: [[SHR:%.*]] = zext i8 [[TMP1]] to i32
				; CHECK-NEXT: [[ADD:%.*]] = add nuw nsw i32 [[ZEXT]], [[SHR]]
				anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions In D107766#2934536, @lebedev.ri wrote: We can not do this transform as proposed here, it increases the instruction count. Could you point me at the test for this change? It should only contain lshr-of-zext, there should not be any trunc; please add a test where zext has an extra use. @lebedev.ri: Yes, you're right, it increases instruction count, adding new `zext` when the old one has an extra use. But it also simplifies `lshr` to lower bits type making it simpler. Isn't it a good compromise? And also, after `zext` sinking, it can trigger a chain of changes combining `zext` with other instrs like in cases below: `add`, `trunc` and so on. anton-afanasyev: >>! In D107766#2934536, @lebedev.ri wrote: > We can not do this transform as proposed here, >…
				lebedev.riUnsubmitted Done Reply Inline Actions @lebedev.ri: Yes, you're right, it increases instruction count, adding new zext when the old one has an extra use. To be noted, this is the profitability heuristics of instcombine: don't increase instruction count. But it also simplifies lshr to lower bits type making it simpler. Isn't it a good compromise? And also, after zext sinking, it can trigger a chain of changes combining zext with other instrs like in cases below: add, trunc and so on. Sure. But it still increases instruction count. lebedev.ri: > @lebedev.ri: Yes, you're right, it increases instruction count, adding new zext when the old…
				anton-afanasyevAuthorUnsubmitted Not Done Reply Inline Actions To be noted, this is the profitability heuristics of instcombine: don't increase instruction count. Ok, I see, thanks for noting this. anton-afanasyev: > To be noted, this is the profitability heuristics of instcombine: don't increase instruction…
	; CHECK-NEXT: store i32 [[ADD]], i32* [[A:%.*]], align 4			; CHECK-NEXT: store i32 [[ADD]], i32* [[A:%.*]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%zext = zext i8 %b to i32			%zext = zext i8 %b to i32
	%shr = lshr i32 %zext, 1			%shr = lshr i32 %zext, 1
	%add = add nsw i32 %zext, %shr			%add = add nsw i32 %zext, %shr
	store i32 %add, i32* %a, align 4			store i32 %add, i32* %a, align 4
	ret void			ret void
	}			}



	define void @trunc_one_add(i16* %a, i8 %b) {			define void @trunc_one_add(i16* %a, i8 %b) {
	; CHECK-LABEL: @trunc_one_add(			; CHECK-LABEL: @trunc_one_add(
	; CHECK-NEXT: [[ZEXT:%.]] = zext i8 [[B:%.]] to i32			; CHECK-NEXT: [[ZEXT:%.]] = zext i8 [[B:%.]] to i16
	; CHECK-NEXT: [[SHR:%.*]] = lshr i32 [[ZEXT]], 1			; CHECK-NEXT: [[TMP1:%.*]] = lshr i8 [[B]], 1
	; CHECK-NEXT: [[ADD:%.*]] = add nuw nsw i32 [[SHR]], [[ZEXT]]			; CHECK-NEXT: [[SHR:%.*]] = zext i8 [[TMP1]] to i16
	; CHECK-NEXT: [[TRUNC:%.*]] = trunc i32 [[ADD]] to i16			; CHECK-NEXT: [[ADD:%.*]] = add nuw nsw i16 [[ZEXT]], [[SHR]]
	; CHECK-NEXT: store i16 [[TRUNC]], i16* [[A:%.*]], align 2			; CHECK-NEXT: store i16 [[ADD]], i16* [[A:%.*]], align 2
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%zext = zext i8 %b to i32			%zext = zext i8 %b to i32
	%shr = lshr i32 %zext, 1			%shr = lshr i32 %zext, 1
	%add = add nsw i32 %zext, %shr			%add = add nsw i32 %zext, %shr
	%trunc = trunc i32 %add to i16			%trunc = trunc i32 %add to i16
	store i16 %trunc, i16* %a, align 2			store i16 %trunc, i16* %a, align 2
	ret void			ret void
	Show All 24 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr50555.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -mtriple=x86_64-- -instcombine -slp-vectorizer -dce -S \| FileCheck %s --check-prefixes=SSE			; RUN: opt < %s -mtriple=x86_64-- -instcombine -slp-vectorizer -dce -S \| FileCheck %s --check-prefixes=SSE
	; RUN: opt < %s -mtriple=x86_64-- -mcpu=corei7-avx -instcombine -slp-vectorizer -dce -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-- -mcpu=corei7-avx -instcombine -slp-vectorizer -dce -S \| FileCheck %s --check-prefixes=AVX
				RKSimonUnsubmitted Not Done Reply Inline Actions Should this be moved to be a phase ordering test do you think? RKSimon: Should this be moved to be a phase ordering test do you think?
				anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Do you think it's more test that slp-vectorizer follows aggressive-instcombine? Ok, moved. anton-afanasyev: Do you think it's more test that slp-vectorizer follows aggressive-instcombine? Ok, moved.
				RKSimonUnsubmitted Done Reply Inline Actions This should be in the X86 sub-directory - look at other tests in there for examples as we don't specify explicit passes: e.g. ; RUN: opt -O2 -S < %s \| FileCheck %s--check-prefixes=SSE ; RUN: opt -O2 -S -mattr=avx < %s \| FileCheck %s--check-prefixes=AVX ; RUN: opt -passes='default<O2>' -S < %s \| FileCheck %s--check-prefixes=SSE ; RUN: opt -passes='default<O2>' -S -mattr=avx < %s \| FileCheck %s--check-prefixes=AVX RKSimon: This should be in the X86 sub-directory - look at other tests in there for examples as we don't…
				spatelUnsubmitted Done Reply Inline Actions Right - the goal of PhaseOrdering tests is to make sure that >1 passes are interacting as expected and that we get the expected results from the typical (-On) pass pipelines in 'opt'. spatel: Right - the goal of PhaseOrdering tests is to make sure that >1 passes are interacting as…
				anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Sure, thanks! Moved to subdirectory, changed to -O3 option. anton-afanasyev: Sure, thanks! Moved to subdirectory, changed to -O3 option.

	define void @trunc_through_one_add(i16* noalias %0, i8* noalias readonly %1) {			define void @trunc_through_one_add(i16* noalias %0, i8* noalias readonly %1) {
	; SSE-LABEL: @trunc_through_one_add(			; SSE-LABEL: @trunc_through_one_add(
	; SSE-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP1:%.]] to <4 x i8>			; SSE-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP1:%.]] to <8 x i8>
	; SSE-NEXT: [[TMP4:%.]] = load <4 x i8>, <4 x i8> [[TMP3]], align 1			; SSE-NEXT: [[TMP4:%.]] = load <8 x i8>, <8 x i8> [[TMP3]], align 1
	; SSE-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>			; SSE-NEXT: [[TMP5:%.*]] = zext <8 x i8> [[TMP4]] to <8 x i16>
	; SSE-NEXT: [[TMP6:%.*]] = lshr <4 x i32> [[TMP5]], <i32 1, i32 1, i32 1, i32 1>			; SSE-NEXT: [[TMP6:%.*]] = lshr <8 x i8> [[TMP4]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	; SSE-NEXT: [[TMP7:%.*]] = add nuw nsw <4 x i32> [[TMP6]], [[TMP5]]			; SSE-NEXT: [[TMP7:%.*]] = zext <8 x i8> [[TMP6]] to <8 x i16>
	; SSE-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP7]], <i32 2, i32 2, i32 2, i32 2>			; SSE-NEXT: [[TMP8:%.*]] = add nuw nsw <8 x i16> [[TMP7]], [[TMP5]]
	; SSE-NEXT: [[TMP9:%.*]] = trunc <4 x i32> [[TMP8]] to <4 x i16>			; SSE-NEXT: [[TMP9:%.*]] = lshr <8 x i16> [[TMP8]], <i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2>
	; SSE-NEXT: [[TMP10:%.]] = bitcast i16 [[TMP0:%.]] to <4 x i16>			; SSE-NEXT: [[TMP10:%.]] = bitcast i16 [[TMP0:%.]] to <8 x i16>
	; SSE-NEXT: store <4 x i16> [[TMP9]], <4 x i16>* [[TMP10]], align 2			; SSE-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* [[TMP10]], align 2
	; SSE-NEXT: [[TMP11:%.]] = getelementptr inbounds i8, i8 [[TMP1]], i64 4			; SSE-NEXT: [[TMP11:%.]] = getelementptr inbounds i8, i8 [[TMP1]], i64 8
	; SSE-NEXT: [[TMP12:%.]] = getelementptr inbounds i16, i16 [[TMP0]], i64 4			; SSE-NEXT: [[TMP12:%.]] = getelementptr inbounds i16, i16 [[TMP0]], i64 8
	; SSE-NEXT: [[TMP13:%.]] = bitcast i8 [[TMP11]] to <4 x i8>*			; SSE-NEXT: [[TMP13:%.]] = bitcast i8 [[TMP11]] to <8 x i8>*
	; SSE-NEXT: [[TMP14:%.]] = load <4 x i8>, <4 x i8> [[TMP13]], align 1			; SSE-NEXT: [[TMP14:%.]] = load <8 x i8>, <8 x i8> [[TMP13]], align 1
	; SSE-NEXT: [[TMP15:%.*]] = zext <4 x i8> [[TMP14]] to <4 x i32>			; SSE-NEXT: [[TMP15:%.*]] = zext <8 x i8> [[TMP14]] to <8 x i16>
	; SSE-NEXT: [[TMP16:%.*]] = lshr <4 x i32> [[TMP15]], <i32 1, i32 1, i32 1, i32 1>			; SSE-NEXT: [[TMP16:%.*]] = lshr <8 x i8> [[TMP14]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	; SSE-NEXT: [[TMP17:%.*]] = add nuw nsw <4 x i32> [[TMP16]], [[TMP15]]			; SSE-NEXT: [[TMP17:%.*]] = zext <8 x i8> [[TMP16]] to <8 x i16>
	; SSE-NEXT: [[TMP18:%.*]] = lshr <4 x i32> [[TMP17]], <i32 2, i32 2, i32 2, i32 2>			; SSE-NEXT: [[TMP18:%.*]] = add nuw nsw <8 x i16> [[TMP17]], [[TMP15]]
	; SSE-NEXT: [[TMP19:%.*]] = trunc <4 x i32> [[TMP18]] to <4 x i16>			; SSE-NEXT: [[TMP19:%.*]] = lshr <8 x i16> [[TMP18]], <i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2>
	; SSE-NEXT: [[TMP20:%.]] = bitcast i16 [[TMP12]] to <4 x i16>*			; SSE-NEXT: [[TMP20:%.]] = bitcast i16 [[TMP12]] to <8 x i16>*
	; SSE-NEXT: store <4 x i16> [[TMP19]], <4 x i16>* [[TMP20]], align 2			; SSE-NEXT: store <8 x i16> [[TMP19]], <8 x i16>* [[TMP20]], align 2
	; SSE-NEXT: [[TMP21:%.]] = getelementptr inbounds i8, i8 [[TMP1]], i64 8
	; SSE-NEXT: [[TMP22:%.]] = getelementptr inbounds i16, i16 [[TMP0]], i64 8
	; SSE-NEXT: [[TMP23:%.]] = bitcast i8 [[TMP21]] to <4 x i8>*
	; SSE-NEXT: [[TMP24:%.]] = load <4 x i8>, <4 x i8> [[TMP23]], align 1
	; SSE-NEXT: [[TMP25:%.*]] = zext <4 x i8> [[TMP24]] to <4 x i32>
	; SSE-NEXT: [[TMP26:%.*]] = lshr <4 x i32> [[TMP25]], <i32 1, i32 1, i32 1, i32 1>
	; SSE-NEXT: [[TMP27:%.*]] = add nuw nsw <4 x i32> [[TMP26]], [[TMP25]]
	; SSE-NEXT: [[TMP28:%.*]] = lshr <4 x i32> [[TMP27]], <i32 2, i32 2, i32 2, i32 2>
	; SSE-NEXT: [[TMP29:%.*]] = trunc <4 x i32> [[TMP28]] to <4 x i16>
	; SSE-NEXT: [[TMP30:%.]] = bitcast i16 [[TMP22]] to <4 x i16>*
	; SSE-NEXT: store <4 x i16> [[TMP29]], <4 x i16>* [[TMP30]], align 2
	; SSE-NEXT: [[TMP31:%.]] = getelementptr inbounds i8, i8 [[TMP1]], i64 12
	; SSE-NEXT: [[TMP32:%.]] = getelementptr inbounds i16, i16 [[TMP0]], i64 12
	; SSE-NEXT: [[TMP33:%.]] = bitcast i8 [[TMP31]] to <4 x i8>*
	; SSE-NEXT: [[TMP34:%.]] = load <4 x i8>, <4 x i8> [[TMP33]], align 1
	; SSE-NEXT: [[TMP35:%.*]] = zext <4 x i8> [[TMP34]] to <4 x i32>
	; SSE-NEXT: [[TMP36:%.*]] = lshr <4 x i32> [[TMP35]], <i32 1, i32 1, i32 1, i32 1>
	; SSE-NEXT: [[TMP37:%.*]] = add nuw nsw <4 x i32> [[TMP36]], [[TMP35]]
	; SSE-NEXT: [[TMP38:%.*]] = lshr <4 x i32> [[TMP37]], <i32 2, i32 2, i32 2, i32 2>
	; SSE-NEXT: [[TMP39:%.*]] = trunc <4 x i32> [[TMP38]] to <4 x i16>
	; SSE-NEXT: [[TMP40:%.]] = bitcast i16 [[TMP32]] to <4 x i16>*
	; SSE-NEXT: store <4 x i16> [[TMP39]], <4 x i16>* [[TMP40]], align 2
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @trunc_through_one_add(			; AVX-LABEL: @trunc_through_one_add(
	; AVX-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP1:%.]] to <8 x i8>			; AVX-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP1:%.]] to <16 x i8>
	; AVX-NEXT: [[TMP4:%.]] = load <8 x i8>, <8 x i8> [[TMP3]], align 1			; AVX-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[TMP3]], align 1
	; AVX-NEXT: [[TMP5:%.*]] = zext <8 x i8> [[TMP4]] to <8 x i32>			; AVX-NEXT: [[TMP5:%.*]] = zext <16 x i8> [[TMP4]] to <16 x i16>
	; AVX-NEXT: [[TMP6:%.*]] = lshr <8 x i32> [[TMP5]], <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>			; AVX-NEXT: [[TMP6:%.*]] = lshr <16 x i8> [[TMP4]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	; AVX-NEXT: [[TMP7:%.*]] = add nuw nsw <8 x i32> [[TMP6]], [[TMP5]]			; AVX-NEXT: [[TMP7:%.*]] = zext <16 x i8> [[TMP6]] to <16 x i16>
	; AVX-NEXT: [[TMP8:%.*]] = lshr <8 x i32> [[TMP7]], <i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2>			; AVX-NEXT: [[TMP8:%.*]] = add nuw nsw <16 x i16> [[TMP7]], [[TMP5]]
	; AVX-NEXT: [[TMP9:%.*]] = trunc <8 x i32> [[TMP8]] to <8 x i16>			; AVX-NEXT: [[TMP9:%.*]] = lshr <16 x i16> [[TMP8]], <i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2>
	; AVX-NEXT: [[TMP10:%.]] = bitcast i16 [[TMP0:%.]] to <8 x i16>			; AVX-NEXT: [[TMP10:%.]] = bitcast i16 [[TMP0:%.]] to <16 x i16>
	; AVX-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* [[TMP10]], align 2			; AVX-NEXT: store <16 x i16> [[TMP9]], <16 x i16>* [[TMP10]], align 2
	; AVX-NEXT: [[TMP11:%.]] = getelementptr inbounds i8, i8 [[TMP1]], i64 8
	; AVX-NEXT: [[TMP12:%.]] = getelementptr inbounds i16, i16 [[TMP0]], i64 8
	; AVX-NEXT: [[TMP13:%.]] = bitcast i8 [[TMP11]] to <8 x i8>*
	; AVX-NEXT: [[TMP14:%.]] = load <8 x i8>, <8 x i8> [[TMP13]], align 1
	; AVX-NEXT: [[TMP15:%.*]] = zext <8 x i8> [[TMP14]] to <8 x i32>
	; AVX-NEXT: [[TMP16:%.*]] = lshr <8 x i32> [[TMP15]], <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
	; AVX-NEXT: [[TMP17:%.*]] = add nuw nsw <8 x i32> [[TMP16]], [[TMP15]]
	; AVX-NEXT: [[TMP18:%.*]] = lshr <8 x i32> [[TMP17]], <i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2>
	; AVX-NEXT: [[TMP19:%.*]] = trunc <8 x i32> [[TMP18]] to <8 x i16>
	; AVX-NEXT: [[TMP20:%.]] = bitcast i16 [[TMP12]] to <8 x i16>*
	; AVX-NEXT: store <8 x i16> [[TMP19]], <8 x i16>* [[TMP20]], align 2
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	%3 = load i8, i8* %1, align 1			%3 = load i8, i8* %1, align 1
	%4 = zext i8 %3 to i32			%4 = zext i8 %3 to i32
	%5 = lshr i32 %4, 1			%5 = lshr i32 %4, 1
	%6 = add nuw nsw i32 %5, %4			%6 = add nuw nsw i32 %5, %4
	%7 = lshr i32 %6, 2			%7 = lshr i32 %6, 2
	%8 = trunc i32 %7 to i16			%8 = trunc i32 %7 to i16
	▲ Show 20 Lines • Show All 443 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AggressiveInstCombine] Add shift instructions to `TruncInstCombine` DAGAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 365189

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp

llvm/test/Transforms/InstCombine/2008-01-21-MulTrunc.ll

llvm/test/Transforms/InstCombine/apint-cast.ll

llvm/test/Transforms/InstCombine/cast.ll

llvm/test/Transforms/InstCombine/pr50555.ll

llvm/test/Transforms/SLPVectorizer/X86/pr50555.ll

[AggressiveInstCombine] Add shift instructions to `TruncInstCombine` DAG
AbandonedPublic