This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/AggressiveInstCombine/
-
Transforms/
-
AggressiveInstCombine/
-
TruncInstCombine.cpp
-
test/Transforms/
-
Transforms/
-
AggressiveInstCombine/
-
pr50555.ll
-
trunc_shifts.ll
-
PhaseOrdering/X86/
-
X86/
-
pr50555.ll

Differential D107766

[AggressiveInstCombine] Add shift instructions to `TruncInstCombine` DAG
AbandonedPublic

Authored by anton-afanasyev on Aug 9 2021, 7:31 AM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
ABataev
dtemirbulatov
lebedev.ri
nikic

Summary

Add `shl`, `lshr` and `ashr` instructions to the DAG post-dominated by `trunc`,
allowing TruncInstCombine to reduce bitwidth of expressions containing shifts.

Fixes PR50555.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

anton-afanasyev created this revision.Aug 9 2021, 7:31 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptAug 9 2021, 7:31 AM

anton-afanasyev requested review of this revision.Aug 9 2021, 7:31 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 9 2021, 7:31 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

lebedev.ri added a reviewer: lebedev.ri.Aug 9 2021, 7:34 AM

We can not do this transform as proposed here,
it increases the instruction count.

Could you point me at the test for this change?
It should only contain lshr-of-zext, there should not be any trunc;
please add a test where zext has an extra use.

This revision now requires changes to proceed.Aug 9 2021, 7:40 AM

Add test with zext having extra use

Harbormaster completed remote builds in B118680: Diff 365189.Aug 9 2021, 8:14 AM

anton-afanasyev added inline comments.Aug 9 2021, 8:33 AM

llvm/test/Transforms/InstCombine/pr50555.ll
9 ↗	(On Diff #365189)	In D107766#2934536, @lebedev.ri wrote: We can not do this transform as proposed here, it increases the instruction count. Could you point me at the test for this change? It should only contain lshr-of-zext, there should not be any trunc; please add a test where zext has an extra use. @lebedev.ri: Yes, you're right, it increases instruction count, adding new `zext` when the old one has an extra use. But it also simplifies `lshr` to lower bits type making it simpler. Isn't it a good compromise? And also, after `zext` sinking, it can trigger a chain of changes combining `zext` with other instrs like in cases below: `add`, `trunc` and so on.

I think this can be adjusted in SLP vectorizer. We have MinBWs container in there, to try to operate on non-wide instructions. Probably need to tweak it to handle this case. Did you try to modify collectValuesToDemote function?

FWIW i agree that it obviously improves the SLP snippet in question,
i'm just not sure this is the right way to do it. Sorry.

llvm/test/Transforms/InstCombine/pr50555.ll
9 ↗	(On Diff #365189)	@lebedev.ri: Yes, you're right, it increases instruction count, adding new zext when the old one has an extra use. To be noted, this is the profitability heuristics of instcombine: don't increase instruction count. But it also simplifies lshr to lower bits type making it simpler. Isn't it a good compromise? And also, after zext sinking, it can trigger a chain of changes combining zext with other instrs like in cases below: add, trunc and so on. Sure. But it still increases instruction count.

If we want to solve this as an instcombine (or maybe aggressive-instcombine) problem, we have to expand the pattern to make it clearly profitable. I'm not sure how to generalize it, but we can do the narrowing starting from the trunc and remove an instruction:
https://alive2.llvm.org/ce/z/lwtDwZ

define i16 @src(i8 %x) {
  %z = zext i8 %x to i32
  %s = lshr i32 %z, 1
  %a = add nuw nsw i32 %s, %z
  %s2 = lshr i32 %a, 2
  %t = trunc i32 %s2 to i16
  ret i16 %t
}

define i16 @tgt(i8 %x) {
  %z = zext i8 %x to i16
  %s = lshr i16 %z, 1
  %a = add nuw nsw i16 %s, %z
  %s2 = lshr i16 %a, 2
  ret i16 %s2
}

Thanks to all, I've moved this fix to aggresive-instcombine, where it is even planned in TODO: section.

llvm/test/Transforms/InstCombine/pr50555.ll
9 ↗	(On Diff #365189)	To be noted, this is the profitability heuristics of instcombine: don't increase instruction count. Ok, I see, thanks for noting this.

Move fix to AggressiveInstCombine

Harbormaster completed remote builds in B118701: Diff 365222.Aug 9 2021, 10:52 AM

Remove InstCombine tests

Harbormaster completed remote builds in B118704: Diff 365227.Aug 9 2021, 11:00 AM

Nice, this seems to fit naturally there.
That being said, you probably still want some standalone tests for the pattern in question, both a positive ones, and a negative ones - what's the requirement on the shift amount?

In D107766#2935073, @lebedev.ri wrote:

Nice, this seems to fit naturally there.
That being said, you probably still want some standalone tests for the pattern in question, both a positive ones, and a negative ones - what's the requirement on the shift amount?

Agree - aggressive-instcombine doesn't get nearly as much testing as regular instcombine, so we need more tests to be confident it doesn't over-reach.
Leaving the shift amount off of the getRelevantOperands() list doesn't work on this example (crash):

define i16 @sh_amt(i8 %x, i8 %sh1) {
  %z = zext i8 %x to i32
  %zs = zext i8 %sh1 to i32
  %s = lshr i32 %z, %zs
  %a = add nuw nsw i32 %s, %z
  %s2 = lshr i32 %a, 2
  %t = trunc i32 %s2 to i16
  ret i16 %t
}

Some observations for logical right-shift.
this will have a hard time with variable shift amounts.
You need to avoid creating out-of-bounds shifts, there are two obvious options:

https://alive2.llvm.org/ce/z/XShcju <- shift amount needs to be less than target width, and truncation should only drop zeros.
https://alive2.llvm.org/ce/z/QiDPV7 <- could saturate the shift amount if you know that %x has more leading zeros than the number of bits to be truncated

We might already have this logic somewhere, not sure.

In D107766#2936935, @lebedev.ri wrote:

Some observations for logical right-shift.
this will have a hard time with variable shift amounts.
You need to avoid creating out-of-bounds shifts, there are two obvious options:

https://alive2.llvm.org/ce/z/XShcju <- shift amount needs to be less than target width, and truncation should only drop zeros.

https://alive2.llvm.org/ce/z/QiDPV7 <- could saturate the shift amount if you know that %x has more leading zeros than the number of bits to be truncated

We might already have this logic somewhere, not sure.

Yes, thanks for your observations, I'm already working on it: https://alive2.llvm.org/ce/z/XcCJ9Q
There is also special care for the vector case to make a transform not being more poisonous.
TruncInstCombine already has appropriate logic but needs to be tweaked.
For now I'm supposing that shift amout is constant (int or vector). Not sure that transform adding check for variable shift amount is good.

In D107766#2937021, @anton-afanasyev wrote:

In D107766#2936935, @lebedev.ri wrote:

Some observations for logical right-shift.
this will have a hard time with variable shift amounts.
You need to avoid creating out-of-bounds shifts, there are two obvious options:

https://alive2.llvm.org/ce/z/XShcju <- shift amount needs to be less than target width, and truncation should only drop zeros.

https://alive2.llvm.org/ce/z/QiDPV7 <- could saturate the shift amount if you know that %x has more leading zeros than the number of bits to be truncated

We might already have this logic somewhere, not sure.

Yes, thanks for your observations, I'm already working on it: https://alive2.llvm.org/ce/z/XcCJ9Q
There is also special care for the vector case to make a transform not being more poisonous.
TruncInstCombine already has appropriate logic but needs to be tweaked.
For now I'm supposing that shift amout is constant (int or vector).

Not sure that transform adding check for variable shift amount is good.

Define good. I think supporting variable shifts will take exactly two lines:
compute knownbits of the shift amount, and get the maximal shift amount via KnownBits::getMaxValue().

Define good. I think supporting variable shifts will take exactly two lines:
compute knownbits of the shift amount, and get the maximal shift amount via KnownBits::getMaxValue().

Hmm, how could we compute knownbits of the shift amount at compile time? Do you mean analyzing DAG for the shift amount Value, taking knowbits recursively?
Also I don't believe this computing makes sense: for the most cases, when shift amount is variable, its first byte is unknown. For instance, how could knownbits help to optimize @spatel's example?

define i16 @sh_amt(i8 %x, i8 %sh1) {
  %z = zext i8 %x to i32
  %zs = zext i8 %sh1 to i32
  %s = lshr i32 %z, %zs
  %a = add nuw nsw i32 %s, %z
  %s2 = lshr i32 %a, 2
  %t = trunc i32 %s2 to i16
  ret i16 %t
}

anton-afanasyev planned changes to this revision.Aug 10 2021, 11:28 AM

anton-afanasyev retitled this revision from [InstCombine] Get rid of `hasOneUses()` when swapping `lshr` and `zext` to [AggressiveInstCombine] Add `lshr` and `ashr` instructions to TruncInstCombine DAG.

anton-afanasyev edited the summary of this revision. (Show Details)

In D107766#2937675, @anton-afanasyev wrote:

Define good. I think supporting variable shifts will take exactly two lines:
compute knownbits of the shift amount, and get the maximal shift amount via KnownBits::getMaxValue().

Hmm, how could we compute knownbits of the shift amount at compile time? Do you mean analyzing DAG for the shift amount Value, taking knowbits recursively?

You've seen llvm::computeKnownBits(), right?

Also I don't believe this computing makes sense: for the most cases, when shift amount is variable, its first byte is unknown. For instance, how could knownbits help to optimize @spatel's example?
define i16 @sh_amt(i8 %x, i8 %sh1) {
  %z = zext i8 %x to i32
  %zs = zext i8 %sh1 to i32
  %s = lshr i32 %z, %zs
  %a = add nuw nsw i32 %s, %z
  %s2 = lshr i32 %a, 2
  %t = trunc i32 %s2 to i16
  ret i16 %t
}

I find this comment to be highly inflammatory.

Just because there's large number of cases it won't help doesn't mean it can't ever help with anything.
https://alive2.llvm.org/ce/z/RkkBTy <- we have no idea what %y is, but we can tell it's less than the target bitwidth.

AggressiveInstCombine runs only with -O3, right? Do we know how expensive it would be for -O2?

xbolva00 added a reviewer: nikic.Aug 12 2021, 5:40 AM

In D107766#2941350, @xbolva00 wrote:

AggressiveInstCombine runs only with -O3, right? Do we know how expensive it would be for -O2?

Yes - only at O3 currently. That's mainly because nobody has bothered to see if it was worth fighting over to include at -O2.

That question is much easier to answer since we have compile-time-tracker. But I'm not sure how to answer the cost question directly - I think we can approximate it by just removing the pass from O3 and checking the difference.

The result seems to be a consistent but very small cost (0.04% geomean here):
https://llvm-compile-time-tracker.com/?config=NewPM-O3&stat=instructions&remote=rotateright

Update, add shl

Add negative and positive tests

Harbormaster completed remote builds in B119534: Diff 366400.Aug 13 2021, 11:18 PM

Hmm, how could we compute knownbits of the shift amount at compile time? Do you mean analyzing DAG for the shift amount Value, taking knowbits recursively?

You've seen llvm::computeKnownBits(), right?

Thanks, used it.

anton-afanasyev retitled this revision from [AggressiveInstCombine] Add `lshr` and `ashr` instructions to TruncInstCombine DAG to [AggressiveInstCombine] Add shift instructions to `TruncInstCombine` DAG.Aug 13 2021, 11:28 PM

anton-afanasyev edited the summary of this revision. (Show Details)

Fix test

Harbormaster completed remote builds in B119536: Diff 366403.Aug 13 2021, 11:35 PM

I think we want to do this in three steps.
lshr is easy and obvious, but for ashr we want to count *sign* bits.
Haven't really thought about shl

In D107766#2945161, @lebedev.ri wrote:

I think we want to do this in three steps.
lshr is easy and obvious, but for ashr we want to count *sign* bits.
Haven't really thought about shl

Do you mean splitting this to three separate patches?
shl is simpler than both right shifts since it has no bits moved from truncated part to the untruncated one.
The condition used for shl here is necessary and sufficient, whereas it is only sufficient for the right shifts.

RKSimon added inline comments.Aug 15 2021, 3:06 AM

llvm/test/Transforms/SLPVectorizer/X86/pr50555.ll
3 ↗	(On Diff #366403)	Should this be moved to be a phase ordering test do you think?

anton-afanasyev added inline comments.Aug 15 2021, 3:37 AM

llvm/test/Transforms/SLPVectorizer/X86/pr50555.ll
3 ↗	(On Diff #366403)	Do you think it's more test that slp-vectorizer follows aggressive-instcombine? Ok, moved.

Move SLPVectorizer test to PhaseOrdering

Harbormaster completed remote builds in B119600: Diff 366483.Aug 15 2021, 3:38 AM

Fix test move

Harbormaster completed remote builds in B119601: Diff 366484.Aug 15 2021, 3:41 AM

In D107766#2945416, @anton-afanasyev wrote:

In D107766#2945161, @lebedev.ri wrote:

I think we want to do this in three steps.
lshr is easy and obvious, but for ashr we want to count *sign* bits.
Haven't really thought about shl

Do you mean splitting this to three separate patches?

Yes.

shl is simpler than both right shifts since it has no bits moved from truncated part to the untruncated one.
The condition used for shl here is necessary and sufficient, whereas it is only sufficient for the right shifts.

That is kind of my point.
At least the left and right shifts have different legality rules,
and different right-shifts also have slightly different rules.
Not having to deal with everything at once will strictly simplify review.

anton-afanasyev mentioned this in D108091: [AggressiveInstCombine] Add shift left instruction to `TruncInstCombine` DAG.Aug 15 2021, 10:39 AM

In D107766#2945525, @lebedev.ri wrote:

That is kind of my point.
At least the left and right shifts have different legality rules,
and different right-shifts also have slightly different rules.
Not having to deal with everything at once will strictly simplify review.

Ok, start from shl: https://reviews.llvm.org/D108091

RKSimon added inline comments.Aug 16 2021, 9:16 AM

llvm/test/Transforms/SLPVectorizer/X86/pr50555.ll

3 ↗

(On Diff #366403)

This should be in the X86 sub-directory - look at other tests in there for examples as we don't specify explicit passes:
e.g.

; RUN: opt -O2 -S < %s | FileCheck %s--check-prefixes=SSE
; RUN: opt -O2 -S -mattr=avx < %s | FileCheck %s--check-prefixes=AVX
; RUN: opt -passes='default<O2>' -S < %s | FileCheck %s--check-prefixes=SSE
; RUN: opt -passes='default<O2>' -S -mattr=avx < %s | FileCheck %s--check-prefixes=AVX

spatel added inline comments.Aug 16 2021, 9:33 AM

llvm/test/Transforms/SLPVectorizer/X86/pr50555.ll
3 ↗	(On Diff #366403)	Right - the goal of PhaseOrdering tests is to make sure that >1 passes are interacting as expected and that we get the expected results from the typical (-On) pass pipelines in 'opt'.

anton-afanasyev marked 2 inline comments as done.Aug 16 2021, 11:38 AM

anton-afanasyev added inline comments.

llvm/test/Transforms/SLPVectorizer/X86/pr50555.ll
3 ↗	(On Diff #366403)	Sure, thanks! Moved to subdirectory, changed to -O3 option.

Address comments

Harbormaster completed remote builds in B119755: Diff 366688.Aug 16 2021, 11:39 AM

(Feel free to post lshr patch after landing D108091)

anton-afanasyev mentioned this in rG8f8f9260a95f: [Test][AggressiveInstCombine] Add test for shifts.Aug 17 2021, 2:40 AM

anton-afanasyev mentioned this in rG1f3e35b6d165: [AggressiveInstCombine] Add shift left instruction to `TruncInstCombine` DAG.Aug 17 2021, 3:17 AM

anton-afanasyev mentioned this in D108201: [AggressiveInstCombine] Add logical shift right instr to `TruncInstCombine` DAG.Aug 17 2021, 4:15 AM

lshr case: https://reviews.llvm.org/D108201

anton-afanasyev mentioned this in rGcfb6dfcbd13b: [AggressiveInstCombine] Add logical shift right instr to `TruncInstCombine` DAG.Aug 18 2021, 12:22 PM

Is there anything left to do on this?

In D107766#2953168, @RKSimon wrote:

Is there anything left to do on this?

Rebase this?
ashr is left.

Yes, I'm to add ashr. Also planning to add AssumptionCache to use it for computeKnownBits(). And investigate question about including AIC to -O2.

And investigate question about including AIC to -O2.

Yeah, AIC for O2+ - it would be great if possible..

anton-afanasyev mentioned this in D108355: [AggressiveInstCombine] Add arithmetic shift right instr to `TruncInstCombine` DAG.Aug 19 2021, 2:02 AM

ashr part: https://reviews.llvm.org/D108355

anton-afanasyev mentioned this in rGbed587631f90: [AggressiveInstCombine] Add arithmetic shift right instr to `TruncInstCombine`….Aug 24 2021, 12:41 AM

@anton-afanasyev reverse ping - is there anything left on this patch?

This was already commited in a series of patch, abandoning.

anton-afanasyev mentioned this in D113179: [Passes] Move AggressiveInstCombine after InstCombine.Nov 4 2021, 3:47 AM

anton-afanasyev mentioned this in rGc34d157fc739: [Passes] Move AggressiveInstCombine after InstCombine.Dec 4 2021, 3:24 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

AggressiveInstCombine/

TruncInstCombine.cpp

47 lines

test/

Transforms/

AggressiveInstCombine/

pr50555.ll

24 lines

trunc_shifts.ll

98 lines

PhaseOrdering/

X86/

pr50555.ll

217 lines

Diff 366688

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp

Show All 23 Lines
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AggressiveInstCombineInternal.h"		#include "AggressiveInstCombineInternal.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/ConstantFolding.h"		#include "llvm/Analysis/ConstantFolding.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
		#include "llvm/Support/KnownBits.h"

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "aggressive-instcombine"		#define DEBUG_TYPE "aggressive-instcombine"

STATISTIC(		STATISTIC(
NumDAGsReduced,		NumDAGsReduced,
"Number of truncations eliminated by reducing bit width of expression DAG");		"Number of truncations eliminated by reducing bit width of expression DAG");
Show All 12 Lines	case Instruction::SExt:
// their operands are not relevent.		// their operands are not relevent.
break;		break;
case Instruction::Add:		case Instruction::Add:
case Instruction::Sub:		case Instruction::Sub:
case Instruction::Mul:		case Instruction::Mul:
case Instruction::And:		case Instruction::And:
case Instruction::Or:		case Instruction::Or:
case Instruction::Xor:		case Instruction::Xor:
		case Instruction::Shl:
		case Instruction::LShr:
		case Instruction::AShr:
Ops.push_back(I->getOperand(0));		Ops.push_back(I->getOperand(0));
Ops.push_back(I->getOperand(1));		Ops.push_back(I->getOperand(1));
break;		break;
case Instruction::Select:		case Instruction::Select:
Ops.push_back(I->getOperand(1));		Ops.push_back(I->getOperand(1));
Ops.push_back(I->getOperand(2));		Ops.push_back(I->getOperand(2));
break;		break;
default:		default:
Show All 21 Lines	while (!Worklist.empty()) {
if (!I)		if (!I)
return false;		return false;

if (!Stack.empty() && Stack.back() == I) {		if (!Stack.empty() && Stack.back() == I) {
// Already handled all instruction operands, can remove it from both the		// Already handled all instruction operands, can remove it from both the
// Worklist and the Stack, and add it to the instruction info map.		// Worklist and the Stack, and add it to the instruction info map.
Worklist.pop_back();		Worklist.pop_back();
Stack.pop_back();		Stack.pop_back();

// Insert I to the Info map.		// Insert I to the Info map.
InstInfoMap.insert(std::make_pair(I, Info()));		// Initialize MinBitWidth for shift instructions with the number
		// satisfying conditions:
		// 1. Shift amount is less than MinBitWidth at least by 1
		// 2. For right shifts all truncated bits are zeros and even
		// one untruncated bit is also zero for arithmetic shift.
		// Also normalize MinBitWidth not to be greater than source bitwidth.
		Info InstInfo;
		unsigned int MinBitWidth = 0;
		if (I->getOpcode() == Instruction::Shl \|\|
		I->getOpcode() == Instruction::LShr \|\|
		I->getOpcode() == Instruction::AShr) {
		KnownBits KnownLHS = computeKnownBits(I->getOperand(0), DL);
		KnownBits KnownRHS = computeKnownBits(I->getOperand(1), DL);
		const unsigned int SrcBitWidth = KnownLHS.getBitWidth();
		if (I->getOpcode() != Instruction::Shl)
		MinBitWidth = SrcBitWidth - KnownLHS.countMinLeadingZeros();
		if (I->getOpcode() == Instruction::AShr && MinBitWidth < SrcBitWidth)
		MinBitWidth++;
		InstInfo.MinBitWidth =
		std::max(uint64_t(MinBitWidth),
		std::min(KnownRHS.getMaxValue().getZExtValue(),
		uint64_t(SrcBitWidth - 1)) +
		1);
		}
		InstInfoMap.insert(std::make_pair(I, InstInfo));
continue;		continue;
}		}

if (InstInfoMap.count(I)) {		if (InstInfoMap.count(I)) {
Worklist.pop_back();		Worklist.pop_back();
continue;		continue;
}		}

Show All 11 Lines	case Instruction::SExt:
// dest		// dest
break;		break;
case Instruction::Add:		case Instruction::Add:
case Instruction::Sub:		case Instruction::Sub:
case Instruction::Mul:		case Instruction::Mul:
case Instruction::And:		case Instruction::And:
case Instruction::Or:		case Instruction::Or:
case Instruction::Xor:		case Instruction::Xor:
		case Instruction::Shl:
		case Instruction::LShr:
		case Instruction::AShr:
case Instruction::Select: {		case Instruction::Select: {
SmallVector<Value *, 2> Operands;		SmallVector<Value *, 2> Operands;
getRelevantOperands(I, Operands);		getRelevantOperands(I, Operands);
append_range(Worklist, Operands);		append_range(Worklist, Operands);
break;		break;
}		}
default:		default:
// TODO: Can handle more cases here:		// TODO: Can handle more cases here:
// 1. shufflevector, extractelement, insertelement		// 1. shufflevector, extractelement, insertelement
// 2. udiv, urem		// 2. udiv, urem
// 3. shl, lshr, ashr		// 3. phi node(and loop handling)
// 4. phi node(and loop handling)
// ...		// ...
return false;		return false;
}		}
}		}
return true;		return true;
}		}

unsigned TruncInstCombine::getMinBitWidth() {		unsigned TruncInstCombine::getMinBitWidth() {
▲ Show 20 Lines • Show All 201 Lines • ▼ Show 20 Lines	case Instruction::SExt: {
Worklist.push_back(NewCI);		Worklist.push_back(NewCI);
break;		break;
}		}
case Instruction::Add:		case Instruction::Add:
case Instruction::Sub:		case Instruction::Sub:
case Instruction::Mul:		case Instruction::Mul:
case Instruction::And:		case Instruction::And:
case Instruction::Or:		case Instruction::Or:
case Instruction::Xor: {		case Instruction::Xor:
		case Instruction::Shl:
		case Instruction::LShr:
		case Instruction::AShr: {
Value *LHS = getReducedOperand(I->getOperand(0), SclTy);		Value *LHS = getReducedOperand(I->getOperand(0), SclTy);
Value *RHS = getReducedOperand(I->getOperand(1), SclTy);		Value *RHS = getReducedOperand(I->getOperand(1), SclTy);
Res = Builder.CreateBinOp((Instruction::BinaryOps)Opc, LHS, RHS);		Res = Builder.CreateBinOp((Instruction::BinaryOps)Opc, LHS, RHS);
		// Try to preserve flags, but `shl nuw/nsw` is more poisonous
		// if bitwidth is smaller.
		if (Opc == Instruction::LShr \|\| Opc == Instruction::AShr)
		cast<Instruction>(Res)->setIsExact(I->isExact());
break;		break;
}		}
case Instruction::Select: {		case Instruction::Select: {
Value *Op0 = I->getOperand(0);		Value *Op0 = I->getOperand(0);
Value *LHS = getReducedOperand(I->getOperand(1), SclTy);		Value *LHS = getReducedOperand(I->getOperand(1), SclTy);
Value *RHS = getReducedOperand(I->getOperand(2), SclTy);		Value *RHS = getReducedOperand(I->getOperand(2), SclTy);
Res = Builder.CreateSelect(Op0, LHS, RHS);		Res = Builder.CreateSelect(Op0, LHS, RHS);
break;		break;
▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

llvm/test/Transforms/AggressiveInstCombine/pr50555.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -aggressive-instcombine -S \| FileCheck %s			; RUN: opt < %s -aggressive-instcombine -S \| FileCheck %s

	define void @trunc_one_add(i16* %a, i8 %b) {			define void @trunc_one_add(i16* %a, i8 %b) {
	; CHECK-LABEL: @trunc_one_add(			; CHECK-LABEL: @trunc_one_add(
	; CHECK-NEXT: [[ZEXT:%.]] = zext i8 [[B:%.]] to i32			; CHECK-NEXT: [[ZEXT:%.]] = zext i8 [[B:%.]] to i16
	; CHECK-NEXT: [[SHR:%.*]] = lshr i32 [[ZEXT]], 1			; CHECK-NEXT: [[SHR:%.*]] = lshr i16 [[ZEXT]], 1
	; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[ZEXT]], [[SHR]]			; CHECK-NEXT: [[ADD:%.*]] = add i16 [[ZEXT]], [[SHR]]
	; CHECK-NEXT: [[TRUNC:%.*]] = trunc i32 [[ADD]] to i16			; CHECK-NEXT: store i16 [[ADD]], i16* [[A:%.*]], align 2
	; CHECK-NEXT: store i16 [[TRUNC]], i16* [[A:%.*]], align 2
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%zext = zext i8 %b to i32			%zext = zext i8 %b to i32
	%shr = lshr i32 %zext, 1			%shr = lshr i32 %zext, 1
	%add = add nsw i32 %zext, %shr			%add = add nsw i32 %zext, %shr
	%trunc = trunc i32 %add to i16			%trunc = trunc i32 %add to i16
	store i16 %trunc, i16* %a, align 2			store i16 %trunc, i16* %a, align 2
	ret void			ret void
	}			}

	define void @trunc_two_adds(i16* %a, i8 %b, i8 %c) {			define void @trunc_two_adds(i16* %a, i8 %b, i8 %c) {
	; CHECK-LABEL: @trunc_two_adds(			; CHECK-LABEL: @trunc_two_adds(
	; CHECK-NEXT: [[ZEXT1:%.]] = zext i8 [[B:%.]] to i32			; CHECK-NEXT: [[ZEXT1:%.]] = zext i8 [[B:%.]] to i16
	; CHECK-NEXT: [[ZEXT2:%.]] = zext i8 [[C:%.]] to i32			; CHECK-NEXT: [[ZEXT2:%.]] = zext i8 [[C:%.]] to i16
	; CHECK-NEXT: [[ADD1:%.*]] = add nuw nsw i32 [[ZEXT1]], [[ZEXT2]]			; CHECK-NEXT: [[ADD1:%.*]] = add i16 [[ZEXT1]], [[ZEXT2]]
	; CHECK-NEXT: [[SHR1:%.*]] = lshr i32 [[ADD1]], 1			; CHECK-NEXT: [[SHR1:%.*]] = lshr i16 [[ADD1]], 1
	; CHECK-NEXT: [[ADD2:%.*]] = add nuw nsw i32 [[ADD1]], [[SHR1]]			; CHECK-NEXT: [[ADD2:%.*]] = add i16 [[ADD1]], [[SHR1]]
	; CHECK-NEXT: [[SHR2:%.*]] = lshr i32 [[ADD2]], 2			; CHECK-NEXT: [[SHR2:%.*]] = lshr i16 [[ADD2]], 2
	; CHECK-NEXT: [[TRUNC:%.*]] = trunc i32 [[SHR2]] to i16			; CHECK-NEXT: store i16 [[SHR2]], i16* [[A:%.*]], align 2
	; CHECK-NEXT: store i16 [[TRUNC]], i16* [[A:%.*]], align 2
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%zext1 = zext i8 %b to i32			%zext1 = zext i8 %b to i32
	%zext2 = zext i8 %c to i32			%zext2 = zext i8 %c to i32
	%add1 = add nuw nsw i32 %zext1, %zext2			%add1 = add nuw nsw i32 %zext1, %zext2
	%shr1 = lshr i32 %add1, 1			%shr1 = lshr i32 %add1, 1
	%add2 = add nuw nsw i32 %add1, %shr1			%add2 = add nuw nsw i32 %add1, %shr1
	%shr2 = lshr i32 %add2, 2			%shr2 = lshr i32 %add2, 2
	%trunc = trunc i32 %shr2 to i16			%trunc = trunc i32 %shr2 to i16
	store i16 %trunc, i16* %a, align 2			store i16 %trunc, i16* %a, align 2
	ret void			ret void
	}			}

llvm/test/Transforms/AggressiveInstCombine/trunc_shifts.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -aggressive-instcombine -S \| FileCheck %s		; RUN: opt < %s -aggressive-instcombine -S \| FileCheck %s

define i16 @lshr_trunc_commute(i16 %x) {		define i16 @lshr_trunc_commute(i16 %x) {
; CHECK-LABEL: @lshr_trunc_commute(		; CHECK-LABEL: @lshr_trunc_commute(
; CHECK-NEXT: [[ZEXT:%.]] = zext i16 [[X:%.]] to i32		; CHECK-NEXT: [[LSHR:%.]] = lshr i16 [[X:%.]], 15
; CHECK-NEXT: [[LSHR:%.*]] = lshr i32 [[ZEXT]], 15		; CHECK-NEXT: ret i16 [[LSHR]]
; CHECK-NEXT: [[TRUNC:%.*]] = trunc i32 [[LSHR]] to i16
; CHECK-NEXT: ret i16 [[TRUNC]]
;		;
%zext = zext i16 %x to i32		%zext = zext i16 %x to i32
%lshr = lshr i32 %zext, 15		%lshr = lshr i32 %zext, 15
%trunc = trunc i32 %lshr to i16		%trunc = trunc i32 %lshr to i16
ret i16 %trunc		ret i16 %trunc
}		}

define i16 @lshr_trunc_not_commute(i16 %x) {		define i16 @lshr_trunc_not_commute(i16 %x) {
Show All 19 Lines	;
%zext = zext i16 %x to i32		%zext = zext i16 %x to i32
%ashr = ashr i32 %zext, 15		%ashr = ashr i32 %zext, 15
%trunc = trunc i32 %ashr to i16		%trunc = trunc i32 %ashr to i16
ret i16 %trunc		ret i16 %trunc
}		}

define i16 @ashr_trunc_commute(i16 %x) {		define i16 @ashr_trunc_commute(i16 %x) {
; CHECK-LABEL: @ashr_trunc_commute(		; CHECK-LABEL: @ashr_trunc_commute(
; CHECK-NEXT: [[ZEXT:%.]] = zext i16 [[X:%.]] to i32		; CHECK-NEXT: [[AND:%.]] = and i16 [[X:%.]], 32767
; CHECK-NEXT: [[AND:%.*]] = and i32 [[ZEXT]], 32767		; CHECK-NEXT: [[ASHR:%.*]] = ashr i16 [[AND]], 15
; CHECK-NEXT: [[ASHR:%.*]] = ashr i32 [[AND]], 15		; CHECK-NEXT: ret i16 [[ASHR]]
; CHECK-NEXT: [[TRUNC:%.*]] = trunc i32 [[ASHR]] to i16
; CHECK-NEXT: ret i16 [[TRUNC]]
;		;
%zext = zext i16 %x to i32		%zext = zext i16 %x to i32
%and = and i32 %zext, 32767		%and = and i32 %zext, 32767
%ashr = ashr i32 %and, 15		%ashr = ashr i32 %and, 15
%trunc = trunc i32 %ashr to i16		%trunc = trunc i32 %ashr to i16
ret i16 %trunc		ret i16 %trunc
}		}

Show All 13 Lines	;
%a = add i32 %s, %z		%a = add i32 %s, %z
%s2 = lshr i32 %a, 2		%s2 = lshr i32 %a, 2
%t = trunc i32 %s2 to i16		%t = trunc i32 %s2 to i16
ret i16 %t		ret i16 %t
}		}

define i16 @var_shift_commute(i8 %x, i8 %amt) {		define i16 @var_shift_commute(i8 %x, i8 %amt) {
; CHECK-LABEL: @var_shift_commute(		; CHECK-LABEL: @var_shift_commute(
; CHECK-NEXT: [[Z:%.]] = zext i8 [[X:%.]] to i32		; CHECK-NEXT: [[Z:%.]] = zext i8 [[X:%.]] to i16
; CHECK-NEXT: [[ZA:%.]] = zext i8 [[AMT:%.]] to i32		; CHECK-NEXT: [[ZA:%.]] = zext i8 [[AMT:%.]] to i16
; CHECK-NEXT: [[ZA2:%.*]] = and i32 [[ZA]], 15		; CHECK-NEXT: [[ZA2:%.*]] = and i16 [[ZA]], 15
; CHECK-NEXT: [[S:%.*]] = lshr i32 [[Z]], [[ZA2]]		; CHECK-NEXT: [[S:%.*]] = lshr i16 [[Z]], [[ZA2]]
; CHECK-NEXT: [[A:%.*]] = add i32 [[S]], [[Z]]		; CHECK-NEXT: [[A:%.*]] = add i16 [[S]], [[Z]]
; CHECK-NEXT: [[S2:%.*]] = lshr i32 [[A]], 2		; CHECK-NEXT: [[S2:%.*]] = lshr i16 [[A]], 2
; CHECK-NEXT: [[T:%.*]] = trunc i32 [[S2]] to i16		; CHECK-NEXT: ret i16 [[S2]]
; CHECK-NEXT: ret i16 [[T]]
;		;
%z = zext i8 %x to i32		%z = zext i8 %x to i32
%za = zext i8 %amt to i32		%za = zext i8 %amt to i32
%za2 = and i32 %za, 15		%za2 = and i32 %za, 15
%s = lshr i32 %z, %za2		%s = lshr i32 %z, %za2
%a = add i32 %s, %z		%a = add i32 %s, %z
%s2 = lshr i32 %a, 2		%s2 = lshr i32 %a, 2
%t = trunc i32 %s2 to i16		%t = trunc i32 %s2 to i16
ret i16 %t		ret i16 %t
}		}

define void @big_dag(i16* %a, i8 %b, i8 %c) {		define void @big_dag(i16* %a, i8 %b, i8 %c) {
; CHECK-LABEL: @big_dag(		; CHECK-LABEL: @big_dag(
; CHECK-NEXT: [[ZEXT1:%.]] = zext i8 [[B:%.]] to i32		; CHECK-NEXT: [[ZEXT1:%.]] = zext i8 [[B:%.]] to i16
; CHECK-NEXT: [[ZEXT2:%.]] = zext i8 [[C:%.]] to i32		; CHECK-NEXT: [[ZEXT2:%.]] = zext i8 [[C:%.]] to i16
; CHECK-NEXT: [[ADD1:%.*]] = add i32 [[ZEXT1]], [[ZEXT2]]		; CHECK-NEXT: [[ADD1:%.*]] = add i16 [[ZEXT1]], [[ZEXT2]]
; CHECK-NEXT: [[SFT1:%.*]] = and i32 [[ADD1]], 15		; CHECK-NEXT: [[SFT1:%.*]] = and i16 [[ADD1]], 15
; CHECK-NEXT: [[SHR1:%.*]] = lshr i32 [[ADD1]], [[SFT1]]		; CHECK-NEXT: [[SHR1:%.*]] = lshr i16 [[ADD1]], [[SFT1]]
; CHECK-NEXT: [[ADD2:%.*]] = add i32 [[ADD1]], [[SHR1]]		; CHECK-NEXT: [[ADD2:%.*]] = add i16 [[ADD1]], [[SHR1]]
; CHECK-NEXT: [[SFT2:%.*]] = and i32 [[ADD2]], 7		; CHECK-NEXT: [[SFT2:%.*]] = and i16 [[ADD2]], 7
; CHECK-NEXT: [[SHR2:%.*]] = lshr i32 [[ADD2]], [[SFT2]]		; CHECK-NEXT: [[SHR2:%.*]] = lshr i16 [[ADD2]], [[SFT2]]
; CHECK-NEXT: [[TRUNC:%.*]] = trunc i32 [[SHR2]] to i16		; CHECK-NEXT: store i16 [[SHR2]], i16* [[A:%.*]], align 2
; CHECK-NEXT: store i16 [[TRUNC]], i16* [[A:%.*]], align 2
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%zext1 = zext i8 %b to i32		%zext1 = zext i8 %b to i32
%zext2 = zext i8 %c to i32		%zext2 = zext i8 %c to i32
%add1 = add i32 %zext1, %zext2		%add1 = add i32 %zext1, %zext2
%sft1 = and i32 %add1, 15		%sft1 = and i32 %add1, 15
%shr1 = lshr i32 %add1, %sft1		%shr1 = lshr i32 %add1, %sft1
%add2 = add i32 %add1, %shr1		%add2 = add i32 %add1, %shr1
%sft2 = and i32 %add2, 7		%sft2 = and i32 %add2, 7
%shr2 = lshr i32 %add2, %sft2		%shr2 = lshr i32 %add2, %sft2
%trunc = trunc i32 %shr2 to i16		%trunc = trunc i32 %shr2 to i16
store i16 %trunc, i16* %a, align 2		store i16 %trunc, i16* %a, align 2
ret void		ret void
}		}

define <2 x i16> @vector_commute(<2 x i8> %x) {		define <2 x i16> @vector_commute(<2 x i8> %x) {
; CHECK-LABEL: @vector_commute(		; CHECK-LABEL: @vector_commute(
; CHECK-NEXT: [[Z:%.]] = zext <2 x i8> [[X:%.]] to <2 x i32>		; CHECK-NEXT: [[Z:%.]] = zext <2 x i8> [[X:%.]] to <2 x i16>
; CHECK-NEXT: [[ZA:%.*]] = and <2 x i32> [[Z]], <i32 7, i32 8>		; CHECK-NEXT: [[ZA:%.*]] = and <2 x i16> [[Z]], <i16 7, i16 8>
; CHECK-NEXT: [[S:%.*]] = lshr <2 x i32> [[Z]], [[ZA]]		; CHECK-NEXT: [[S:%.*]] = lshr <2 x i16> [[Z]], [[ZA]]
; CHECK-NEXT: [[A:%.*]] = add <2 x i32> [[S]], [[Z]]		; CHECK-NEXT: [[A:%.*]] = add <2 x i16> [[S]], [[Z]]
; CHECK-NEXT: [[S2:%.*]] = lshr <2 x i32> [[A]], <i32 4, i32 5>		; CHECK-NEXT: [[S2:%.*]] = lshr <2 x i16> [[A]], <i16 4, i16 5>
; CHECK-NEXT: [[T:%.*]] = trunc <2 x i32> [[S2]] to <2 x i16>		; CHECK-NEXT: ret <2 x i16> [[S2]]
; CHECK-NEXT: ret <2 x i16> [[T]]
;		;
%z = zext <2 x i8> %x to <2 x i32>		%z = zext <2 x i8> %x to <2 x i32>
%za = and <2 x i32> %z, <i32 7, i32 8>		%za = and <2 x i32> %z, <i32 7, i32 8>
%s = lshr <2 x i32> %z, %za		%s = lshr <2 x i32> %z, %za
%a = add <2 x i32> %s, %z		%a = add <2 x i32> %s, %z
%s2 = lshr <2 x i32> %a, <i32 4, i32 5>		%s2 = lshr <2 x i32> %a, <i32 4, i32 5>
%t = trunc <2 x i32> %s2 to <2 x i16>		%t = trunc <2 x i32> %s2 to <2 x i16>
ret <2 x i16> %t		ret <2 x i16> %t
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	;
%zext = zext i8 %x to i32		%zext = zext i8 %x to i32
%shl = shl i32 %zext, %zext		%shl = shl i32 %zext, %zext
%trunc = trunc i32 %shl to i16		%trunc = trunc i32 %shl to i16
ret i16 %trunc		ret i16 %trunc
}		}

define i16 @shl_commute(i8 %x) {		define i16 @shl_commute(i8 %x) {
; CHECK-LABEL: @shl_commute(		; CHECK-LABEL: @shl_commute(
; CHECK-NEXT: [[ZEXT:%.]] = zext i8 [[X:%.]] to i32		; CHECK-NEXT: [[ZEXT:%.]] = zext i8 [[X:%.]] to i16
; CHECK-NEXT: [[AND:%.*]] = and i32 [[ZEXT]], 15		; CHECK-NEXT: [[AND:%.*]] = and i16 [[ZEXT]], 15
; CHECK-NEXT: [[SHL:%.*]] = shl i32 [[ZEXT]], [[AND]]		; CHECK-NEXT: [[SHL:%.*]] = shl i16 [[ZEXT]], [[AND]]
; CHECK-NEXT: [[TRUNC:%.*]] = trunc i32 [[SHL]] to i16		; CHECK-NEXT: ret i16 [[SHL]]
; CHECK-NEXT: ret i16 [[TRUNC]]
;		;
%zext = zext i8 %x to i32		%zext = zext i8 %x to i32
%and = and i32 %zext, 15		%and = and i32 %zext, 15
%shl = shl i32 %zext, %and		%shl = shl i32 %zext, %and
%trunc = trunc i32 %shl to i16		%trunc = trunc i32 %shl to i16
ret i16 %trunc		ret i16 %trunc
}		}

define i16 @lshr_exact(i16 %x) {		define i16 @lshr_exact(i16 %x) {
; CHECK-LABEL: @lshr_exact(		; CHECK-LABEL: @lshr_exact(
; CHECK-NEXT: [[ZEXT:%.]] = zext i16 [[X:%.]] to i32		; CHECK-NEXT: [[LSHR:%.]] = lshr exact i16 [[X:%.]], 15
; CHECK-NEXT: [[LSHR:%.*]] = lshr exact i32 [[ZEXT]], 15		; CHECK-NEXT: ret i16 [[LSHR]]
; CHECK-NEXT: [[TRUNC:%.*]] = trunc i32 [[LSHR]] to i16
; CHECK-NEXT: ret i16 [[TRUNC]]
;		;
%zext = zext i16 %x to i32		%zext = zext i16 %x to i32
%lshr = lshr exact i32 %zext, 15		%lshr = lshr exact i32 %zext, 15
%trunc = trunc i32 %lshr to i16		%trunc = trunc i32 %lshr to i16
ret i16 %trunc		ret i16 %trunc
}		}

define i16 @ashr_exact(i16 %x) {		define i16 @ashr_exact(i16 %x) {
; CHECK-LABEL: @ashr_exact(		; CHECK-LABEL: @ashr_exact(
; CHECK-NEXT: [[ZEXT:%.]] = zext i16 [[X:%.]] to i32		; CHECK-NEXT: [[AND:%.]] = and i16 [[X:%.]], 32767
; CHECK-NEXT: [[AND:%.*]] = and i32 [[ZEXT]], 32767		; CHECK-NEXT: [[ASHR:%.*]] = ashr exact i16 [[AND]], 15
; CHECK-NEXT: [[ASHR:%.*]] = ashr exact i32 [[AND]], 15		; CHECK-NEXT: ret i16 [[ASHR]]
; CHECK-NEXT: [[TRUNC:%.*]] = trunc i32 [[ASHR]] to i16
; CHECK-NEXT: ret i16 [[TRUNC]]
;		;
%zext = zext i16 %x to i32		%zext = zext i16 %x to i32
%and = and i32 %zext, 32767		%and = and i32 %zext, 32767
%ashr = ashr exact i32 %and, 15		%ashr = ashr exact i32 %and, 15
%trunc = trunc i32 %ashr to i16		%trunc = trunc i32 %ashr to i16
ret i16 %trunc		ret i16 %trunc
}		}

define i16 @shl_nuw(i8 %x, i8 %sh1) {		define i16 @shl_nuw(i8 %x, i8 %sh1) {
; CHECK-LABEL: @shl_nuw(		; CHECK-LABEL: @shl_nuw(
; CHECK-NEXT: [[Z:%.]] = zext i8 [[X:%.]] to i32		; CHECK-NEXT: [[Z:%.]] = zext i8 [[X:%.]] to i16
; CHECK-NEXT: [[S:%.*]] = shl nuw i32 [[Z]], 8		; CHECK-NEXT: [[S:%.*]] = shl nuw i16 [[Z]], 8
; CHECK-NEXT: [[T:%.*]] = trunc i32 [[S]] to i16		; CHECK-NEXT: ret i16 [[S]]
; CHECK-NEXT: ret i16 [[T]]
;		;
%z = zext i8 %x to i32		%z = zext i8 %x to i32
%s = shl nuw i32 %z, 8		%s = shl nuw i32 %z, 8
%t = trunc i32 %s to i16		%t = trunc i32 %s to i16
ret i16 %t		ret i16 %t
}		}

define i16 @shl_nsw(i8 %x, i8 %sh1) {		define i16 @shl_nsw(i8 %x, i8 %sh1) {
; CHECK-LABEL: @shl_nsw(		; CHECK-LABEL: @shl_nsw(
; CHECK-NEXT: [[Z:%.]] = zext i8 [[X:%.]] to i32		; CHECK-NEXT: [[Z:%.]] = zext i8 [[X:%.]] to i16
; CHECK-NEXT: [[S:%.*]] = shl nsw i32 [[Z]], 8		; CHECK-NEXT: [[S:%.*]] = shl i16 [[Z]], 8
; CHECK-NEXT: [[T:%.*]] = trunc i32 [[S]] to i16		; CHECK-NEXT: ret i16 [[S]]
; CHECK-NEXT: ret i16 [[T]]
;		;
%z = zext i8 %x to i32		%z = zext i8 %x to i32
%s = shl nsw i32 %z, 8		%s = shl nsw i32 %z, 8
%t = trunc i32 %s to i16		%t = trunc i32 %s to i16
ret i16 %t		ret i16 %t
}		}

llvm/test/Transforms/PhaseOrdering/X86/pr50555.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -O3 -S -mtriple=x86_64-- \| FileCheck %s --check-prefixes=SSE		; RUN: opt < %s -O3 -S -mtriple=x86_64-- \| FileCheck %s --check-prefixes=SSE
; RUN: opt < %s -O3 -S -mtriple=x86_64-- -mattr=avx \| FileCheck %s --check-prefixes=AVX		; RUN: opt < %s -O3 -S -mtriple=x86_64-- -mattr=avx \| FileCheck %s --check-prefixes=AVX

define void @trunc_through_one_add(i16* noalias %0, i8* noalias readonly %1) {		define void @trunc_through_one_add(i16* noalias %0, i8* noalias readonly %1) {
; SSE-LABEL: @trunc_through_one_add(		; SSE-LABEL: @trunc_through_one_add(
; SSE-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP1:%.]] to <4 x i8>		; SSE-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP1:%.]] to <8 x i8>
; SSE-NEXT: [[TMP4:%.]] = load <4 x i8>, <4 x i8> [[TMP3]], align 1		; SSE-NEXT: [[TMP4:%.]] = load <8 x i8>, <8 x i8> [[TMP3]], align 1
; SSE-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>		; SSE-NEXT: [[TMP5:%.*]] = zext <8 x i8> [[TMP4]] to <8 x i16>
; SSE-NEXT: [[TMP6:%.*]] = lshr <4 x i32> [[TMP5]], <i32 1, i32 1, i32 1, i32 1>		; SSE-NEXT: [[TMP6:%.*]] = lshr <8 x i16> [[TMP5]], <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
; SSE-NEXT: [[TMP7:%.*]] = add nuw nsw <4 x i32> [[TMP6]], [[TMP5]]		; SSE-NEXT: [[TMP7:%.*]] = add nuw nsw <8 x i16> [[TMP6]], [[TMP5]]
; SSE-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP7]], <i32 2, i32 2, i32 2, i32 2>		; SSE-NEXT: [[TMP8:%.*]] = lshr <8 x i16> [[TMP7]], <i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2>
; SSE-NEXT: [[TMP9:%.*]] = trunc <4 x i32> [[TMP8]] to <4 x i16>		; SSE-NEXT: [[TMP9:%.]] = bitcast i16 [[TMP0:%.]] to <8 x i16>
; SSE-NEXT: [[TMP10:%.]] = bitcast i16 [[TMP0:%.]] to <4 x i16>		; SSE-NEXT: store <8 x i16> [[TMP8]], <8 x i16>* [[TMP9]], align 2
; SSE-NEXT: store <4 x i16> [[TMP9]], <4 x i16>* [[TMP10]], align 2		; SSE-NEXT: [[TMP10:%.]] = getelementptr inbounds i8, i8 [[TMP1]], i64 8
; SSE-NEXT: [[TMP11:%.]] = getelementptr inbounds i8, i8 [[TMP1]], i64 4		; SSE-NEXT: [[TMP11:%.]] = getelementptr inbounds i16, i16 [[TMP0]], i64 8
; SSE-NEXT: [[TMP12:%.]] = getelementptr inbounds i16, i16 [[TMP0]], i64 4		; SSE-NEXT: [[TMP12:%.]] = bitcast i8 [[TMP10]] to <8 x i8>*
; SSE-NEXT: [[TMP13:%.]] = bitcast i8 [[TMP11]] to <4 x i8>*		; SSE-NEXT: [[TMP13:%.]] = load <8 x i8>, <8 x i8> [[TMP12]], align 1
; SSE-NEXT: [[TMP14:%.]] = load <4 x i8>, <4 x i8> [[TMP13]], align 1		; SSE-NEXT: [[TMP14:%.*]] = zext <8 x i8> [[TMP13]] to <8 x i16>
; SSE-NEXT: [[TMP15:%.*]] = zext <4 x i8> [[TMP14]] to <4 x i32>		; SSE-NEXT: [[TMP15:%.*]] = lshr <8 x i16> [[TMP14]], <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
; SSE-NEXT: [[TMP16:%.*]] = lshr <4 x i32> [[TMP15]], <i32 1, i32 1, i32 1, i32 1>		; SSE-NEXT: [[TMP16:%.*]] = add nuw nsw <8 x i16> [[TMP15]], [[TMP14]]
; SSE-NEXT: [[TMP17:%.*]] = add nuw nsw <4 x i32> [[TMP16]], [[TMP15]]		; SSE-NEXT: [[TMP17:%.*]] = lshr <8 x i16> [[TMP16]], <i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2>
; SSE-NEXT: [[TMP18:%.*]] = lshr <4 x i32> [[TMP17]], <i32 2, i32 2, i32 2, i32 2>		; SSE-NEXT: [[TMP18:%.]] = bitcast i16 [[TMP11]] to <8 x i16>*
; SSE-NEXT: [[TMP19:%.*]] = trunc <4 x i32> [[TMP18]] to <4 x i16>		; SSE-NEXT: store <8 x i16> [[TMP17]], <8 x i16>* [[TMP18]], align 2
; SSE-NEXT: [[TMP20:%.]] = bitcast i16 [[TMP12]] to <4 x i16>*
; SSE-NEXT: store <4 x i16> [[TMP19]], <4 x i16>* [[TMP20]], align 2
; SSE-NEXT: [[TMP21:%.]] = getelementptr inbounds i8, i8 [[TMP1]], i64 8
; SSE-NEXT: [[TMP22:%.]] = getelementptr inbounds i16, i16 [[TMP0]], i64 8
; SSE-NEXT: [[TMP23:%.]] = bitcast i8 [[TMP21]] to <4 x i8>*
; SSE-NEXT: [[TMP24:%.]] = load <4 x i8>, <4 x i8> [[TMP23]], align 1
; SSE-NEXT: [[TMP25:%.*]] = zext <4 x i8> [[TMP24]] to <4 x i32>
; SSE-NEXT: [[TMP26:%.*]] = lshr <4 x i32> [[TMP25]], <i32 1, i32 1, i32 1, i32 1>
; SSE-NEXT: [[TMP27:%.*]] = add nuw nsw <4 x i32> [[TMP26]], [[TMP25]]
; SSE-NEXT: [[TMP28:%.*]] = lshr <4 x i32> [[TMP27]], <i32 2, i32 2, i32 2, i32 2>
; SSE-NEXT: [[TMP29:%.*]] = trunc <4 x i32> [[TMP28]] to <4 x i16>
; SSE-NEXT: [[TMP30:%.]] = bitcast i16 [[TMP22]] to <4 x i16>*
; SSE-NEXT: store <4 x i16> [[TMP29]], <4 x i16>* [[TMP30]], align 2
; SSE-NEXT: [[TMP31:%.]] = getelementptr inbounds i8, i8 [[TMP1]], i64 12
; SSE-NEXT: [[TMP32:%.]] = getelementptr inbounds i16, i16 [[TMP0]], i64 12
; SSE-NEXT: [[TMP33:%.]] = bitcast i8 [[TMP31]] to <4 x i8>*
; SSE-NEXT: [[TMP34:%.]] = load <4 x i8>, <4 x i8> [[TMP33]], align 1
; SSE-NEXT: [[TMP35:%.*]] = zext <4 x i8> [[TMP34]] to <4 x i32>
; SSE-NEXT: [[TMP36:%.*]] = lshr <4 x i32> [[TMP35]], <i32 1, i32 1, i32 1, i32 1>
; SSE-NEXT: [[TMP37:%.*]] = add nuw nsw <4 x i32> [[TMP36]], [[TMP35]]
; SSE-NEXT: [[TMP38:%.*]] = lshr <4 x i32> [[TMP37]], <i32 2, i32 2, i32 2, i32 2>
; SSE-NEXT: [[TMP39:%.*]] = trunc <4 x i32> [[TMP38]] to <4 x i16>
; SSE-NEXT: [[TMP40:%.]] = bitcast i16 [[TMP32]] to <4 x i16>*
; SSE-NEXT: store <4 x i16> [[TMP39]], <4 x i16>* [[TMP40]], align 2
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @trunc_through_one_add(		; AVX-LABEL: @trunc_through_one_add(
; AVX-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP1:%.]] to <8 x i8>		; AVX-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP1:%.]] to <16 x i8>
; AVX-NEXT: [[TMP4:%.]] = load <8 x i8>, <8 x i8> [[TMP3]], align 1		; AVX-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[TMP3]], align 1
; AVX-NEXT: [[TMP5:%.*]] = zext <8 x i8> [[TMP4]] to <8 x i32>		; AVX-NEXT: [[TMP5:%.*]] = zext <16 x i8> [[TMP4]] to <16 x i16>
; AVX-NEXT: [[TMP6:%.*]] = lshr <8 x i32> [[TMP5]], <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>		; AVX-NEXT: [[TMP6:%.*]] = lshr <16 x i16> [[TMP5]], <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
; AVX-NEXT: [[TMP7:%.*]] = add nuw nsw <8 x i32> [[TMP6]], [[TMP5]]		; AVX-NEXT: [[TMP7:%.*]] = add nuw nsw <16 x i16> [[TMP6]], [[TMP5]]
; AVX-NEXT: [[TMP8:%.*]] = lshr <8 x i32> [[TMP7]], <i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2>		; AVX-NEXT: [[TMP8:%.*]] = lshr <16 x i16> [[TMP7]], <i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2>
; AVX-NEXT: [[TMP9:%.*]] = trunc <8 x i32> [[TMP8]] to <8 x i16>		; AVX-NEXT: [[TMP9:%.]] = bitcast i16 [[TMP0:%.]] to <16 x i16>
; AVX-NEXT: [[TMP10:%.]] = bitcast i16 [[TMP0:%.]] to <8 x i16>		; AVX-NEXT: store <16 x i16> [[TMP8]], <16 x i16>* [[TMP9]], align 2
; AVX-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* [[TMP10]], align 2
; AVX-NEXT: [[TMP11:%.]] = getelementptr inbounds i8, i8 [[TMP1]], i64 8
; AVX-NEXT: [[TMP12:%.]] = getelementptr inbounds i16, i16 [[TMP0]], i64 8
; AVX-NEXT: [[TMP13:%.]] = bitcast i8 [[TMP11]] to <8 x i8>*
; AVX-NEXT: [[TMP14:%.]] = load <8 x i8>, <8 x i8> [[TMP13]], align 1
; AVX-NEXT: [[TMP15:%.*]] = zext <8 x i8> [[TMP14]] to <8 x i32>
; AVX-NEXT: [[TMP16:%.*]] = lshr <8 x i32> [[TMP15]], <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
; AVX-NEXT: [[TMP17:%.*]] = add nuw nsw <8 x i32> [[TMP16]], [[TMP15]]
; AVX-NEXT: [[TMP18:%.*]] = lshr <8 x i32> [[TMP17]], <i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2>
; AVX-NEXT: [[TMP19:%.*]] = trunc <8 x i32> [[TMP18]] to <8 x i16>
; AVX-NEXT: [[TMP20:%.]] = bitcast i16 [[TMP12]] to <8 x i16>*
; AVX-NEXT: store <8 x i16> [[TMP19]], <8 x i16>* [[TMP20]], align 2
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
%3 = load i8, i8* %1, align 1		%3 = load i8, i8* %1, align 1
%4 = zext i8 %3 to i32		%4 = zext i8 %3 to i32
%5 = lshr i32 %4, 1		%5 = lshr i32 %4, 1
%6 = add nuw nsw i32 %5, %4		%6 = add nuw nsw i32 %5, %4
%7 = lshr i32 %6, 2		%7 = lshr i32 %6, 2
%8 = trunc i32 %7 to i16		%8 = trunc i32 %7 to i16
▲ Show 20 Lines • Show All 133 Lines • ▼ Show 20 Lines	;
%127 = trunc i32 %126 to i16		%127 = trunc i32 %126 to i16
%128 = getelementptr inbounds i16, i16* %0, i64 15		%128 = getelementptr inbounds i16, i16* %0, i64 15
store i16 %127, i16* %128, align 2		store i16 %127, i16* %128, align 2
ret void		ret void
}		}

define void @trunc_through_two_adds(i16* noalias %0, i8* noalias readonly %1, i8* noalias readonly %2) {		define void @trunc_through_two_adds(i16* noalias %0, i8* noalias readonly %1, i8* noalias readonly %2) {
; SSE-LABEL: @trunc_through_two_adds(		; SSE-LABEL: @trunc_through_two_adds(
; SSE-NEXT: [[TMP4:%.]] = bitcast i8 [[TMP1:%.]] to <4 x i8>		; SSE-NEXT: [[TMP4:%.]] = bitcast i8 [[TMP1:%.]] to <8 x i8>
; SSE-NEXT: [[TMP5:%.]] = load <4 x i8>, <4 x i8> [[TMP4]], align 1		; SSE-NEXT: [[TMP5:%.]] = load <8 x i8>, <8 x i8> [[TMP4]], align 1
; SSE-NEXT: [[TMP6:%.*]] = zext <4 x i8> [[TMP5]] to <4 x i32>		; SSE-NEXT: [[TMP6:%.*]] = zext <8 x i8> [[TMP5]] to <8 x i16>
; SSE-NEXT: [[TMP7:%.]] = bitcast i8 [[TMP2:%.]] to <4 x i8>		; SSE-NEXT: [[TMP7:%.]] = bitcast i8 [[TMP2:%.]] to <8 x i8>
; SSE-NEXT: [[TMP8:%.]] = load <4 x i8>, <4 x i8> [[TMP7]], align 1		; SSE-NEXT: [[TMP8:%.]] = load <8 x i8>, <8 x i8> [[TMP7]], align 1
; SSE-NEXT: [[TMP9:%.*]] = zext <4 x i8> [[TMP8]] to <4 x i32>		; SSE-NEXT: [[TMP9:%.*]] = zext <8 x i8> [[TMP8]] to <8 x i16>
; SSE-NEXT: [[TMP10:%.*]] = add nuw nsw <4 x i32> [[TMP9]], [[TMP6]]		; SSE-NEXT: [[TMP10:%.*]] = add nuw nsw <8 x i16> [[TMP9]], [[TMP6]]
; SSE-NEXT: [[TMP11:%.*]] = lshr <4 x i32> [[TMP10]], <i32 1, i32 1, i32 1, i32 1>		; SSE-NEXT: [[TMP11:%.*]] = lshr <8 x i16> [[TMP10]], <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
; SSE-NEXT: [[TMP12:%.*]] = add nuw nsw <4 x i32> [[TMP11]], [[TMP10]]		; SSE-NEXT: [[TMP12:%.*]] = add nuw nsw <8 x i16> [[TMP11]], [[TMP10]]
; SSE-NEXT: [[TMP13:%.*]] = lshr <4 x i32> [[TMP12]], <i32 2, i32 2, i32 2, i32 2>		; SSE-NEXT: [[TMP13:%.*]] = lshr <8 x i16> [[TMP12]], <i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2>
; SSE-NEXT: [[TMP14:%.*]] = trunc <4 x i32> [[TMP13]] to <4 x i16>		; SSE-NEXT: [[TMP14:%.]] = bitcast i16 [[TMP0:%.]] to <8 x i16>
; SSE-NEXT: [[TMP15:%.]] = bitcast i16 [[TMP0:%.]] to <4 x i16>		; SSE-NEXT: store <8 x i16> [[TMP13]], <8 x i16>* [[TMP14]], align 2
; SSE-NEXT: store <4 x i16> [[TMP14]], <4 x i16>* [[TMP15]], align 2		; SSE-NEXT: [[TMP15:%.]] = getelementptr inbounds i8, i8 [[TMP1]], i64 8
; SSE-NEXT: [[TMP16:%.]] = getelementptr inbounds i8, i8 [[TMP1]], i64 4		; SSE-NEXT: [[TMP16:%.]] = getelementptr inbounds i8, i8 [[TMP2]], i64 8
; SSE-NEXT: [[TMP17:%.]] = getelementptr inbounds i8, i8 [[TMP2]], i64 4		; SSE-NEXT: [[TMP17:%.]] = getelementptr inbounds i16, i16 [[TMP0]], i64 8
; SSE-NEXT: [[TMP18:%.]] = getelementptr inbounds i16, i16 [[TMP0]], i64 4		; SSE-NEXT: [[TMP18:%.]] = bitcast i8 [[TMP15]] to <8 x i8>*
; SSE-NEXT: [[TMP19:%.]] = bitcast i8 [[TMP16]] to <4 x i8>*		; SSE-NEXT: [[TMP19:%.]] = load <8 x i8>, <8 x i8> [[TMP18]], align 1
; SSE-NEXT: [[TMP20:%.]] = load <4 x i8>, <4 x i8> [[TMP19]], align 1		; SSE-NEXT: [[TMP20:%.*]] = zext <8 x i8> [[TMP19]] to <8 x i16>
; SSE-NEXT: [[TMP21:%.*]] = zext <4 x i8> [[TMP20]] to <4 x i32>		; SSE-NEXT: [[TMP21:%.]] = bitcast i8 [[TMP16]] to <8 x i8>*
; SSE-NEXT: [[TMP22:%.]] = bitcast i8 [[TMP17]] to <4 x i8>*		; SSE-NEXT: [[TMP22:%.]] = load <8 x i8>, <8 x i8> [[TMP21]], align 1
; SSE-NEXT: [[TMP23:%.]] = load <4 x i8>, <4 x i8> [[TMP22]], align 1		; SSE-NEXT: [[TMP23:%.*]] = zext <8 x i8> [[TMP22]] to <8 x i16>
; SSE-NEXT: [[TMP24:%.*]] = zext <4 x i8> [[TMP23]] to <4 x i32>		; SSE-NEXT: [[TMP24:%.*]] = add nuw nsw <8 x i16> [[TMP23]], [[TMP20]]
; SSE-NEXT: [[TMP25:%.*]] = add nuw nsw <4 x i32> [[TMP24]], [[TMP21]]		; SSE-NEXT: [[TMP25:%.*]] = lshr <8 x i16> [[TMP24]], <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
; SSE-NEXT: [[TMP26:%.*]] = lshr <4 x i32> [[TMP25]], <i32 1, i32 1, i32 1, i32 1>		; SSE-NEXT: [[TMP26:%.*]] = add nuw nsw <8 x i16> [[TMP25]], [[TMP24]]
; SSE-NEXT: [[TMP27:%.*]] = add nuw nsw <4 x i32> [[TMP26]], [[TMP25]]		; SSE-NEXT: [[TMP27:%.*]] = lshr <8 x i16> [[TMP26]], <i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2>
; SSE-NEXT: [[TMP28:%.*]] = lshr <4 x i32> [[TMP27]], <i32 2, i32 2, i32 2, i32 2>		; SSE-NEXT: [[TMP28:%.]] = bitcast i16 [[TMP17]] to <8 x i16>*
; SSE-NEXT: [[TMP29:%.*]] = trunc <4 x i32> [[TMP28]] to <4 x i16>		; SSE-NEXT: store <8 x i16> [[TMP27]], <8 x i16>* [[TMP28]], align 2
; SSE-NEXT: [[TMP30:%.]] = bitcast i16 [[TMP18]] to <4 x i16>*
; SSE-NEXT: store <4 x i16> [[TMP29]], <4 x i16>* [[TMP30]], align 2
; SSE-NEXT: [[TMP31:%.]] = getelementptr inbounds i8, i8 [[TMP1]], i64 8
; SSE-NEXT: [[TMP32:%.]] = getelementptr inbounds i8, i8 [[TMP2]], i64 8
; SSE-NEXT: [[TMP33:%.]] = getelementptr inbounds i16, i16 [[TMP0]], i64 8
; SSE-NEXT: [[TMP34:%.]] = bitcast i8 [[TMP31]] to <4 x i8>*
; SSE-NEXT: [[TMP35:%.]] = load <4 x i8>, <4 x i8> [[TMP34]], align 1
; SSE-NEXT: [[TMP36:%.*]] = zext <4 x i8> [[TMP35]] to <4 x i32>
; SSE-NEXT: [[TMP37:%.]] = bitcast i8 [[TMP32]] to <4 x i8>*
; SSE-NEXT: [[TMP38:%.]] = load <4 x i8>, <4 x i8> [[TMP37]], align 1
; SSE-NEXT: [[TMP39:%.*]] = zext <4 x i8> [[TMP38]] to <4 x i32>
; SSE-NEXT: [[TMP40:%.*]] = add nuw nsw <4 x i32> [[TMP39]], [[TMP36]]
; SSE-NEXT: [[TMP41:%.*]] = lshr <4 x i32> [[TMP40]], <i32 1, i32 1, i32 1, i32 1>
; SSE-NEXT: [[TMP42:%.*]] = add nuw nsw <4 x i32> [[TMP41]], [[TMP40]]
; SSE-NEXT: [[TMP43:%.*]] = lshr <4 x i32> [[TMP42]], <i32 2, i32 2, i32 2, i32 2>
; SSE-NEXT: [[TMP44:%.*]] = trunc <4 x i32> [[TMP43]] to <4 x i16>
; SSE-NEXT: [[TMP45:%.]] = bitcast i16 [[TMP33]] to <4 x i16>*
; SSE-NEXT: store <4 x i16> [[TMP44]], <4 x i16>* [[TMP45]], align 2
; SSE-NEXT: [[TMP46:%.]] = getelementptr inbounds i8, i8 [[TMP1]], i64 12
; SSE-NEXT: [[TMP47:%.]] = getelementptr inbounds i8, i8 [[TMP2]], i64 12
; SSE-NEXT: [[TMP48:%.]] = getelementptr inbounds i16, i16 [[TMP0]], i64 12
; SSE-NEXT: [[TMP49:%.]] = bitcast i8 [[TMP46]] to <4 x i8>*
; SSE-NEXT: [[TMP50:%.]] = load <4 x i8>, <4 x i8> [[TMP49]], align 1
; SSE-NEXT: [[TMP51:%.*]] = zext <4 x i8> [[TMP50]] to <4 x i32>
; SSE-NEXT: [[TMP52:%.]] = bitcast i8 [[TMP47]] to <4 x i8>*
; SSE-NEXT: [[TMP53:%.]] = load <4 x i8>, <4 x i8> [[TMP52]], align 1
; SSE-NEXT: [[TMP54:%.*]] = zext <4 x i8> [[TMP53]] to <4 x i32>
; SSE-NEXT: [[TMP55:%.*]] = add nuw nsw <4 x i32> [[TMP54]], [[TMP51]]
; SSE-NEXT: [[TMP56:%.*]] = lshr <4 x i32> [[TMP55]], <i32 1, i32 1, i32 1, i32 1>
; SSE-NEXT: [[TMP57:%.*]] = add nuw nsw <4 x i32> [[TMP56]], [[TMP55]]
; SSE-NEXT: [[TMP58:%.*]] = lshr <4 x i32> [[TMP57]], <i32 2, i32 2, i32 2, i32 2>
; SSE-NEXT: [[TMP59:%.*]] = trunc <4 x i32> [[TMP58]] to <4 x i16>
; SSE-NEXT: [[TMP60:%.]] = bitcast i16 [[TMP48]] to <4 x i16>*
; SSE-NEXT: store <4 x i16> [[TMP59]], <4 x i16>* [[TMP60]], align 2
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @trunc_through_two_adds(		; AVX-LABEL: @trunc_through_two_adds(
; AVX-NEXT: [[TMP4:%.]] = bitcast i8 [[TMP1:%.]] to <8 x i8>		; AVX-NEXT: [[TMP4:%.]] = bitcast i8 [[TMP1:%.]] to <16 x i8>
; AVX-NEXT: [[TMP5:%.]] = load <8 x i8>, <8 x i8> [[TMP4]], align 1		; AVX-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> [[TMP4]], align 1
; AVX-NEXT: [[TMP6:%.*]] = zext <8 x i8> [[TMP5]] to <8 x i32>		; AVX-NEXT: [[TMP6:%.*]] = zext <16 x i8> [[TMP5]] to <16 x i16>
; AVX-NEXT: [[TMP7:%.]] = bitcast i8 [[TMP2:%.]] to <8 x i8>		; AVX-NEXT: [[TMP7:%.]] = bitcast i8 [[TMP2:%.]] to <16 x i8>
; AVX-NEXT: [[TMP8:%.]] = load <8 x i8>, <8 x i8> [[TMP7]], align 1		; AVX-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> [[TMP7]], align 1
; AVX-NEXT: [[TMP9:%.*]] = zext <8 x i8> [[TMP8]] to <8 x i32>		; AVX-NEXT: [[TMP9:%.*]] = zext <16 x i8> [[TMP8]] to <16 x i16>
; AVX-NEXT: [[TMP10:%.*]] = add nuw nsw <8 x i32> [[TMP9]], [[TMP6]]		; AVX-NEXT: [[TMP10:%.*]] = add nuw nsw <16 x i16> [[TMP9]], [[TMP6]]
; AVX-NEXT: [[TMP11:%.*]] = lshr <8 x i32> [[TMP10]], <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>		; AVX-NEXT: [[TMP11:%.*]] = lshr <16 x i16> [[TMP10]], <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
; AVX-NEXT: [[TMP12:%.*]] = add nuw nsw <8 x i32> [[TMP11]], [[TMP10]]		; AVX-NEXT: [[TMP12:%.*]] = add nuw nsw <16 x i16> [[TMP11]], [[TMP10]]
; AVX-NEXT: [[TMP13:%.*]] = lshr <8 x i32> [[TMP12]], <i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2>		; AVX-NEXT: [[TMP13:%.*]] = lshr <16 x i16> [[TMP12]], <i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2>
; AVX-NEXT: [[TMP14:%.*]] = trunc <8 x i32> [[TMP13]] to <8 x i16>		; AVX-NEXT: [[TMP14:%.]] = bitcast i16 [[TMP0:%.]] to <16 x i16>
; AVX-NEXT: [[TMP15:%.]] = bitcast i16 [[TMP0:%.]] to <8 x i16>		; AVX-NEXT: store <16 x i16> [[TMP13]], <16 x i16>* [[TMP14]], align 2
; AVX-NEXT: store <8 x i16> [[TMP14]], <8 x i16>* [[TMP15]], align 2
; AVX-NEXT: [[TMP16:%.]] = getelementptr inbounds i8, i8 [[TMP1]], i64 8
; AVX-NEXT: [[TMP17:%.]] = getelementptr inbounds i8, i8 [[TMP2]], i64 8
; AVX-NEXT: [[TMP18:%.]] = getelementptr inbounds i16, i16 [[TMP0]], i64 8
; AVX-NEXT: [[TMP19:%.]] = bitcast i8 [[TMP16]] to <8 x i8>*
; AVX-NEXT: [[TMP20:%.]] = load <8 x i8>, <8 x i8> [[TMP19]], align 1
; AVX-NEXT: [[TMP21:%.*]] = zext <8 x i8> [[TMP20]] to <8 x i32>
; AVX-NEXT: [[TMP22:%.]] = bitcast i8 [[TMP17]] to <8 x i8>*
; AVX-NEXT: [[TMP23:%.]] = load <8 x i8>, <8 x i8> [[TMP22]], align 1
; AVX-NEXT: [[TMP24:%.*]] = zext <8 x i8> [[TMP23]] to <8 x i32>
; AVX-NEXT: [[TMP25:%.*]] = add nuw nsw <8 x i32> [[TMP24]], [[TMP21]]
; AVX-NEXT: [[TMP26:%.*]] = lshr <8 x i32> [[TMP25]], <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
; AVX-NEXT: [[TMP27:%.*]] = add nuw nsw <8 x i32> [[TMP26]], [[TMP25]]
; AVX-NEXT: [[TMP28:%.*]] = lshr <8 x i32> [[TMP27]], <i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2>
; AVX-NEXT: [[TMP29:%.*]] = trunc <8 x i32> [[TMP28]] to <8 x i16>
; AVX-NEXT: [[TMP30:%.]] = bitcast i16 [[TMP18]] to <8 x i16>*
; AVX-NEXT: store <8 x i16> [[TMP29]], <8 x i16>* [[TMP30]], align 2
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
%4 = load i8, i8* %1, align 1		%4 = load i8, i8* %1, align 1
%5 = zext i8 %4 to i32		%5 = zext i8 %4 to i32
%6 = load i8, i8* %2, align 1		%6 = load i8, i8* %2, align 1
%7 = zext i8 %6 to i32		%7 = zext i8 %6 to i32
%8 = add nuw nsw i32 %7, %5		%8 = add nuw nsw i32 %7, %5
%9 = lshr i32 %8, 1		%9 = lshr i32 %8, 1
▲ Show 20 Lines • Show All 201 Lines • Show Last 20 Lines