This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/AggressiveInstCombine/
-
Transforms/
-
AggressiveInstCombine/
8/8
TruncInstCombine.cpp
-
test/Transforms/
-
Transforms/
-
AggressiveInstCombine/
-
pr50555.ll
3/3
trunc_shifts.ll
-
PhaseOrdering/X86/
-
X86/
-
pr50555.ll

Differential D108201

[AggressiveInstCombine] Add logical shift right instr to `TruncInstCombine` DAG
ClosedPublic

Authored by anton-afanasyev on Aug 17 2021, 4:15 AM.

Download Raw Diff

Details

Reviewers

lebedev.ri
spatel
RKSimon

Commits

rGcfb6dfcbd13b: [AggressiveInstCombine] Add logical shift right instr to `TruncInstCombine` DAG

Summary

Add lshr instruction to the DAG post-dominated by trunc, allowing
TruncInstCombine to reduce bitwidth of expressions containing
these instructions.

We should be shifting by less than the target bitwidth.
Also it is sufficient to require that all truncated bits
of the value-to-be-shifted are zeros: https://alive2.llvm.org/ce/z/_LytbB

Alive2 variable-length proof:
https://godbolt.org/z/1srE1aqzf => s/32/8/ => https://alive2.llvm.org/ce/z/StwPia

Part of https://reviews.llvm.org/D107766

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

anton-afanasyev created this revision.Aug 17 2021, 4:15 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptAug 17 2021, 4:15 AM

anton-afanasyev requested review of this revision.Aug 17 2021, 4:15 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 17 2021, 4:15 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

anton-afanasyev mentioned this in D107766: [AggressiveInstCombine] Add shift instructions to `TruncInstCombine` DAG.Aug 17 2021, 4:16 AM

Harbormaster completed remote builds in B119873: Diff 366858.Aug 17 2021, 4:54 AM

lebedev.ri edited the summary of this revision. (Show Details)Aug 17 2021, 5:31 AM

lebedev.ri edited the summary of this revision. (Show Details)Aug 17 2021, 6:14 AM

We don't actually need *all* the high bits to be zeros,
only the ones that would be potentially shifted-in,
iff we don't truncate them away first.
E.g.: 0b11100111, target bit width of 4, and shift amount of 0..1: https://alive2.llvm.org/ce/z/jJ85EE

But indeed, we only have the 'min' bitwidth, we can't really model that here,
especially because 'max' bitwidth would differ for different 'min' bitwidths...

So LG, i guess.
@spatel ?

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
293

This revision is now accepted and ready to land.Aug 17 2021, 7:34 AM

We don't actually need *all* the high bits to be zeros,
only the ones that would be potentially shifted-in,

Sure, that's why it's only sufficient condition, not necessary one. Hope it's the most common case in real life.

spatel mentioned this in D108091: [AggressiveInstCombine] Add shift left instruction to `TruncInstCombine` DAG.Aug 17 2021, 11:52 AM

Rebase after shl bugfix

Harbormaster completed remote builds in B120101: Diff 367177.Aug 18 2021, 5:06 AM

In D108201#2951907, @anton-afanasyev wrote:

Rebase after shl bugfix

Add tests similar to 0988488ed461 (multiple widths) and 803270c0c691 (64-bit) for lshr patterns?

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
295–296	This wasn't updated with the suggested change to use getActiveBits(). IIUC, we can make this more efficient (avoid computeKnownBits in some cases) by hoisting the common check for MinBitWidth >= OrigBitWidth above the extra check for lshr: unsigned MinBitWidth = KnownRHS.getMaxValue() .uadd_sat(APInt(SrcBitWidth, 1)) .getLimitedValue(SrcBitWidth); if (MinBitWidth >= OrigBitWidth) return nullptr; if (I->getOpcode() == Instruction::LShr) { KnownBits KnownLHS = computeKnownBits(I->getOperand(0), DL); if (KnownLHS.getMaxValue().getActiveBits() >= OrigBitWidth) return nullptr; } Itr.second.MinBitWidth = MinBitWidth;
llvm/test/Transforms/AggressiveInstCombine/trunc_shifts.ll
212	Similar to the previous `shl` example - this is another simplification to 0 that isn't handled by -instsimplify (but regular -instcombine gets it).

In D108201#2952032, @spatel wrote:

In D108201#2951907, @anton-afanasyev wrote:

Rebase after shl bugfix

Add tests similar to 0988488ed461 (multiple widths) and 803270c0c691 (64-bit) for lshr patterns?

Sure, added tests

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
293	Thanks, used `getActiveBits()`
295–296	Thanks, duplicated check for efficiency.
llvm/test/Transforms/AggressiveInstCombine/trunc_shifts.ll
212	Ok, but here it is simple negative test, checking we do not fold to poisonous `lshr i16 x, 16`.

Address comments

Harbormaster completed remote builds in B120125: Diff 367206.Aug 18 2021, 7:45 AM

spatel added inline comments.Aug 18 2021, 8:10 AM

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
297–298	We already returned if MinBitWdith based on KnownRHS was too big, so this std::max is redundant?
llvm/test/Transforms/AggressiveInstCombine/trunc_shifts.ll
212	Yes, not a problem - just pointing out that there's another opportunity to improve InstSimplify and run some kind of cleanup in this pass.

anton-afanasyev marked 2 inline comments as done.Aug 18 2021, 9:31 AM

anton-afanasyev added inline comments.

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
297–298	No, it isn't: we still use this updated MinBitWidth by setting it to instruction Info: Itr.second.MinBitWidth = MinBitWidth; This value is then used while computing common MinBitWidth in getMinBitWidth() function.

spatel added inline comments.Aug 18 2021, 9:49 AM

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
297–298	Ah, ok. Do we have a test to exercise that path? I didn't see any test failures when I made the change locally.

Add test to exercise uncovered path

Harbormaster completed remote builds in B120164: Diff 367264.Aug 18 2021, 11:13 AM

anton-afanasyev added inline comments.Aug 18 2021, 11:14 AM

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
297–298	Ok, added test to cover that path. With your change we get incorrect transformation: https://alive2.llvm.org/ce/z/fZ6jFb

LGTM

This revision was landed with ongoing or failed builds.Aug 18 2021, 12:22 PM

Closed by commit rGcfb6dfcbd13b: [AggressiveInstCombine] Add logical shift right instr to `TruncInstCombine` DAG (authored by anton-afanasyev). · Explain Why

This revision was automatically updated to reflect the committed changes.

anton-afanasyev added a commit: rGcfb6dfcbd13b: [AggressiveInstCombine] Add logical shift right instr to `TruncInstCombine` DAG.

looks like this caused a crash in some chromium code

c++ repro,

$ cat t.cpp
struct a {
  typedef unsigned char b;
};
struct c {
  static unsigned char *d(unsigned char *g, unsigned *p, int q) {
    *p = *g;
    if (*p)
      *p = q;
    return g;
  }
};
template <typename h> void i(int *, int, int, unsigned *, typename h::b *) {
  char *e = 0;
  unsigned char j;
  unsigned f;
  c::d(&j, &f, 33);
  *e = f >> 2;
}
int k, l, m;
unsigned n;
void o() { i<a>(&k, l, m, &n, (unsigned char *)o); }

$ clang -cc1 -fno-delete-null-pointer-checks -O3 -fsanitize=fuzzer-no-link -emit-llvm t.cpp

Also made an IR repro,

$ cat repro.ll
; ModuleID = 't.cpp'
source_filename = "t.cpp"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

$_Z1iI1aEvPiiiPjPNT_1bE = comdat any

; Function Attrs: mustprogress nounwind null_pointer_is_valid optforfuzzing
define linkonce_odr void @_Z1iI1aEvPiiiPjPNT_1bE(i32* %0, i32 %1, i32 %2, i32* %3, i8* %4) local_unnamed_addr #0 comdat {
_ZN1c1dEPhPji.exit:
  %shr = lshr i32 33, 2
  %conv = trunc i32 %shr to i8
  store i8 %conv, i8* null, align 536870912, !tbaa !6
  ret void
}

attributes #0 = { mustprogress nounwind null_pointer_is_valid optforfuzzing "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }

!llvm.module.flags = !{!0}
!llvm.ident = !{!1}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 14.0.0"}
!2 = !{!3, !3, i64 0}
!3 = !{!"int", !4, i64 0}
!4 = !{!"omnipotent char", !5, i64 0}
!5 = !{!"Simple C++ TBAA"}
!6 = !{!4, !4, i64 0}

$ opt -aggressive-instcombine -S repro.ll

In D108201#2956220, @akhuang wrote:

looks like this caused a crash in some chromium code

Hi @akhuang , I'm not able to reproduce this crash. Could you please decribe your environment/crash output/stack trace?

> opt -aggressive-instcombine -S repro.ll 
; ModuleID = 'repro.ll'
source_filename = "repro.cpp"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@k = local_unnamed_addr global i32 0, align 4
@l = local_unnamed_addr global i32 0, align 4
@m = local_unnamed_addr global i32 0, align 4
@n = local_unnamed_addr global i32 0, align 4

; Function Attrs: mustprogress nofree norecurse nosync nounwind null_pointer_is_valid optforfuzzing willreturn
define dso_local void @_Z1ov() local_unnamed_addr #0 {
entry:
  store i8 8, i8* null, align 536870912, !tbaa !2
  ret void
}

attributes #0 = { mustprogress nofree norecurse nosync nounwind null_pointer_is_valid optforfuzzing willreturn "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }

!llvm.module.flags = !{!0}
!llvm.ident = !{!1}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 14.0.0 (git@github.com:llvm/llvm-project.git b3a45e286fdfa73dd758472363dfafe7543cc077)"}
!2 = !{!3, !3, i64 0}
!3 = !{!"omnipotent char", !4, i64 0}
!4 = !{!"Simple C++ TBAA"}

> clang -cc1 -fno-delete-null-pointer-checks -O3 -fsanitize=fuzzer-no-link -emit-llvm  t.cpp
> cat t.ll
(the same output)

spatel mentioned this in rGdd19f342fa21: [AggressiveInstCombine] guard against applying instruction flags with constant….Aug 20 2021, 9:22 AM

In D108201#2956690, @anton-afanasyev wrote:

In D108201#2956220, @akhuang wrote:

looks like this caused a crash in some chromium code

Hi @akhuang , I'm not able to reproduce this crash. Could you please decribe your environment/crash output/stack trace?

This should repro in a debug build, but might not be visible in release+asserts; I see it when building with clang on macOS, but it doesn't seem to repro on godbolt.
I think it's a simple bug and fixed with:
dd19f342fa21

This should repro in a debug build, but might not be visible in release+asserts; I see it when building with clang on macOS, but it doesn't seem to repro on godbolt.
I think it's a simple bug and fixed with:
dd19f342fa21

Thank you!

In D108201#2957620, @anton-afanasyev wrote:

This should repro in a debug build, but might not be visible in release+asserts; I see it when building with clang on macOS, but it doesn't seem to repro on godbolt.
I think it's a simple bug and fixed with:
dd19f342fa21

Thank you!

Easy fix, but it does point to a larger problem (for the 3rd time in this patch set...) - we're seeing unsimplified IR, but this pass is not really prepared to handle that.
Maybe we'll sort this out when we try to make this pass run at -O2 instead of only -O3 (by having it run directly after -instcombine for example).

For reference, the exact bug was also responsible for:
https://llvm.org/PR51553
(and again, I can't remember why we run AIC before regular IC in the -O3 pipeline...)

anton-afanasyev mentioned this in D113179: [Passes] Move AggressiveInstCombine after InstCombine.Nov 4 2021, 3:47 AM

anton-afanasyev mentioned this in rGc34d157fc739: [Passes] Move AggressiveInstCombine after InstCombine.Dec 4 2021, 3:24 AM

Not sure if it has been raised elsewhere already, but it looks like this is causing a regression: https://github.com/llvm/llvm-project/issues/51922

In D108201#3249134, @fhahn wrote:

Not sure if it has been raised elsewhere already, but it looks like this is causing a regression: https://github.com/llvm/llvm-project/issues/51922

@fhahn Thanks for pointing this out! Looks like I've missed this during bugzilla migration.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

AggressiveInstCombine/

TruncInstCombine.cpp

26 lines

test/

Transforms/

AggressiveInstCombine/

pr50555.ll

24 lines

trunc_shifts.ll

75 lines

PhaseOrdering/

X86/

pr50555.ll

217 lines

Diff 367300

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp

Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines case Instruction::SExt:

break; break;

case Instruction::Add: case Instruction::Add:

case Instruction::Sub: case Instruction::Sub:

case Instruction::Mul: case Instruction::Mul:

case Instruction::And: case Instruction::And:

case Instruction::Or: case Instruction::Or:

case Instruction::Xor: case Instruction::Xor:

case Instruction::Shl: case Instruction::Shl:

case Instruction::LShr:

Ops.push_back(I->getOperand(0)); Ops.push_back(I->getOperand(0));

Ops.push_back(I->getOperand(1)); Ops.push_back(I->getOperand(1));

break; break;

case Instruction::Select: case Instruction::Select:

Ops.push_back(I->getOperand(1)); Ops.push_back(I->getOperand(1));

Ops.push_back(I->getOperand(2)); Ops.push_back(I->getOperand(2));

break; break;

default: default:

▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines case Instruction::SExt:

break; break;

case Instruction::Add: case Instruction::Add:

case Instruction::Sub: case Instruction::Sub:

case Instruction::Mul: case Instruction::Mul:

case Instruction::And: case Instruction::And:

case Instruction::Or: case Instruction::Or:

case Instruction::Xor: case Instruction::Xor:

case Instruction::Shl: case Instruction::Shl:

case Instruction::LShr:

case Instruction::Select: { case Instruction::Select: {

SmallVector<Value *, 2> Operands; SmallVector<Value *, 2> Operands;

getRelevantOperands(I, Operands); getRelevantOperands(I, Operands);

append_range(Worklist, Operands); append_range(Worklist, Operands);

break; break;

} }

default: default:

// TODO: Can handle more cases here: // TODO: Can handle more cases here:

// 1. shufflevector, extractelement, insertelement // 1. shufflevector, extractelement, insertelement

// 2. udiv, urem // 2. udiv, urem

// 3. lshr, ashr // 3. ashr

// 4. phi node(and loop handling) // 4. phi node(and loop handling)

// ... // ...

return false; return false;

} }

return true; return true;

} }

▲ Show 20 Lines • Show All 116 Lines • ▼ Show 20 Lines for (auto *U : I->users())

return nullptr; return nullptr;

DesiredBitWidth = ExtInstBitWidth; DesiredBitWidth = ExtInstBitWidth;

} }

unsigned OrigBitWidth = unsigned OrigBitWidth =

CurrentTruncInst->getOperand(0)->getType()->getScalarSizeInBits(); CurrentTruncInst->getOperand(0)->getType()->getScalarSizeInBits();

// Initialize MinBitWidth for `shl` instructions with the minimum number // Initialize MinBitWidth for shift instructions with the minimum number

// that is greater than shift amount (i.e. shift amount + 1). // that is greater than shift amount (i.e. shift amount + 1). For `lshr`

// adjust MinBitWidth so that all potentially truncated bits of

// the value-to-be-shifted are zeros.

// Also normalize MinBitWidth not to be greater than source bitwidth. // Also normalize MinBitWidth not to be greater than source bitwidth.

for (auto &Itr : InstInfoMap) { for (auto &Itr : InstInfoMap) {

Instruction *I = Itr.first; Instruction *I = Itr.first;

if (I->getOpcode() == Instruction::Shl) { if (I->getOpcode() == Instruction::Shl ||

I->getOpcode() == Instruction::LShr) {

KnownBits KnownRHS = computeKnownBits(I->getOperand(1), DL); KnownBits KnownRHS = computeKnownBits(I->getOperand(1), DL);

const unsigned SrcBitWidth = KnownRHS.getBitWidth(); const unsigned SrcBitWidth = KnownRHS.getBitWidth();

unsigned MinBitWidth = KnownRHS.getMaxValue() unsigned MinBitWidth = KnownRHS.getMaxValue()

.uadd_sat(APInt(SrcBitWidth, 1)) .uadd_sat(APInt(SrcBitWidth, 1))

.getLimitedValue(SrcBitWidth); .getLimitedValue(SrcBitWidth);

if (MinBitWidth >= OrigBitWidth) if (MinBitWidth >= OrigBitWidth)

lebedev.riUnsubmitted

Done

MinBitWidth = std::max(MinBitWidth,

- SrcBitWidth - KnownLHS.countMinLeadingZeros());

+ KnownLHS.getMaxValue().getActiveBits());

}

MinBitWidth = std::min(MinBitWidth, SrcBitWidth);

lebedev.ri:

anton-afanasyevAuthorUnsubmitted

Done

Thanks, used getActiveBits()

anton-afanasyev: Thanks, used `getActiveBits()`

return nullptr; return nullptr;

if (I->getOpcode() == Instruction::LShr) {

KnownBits KnownLHS = computeKnownBits(I->getOperand(0), DL);

spatelUnsubmitted

Done

This wasn't updated with the suggested change to use getActiveBits().
IIUC, we can make this more efficient (avoid computeKnownBits in some cases) by hoisting the common check for MinBitWidth >= OrigBitWidth above the extra check for lshr:

unsigned MinBitWidth = KnownRHS.getMaxValue()
                           .uadd_sat(APInt(SrcBitWidth, 1))
                           .getLimitedValue(SrcBitWidth);
if (MinBitWidth >= OrigBitWidth)
  return nullptr;
if (I->getOpcode() == Instruction::LShr) {
  KnownBits KnownLHS = computeKnownBits(I->getOperand(0), DL);
  if (KnownLHS.getMaxValue().getActiveBits() >= OrigBitWidth)
    return nullptr;
}
Itr.second.MinBitWidth = MinBitWidth;

spatel: This wasn't updated with the suggested change to use getActiveBits(). IIUC, we can make this…

anton-afanasyevAuthorUnsubmitted

Done

Thanks, duplicated check for efficiency.

anton-afanasyev: Thanks, duplicated check for efficiency.

MinBitWidth =

std::max(MinBitWidth, KnownLHS.getMaxValue().getActiveBits());

spatelUnsubmitted

Done

We already returned if MinBitWdith based on KnownRHS was too big, so this std::max is redundant?

spatel: We already returned if MinBitWdith based on KnownRHS was too big, so this std::max is redundant?

anton-afanasyevAuthorUnsubmitted

Done

No, it isn't: we still use this updated MinBitWidth by setting it to instruction Info:

Itr.second.MinBitWidth = MinBitWidth;

This value is then used while computing common MinBitWidth in getMinBitWidth() function.

anton-afanasyev: No, it isn't: we still use this updated MinBitWidth by setting it to instruction Info: ``` Itr.

spatelUnsubmitted

Done

Ah, ok. Do we have a test to exercise that path? I didn't see any test failures when I made the change locally.

spatel: Ah, ok. Do we have a test to exercise that path? I didn't see any test failures when I made the…

anton-afanasyevAuthorUnsubmitted

Done

Ok, added test to cover that path. With your change we get incorrect transformation: https://alive2.llvm.org/ce/z/fZ6jFb

anton-afanasyev: Ok, added test to cover that path. With your change we get incorrect transformation: https…

if (MinBitWidth >= OrigBitWidth)

return nullptr;

}

Itr.second.MinBitWidth = MinBitWidth; Itr.second.MinBitWidth = MinBitWidth;

} }

// Calculate minimum allowed bit-width allowed for shrinking the currently // Calculate minimum allowed bit-width allowed for shrinking the currently

// visited truncate's operand. // visited truncate's operand.

unsigned MinBitWidth = getMinBitWidth(); unsigned MinBitWidth = getMinBitWidth();

▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines case Instruction::SExt: {

break; break;

} }

case Instruction::Add: case Instruction::Add:

case Instruction::Sub: case Instruction::Sub:

case Instruction::Mul: case Instruction::Mul:

case Instruction::And: case Instruction::And:

case Instruction::Or: case Instruction::Or:

case Instruction::Xor: case Instruction::Xor:

case Instruction::Shl: { case Instruction::Shl:

case Instruction::LShr: {

Value *LHS = getReducedOperand(I->getOperand(0), SclTy); Value *LHS = getReducedOperand(I->getOperand(0), SclTy);

Value *RHS = getReducedOperand(I->getOperand(1), SclTy); Value *RHS = getReducedOperand(I->getOperand(1), SclTy);

Res = Builder.CreateBinOp((Instruction::BinaryOps)Opc, LHS, RHS); Res = Builder.CreateBinOp((Instruction::BinaryOps)Opc, LHS, RHS);

// Preserve `exact` flag since truncation doesn't change exactness

if (Opc == Instruction::LShr)

cast<Instruction>(Res)->setIsExact(I->isExact());

break; break;

} }

case Instruction::Select: { case Instruction::Select: {

Value *Op0 = I->getOperand(0); Value *Op0 = I->getOperand(0);

Value *LHS = getReducedOperand(I->getOperand(1), SclTy); Value *LHS = getReducedOperand(I->getOperand(1), SclTy);

Value *RHS = getReducedOperand(I->getOperand(2), SclTy); Value *RHS = getReducedOperand(I->getOperand(2), SclTy);

Res = Builder.CreateSelect(Op0, LHS, RHS); Res = Builder.CreateSelect(Op0, LHS, RHS);

break; break;

▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

llvm/test/Transforms/AggressiveInstCombine/pr50555.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -aggressive-instcombine -S \| FileCheck %s			; RUN: opt < %s -aggressive-instcombine -S \| FileCheck %s

	define void @trunc_one_add(i16* %a, i8 %b) {			define void @trunc_one_add(i16* %a, i8 %b) {
	; CHECK-LABEL: @trunc_one_add(			; CHECK-LABEL: @trunc_one_add(
	; CHECK-NEXT: [[ZEXT:%.]] = zext i8 [[B:%.]] to i32			; CHECK-NEXT: [[ZEXT:%.]] = zext i8 [[B:%.]] to i16
	; CHECK-NEXT: [[SHR:%.*]] = lshr i32 [[ZEXT]], 1			; CHECK-NEXT: [[SHR:%.*]] = lshr i16 [[ZEXT]], 1
	; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[ZEXT]], [[SHR]]			; CHECK-NEXT: [[ADD:%.*]] = add i16 [[ZEXT]], [[SHR]]
	; CHECK-NEXT: [[TRUNC:%.*]] = trunc i32 [[ADD]] to i16			; CHECK-NEXT: store i16 [[ADD]], i16* [[A:%.*]], align 2
	; CHECK-NEXT: store i16 [[TRUNC]], i16* [[A:%.*]], align 2
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%zext = zext i8 %b to i32			%zext = zext i8 %b to i32
	%shr = lshr i32 %zext, 1			%shr = lshr i32 %zext, 1
	%add = add nsw i32 %zext, %shr			%add = add nsw i32 %zext, %shr
	%trunc = trunc i32 %add to i16			%trunc = trunc i32 %add to i16
	store i16 %trunc, i16* %a, align 2			store i16 %trunc, i16* %a, align 2
	ret void			ret void
	}			}

	define void @trunc_two_adds(i16* %a, i8 %b, i8 %c) {			define void @trunc_two_adds(i16* %a, i8 %b, i8 %c) {
	; CHECK-LABEL: @trunc_two_adds(			; CHECK-LABEL: @trunc_two_adds(
	; CHECK-NEXT: [[ZEXT1:%.]] = zext i8 [[B:%.]] to i32			; CHECK-NEXT: [[ZEXT1:%.]] = zext i8 [[B:%.]] to i16
	; CHECK-NEXT: [[ZEXT2:%.]] = zext i8 [[C:%.]] to i32			; CHECK-NEXT: [[ZEXT2:%.]] = zext i8 [[C:%.]] to i16
	; CHECK-NEXT: [[ADD1:%.*]] = add nuw nsw i32 [[ZEXT1]], [[ZEXT2]]			; CHECK-NEXT: [[ADD1:%.*]] = add i16 [[ZEXT1]], [[ZEXT2]]
	; CHECK-NEXT: [[SHR1:%.*]] = lshr i32 [[ADD1]], 1			; CHECK-NEXT: [[SHR1:%.*]] = lshr i16 [[ADD1]], 1
	; CHECK-NEXT: [[ADD2:%.*]] = add nuw nsw i32 [[ADD1]], [[SHR1]]			; CHECK-NEXT: [[ADD2:%.*]] = add i16 [[ADD1]], [[SHR1]]
	; CHECK-NEXT: [[SHR2:%.*]] = lshr i32 [[ADD2]], 2			; CHECK-NEXT: [[SHR2:%.*]] = lshr i16 [[ADD2]], 2
	; CHECK-NEXT: [[TRUNC:%.*]] = trunc i32 [[SHR2]] to i16			; CHECK-NEXT: store i16 [[SHR2]], i16* [[A:%.*]], align 2
	; CHECK-NEXT: store i16 [[TRUNC]], i16* [[A:%.*]], align 2
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%zext1 = zext i8 %b to i32			%zext1 = zext i8 %b to i32
	%zext2 = zext i8 %c to i32			%zext2 = zext i8 %c to i32
	%add1 = add nuw nsw i32 %zext1, %zext2			%add1 = add nuw nsw i32 %zext1, %zext2
	%shr1 = lshr i32 %add1, 1			%shr1 = lshr i32 %add1, 1
	%add2 = add nuw nsw i32 %add1, %shr1			%add2 = add nuw nsw i32 %add1, %shr1
	%shr2 = lshr i32 %add2, 2			%shr2 = lshr i32 %add2, 2
	%trunc = trunc i32 %shr2 to i16			%trunc = trunc i32 %shr2 to i16
	store i16 %trunc, i16* %a, align 2			store i16 %trunc, i16* %a, align 2
	ret void			ret void
	}			}

llvm/test/Transforms/AggressiveInstCombine/trunc_shifts.ll

Show First 20 Lines • Show All 184 Lines • ▼ Show 20 Lines	;
%z = zext i8 %x to i32		%z = zext i8 %x to i32
%s = shl nsw i32 %z, 15		%s = shl nsw i32 %z, 15
%t = trunc i32 %s to i16		%t = trunc i32 %s to i16
ret i16 %t		ret i16 %t
}		}

define i16 @lshr_15(i16 %x) {		define i16 @lshr_15(i16 %x) {
; CHECK-LABEL: @lshr_15(		; CHECK-LABEL: @lshr_15(
; CHECK-NEXT: [[ZEXT:%.]] = zext i16 [[X:%.]] to i32		; CHECK-NEXT: [[LSHR:%.]] = lshr i16 [[X:%.]], 15
; CHECK-NEXT: [[LSHR:%.*]] = lshr i32 [[ZEXT]], 15		; CHECK-NEXT: ret i16 [[LSHR]]
; CHECK-NEXT: [[TRUNC:%.*]] = trunc i32 [[LSHR]] to i16
; CHECK-NEXT: ret i16 [[TRUNC]]
;		;
%zext = zext i16 %x to i32		%zext = zext i16 %x to i32
%lshr = lshr i32 %zext, 15		%lshr = lshr i32 %zext, 15
%trunc = trunc i32 %lshr to i16		%trunc = trunc i32 %lshr to i16
ret i16 %trunc		ret i16 %trunc
}		}

; Negative test		; Negative test

define i16 @lshr_16(i16 %x) {		define i16 @lshr_16(i16 %x) {
; CHECK-LABEL: @lshr_16(		; CHECK-LABEL: @lshr_16(
; CHECK-NEXT: [[ZEXT:%.]] = zext i16 [[X:%.]] to i32		; CHECK-NEXT: [[ZEXT:%.]] = zext i16 [[X:%.]] to i32
; CHECK-NEXT: [[LSHR:%.*]] = lshr i32 [[ZEXT]], 16		; CHECK-NEXT: [[LSHR:%.*]] = lshr i32 [[ZEXT]], 16
; CHECK-NEXT: [[TRUNC:%.*]] = trunc i32 [[LSHR]] to i16		; CHECK-NEXT: [[TRUNC:%.*]] = trunc i32 [[LSHR]] to i16
; CHECK-NEXT: ret i16 [[TRUNC]]		; CHECK-NEXT: ret i16 [[TRUNC]]
;		;
%zext = zext i16 %x to i32		%zext = zext i16 %x to i32
%lshr = lshr i32 %zext, 16		%lshr = lshr i32 %zext, 16
		spatelUnsubmitted Done Reply Inline Actions Similar to the previous `shl` example - this is another simplification to 0 that isn't handled by -instsimplify (but regular -instcombine gets it). spatel: Similar to the previous `shl` example - this is another simplification to 0 that isn't handled…
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Ok, but here it is simple negative test, checking we do not fold to poisonous `lshr i16 x, 16`. anton-afanasyev: Ok, but here it is simple negative test, checking we do not fold to poisonous `lshr i16 x, 16`.
		spatelUnsubmitted Done Reply Inline Actions Yes, not a problem - just pointing out that there's another opportunity to improve InstSimplify and run some kind of cleanup in this pass. spatel: Yes, not a problem - just pointing out that there's another opportunity to improve InstSimplify…
%trunc = trunc i32 %lshr to i16		%trunc = trunc i32 %lshr to i16
ret i16 %trunc		ret i16 %trunc
}		}

; Negative test		; Negative test

define i16 @lshr_var_shift_amount(i8 %x, i8 %amt) {		define i16 @lshr_var_shift_amount(i8 %x, i8 %amt) {
; CHECK-LABEL: @lshr_var_shift_amount(		; CHECK-LABEL: @lshr_var_shift_amount(
Show All 11 Lines	;
%a = add i32 %s, %z		%a = add i32 %s, %z
%s2 = lshr i32 %a, 2		%s2 = lshr i32 %a, 2
%t = trunc i32 %s2 to i16		%t = trunc i32 %s2 to i16
ret i16 %t		ret i16 %t
}		}

define i16 @lshr_var_bounded_shift_amount(i8 %x, i8 %amt) {		define i16 @lshr_var_bounded_shift_amount(i8 %x, i8 %amt) {
; CHECK-LABEL: @lshr_var_bounded_shift_amount(		; CHECK-LABEL: @lshr_var_bounded_shift_amount(
; CHECK-NEXT: [[Z:%.]] = zext i8 [[X:%.]] to i32		; CHECK-NEXT: [[Z:%.]] = zext i8 [[X:%.]] to i16
; CHECK-NEXT: [[ZA:%.]] = zext i8 [[AMT:%.]] to i32		; CHECK-NEXT: [[ZA:%.]] = zext i8 [[AMT:%.]] to i16
; CHECK-NEXT: [[ZA2:%.*]] = and i32 [[ZA]], 15		; CHECK-NEXT: [[ZA2:%.*]] = and i16 [[ZA]], 15
; CHECK-NEXT: [[S:%.*]] = lshr i32 [[Z]], [[ZA2]]		; CHECK-NEXT: [[S:%.*]] = lshr i16 [[Z]], [[ZA2]]
; CHECK-NEXT: [[A:%.*]] = add i32 [[S]], [[Z]]		; CHECK-NEXT: [[A:%.*]] = add i16 [[S]], [[Z]]
; CHECK-NEXT: [[S2:%.*]] = lshr i32 [[A]], 2		; CHECK-NEXT: [[S2:%.*]] = lshr i16 [[A]], 2
; CHECK-NEXT: [[T:%.*]] = trunc i32 [[S2]] to i16		; CHECK-NEXT: ret i16 [[S2]]
; CHECK-NEXT: ret i16 [[T]]
;		;
%z = zext i8 %x to i32		%z = zext i8 %x to i32
%za = zext i8 %amt to i32		%za = zext i8 %amt to i32
%za2 = and i32 %za, 15		%za2 = and i32 %za, 15
%s = lshr i32 %z, %za2		%s = lshr i32 %z, %za2
%a = add i32 %s, %z		%a = add i32 %s, %z
%s2 = lshr i32 %a, 2		%s2 = lshr i32 %a, 2
%t = trunc i32 %s2 to i16		%t = trunc i32 %s2 to i16
Show All 16 Lines	;
%and = and i64 %sext, 4294967295		%and = and i64 %sext, 4294967295
%shl = lshr i64 %zext, %and		%shl = lshr i64 %zext, %and
%trunc = trunc i64 %shl to i32		%trunc = trunc i64 %shl to i32
ret i32 %trunc		ret i32 %trunc
}		}

define void @lshr_big_dag(i16* %a, i8 %b, i8 %c) {		define void @lshr_big_dag(i16* %a, i8 %b, i8 %c) {
; CHECK-LABEL: @lshr_big_dag(		; CHECK-LABEL: @lshr_big_dag(
; CHECK-NEXT: [[ZEXT1:%.]] = zext i8 [[B:%.]] to i32		; CHECK-NEXT: [[ZEXT1:%.]] = zext i8 [[B:%.]] to i16
; CHECK-NEXT: [[ZEXT2:%.]] = zext i8 [[C:%.]] to i32		; CHECK-NEXT: [[ZEXT2:%.]] = zext i8 [[C:%.]] to i16
; CHECK-NEXT: [[ADD1:%.*]] = add i32 [[ZEXT1]], [[ZEXT2]]		; CHECK-NEXT: [[ADD1:%.*]] = add i16 [[ZEXT1]], [[ZEXT2]]
; CHECK-NEXT: [[SFT1:%.*]] = and i32 [[ADD1]], 15		; CHECK-NEXT: [[SFT1:%.*]] = and i16 [[ADD1]], 15
; CHECK-NEXT: [[SHR1:%.*]] = lshr i32 [[ADD1]], [[SFT1]]		; CHECK-NEXT: [[SHR1:%.*]] = lshr i16 [[ADD1]], [[SFT1]]
; CHECK-NEXT: [[ADD2:%.*]] = add i32 [[ADD1]], [[SHR1]]		; CHECK-NEXT: [[ADD2:%.*]] = add i16 [[ADD1]], [[SHR1]]
; CHECK-NEXT: [[SFT2:%.*]] = and i32 [[ADD2]], 7		; CHECK-NEXT: [[SFT2:%.*]] = and i16 [[ADD2]], 7
; CHECK-NEXT: [[SHR2:%.*]] = lshr i32 [[ADD2]], [[SFT2]]		; CHECK-NEXT: [[SHR2:%.*]] = lshr i16 [[ADD2]], [[SFT2]]
; CHECK-NEXT: [[TRUNC:%.*]] = trunc i32 [[SHR2]] to i16		; CHECK-NEXT: store i16 [[SHR2]], i16* [[A:%.*]], align 2
; CHECK-NEXT: store i16 [[TRUNC]], i16* [[A:%.*]], align 2
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%zext1 = zext i8 %b to i32		%zext1 = zext i8 %b to i32
%zext2 = zext i8 %c to i32		%zext2 = zext i8 %c to i32
%add1 = add i32 %zext1, %zext2		%add1 = add i32 %zext1, %zext2
%sft1 = and i32 %add1, 15		%sft1 = and i32 %add1, 15
%shr1 = lshr i32 %add1, %sft1		%shr1 = lshr i32 %add1, %sft1
%add2 = add i32 %add1, %shr1		%add2 = add i32 %add1, %shr1
%sft2 = and i32 %add2, 7		%sft2 = and i32 %add2, 7
%shr2 = lshr i32 %add2, %sft2		%shr2 = lshr i32 %add2, %sft2
%trunc = trunc i32 %shr2 to i16		%trunc = trunc i32 %shr2 to i16
store i16 %trunc, i16* %a, align 2		store i16 %trunc, i16* %a, align 2
ret void		ret void
}		}

define i16 @lshr_smaller_bitwidth(i8 %x) {		define i16 @lshr_smaller_bitwidth(i8 %x) {
; CHECK-LABEL: @lshr_smaller_bitwidth(		; CHECK-LABEL: @lshr_smaller_bitwidth(
; CHECK-NEXT: [[ZEXT:%.]] = zext i8 [[X:%.]] to i16		; CHECK-NEXT: [[ZEXT:%.]] = zext i8 [[X:%.]] to i16
; CHECK-NEXT: [[LSHR:%.*]] = lshr i16 [[ZEXT]], 1		; CHECK-NEXT: [[LSHR:%.*]] = lshr i16 [[ZEXT]], 1
; CHECK-NEXT: [[ZEXT2:%.*]] = zext i16 [[LSHR]] to i32		; CHECK-NEXT: [[LSHR2:%.*]] = lshr i16 [[LSHR]], 2
; CHECK-NEXT: [[LSHR2:%.*]] = lshr i32 [[ZEXT2]], 2		; CHECK-NEXT: ret i16 [[LSHR2]]
; CHECK-NEXT: [[TRUNC:%.*]] = trunc i32 [[LSHR2]] to i16
; CHECK-NEXT: ret i16 [[TRUNC]]
;		;
%zext = zext i8 %x to i16		%zext = zext i8 %x to i16
%lshr = lshr i16 %zext, 1		%lshr = lshr i16 %zext, 1
%zext2 = zext i16 %lshr to i32		%zext2 = zext i16 %lshr to i32
%lshr2 = lshr i32 %zext2, 2		%lshr2 = lshr i32 %zext2, 2
%trunc = trunc i32 %lshr2 to i16		%trunc = trunc i32 %lshr2 to i16
ret i16 %trunc		ret i16 %trunc
}		}

define i16 @lshr_larger_bitwidth(i8 %x) {		define i16 @lshr_larger_bitwidth(i8 %x) {
; CHECK-LABEL: @lshr_larger_bitwidth(		; CHECK-LABEL: @lshr_larger_bitwidth(
; CHECK-NEXT: [[ZEXT:%.]] = zext i8 [[X:%.]] to i64		; CHECK-NEXT: [[ZEXT:%.]] = zext i8 [[X:%.]] to i16
; CHECK-NEXT: [[LSHR:%.*]] = lshr i64 [[ZEXT]], 1		; CHECK-NEXT: [[LSHR:%.*]] = lshr i16 [[ZEXT]], 1
; CHECK-NEXT: [[ZEXT2:%.*]] = trunc i64 [[LSHR]] to i32		; CHECK-NEXT: [[AND:%.*]] = lshr i16 [[LSHR]], 2
; CHECK-NEXT: [[AND:%.*]] = lshr i32 [[ZEXT2]], 2		; CHECK-NEXT: ret i16 [[AND]]
; CHECK-NEXT: [[TRUNC:%.*]] = trunc i32 [[AND]] to i16
; CHECK-NEXT: ret i16 [[TRUNC]]
;		;
%zext = zext i8 %x to i64		%zext = zext i8 %x to i64
%lshr = lshr i64 %zext, 1		%lshr = lshr i64 %zext, 1
%zext2 = trunc i64 %lshr to i32		%zext2 = trunc i64 %lshr to i32
%and = lshr i32 %zext2, 2		%and = lshr i32 %zext2, 2
%trunc = trunc i32 %and to i16		%trunc = trunc i32 %and to i16
ret i16 %trunc		ret i16 %trunc
}		}
Show All 12 Lines	;
%zext2 = zext i16 %lshr to i32		%zext2 = zext i16 %lshr to i32
%lshr2 = lshr i32 %zext2, 2		%lshr2 = lshr i32 %zext2, 2
%trunc = trunc i32 %lshr2 to i8		%trunc = trunc i32 %lshr2 to i8
ret i8 %trunc		ret i8 %trunc
}		}

define <2 x i16> @lshr_vector(<2 x i8> %x) {		define <2 x i16> @lshr_vector(<2 x i8> %x) {
; CHECK-LABEL: @lshr_vector(		; CHECK-LABEL: @lshr_vector(
; CHECK-NEXT: [[Z:%.]] = zext <2 x i8> [[X:%.]] to <2 x i32>		; CHECK-NEXT: [[Z:%.]] = zext <2 x i8> [[X:%.]] to <2 x i16>
; CHECK-NEXT: [[ZA:%.*]] = and <2 x i32> [[Z]], <i32 7, i32 8>		; CHECK-NEXT: [[ZA:%.*]] = and <2 x i16> [[Z]], <i16 7, i16 8>
; CHECK-NEXT: [[S:%.*]] = lshr <2 x i32> [[Z]], [[ZA]]		; CHECK-NEXT: [[S:%.*]] = lshr <2 x i16> [[Z]], [[ZA]]
; CHECK-NEXT: [[A:%.*]] = add <2 x i32> [[S]], [[Z]]		; CHECK-NEXT: [[A:%.*]] = add <2 x i16> [[S]], [[Z]]
; CHECK-NEXT: [[S2:%.*]] = lshr <2 x i32> [[A]], <i32 4, i32 5>		; CHECK-NEXT: [[S2:%.*]] = lshr <2 x i16> [[A]], <i16 4, i16 5>
; CHECK-NEXT: [[T:%.*]] = trunc <2 x i32> [[S2]] to <2 x i16>		; CHECK-NEXT: ret <2 x i16> [[S2]]
; CHECK-NEXT: ret <2 x i16> [[T]]
;		;
%z = zext <2 x i8> %x to <2 x i32>		%z = zext <2 x i8> %x to <2 x i32>
%za = and <2 x i32> %z, <i32 7, i32 8>		%za = and <2 x i32> %z, <i32 7, i32 8>
%s = lshr <2 x i32> %z, %za		%s = lshr <2 x i32> %z, %za
%a = add <2 x i32> %s, %z		%a = add <2 x i32> %s, %z
%s2 = lshr <2 x i32> %a, <i32 4, i32 5>		%s2 = lshr <2 x i32> %a, <i32 4, i32 5>
%t = trunc <2 x i32> %s2 to <2 x i16>		%t = trunc <2 x i32> %s2 to <2 x i16>
ret <2 x i16> %t		ret <2 x i16> %t
Show All 38 Lines	;
%a = add <2 x i32> %s, %z		%a = add <2 x i32> %s, %z
%s2 = lshr <2 x i32> %a, <i32 16, i32 5>		%s2 = lshr <2 x i32> %a, <i32 16, i32 5>
%t = trunc <2 x i32> %s2 to <2 x i16>		%t = trunc <2 x i32> %s2 to <2 x i16>
ret <2 x i16> %t		ret <2 x i16> %t
}		}

define i16 @lshr_exact(i16 %x) {		define i16 @lshr_exact(i16 %x) {
; CHECK-LABEL: @lshr_exact(		; CHECK-LABEL: @lshr_exact(
; CHECK-NEXT: [[ZEXT:%.]] = zext i16 [[X:%.]] to i32		; CHECK-NEXT: [[LSHR:%.]] = lshr exact i16 [[X:%.]], 15
; CHECK-NEXT: [[LSHR:%.*]] = lshr exact i32 [[ZEXT]], 15		; CHECK-NEXT: ret i16 [[LSHR]]
; CHECK-NEXT: [[TRUNC:%.*]] = trunc i32 [[LSHR]] to i16
; CHECK-NEXT: ret i16 [[TRUNC]]
;		;
%zext = zext i16 %x to i32		%zext = zext i16 %x to i32
%lshr = lshr exact i32 %zext, 15		%lshr = lshr exact i32 %zext, 15
%trunc = trunc i32 %lshr to i16		%trunc = trunc i32 %lshr to i16
ret i16 %trunc		ret i16 %trunc
}		}

; Negative test		; Negative test
▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/X86/pr50555.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -O3 -S -mtriple=x86_64-- \| FileCheck %s --check-prefixes=SSE		; RUN: opt < %s -O3 -S -mtriple=x86_64-- \| FileCheck %s --check-prefixes=SSE
; RUN: opt < %s -O3 -S -mtriple=x86_64-- -mattr=avx \| FileCheck %s --check-prefixes=AVX		; RUN: opt < %s -O3 -S -mtriple=x86_64-- -mattr=avx \| FileCheck %s --check-prefixes=AVX

define void @trunc_through_one_add(i16* noalias %0, i8* noalias readonly %1) {		define void @trunc_through_one_add(i16* noalias %0, i8* noalias readonly %1) {
; SSE-LABEL: @trunc_through_one_add(		; SSE-LABEL: @trunc_through_one_add(
; SSE-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP1:%.]] to <4 x i8>		; SSE-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP1:%.]] to <8 x i8>
; SSE-NEXT: [[TMP4:%.]] = load <4 x i8>, <4 x i8> [[TMP3]], align 1		; SSE-NEXT: [[TMP4:%.]] = load <8 x i8>, <8 x i8> [[TMP3]], align 1
; SSE-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>		; SSE-NEXT: [[TMP5:%.*]] = zext <8 x i8> [[TMP4]] to <8 x i16>
; SSE-NEXT: [[TMP6:%.*]] = lshr <4 x i32> [[TMP5]], <i32 1, i32 1, i32 1, i32 1>		; SSE-NEXT: [[TMP6:%.*]] = lshr <8 x i16> [[TMP5]], <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
; SSE-NEXT: [[TMP7:%.*]] = add nuw nsw <4 x i32> [[TMP6]], [[TMP5]]		; SSE-NEXT: [[TMP7:%.*]] = add nuw nsw <8 x i16> [[TMP6]], [[TMP5]]
; SSE-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP7]], <i32 2, i32 2, i32 2, i32 2>		; SSE-NEXT: [[TMP8:%.*]] = lshr <8 x i16> [[TMP7]], <i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2>
; SSE-NEXT: [[TMP9:%.*]] = trunc <4 x i32> [[TMP8]] to <4 x i16>		; SSE-NEXT: [[TMP9:%.]] = bitcast i16 [[TMP0:%.]] to <8 x i16>
; SSE-NEXT: [[TMP10:%.]] = bitcast i16 [[TMP0:%.]] to <4 x i16>		; SSE-NEXT: store <8 x i16> [[TMP8]], <8 x i16>* [[TMP9]], align 2
; SSE-NEXT: store <4 x i16> [[TMP9]], <4 x i16>* [[TMP10]], align 2		; SSE-NEXT: [[TMP10:%.]] = getelementptr inbounds i8, i8 [[TMP1]], i64 8
; SSE-NEXT: [[TMP11:%.]] = getelementptr inbounds i8, i8 [[TMP1]], i64 4		; SSE-NEXT: [[TMP11:%.]] = getelementptr inbounds i16, i16 [[TMP0]], i64 8
; SSE-NEXT: [[TMP12:%.]] = getelementptr inbounds i16, i16 [[TMP0]], i64 4		; SSE-NEXT: [[TMP12:%.]] = bitcast i8 [[TMP10]] to <8 x i8>*
; SSE-NEXT: [[TMP13:%.]] = bitcast i8 [[TMP11]] to <4 x i8>*		; SSE-NEXT: [[TMP13:%.]] = load <8 x i8>, <8 x i8> [[TMP12]], align 1
; SSE-NEXT: [[TMP14:%.]] = load <4 x i8>, <4 x i8> [[TMP13]], align 1		; SSE-NEXT: [[TMP14:%.*]] = zext <8 x i8> [[TMP13]] to <8 x i16>
; SSE-NEXT: [[TMP15:%.*]] = zext <4 x i8> [[TMP14]] to <4 x i32>		; SSE-NEXT: [[TMP15:%.*]] = lshr <8 x i16> [[TMP14]], <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
; SSE-NEXT: [[TMP16:%.*]] = lshr <4 x i32> [[TMP15]], <i32 1, i32 1, i32 1, i32 1>		; SSE-NEXT: [[TMP16:%.*]] = add nuw nsw <8 x i16> [[TMP15]], [[TMP14]]
; SSE-NEXT: [[TMP17:%.*]] = add nuw nsw <4 x i32> [[TMP16]], [[TMP15]]		; SSE-NEXT: [[TMP17:%.*]] = lshr <8 x i16> [[TMP16]], <i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2>
; SSE-NEXT: [[TMP18:%.*]] = lshr <4 x i32> [[TMP17]], <i32 2, i32 2, i32 2, i32 2>		; SSE-NEXT: [[TMP18:%.]] = bitcast i16 [[TMP11]] to <8 x i16>*
; SSE-NEXT: [[TMP19:%.*]] = trunc <4 x i32> [[TMP18]] to <4 x i16>		; SSE-NEXT: store <8 x i16> [[TMP17]], <8 x i16>* [[TMP18]], align 2
; SSE-NEXT: [[TMP20:%.]] = bitcast i16 [[TMP12]] to <4 x i16>*
; SSE-NEXT: store <4 x i16> [[TMP19]], <4 x i16>* [[TMP20]], align 2
; SSE-NEXT: [[TMP21:%.]] = getelementptr inbounds i8, i8 [[TMP1]], i64 8
; SSE-NEXT: [[TMP22:%.]] = getelementptr inbounds i16, i16 [[TMP0]], i64 8
; SSE-NEXT: [[TMP23:%.]] = bitcast i8 [[TMP21]] to <4 x i8>*
; SSE-NEXT: [[TMP24:%.]] = load <4 x i8>, <4 x i8> [[TMP23]], align 1
; SSE-NEXT: [[TMP25:%.*]] = zext <4 x i8> [[TMP24]] to <4 x i32>
; SSE-NEXT: [[TMP26:%.*]] = lshr <4 x i32> [[TMP25]], <i32 1, i32 1, i32 1, i32 1>
; SSE-NEXT: [[TMP27:%.*]] = add nuw nsw <4 x i32> [[TMP26]], [[TMP25]]
; SSE-NEXT: [[TMP28:%.*]] = lshr <4 x i32> [[TMP27]], <i32 2, i32 2, i32 2, i32 2>
; SSE-NEXT: [[TMP29:%.*]] = trunc <4 x i32> [[TMP28]] to <4 x i16>
; SSE-NEXT: [[TMP30:%.]] = bitcast i16 [[TMP22]] to <4 x i16>*
; SSE-NEXT: store <4 x i16> [[TMP29]], <4 x i16>* [[TMP30]], align 2
; SSE-NEXT: [[TMP31:%.]] = getelementptr inbounds i8, i8 [[TMP1]], i64 12
; SSE-NEXT: [[TMP32:%.]] = getelementptr inbounds i16, i16 [[TMP0]], i64 12
; SSE-NEXT: [[TMP33:%.]] = bitcast i8 [[TMP31]] to <4 x i8>*
; SSE-NEXT: [[TMP34:%.]] = load <4 x i8>, <4 x i8> [[TMP33]], align 1
; SSE-NEXT: [[TMP35:%.*]] = zext <4 x i8> [[TMP34]] to <4 x i32>
; SSE-NEXT: [[TMP36:%.*]] = lshr <4 x i32> [[TMP35]], <i32 1, i32 1, i32 1, i32 1>
; SSE-NEXT: [[TMP37:%.*]] = add nuw nsw <4 x i32> [[TMP36]], [[TMP35]]
; SSE-NEXT: [[TMP38:%.*]] = lshr <4 x i32> [[TMP37]], <i32 2, i32 2, i32 2, i32 2>
; SSE-NEXT: [[TMP39:%.*]] = trunc <4 x i32> [[TMP38]] to <4 x i16>
; SSE-NEXT: [[TMP40:%.]] = bitcast i16 [[TMP32]] to <4 x i16>*
; SSE-NEXT: store <4 x i16> [[TMP39]], <4 x i16>* [[TMP40]], align 2
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @trunc_through_one_add(		; AVX-LABEL: @trunc_through_one_add(
; AVX-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP1:%.]] to <8 x i8>		; AVX-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP1:%.]] to <16 x i8>
; AVX-NEXT: [[TMP4:%.]] = load <8 x i8>, <8 x i8> [[TMP3]], align 1		; AVX-NEXT: [[TMP4:%.]] = load <16 x i8>, <16 x i8> [[TMP3]], align 1
; AVX-NEXT: [[TMP5:%.*]] = zext <8 x i8> [[TMP4]] to <8 x i32>		; AVX-NEXT: [[TMP5:%.*]] = zext <16 x i8> [[TMP4]] to <16 x i16>
; AVX-NEXT: [[TMP6:%.*]] = lshr <8 x i32> [[TMP5]], <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>		; AVX-NEXT: [[TMP6:%.*]] = lshr <16 x i16> [[TMP5]], <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
; AVX-NEXT: [[TMP7:%.*]] = add nuw nsw <8 x i32> [[TMP6]], [[TMP5]]		; AVX-NEXT: [[TMP7:%.*]] = add nuw nsw <16 x i16> [[TMP6]], [[TMP5]]
; AVX-NEXT: [[TMP8:%.*]] = lshr <8 x i32> [[TMP7]], <i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2>		; AVX-NEXT: [[TMP8:%.*]] = lshr <16 x i16> [[TMP7]], <i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2>
; AVX-NEXT: [[TMP9:%.*]] = trunc <8 x i32> [[TMP8]] to <8 x i16>		; AVX-NEXT: [[TMP9:%.]] = bitcast i16 [[TMP0:%.]] to <16 x i16>
; AVX-NEXT: [[TMP10:%.]] = bitcast i16 [[TMP0:%.]] to <8 x i16>		; AVX-NEXT: store <16 x i16> [[TMP8]], <16 x i16>* [[TMP9]], align 2
; AVX-NEXT: store <8 x i16> [[TMP9]], <8 x i16>* [[TMP10]], align 2
; AVX-NEXT: [[TMP11:%.]] = getelementptr inbounds i8, i8 [[TMP1]], i64 8
; AVX-NEXT: [[TMP12:%.]] = getelementptr inbounds i16, i16 [[TMP0]], i64 8
; AVX-NEXT: [[TMP13:%.]] = bitcast i8 [[TMP11]] to <8 x i8>*
; AVX-NEXT: [[TMP14:%.]] = load <8 x i8>, <8 x i8> [[TMP13]], align 1
; AVX-NEXT: [[TMP15:%.*]] = zext <8 x i8> [[TMP14]] to <8 x i32>
; AVX-NEXT: [[TMP16:%.*]] = lshr <8 x i32> [[TMP15]], <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
; AVX-NEXT: [[TMP17:%.*]] = add nuw nsw <8 x i32> [[TMP16]], [[TMP15]]
; AVX-NEXT: [[TMP18:%.*]] = lshr <8 x i32> [[TMP17]], <i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2>
; AVX-NEXT: [[TMP19:%.*]] = trunc <8 x i32> [[TMP18]] to <8 x i16>
; AVX-NEXT: [[TMP20:%.]] = bitcast i16 [[TMP12]] to <8 x i16>*
; AVX-NEXT: store <8 x i16> [[TMP19]], <8 x i16>* [[TMP20]], align 2
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
%3 = load i8, i8* %1, align 1		%3 = load i8, i8* %1, align 1
%4 = zext i8 %3 to i32		%4 = zext i8 %3 to i32
%5 = lshr i32 %4, 1		%5 = lshr i32 %4, 1
%6 = add nuw nsw i32 %5, %4		%6 = add nuw nsw i32 %5, %4
%7 = lshr i32 %6, 2		%7 = lshr i32 %6, 2
%8 = trunc i32 %7 to i16		%8 = trunc i32 %7 to i16
▲ Show 20 Lines • Show All 133 Lines • ▼ Show 20 Lines	;
%127 = trunc i32 %126 to i16		%127 = trunc i32 %126 to i16
%128 = getelementptr inbounds i16, i16* %0, i64 15		%128 = getelementptr inbounds i16, i16* %0, i64 15
store i16 %127, i16* %128, align 2		store i16 %127, i16* %128, align 2
ret void		ret void
}		}

define void @trunc_through_two_adds(i16* noalias %0, i8* noalias readonly %1, i8* noalias readonly %2) {		define void @trunc_through_two_adds(i16* noalias %0, i8* noalias readonly %1, i8* noalias readonly %2) {
; SSE-LABEL: @trunc_through_two_adds(		; SSE-LABEL: @trunc_through_two_adds(
; SSE-NEXT: [[TMP4:%.]] = bitcast i8 [[TMP1:%.]] to <4 x i8>		; SSE-NEXT: [[TMP4:%.]] = bitcast i8 [[TMP1:%.]] to <8 x i8>
; SSE-NEXT: [[TMP5:%.]] = load <4 x i8>, <4 x i8> [[TMP4]], align 1		; SSE-NEXT: [[TMP5:%.]] = load <8 x i8>, <8 x i8> [[TMP4]], align 1
; SSE-NEXT: [[TMP6:%.*]] = zext <4 x i8> [[TMP5]] to <4 x i32>		; SSE-NEXT: [[TMP6:%.*]] = zext <8 x i8> [[TMP5]] to <8 x i16>
; SSE-NEXT: [[TMP7:%.]] = bitcast i8 [[TMP2:%.]] to <4 x i8>		; SSE-NEXT: [[TMP7:%.]] = bitcast i8 [[TMP2:%.]] to <8 x i8>
; SSE-NEXT: [[TMP8:%.]] = load <4 x i8>, <4 x i8> [[TMP7]], align 1		; SSE-NEXT: [[TMP8:%.]] = load <8 x i8>, <8 x i8> [[TMP7]], align 1
; SSE-NEXT: [[TMP9:%.*]] = zext <4 x i8> [[TMP8]] to <4 x i32>		; SSE-NEXT: [[TMP9:%.*]] = zext <8 x i8> [[TMP8]] to <8 x i16>
; SSE-NEXT: [[TMP10:%.*]] = add nuw nsw <4 x i32> [[TMP9]], [[TMP6]]		; SSE-NEXT: [[TMP10:%.*]] = add nuw nsw <8 x i16> [[TMP9]], [[TMP6]]
; SSE-NEXT: [[TMP11:%.*]] = lshr <4 x i32> [[TMP10]], <i32 1, i32 1, i32 1, i32 1>		; SSE-NEXT: [[TMP11:%.*]] = lshr <8 x i16> [[TMP10]], <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
; SSE-NEXT: [[TMP12:%.*]] = add nuw nsw <4 x i32> [[TMP11]], [[TMP10]]		; SSE-NEXT: [[TMP12:%.*]] = add nuw nsw <8 x i16> [[TMP11]], [[TMP10]]
; SSE-NEXT: [[TMP13:%.*]] = lshr <4 x i32> [[TMP12]], <i32 2, i32 2, i32 2, i32 2>		; SSE-NEXT: [[TMP13:%.*]] = lshr <8 x i16> [[TMP12]], <i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2>
; SSE-NEXT: [[TMP14:%.*]] = trunc <4 x i32> [[TMP13]] to <4 x i16>		; SSE-NEXT: [[TMP14:%.]] = bitcast i16 [[TMP0:%.]] to <8 x i16>
; SSE-NEXT: [[TMP15:%.]] = bitcast i16 [[TMP0:%.]] to <4 x i16>		; SSE-NEXT: store <8 x i16> [[TMP13]], <8 x i16>* [[TMP14]], align 2
; SSE-NEXT: store <4 x i16> [[TMP14]], <4 x i16>* [[TMP15]], align 2		; SSE-NEXT: [[TMP15:%.]] = getelementptr inbounds i8, i8 [[TMP1]], i64 8
; SSE-NEXT: [[TMP16:%.]] = getelementptr inbounds i8, i8 [[TMP1]], i64 4		; SSE-NEXT: [[TMP16:%.]] = getelementptr inbounds i8, i8 [[TMP2]], i64 8
; SSE-NEXT: [[TMP17:%.]] = getelementptr inbounds i8, i8 [[TMP2]], i64 4		; SSE-NEXT: [[TMP17:%.]] = getelementptr inbounds i16, i16 [[TMP0]], i64 8
; SSE-NEXT: [[TMP18:%.]] = getelementptr inbounds i16, i16 [[TMP0]], i64 4		; SSE-NEXT: [[TMP18:%.]] = bitcast i8 [[TMP15]] to <8 x i8>*
; SSE-NEXT: [[TMP19:%.]] = bitcast i8 [[TMP16]] to <4 x i8>*		; SSE-NEXT: [[TMP19:%.]] = load <8 x i8>, <8 x i8> [[TMP18]], align 1
; SSE-NEXT: [[TMP20:%.]] = load <4 x i8>, <4 x i8> [[TMP19]], align 1		; SSE-NEXT: [[TMP20:%.*]] = zext <8 x i8> [[TMP19]] to <8 x i16>
; SSE-NEXT: [[TMP21:%.*]] = zext <4 x i8> [[TMP20]] to <4 x i32>		; SSE-NEXT: [[TMP21:%.]] = bitcast i8 [[TMP16]] to <8 x i8>*
; SSE-NEXT: [[TMP22:%.]] = bitcast i8 [[TMP17]] to <4 x i8>*		; SSE-NEXT: [[TMP22:%.]] = load <8 x i8>, <8 x i8> [[TMP21]], align 1
; SSE-NEXT: [[TMP23:%.]] = load <4 x i8>, <4 x i8> [[TMP22]], align 1		; SSE-NEXT: [[TMP23:%.*]] = zext <8 x i8> [[TMP22]] to <8 x i16>
; SSE-NEXT: [[TMP24:%.*]] = zext <4 x i8> [[TMP23]] to <4 x i32>		; SSE-NEXT: [[TMP24:%.*]] = add nuw nsw <8 x i16> [[TMP23]], [[TMP20]]
; SSE-NEXT: [[TMP25:%.*]] = add nuw nsw <4 x i32> [[TMP24]], [[TMP21]]		; SSE-NEXT: [[TMP25:%.*]] = lshr <8 x i16> [[TMP24]], <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
; SSE-NEXT: [[TMP26:%.*]] = lshr <4 x i32> [[TMP25]], <i32 1, i32 1, i32 1, i32 1>		; SSE-NEXT: [[TMP26:%.*]] = add nuw nsw <8 x i16> [[TMP25]], [[TMP24]]
; SSE-NEXT: [[TMP27:%.*]] = add nuw nsw <4 x i32> [[TMP26]], [[TMP25]]		; SSE-NEXT: [[TMP27:%.*]] = lshr <8 x i16> [[TMP26]], <i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2>
; SSE-NEXT: [[TMP28:%.*]] = lshr <4 x i32> [[TMP27]], <i32 2, i32 2, i32 2, i32 2>		; SSE-NEXT: [[TMP28:%.]] = bitcast i16 [[TMP17]] to <8 x i16>*
; SSE-NEXT: [[TMP29:%.*]] = trunc <4 x i32> [[TMP28]] to <4 x i16>		; SSE-NEXT: store <8 x i16> [[TMP27]], <8 x i16>* [[TMP28]], align 2
; SSE-NEXT: [[TMP30:%.]] = bitcast i16 [[TMP18]] to <4 x i16>*
; SSE-NEXT: store <4 x i16> [[TMP29]], <4 x i16>* [[TMP30]], align 2
; SSE-NEXT: [[TMP31:%.]] = getelementptr inbounds i8, i8 [[TMP1]], i64 8
; SSE-NEXT: [[TMP32:%.]] = getelementptr inbounds i8, i8 [[TMP2]], i64 8
; SSE-NEXT: [[TMP33:%.]] = getelementptr inbounds i16, i16 [[TMP0]], i64 8
; SSE-NEXT: [[TMP34:%.]] = bitcast i8 [[TMP31]] to <4 x i8>*
; SSE-NEXT: [[TMP35:%.]] = load <4 x i8>, <4 x i8> [[TMP34]], align 1
; SSE-NEXT: [[TMP36:%.*]] = zext <4 x i8> [[TMP35]] to <4 x i32>
; SSE-NEXT: [[TMP37:%.]] = bitcast i8 [[TMP32]] to <4 x i8>*
; SSE-NEXT: [[TMP38:%.]] = load <4 x i8>, <4 x i8> [[TMP37]], align 1
; SSE-NEXT: [[TMP39:%.*]] = zext <4 x i8> [[TMP38]] to <4 x i32>
; SSE-NEXT: [[TMP40:%.*]] = add nuw nsw <4 x i32> [[TMP39]], [[TMP36]]
; SSE-NEXT: [[TMP41:%.*]] = lshr <4 x i32> [[TMP40]], <i32 1, i32 1, i32 1, i32 1>
; SSE-NEXT: [[TMP42:%.*]] = add nuw nsw <4 x i32> [[TMP41]], [[TMP40]]
; SSE-NEXT: [[TMP43:%.*]] = lshr <4 x i32> [[TMP42]], <i32 2, i32 2, i32 2, i32 2>
; SSE-NEXT: [[TMP44:%.*]] = trunc <4 x i32> [[TMP43]] to <4 x i16>
; SSE-NEXT: [[TMP45:%.]] = bitcast i16 [[TMP33]] to <4 x i16>*
; SSE-NEXT: store <4 x i16> [[TMP44]], <4 x i16>* [[TMP45]], align 2
; SSE-NEXT: [[TMP46:%.]] = getelementptr inbounds i8, i8 [[TMP1]], i64 12
; SSE-NEXT: [[TMP47:%.]] = getelementptr inbounds i8, i8 [[TMP2]], i64 12
; SSE-NEXT: [[TMP48:%.]] = getelementptr inbounds i16, i16 [[TMP0]], i64 12
; SSE-NEXT: [[TMP49:%.]] = bitcast i8 [[TMP46]] to <4 x i8>*
; SSE-NEXT: [[TMP50:%.]] = load <4 x i8>, <4 x i8> [[TMP49]], align 1
; SSE-NEXT: [[TMP51:%.*]] = zext <4 x i8> [[TMP50]] to <4 x i32>
; SSE-NEXT: [[TMP52:%.]] = bitcast i8 [[TMP47]] to <4 x i8>*
; SSE-NEXT: [[TMP53:%.]] = load <4 x i8>, <4 x i8> [[TMP52]], align 1
; SSE-NEXT: [[TMP54:%.*]] = zext <4 x i8> [[TMP53]] to <4 x i32>
; SSE-NEXT: [[TMP55:%.*]] = add nuw nsw <4 x i32> [[TMP54]], [[TMP51]]
; SSE-NEXT: [[TMP56:%.*]] = lshr <4 x i32> [[TMP55]], <i32 1, i32 1, i32 1, i32 1>
; SSE-NEXT: [[TMP57:%.*]] = add nuw nsw <4 x i32> [[TMP56]], [[TMP55]]
; SSE-NEXT: [[TMP58:%.*]] = lshr <4 x i32> [[TMP57]], <i32 2, i32 2, i32 2, i32 2>
; SSE-NEXT: [[TMP59:%.*]] = trunc <4 x i32> [[TMP58]] to <4 x i16>
; SSE-NEXT: [[TMP60:%.]] = bitcast i16 [[TMP48]] to <4 x i16>*
; SSE-NEXT: store <4 x i16> [[TMP59]], <4 x i16>* [[TMP60]], align 2
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @trunc_through_two_adds(		; AVX-LABEL: @trunc_through_two_adds(
; AVX-NEXT: [[TMP4:%.]] = bitcast i8 [[TMP1:%.]] to <8 x i8>		; AVX-NEXT: [[TMP4:%.]] = bitcast i8 [[TMP1:%.]] to <16 x i8>
; AVX-NEXT: [[TMP5:%.]] = load <8 x i8>, <8 x i8> [[TMP4]], align 1		; AVX-NEXT: [[TMP5:%.]] = load <16 x i8>, <16 x i8> [[TMP4]], align 1
; AVX-NEXT: [[TMP6:%.*]] = zext <8 x i8> [[TMP5]] to <8 x i32>		; AVX-NEXT: [[TMP6:%.*]] = zext <16 x i8> [[TMP5]] to <16 x i16>
; AVX-NEXT: [[TMP7:%.]] = bitcast i8 [[TMP2:%.]] to <8 x i8>		; AVX-NEXT: [[TMP7:%.]] = bitcast i8 [[TMP2:%.]] to <16 x i8>
; AVX-NEXT: [[TMP8:%.]] = load <8 x i8>, <8 x i8> [[TMP7]], align 1		; AVX-NEXT: [[TMP8:%.]] = load <16 x i8>, <16 x i8> [[TMP7]], align 1
; AVX-NEXT: [[TMP9:%.*]] = zext <8 x i8> [[TMP8]] to <8 x i32>		; AVX-NEXT: [[TMP9:%.*]] = zext <16 x i8> [[TMP8]] to <16 x i16>
; AVX-NEXT: [[TMP10:%.*]] = add nuw nsw <8 x i32> [[TMP9]], [[TMP6]]		; AVX-NEXT: [[TMP10:%.*]] = add nuw nsw <16 x i16> [[TMP9]], [[TMP6]]
; AVX-NEXT: [[TMP11:%.*]] = lshr <8 x i32> [[TMP10]], <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>		; AVX-NEXT: [[TMP11:%.*]] = lshr <16 x i16> [[TMP10]], <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
; AVX-NEXT: [[TMP12:%.*]] = add nuw nsw <8 x i32> [[TMP11]], [[TMP10]]		; AVX-NEXT: [[TMP12:%.*]] = add nuw nsw <16 x i16> [[TMP11]], [[TMP10]]
; AVX-NEXT: [[TMP13:%.*]] = lshr <8 x i32> [[TMP12]], <i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2>		; AVX-NEXT: [[TMP13:%.*]] = lshr <16 x i16> [[TMP12]], <i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2>
; AVX-NEXT: [[TMP14:%.*]] = trunc <8 x i32> [[TMP13]] to <8 x i16>		; AVX-NEXT: [[TMP14:%.]] = bitcast i16 [[TMP0:%.]] to <16 x i16>
; AVX-NEXT: [[TMP15:%.]] = bitcast i16 [[TMP0:%.]] to <8 x i16>		; AVX-NEXT: store <16 x i16> [[TMP13]], <16 x i16>* [[TMP14]], align 2
; AVX-NEXT: store <8 x i16> [[TMP14]], <8 x i16>* [[TMP15]], align 2
; AVX-NEXT: [[TMP16:%.]] = getelementptr inbounds i8, i8 [[TMP1]], i64 8
; AVX-NEXT: [[TMP17:%.]] = getelementptr inbounds i8, i8 [[TMP2]], i64 8
; AVX-NEXT: [[TMP18:%.]] = getelementptr inbounds i16, i16 [[TMP0]], i64 8
; AVX-NEXT: [[TMP19:%.]] = bitcast i8 [[TMP16]] to <8 x i8>*
; AVX-NEXT: [[TMP20:%.]] = load <8 x i8>, <8 x i8> [[TMP19]], align 1
; AVX-NEXT: [[TMP21:%.*]] = zext <8 x i8> [[TMP20]] to <8 x i32>
; AVX-NEXT: [[TMP22:%.]] = bitcast i8 [[TMP17]] to <8 x i8>*
; AVX-NEXT: [[TMP23:%.]] = load <8 x i8>, <8 x i8> [[TMP22]], align 1
; AVX-NEXT: [[TMP24:%.*]] = zext <8 x i8> [[TMP23]] to <8 x i32>
; AVX-NEXT: [[TMP25:%.*]] = add nuw nsw <8 x i32> [[TMP24]], [[TMP21]]
; AVX-NEXT: [[TMP26:%.*]] = lshr <8 x i32> [[TMP25]], <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
; AVX-NEXT: [[TMP27:%.*]] = add nuw nsw <8 x i32> [[TMP26]], [[TMP25]]
; AVX-NEXT: [[TMP28:%.*]] = lshr <8 x i32> [[TMP27]], <i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2>
; AVX-NEXT: [[TMP29:%.*]] = trunc <8 x i32> [[TMP28]] to <8 x i16>
; AVX-NEXT: [[TMP30:%.]] = bitcast i16 [[TMP18]] to <8 x i16>*
; AVX-NEXT: store <8 x i16> [[TMP29]], <8 x i16>* [[TMP30]], align 2
; AVX-NEXT: ret void		; AVX-NEXT: ret void
;		;
%4 = load i8, i8* %1, align 1		%4 = load i8, i8* %1, align 1
%5 = zext i8 %4 to i32		%5 = zext i8 %4 to i32
%6 = load i8, i8* %2, align 1		%6 = load i8, i8* %2, align 1
%7 = zext i8 %6 to i32		%7 = zext i8 %6 to i32
%8 = add nuw nsw i32 %7, %5		%8 = add nuw nsw i32 %7, %5
%9 = lshr i32 %8, 1		%9 = lshr i32 %8, 1
▲ Show 20 Lines • Show All 201 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AggressiveInstCombine] Add logical shift right instr to `TruncInstCombine` DAGClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 367300

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp

llvm/test/Transforms/AggressiveInstCombine/pr50555.ll

llvm/test/Transforms/AggressiveInstCombine/trunc_shifts.ll

llvm/test/Transforms/PhaseOrdering/X86/pr50555.ll

[AggressiveInstCombine] Add logical shift right instr to `TruncInstCombine` DAG
ClosedPublic