This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Allow common type conversions to i8/i16
ClosedPublic

Authored by dmgreen on Jan 23 2018, 8:15 AM.

Download Raw Diff

Details

Reviewers

spatel
efriedma
hfinkel
bogner
arsenm
t.p.northover
sdardis
asb

Commits

rG7174023f5730: [InstCombine] Allow common type conversions to i8/i16/i32
rGe11f0545db7a: [InstCombine] Allow common type conversions to i8/i16/i32
rL324174: [InstCombine] Allow common type conversions to i8/i16/i32
rL323951: [InstCombine] Allow common type conversions to i8/i16/i32

Summary

This allows conversions to i8/i16/i32 (very common cases) even if the
resulting type is not legal. This can often open up extra combine
opportunities.

Diff Detail

Repository: rL LLVM

Event Timeline

Let me know if I should add more tests/change the condition/whathaveyou.

Let me cross-ref back to the llvm-dev thread for the motivating example and perf results so far:
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120522.html
...and add some reviewers who likely have better intuition about the potential backend effects on non-x86 targets.
Reminder: this should have no effect on x86 because i8/i16 are already data-layout-legal there.

lib/Transforms/InstCombine/InstructionCombining.cpp
156–157 ↗	(On Diff #131076)	See Alex Bradbury's comment in the llvm-dev thread about i32 too. Maybe we generalize this for ToLegal as: bool ToLegal = ToWidth == 1 \|\| isPowerOf2_32(ToWidth) \|\| DL.isLegalInteger(ToWidth);
test/Transforms/InstCombine/should-change-type.ll
3 ↗	(On Diff #131076)	Should be able to reduce that string to just: "n32" ?
5 ↗	(On Diff #131076)	trunk -> trunc

Herald added a subscriber: wdng. · View Herald TranscriptJan 23 2018, 9:18 AM

spatel added inline comments.Jan 23 2018, 10:04 AM

lib/Transforms/InstCombine/InstructionCombining.cpp
156–157 ↗	(On Diff #131076)	pow2 isn't quite right; nobody wants i2 or i4. :)

dmgreen updated this revision to Diff 131212.Jan 24 2018, 2:39 AM

dmgreen edited the summary of this revision. (Show Details)

dmgreen added inline comments.Jan 24 2018, 2:44 AM

lib/Transforms/InstCombine/InstructionCombining.cpp
156–157 ↗	(On Diff #131076)	I took the simple route. Not sure what's really best here. ARM/AArch64 has LDRB/LDRH for loading bytes/halfwords, same for loads, UXTB/UXTH for extends etc. These types feel like they are almost "semi-legal".
149 ↗	(On Diff #131212)	I almost wrote this as a FIXME:
test/Transforms/InstCombine/select-bitext.ll
117 ↗	(On Diff #131212)	Although longer, this produced the same final assembly as the old function on any target I tried.

spatel added inline comments.Jan 24 2018, 10:33 AM

test/Transforms/InstCombine/select-bitext.ll
117 ↗	(On Diff #131212)	We canonicalize trunc+sext to shifts, and we have a similar fold for: // ashr (shl (zext X), C), C --> sext X ...so I think we can add a fold for this too (guarded by shouldChangeType). https://rise4fun.com/Alive/NVU I'll post that for review, so we can eliminate this sequence as a potential cause of regressions.

spatel added inline comments.Jan 24 2018, 12:50 PM

test/Transforms/InstCombine/select-bitext.ll
117 ↗	(On Diff #131212)	Actually, adding a fold for this won't work because we transform the other way (to shifts) in InstCombineCasts: DEBUG(dbgs() << "ICE: EvaluateInDifferentType converting expression type" " to avoid sign extend: " << CI << '\n'); // We need to emit a shl + ashr to do the sign extend. So if we try to reduce to the form with sext, we'll infinite loop on a case like this: target datalayout = "n8:16:32" define i16 @trunc_ashr(i32 %x) { %t = trunc i32 %x to i16 %s = shl i16 %t, 8 %a = ashr i16 %s, 8 ret i16 %a } It doesn't show up in the test here because this file doesn't specify a data-layout.

spatel mentioned this in rL323377: [InstCombine] fix datalayout in test file.Jan 24 2018, 1:39 PM

spatel mentioned this in rL323437: [InstCombine] narrow masked zexted binops (PR35792).Jan 25 2018, 8:36 AM

I added a different transform that is guarded by shouldChangeType() at rL323437 .
The tests added in that patch will be affected by this patch, so please rebase. I don't have any other suggestions, so if there are no other comments, let's try it?

I think the difference in AArch64 code with this patch for the examples in PR35792 would be:

$ ./opt  35792.ll -S | ./llc -o - -mtriple=aarch64
julia_a_62828:                          // @julia_a_62828
	mov	w8, w0
	sub	w9, w0, #1              // =1
	and	x0, x9, x8
	ret
bad:                                    // @bad
	orr	w8, wzr, #0xffff
	add	w8, w0, w8
	and	w0, w0, w8
	ret

vs. after this patch:

$ ./opt -instcombine  35792.ll -S | ./llc -o - -mtriple=aarch64
julia_a_62828:                          // @julia_a_62828
	sub	w8, w0, #1              // =1
	and	w0, w8, w0
	ret
bad:                                    // @bad
	sub	w8, w0, #1              // =1
	and	w0, w8, w0
	ret

lib/Transforms/InstCombine/InstructionCombining.cpp
147 ↗	(On Diff #131212)	or -> of ?

Thanks for the info. Unfortunately it looks like I was previously building without all backends (or an intervening commit has changed things?), and a test in Hexagon (CodeGen/Hexagon/loop-idiom/pmpy-mod.ll) is failing due to this. It looks like the code entering into the hexagon loop recogniser is shorter, but is no longer recognised and transformed to a single intrinsic.

Does anyone know if there a more sensible way to do this? Perhaps just for the ARM/AArch64 backends, not for all targets, no matter how odd they are. Or do we think this is the best way to do things?

The sensible thing to do would be to have a well-defined canonical form of the IR, instead of "the most strength-reduced". This is an example why lack of it is only creating issues, and how having idiom-recognition that goes beyond a trivial "memcpy" or "memset" is notoriously difficult to maintain. We could still have the "maximum combining", but after idiom recognition has run.

The code in the polynomial multiplication recognition has even evolved to have its own expression transformations to "undo" various changes that the combining had developed since the initial pattern matching was written. I could change that code again, but unless there is a plan in place to fix this ongoing issue, this is not going to be a high priority for me.

In D42424#989058, @dmgreen wrote:

Does anyone know if there a more sensible way to do this? Perhaps just for the ARM/AArch64 backends, not for all targets, no matter how odd they are. Or do we think this is the best way to do things?

The target-specific hack would be to change the data-layout for those targets, so i8 and i16 are legal ints. I don't know if that would cause any conflicts with legalization though. You could try that locally and see if anything breaks?

Yes, making them legal types in the datalayout was one of the things I tried. It made things (a little) worse in the quick tests I ran, but didn't seem to fail on anything. It was only a quick set of tests, and the changes vs this were only quite small. It obviously has larger reaching implications that this just in instcombine.

The purist in me is looking for something a little better, but I think what we have here is a sensible approach to take. We can always change it if not.

Krzysztof, thanks for the info. It sounds like having a canonical representation we all agree on would be a hard problem to solve :) I thinks that is how this all began, wanting a canonicalisation of selects vs cfg that played better with GVN/similar passes.

It appears that the loop idiom recogniser is not picking this up because it does not know how to handle the trunc's that are now part of the CFG. There seem to be a number of changes in the IR going into the idiom recogniser, like doing icmp eq 1 vs icmp eq 0, doing and(xor()) vs xor(and()), etc. They are all being handled fine, except for the new trunk nodes it just doesn't know anything about.

In D42424#989980, @dmgreen wrote:

Krzysztof, thanks for the info. It sounds like having a canonical representation we all agree on would be a hard problem to solve :) I thinks that is how this all began, wanting a canonicalisation of selects vs cfg that played better with GVN/similar passes.

It appears that the loop idiom recogniser is not picking this up because it does not know how to handle the trunc's that are now part of the CFG. There seem to be a number of changes in the IR going into the idiom recogniser, like doing icmp eq 1 vs icmp eq 0, doing and(xor()) vs xor(and()), etc. They are all being handled fine, except for the new trunk nodes it just doesn't know anything about.

I'm not familiar with LIR, but if this isn't matching for Hexagon now, then isn't it already a problem for other targets? Ie, aren't we failing to recognize the idiom for x86 because it must already be transformed into the form that is not recognized? Is it difficult/undesirable to make the matching more flexible?

In D42424#990022, @spatel wrote:

In D42424#989980, @dmgreen wrote:

Krzysztof, thanks for the info. It sounds like having a canonical representation we all agree on would be a hard problem to solve :) I thinks that is how this all began, wanting a canonicalisation of selects vs cfg that played better with GVN/similar passes.

It appears that the loop idiom recogniser is not picking this up because it does not know how to handle the trunc's that are now part of the CFG. There seem to be a number of changes in the IR going into the idiom recogniser, like doing icmp eq 1 vs icmp eq 0, doing and(xor()) vs xor(and()), etc. They are all being handled fine, except for the new trunk nodes it just doesn't know anything about.

I'm not familiar with LIR, but if this isn't matching for Hexagon now, then isn't it already a problem for other targets? Ie, aren't we failing to recognize the idiom for x86 because it must already be transformed into the form that is not recognized? Is it difficult/undesirable to make the matching more flexible?

Oops - disregard. I see now that we're talking about a target-specific LIR that produces a target-specific intrinsic.

So let me adjust the question: if source code was written in a way that this new variant of IR with truncs was already created (independent of this patch), then we're already failing to match, right? Or is there something that guarantees that that pattern is not created from source in the first place?

In D42424#990024, @spatel wrote:

So let me adjust the question: if source code was written in a way that this new variant of IR with truncs was already created (independent of this patch), then we're already failing to match, right? Or is there something that guarantees that that pattern is not created from source in the first place?

The motivating source had short types (uint8_t and uint16_t) or equivalent, if I remember correctly, and the passes running prior to LIR "legalized" the types to i32. The pattern matching code was written to detect a loop that does polynomial multiplication and polynomial division (with a remainder). Originally it was matching the code that was generated from the sources at the time (and the operating type was i32 after extensions). Over time, changes in combining caused it to fail without us noticing (we didn't have that testcase, just something with a pre-generated IR going directly to the LIR pass). Due to the complexity of the LIR code, it was left alone, and a pre-processing step was added to it to convert the IR back to the form that it looked for, essentially undoing the combiner's changes. This was done on a "local copy" of the IR, so to speak, so it didn't really undo it for the rest of the passes. If the match succeeded, the loop would go away, otherwise, the IR would stay (and the "local copy" would be deleted).

In D42424#989980, @dmgreen wrote:

Krzysztof, thanks for the info. It sounds like having a canonical representation we all agree on would be a hard problem to solve :) I thinks that is how this all began, wanting a canonicalisation of selects vs cfg that played better with GVN/similar passes.

What I want out of the canonical form is predictability. I could not care less if it's suboptimal, I just want to have a form that I can match against, and only at a specified point in the optimization sequence. All the aggressive optimizations can still happen, but after idiom recognizers had their chance to take a look at the code. In other words, I'd advocate to split these transformations that make code look predictable from those that make it optimized.

Going back to this issue. I'm not opposed to the change itself, if it enables more things to happen then it sounds like a good idea. At the same time we (Hexagon) cannot give up that LIR pass. If it's only a question of truncates, it may be easy to change the LIR code, but I don't want to be finding myself in this situation over and over again. The current model simply does not support idiom recognition.

OK. For some technical details. This is what we used to have heading into the recogniser (slightly edited for clarity):

%v64 = phi i8 [ 0, %b2 ], [ %v45, %b9 ]
%v53 = phi i16 [ %a1, %b2 ], [ %v43, %b9 ]
%v42 = phi i8 [ %a0, %b2 ], [ %a, %b9 ]

%0 = and i8 %v42, 1
%v11 = zext i8 %0 to i32
%1 = and i16 %v53, 1
%v14 = zext i16 %1 to i32
%v15 = xor i32 %v14, %v11

%a = lshr i8 %v42, 1
%v21 = icmp eq i32 %v15, 1
%b = lshr i16 %v53, 1
%v36 = xor i16 %b, -24575
%v43 = select i1 %v21, i16 %v36, i16 %b
%v45 = add nuw nsw i8 %v64, 1
%v8 = icmp ult i8 %v45, 8
br i1 %v8, label %b9, label %b46

This is what it is now:

%v64 = phi i8 [ 0, %b2 ], [ %v45, %b9 ]
%v53 = phi i16 [ %a1, %b2 ], [ %v43, %b9 ]
%v42 = phi i8 [ %a0, %b2 ], [ %a, %b9 ]

%0 = trunc i16 %v53 to i8
%v111 = xor i8 %v42, %0
%v15 = and i8 %v111, 1

%a = lshr i8 %v42, 1
%v21 = icmp eq i8 %v15, 0
%b = lshr i16 %v53, 1
%v36 = xor i16 %b, -24575
%v43 = select i1 %v21, i16 %b, i16 %v36
%v45 = add nuw nsw i8 %v64, 1
%v8 = icmp ult i8 %v45, 8
br i1 %v8, label %b9, label %b46

I managed to hackily get things working by adding Trunc as cases to isPromotableTo, commutesWithShift and keepsHighBitsZero, and then doing something like "if(isa<TruncInst>(Var)) Var = Var->getOperand(0)" in scanSelect. I can't claim these changes are complete or properly thought through or tested at-all :-/ but they did get at least this test case to work correctly again.

I was impressed that the rest of the pass just worked correctly, even with the other knock-on changes. When I was looking through it, I felt like an "is equivalent to" matcher would have been useful. Might be very hard to make in practice though.

See https://reviews.llvm.org/rL323824.

Oh brilliant. What a champ! Looks there were a few more changes than just my hacks. Thank you for the fixes, much appreciated.

I did a build and all tests now pass.

In D42424#993030, @dmgreen wrote:

I did a build and all tests now pass.

Let's try it then. LGTM.

This revision is now accepted and ready to land.Jan 31 2018, 6:25 AM

Closed by commit rL323951: [InstCombine] Allow common type conversions to i8/i16/i32 (authored by dmgreen). · Explain WhyFeb 1 2018, 3:08 AM

This revision was automatically updated to reflect the committed changes.

Thanks. Let see how it goes.

There may be some blow back from this one, let me know if anything comes up.

Alas, looks like one came up already

Causing timeouts on at least some ppc64le, with some other suspicious looking cases. Reverted in rL323959. I'll take a look into why.

Reopening the review since this was reverted.

This revision is now accepted and ready to land.Feb 1 2018, 7:51 AM

spatel requested changes to this revision.Feb 1 2018, 7:52 AM

This revision now requires changes to proceed.Feb 1 2018, 7:52 AM

OK. So the reason it was looping is two parts of InstCombinePhi acting against one another.

FoldPHIArgsIntoPHI (guarded by shouldChangeType) would sink from phi i8(trunc a, trunc b) to trunc(phi i16(a,b)), and SliceUpIllegalIntegerPHI (guarded by !isLegalType) would do the opposite.

I've updated back to an older version here, where we only do the conversion if ToWidth < FromWidth, breaking the cycle. I've also added a test for the infinite recursion case and ran an arm bootstrap/testsuite/other testing, without finding any problems.

I also had a go at ran a ppc64le bootstrap, but couldn't get it to cross-compile correctly, so just used a dummy linker. All the compiles did OK, no timeouts.

LGTM.

test/Transforms/InstCombine/phi-timeout.ll
2 ↗	(On Diff #132575)	I recommend just checking the expected (current) output on this test rather than debug output. That potentially gives us more bot coverage...if we inf-loop, we'll probably know sooner from a 'release' buildbot.

This revision is now accepted and ready to land.Feb 2 2018, 7:23 AM

Update test. Thanks for the suggestion. I was copying one of the existing tests, but I think you are right, this way is better.

I'll try to run some more tests overnight to be sure this is OK.

Closed by commit rL324174: [InstCombine] Allow common type conversions to i8/i16/i32 (authored by dmgreen). · Explain WhyFeb 3 2018, 8:52 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

InstCombine/

InstructionCombining.cpp

10 lines

test/

Transforms/

InstCombine/

and-narrow.ll

180 lines

phi-timeout.ll

47 lines

should-change-type.ll

57 lines

Diff 132740

llvm/trunk/lib/Transforms/InstCombine/InstructionCombining.cpp

Show First 20 Lines • Show All 138 Lines • ▼ Show 20 Lines	Value InstCombiner::EmitGEPOffset(User GEP) {
return llvm::EmitGEPOffset(&Builder, DL, GEP);		return llvm::EmitGEPOffset(&Builder, DL, GEP);
}		}

/// Return true if it is desirable to convert an integer computation from a		/// Return true if it is desirable to convert an integer computation from a
/// given bit width to a new bit width.		/// given bit width to a new bit width.
/// We don't want to convert from a legal to an illegal type or from a smaller		/// We don't want to convert from a legal to an illegal type or from a smaller
/// to a larger illegal type. A width of '1' is always treated as a legal type		/// to a larger illegal type. A width of '1' is always treated as a legal type
/// because i1 is a fundamental type in IR, and there are many specialized		/// because i1 is a fundamental type in IR, and there are many specialized
/// optimizations for i1 types.		/// optimizations for i1 types. Widths of 8, 16 or 32 are equally treated as
		/// legal to convert to, in order to open up more combining opportunities.
		/// NOTE: this treats i8, i16 and i32 specially, due to them being so common
		/// from frontend languages.
bool InstCombiner::shouldChangeType(unsigned FromWidth,		bool InstCombiner::shouldChangeType(unsigned FromWidth,
unsigned ToWidth) const {		unsigned ToWidth) const {
bool FromLegal = FromWidth == 1 \|\| DL.isLegalInteger(FromWidth);		bool FromLegal = FromWidth == 1 \|\| DL.isLegalInteger(FromWidth);
bool ToLegal = ToWidth == 1 \|\| DL.isLegalInteger(ToWidth);		bool ToLegal = ToWidth == 1 \|\| DL.isLegalInteger(ToWidth);

		// Convert to widths of 8, 16 or 32 even if they are not legal types. Only
		// shrink types, to prevent infinite loops.
		if (ToWidth < FromWidth && (ToWidth == 8 \|\| ToWidth == 16 \|\| ToWidth == 32))
		return true;

// If this is a legal integer from type, and the result would be an illegal		// If this is a legal integer from type, and the result would be an illegal
// type, don't do the transformation.		// type, don't do the transformation.
if (FromLegal && !ToLegal)		if (FromLegal && !ToLegal)
return false;		return false;

// Otherwise, if both are illegal, do not increase the size of the result. We		// Otherwise, if both are illegal, do not increase the size of the result. We
// do allow things like i160 -> i64, but not i64 -> i160.		// do allow things like i160 -> i64, but not i64 -> i160.
if (!FromLegal && !ToLegal && ToWidth > FromWidth)		if (!FromLegal && !ToLegal && ToWidth > FromWidth)
▲ Show 20 Lines • Show All 3,190 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstCombine/and-narrow.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -data-layout="n8:16:32" -S \| FileCheck %s --check-prefix=ALL --check-prefix=LEGAL8			; RUN: opt < %s -instcombine -data-layout="n8:16:32" -S \| FileCheck %s
	; RUN: opt < %s -instcombine -data-layout="n16" -S \| FileCheck %s --check-prefix=ALL --check-prefix=LEGAL16			; RUN: opt < %s -instcombine -data-layout="n16" -S \| FileCheck %s

	; PR35792 - https://bugs.llvm.org/show_bug.cgi?id=35792			; PR35792 - https://bugs.llvm.org/show_bug.cgi?id=35792

	define i16 @zext_add(i8 %x) {			define i16 @zext_add(i8 %x) {
	; LEGAL8-LABEL: @zext_add(			; CHECK-LABEL: @zext_add(
	; LEGAL8-NEXT: [[TMP1:%.]] = add i8 [[X:%.]], 44			; CHECK-NEXT: [[TMP1:%.]] = add i8 [[X:%.]], 44
	; LEGAL8-NEXT: [[TMP2:%.*]] = and i8 [[TMP1]], [[X]]			; CHECK-NEXT: [[TMP2:%.*]] = and i8 [[TMP1]], [[X]]
	; LEGAL8-NEXT: [[R:%.*]] = zext i8 [[TMP2]] to i16			; CHECK-NEXT: [[R:%.*]] = zext i8 [[TMP2]] to i16
	; LEGAL8-NEXT: ret i16 [[R]]			; CHECK-NEXT: ret i16 [[R]]
	;
	; LEGAL16-LABEL: @zext_add(
	; LEGAL16-NEXT: [[Z:%.]] = zext i8 [[X:%.]] to i16
	; LEGAL16-NEXT: [[B:%.*]] = add nuw nsw i16 [[Z]], 44
	; LEGAL16-NEXT: [[R:%.*]] = and i16 [[B]], [[Z]]
	; LEGAL16-NEXT: ret i16 [[R]]
	;			;
	%z = zext i8 %x to i16			%z = zext i8 %x to i16
	%b = add i16 %z, 44			%b = add i16 %z, 44
	%r = and i16 %b, %z			%r = and i16 %b, %z
	ret i16 %r			ret i16 %r
	}			}

	define i16 @zext_sub(i8 %x) {			define i16 @zext_sub(i8 %x) {
	; LEGAL8-LABEL: @zext_sub(			; CHECK-LABEL: @zext_sub(
	; LEGAL8-NEXT: [[TMP1:%.]] = sub i8 -5, [[X:%.]]			; CHECK-NEXT: [[TMP1:%.]] = sub i8 -5, [[X:%.]]
	; LEGAL8-NEXT: [[TMP2:%.*]] = and i8 [[TMP1]], [[X]]			; CHECK-NEXT: [[TMP2:%.*]] = and i8 [[TMP1]], [[X]]
	; LEGAL8-NEXT: [[R:%.*]] = zext i8 [[TMP2]] to i16			; CHECK-NEXT: [[R:%.*]] = zext i8 [[TMP2]] to i16
	; LEGAL8-NEXT: ret i16 [[R]]			; CHECK-NEXT: ret i16 [[R]]
	;
	; LEGAL16-LABEL: @zext_sub(
	; LEGAL16-NEXT: [[Z:%.]] = zext i8 [[X:%.]] to i16
	; LEGAL16-NEXT: [[B:%.*]] = sub nsw i16 251, [[Z]]
	; LEGAL16-NEXT: [[R:%.*]] = and i16 [[B]], [[Z]]
	; LEGAL16-NEXT: ret i16 [[R]]
	;			;
	%z = zext i8 %x to i16			%z = zext i8 %x to i16
	%b = sub i16 -5, %z			%b = sub i16 -5, %z
	%r = and i16 %b, %z			%r = and i16 %b, %z
	ret i16 %r			ret i16 %r
	}			}

	define i16 @zext_mul(i8 %x) {			define i16 @zext_mul(i8 %x) {
	; LEGAL8-LABEL: @zext_mul(			; CHECK-LABEL: @zext_mul(
	; LEGAL8-NEXT: [[TMP1:%.]] = mul i8 [[X:%.]], 3			; CHECK-NEXT: [[TMP1:%.]] = mul i8 [[X:%.]], 3
	; LEGAL8-NEXT: [[TMP2:%.*]] = and i8 [[TMP1]], [[X]]			; CHECK-NEXT: [[TMP2:%.*]] = and i8 [[TMP1]], [[X]]
	; LEGAL8-NEXT: [[R:%.*]] = zext i8 [[TMP2]] to i16			; CHECK-NEXT: [[R:%.*]] = zext i8 [[TMP2]] to i16
	; LEGAL8-NEXT: ret i16 [[R]]			; CHECK-NEXT: ret i16 [[R]]
	;
	; LEGAL16-LABEL: @zext_mul(
	; LEGAL16-NEXT: [[Z:%.]] = zext i8 [[X:%.]] to i16
	; LEGAL16-NEXT: [[B:%.*]] = mul nuw nsw i16 [[Z]], 3
	; LEGAL16-NEXT: [[R:%.*]] = and i16 [[B]], [[Z]]
	; LEGAL16-NEXT: ret i16 [[R]]
	;			;
	%z = zext i8 %x to i16			%z = zext i8 %x to i16
	%b = mul i16 %z, 3			%b = mul i16 %z, 3
	%r = and i16 %b, %z			%r = and i16 %b, %z
	ret i16 %r			ret i16 %r
	}			}

	define i16 @zext_lshr(i8 %x) {			define i16 @zext_lshr(i8 %x) {
	; LEGAL8-LABEL: @zext_lshr(			; CHECK-LABEL: @zext_lshr(
	; LEGAL8-NEXT: [[TMP1:%.]] = lshr i8 [[X:%.]], 4			; CHECK-NEXT: [[TMP1:%.]] = lshr i8 [[X:%.]], 4
	; LEGAL8-NEXT: [[TMP2:%.*]] = and i8 [[TMP1]], [[X]]			; CHECK-NEXT: [[TMP2:%.*]] = and i8 [[TMP1]], [[X]]
	; LEGAL8-NEXT: [[R:%.*]] = zext i8 [[TMP2]] to i16			; CHECK-NEXT: [[R:%.*]] = zext i8 [[TMP2]] to i16
	; LEGAL8-NEXT: ret i16 [[R]]			; CHECK-NEXT: ret i16 [[R]]
	;
	; LEGAL16-LABEL: @zext_lshr(
	; LEGAL16-NEXT: [[Z:%.]] = zext i8 [[X:%.]] to i16
	; LEGAL16-NEXT: [[B:%.*]] = lshr i16 [[Z]], 4
	; LEGAL16-NEXT: [[R:%.*]] = and i16 [[B]], [[Z]]
	; LEGAL16-NEXT: ret i16 [[R]]
	;			;
	%z = zext i8 %x to i16			%z = zext i8 %x to i16
	%b = lshr i16 %z, 4			%b = lshr i16 %z, 4
	%r = and i16 %b, %z			%r = and i16 %b, %z
	ret i16 %r			ret i16 %r
	}			}

	define i16 @zext_ashr(i8 %x) {			define i16 @zext_ashr(i8 %x) {
	; LEGAL8-LABEL: @zext_ashr(			; CHECK-LABEL: @zext_ashr(
	; LEGAL8-NEXT: [[TMP1:%.]] = lshr i8 [[X:%.]], 2			; CHECK-NEXT: [[TMP1:%.]] = lshr i8 [[X:%.]], 2
	; LEGAL8-NEXT: [[TMP2:%.*]] = and i8 [[TMP1]], [[X]]			; CHECK-NEXT: [[TMP2:%.*]] = and i8 [[TMP1]], [[X]]
	; LEGAL8-NEXT: [[R:%.*]] = zext i8 [[TMP2]] to i16			; CHECK-NEXT: [[R:%.*]] = zext i8 [[TMP2]] to i16
	; LEGAL8-NEXT: ret i16 [[R]]			; CHECK-NEXT: ret i16 [[R]]
	;
	; LEGAL16-LABEL: @zext_ashr(
	; LEGAL16-NEXT: [[Z:%.]] = zext i8 [[X:%.]] to i16
	; LEGAL16-NEXT: [[TMP1:%.*]] = lshr i16 [[Z]], 2
	; LEGAL16-NEXT: [[R:%.*]] = and i16 [[TMP1]], [[Z]]
	; LEGAL16-NEXT: ret i16 [[R]]
	;			;
	%z = zext i8 %x to i16			%z = zext i8 %x to i16
	%b = ashr i16 %z, 2			%b = ashr i16 %z, 2
	%r = and i16 %b, %z			%r = and i16 %b, %z
	ret i16 %r			ret i16 %r
	}			}

	define i16 @zext_shl(i8 %x) {			define i16 @zext_shl(i8 %x) {
	; LEGAL8-LABEL: @zext_shl(			; CHECK-LABEL: @zext_shl(
	; LEGAL8-NEXT: [[TMP1:%.]] = shl i8 [[X:%.]], 3			; CHECK-NEXT: [[TMP1:%.]] = shl i8 [[X:%.]], 3
	; LEGAL8-NEXT: [[TMP2:%.*]] = and i8 [[TMP1]], [[X]]			; CHECK-NEXT: [[TMP2:%.*]] = and i8 [[TMP1]], [[X]]
	; LEGAL8-NEXT: [[R:%.*]] = zext i8 [[TMP2]] to i16			; CHECK-NEXT: [[R:%.*]] = zext i8 [[TMP2]] to i16
	; LEGAL8-NEXT: ret i16 [[R]]			; CHECK-NEXT: ret i16 [[R]]
	;
	; LEGAL16-LABEL: @zext_shl(
	; LEGAL16-NEXT: [[Z:%.]] = zext i8 [[X:%.]] to i16
	; LEGAL16-NEXT: [[B:%.*]] = shl nuw nsw i16 [[Z]], 3
	; LEGAL16-NEXT: [[R:%.*]] = and i16 [[B]], [[Z]]
	; LEGAL16-NEXT: ret i16 [[R]]
	;			;
	%z = zext i8 %x to i16			%z = zext i8 %x to i16
	%b = shl i16 %z, 3			%b = shl i16 %z, 3
	%r = and i16 %b, %z			%r = and i16 %b, %z
	ret i16 %r			ret i16 %r
	}			}

	define <2 x i16> @zext_add_vec(<2 x i8> %x) {			define <2 x i16> @zext_add_vec(<2 x i8> %x) {
	; ALL-LABEL: @zext_add_vec(			; CHECK-LABEL: @zext_add_vec(
	; ALL-NEXT: [[TMP1:%.]] = add <2 x i8> [[X:%.]], <i8 44, i8 42>			; CHECK-NEXT: [[TMP1:%.]] = add <2 x i8> [[X:%.]], <i8 44, i8 42>
	; ALL-NEXT: [[TMP2:%.*]] = and <2 x i8> [[TMP1]], [[X]]			; CHECK-NEXT: [[TMP2:%.*]] = and <2 x i8> [[TMP1]], [[X]]
	; ALL-NEXT: [[R:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i16>			; CHECK-NEXT: [[R:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i16>
	; ALL-NEXT: ret <2 x i16> [[R]]			; CHECK-NEXT: ret <2 x i16> [[R]]
	;			;
	%z = zext <2 x i8> %x to <2 x i16>			%z = zext <2 x i8> %x to <2 x i16>
	%b = add <2 x i16> %z, <i16 44, i16 42>			%b = add <2 x i16> %z, <i16 44, i16 42>
	%r = and <2 x i16> %b, %z			%r = and <2 x i16> %b, %z
	ret <2 x i16> %r			ret <2 x i16> %r
	}			}

	define <2 x i16> @zext_sub_vec(<2 x i8> %x) {			define <2 x i16> @zext_sub_vec(<2 x i8> %x) {
	; ALL-LABEL: @zext_sub_vec(			; CHECK-LABEL: @zext_sub_vec(
	; ALL-NEXT: [[TMP1:%.]] = sub <2 x i8> <i8 -5, i8 -4>, [[X:%.]]			; CHECK-NEXT: [[TMP1:%.]] = sub <2 x i8> <i8 -5, i8 -4>, [[X:%.]]
	; ALL-NEXT: [[TMP2:%.*]] = and <2 x i8> [[TMP1]], [[X]]			; CHECK-NEXT: [[TMP2:%.*]] = and <2 x i8> [[TMP1]], [[X]]
	; ALL-NEXT: [[R:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i16>			; CHECK-NEXT: [[R:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i16>
	; ALL-NEXT: ret <2 x i16> [[R]]			; CHECK-NEXT: ret <2 x i16> [[R]]
	;			;
	%z = zext <2 x i8> %x to <2 x i16>			%z = zext <2 x i8> %x to <2 x i16>
	%b = sub <2 x i16> <i16 -5, i16 -4>, %z			%b = sub <2 x i16> <i16 -5, i16 -4>, %z
	%r = and <2 x i16> %b, %z			%r = and <2 x i16> %b, %z
	ret <2 x i16> %r			ret <2 x i16> %r
	}			}

	define <2 x i16> @zext_mul_vec(<2 x i8> %x) {			define <2 x i16> @zext_mul_vec(<2 x i8> %x) {
	; ALL-LABEL: @zext_mul_vec(			; CHECK-LABEL: @zext_mul_vec(
	; ALL-NEXT: [[TMP1:%.]] = mul <2 x i8> [[X:%.]], <i8 3, i8 -2>			; CHECK-NEXT: [[TMP1:%.]] = mul <2 x i8> [[X:%.]], <i8 3, i8 -2>
	; ALL-NEXT: [[TMP2:%.*]] = and <2 x i8> [[TMP1]], [[X]]			; CHECK-NEXT: [[TMP2:%.*]] = and <2 x i8> [[TMP1]], [[X]]
	; ALL-NEXT: [[R:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i16>			; CHECK-NEXT: [[R:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i16>
	; ALL-NEXT: ret <2 x i16> [[R]]			; CHECK-NEXT: ret <2 x i16> [[R]]
	;			;
	%z = zext <2 x i8> %x to <2 x i16>			%z = zext <2 x i8> %x to <2 x i16>
	%b = mul <2 x i16> %z, <i16 3, i16 -2>			%b = mul <2 x i16> %z, <i16 3, i16 -2>
	%r = and <2 x i16> %b, %z			%r = and <2 x i16> %b, %z
	ret <2 x i16> %r			ret <2 x i16> %r
	}			}

	define <2 x i16> @zext_lshr_vec(<2 x i8> %x) {			define <2 x i16> @zext_lshr_vec(<2 x i8> %x) {
	; ALL-LABEL: @zext_lshr_vec(			; CHECK-LABEL: @zext_lshr_vec(
	; ALL-NEXT: [[TMP1:%.]] = lshr <2 x i8> [[X:%.]], <i8 4, i8 2>			; CHECK-NEXT: [[TMP1:%.]] = lshr <2 x i8> [[X:%.]], <i8 4, i8 2>
	; ALL-NEXT: [[TMP2:%.*]] = and <2 x i8> [[TMP1]], [[X]]			; CHECK-NEXT: [[TMP2:%.*]] = and <2 x i8> [[TMP1]], [[X]]
	; ALL-NEXT: [[R:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i16>			; CHECK-NEXT: [[R:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i16>
	; ALL-NEXT: ret <2 x i16> [[R]]			; CHECK-NEXT: ret <2 x i16> [[R]]
	;			;
	%z = zext <2 x i8> %x to <2 x i16>			%z = zext <2 x i8> %x to <2 x i16>
	%b = lshr <2 x i16> %z, <i16 4, i16 2>			%b = lshr <2 x i16> %z, <i16 4, i16 2>
	%r = and <2 x i16> %b, %z			%r = and <2 x i16> %b, %z
	ret <2 x i16> %r			ret <2 x i16> %r
	}			}

	define <2 x i16> @zext_ashr_vec(<2 x i8> %x) {			define <2 x i16> @zext_ashr_vec(<2 x i8> %x) {
	; ALL-LABEL: @zext_ashr_vec(			; CHECK-LABEL: @zext_ashr_vec(
	; ALL-NEXT: [[TMP1:%.]] = lshr <2 x i8> [[X:%.]], <i8 2, i8 3>			; CHECK-NEXT: [[TMP1:%.]] = lshr <2 x i8> [[X:%.]], <i8 2, i8 3>
	; ALL-NEXT: [[TMP2:%.*]] = and <2 x i8> [[TMP1]], [[X]]			; CHECK-NEXT: [[TMP2:%.*]] = and <2 x i8> [[TMP1]], [[X]]
	; ALL-NEXT: [[R:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i16>			; CHECK-NEXT: [[R:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i16>
	; ALL-NEXT: ret <2 x i16> [[R]]			; CHECK-NEXT: ret <2 x i16> [[R]]
	;			;
	%z = zext <2 x i8> %x to <2 x i16>			%z = zext <2 x i8> %x to <2 x i16>
	%b = ashr <2 x i16> %z, <i16 2, i16 3>			%b = ashr <2 x i16> %z, <i16 2, i16 3>
	%r = and <2 x i16> %b, %z			%r = and <2 x i16> %b, %z
	ret <2 x i16> %r			ret <2 x i16> %r
	}			}

	define <2 x i16> @zext_shl_vec(<2 x i8> %x) {			define <2 x i16> @zext_shl_vec(<2 x i8> %x) {
	; ALL-LABEL: @zext_shl_vec(			; CHECK-LABEL: @zext_shl_vec(
	; ALL-NEXT: [[TMP1:%.]] = shl <2 x i8> [[X:%.]], <i8 3, i8 2>			; CHECK-NEXT: [[TMP1:%.]] = shl <2 x i8> [[X:%.]], <i8 3, i8 2>
	; ALL-NEXT: [[TMP2:%.*]] = and <2 x i8> [[TMP1]], [[X]]			; CHECK-NEXT: [[TMP2:%.*]] = and <2 x i8> [[TMP1]], [[X]]
	; ALL-NEXT: [[R:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i16>			; CHECK-NEXT: [[R:%.*]] = zext <2 x i8> [[TMP2]] to <2 x i16>
	; ALL-NEXT: ret <2 x i16> [[R]]			; CHECK-NEXT: ret <2 x i16> [[R]]
	;			;
	%z = zext <2 x i8> %x to <2 x i16>			%z = zext <2 x i8> %x to <2 x i16>
	%b = shl <2 x i16> %z, <i16 3, i16 2>			%b = shl <2 x i16> %z, <i16 3, i16 2>
	%r = and <2 x i16> %b, %z			%r = and <2 x i16> %b, %z
	ret <2 x i16> %r			ret <2 x i16> %r
	}			}

	; Don't create poison by narrowing a shift below the shift amount.			; Don't create poison by narrowing a shift below the shift amount.

	define <2 x i16> @zext_lshr_vec_overshift(<2 x i8> %x) {			define <2 x i16> @zext_lshr_vec_overshift(<2 x i8> %x) {
	; ALL-LABEL: @zext_lshr_vec_overshift(			; CHECK-LABEL: @zext_lshr_vec_overshift(
	; ALL-NEXT: [[Z:%.]] = zext <2 x i8> [[X:%.]] to <2 x i16>			; CHECK-NEXT: [[Z:%.]] = zext <2 x i8> [[X:%.]] to <2 x i16>
	; ALL-NEXT: [[B:%.*]] = lshr <2 x i16> [[Z]], <i16 4, i16 8>			; CHECK-NEXT: [[B:%.*]] = lshr <2 x i16> [[Z]], <i16 4, i16 8>
	; ALL-NEXT: [[R:%.*]] = and <2 x i16> [[B]], [[Z]]			; CHECK-NEXT: [[R:%.*]] = and <2 x i16> [[B]], [[Z]]
	; ALL-NEXT: ret <2 x i16> [[R]]			; CHECK-NEXT: ret <2 x i16> [[R]]
	;			;
	%z = zext <2 x i8> %x to <2 x i16>			%z = zext <2 x i8> %x to <2 x i16>
	%b = lshr <2 x i16> %z, <i16 4, i16 8>			%b = lshr <2 x i16> %z, <i16 4, i16 8>
	%r = and <2 x i16> %b, %z			%r = and <2 x i16> %b, %z
	ret <2 x i16> %r			ret <2 x i16> %r
	}			}

	; Don't create poison by narrowing a shift below the shift amount.			; Don't create poison by narrowing a shift below the shift amount.

	define <2 x i16> @zext_shl_vec_overshift(<2 x i8> %x) {			define <2 x i16> @zext_shl_vec_overshift(<2 x i8> %x) {
	; ALL-LABEL: @zext_shl_vec_overshift(			; CHECK-LABEL: @zext_shl_vec_overshift(
	; ALL-NEXT: [[Z:%.]] = zext <2 x i8> [[X:%.]] to <2 x i16>			; CHECK-NEXT: [[Z:%.]] = zext <2 x i8> [[X:%.]] to <2 x i16>
	; ALL-NEXT: [[B:%.*]] = shl <2 x i16> [[Z]], <i16 8, i16 2>			; CHECK-NEXT: [[B:%.*]] = shl <2 x i16> [[Z]], <i16 8, i16 2>
	; ALL-NEXT: [[R:%.*]] = and <2 x i16> [[B]], [[Z]]			; CHECK-NEXT: [[R:%.*]] = and <2 x i16> [[B]], [[Z]]
	; ALL-NEXT: ret <2 x i16> [[R]]			; CHECK-NEXT: ret <2 x i16> [[R]]
	;			;
	%z = zext <2 x i8> %x to <2 x i16>			%z = zext <2 x i8> %x to <2 x i16>
	%b = shl <2 x i16> %z, <i16 8, i16 2>			%b = shl <2 x i16> %z, <i16 8, i16 2>
	%r = and <2 x i16> %b, %z			%r = and <2 x i16> %b, %z
	ret <2 x i16> %r			ret <2 x i16> %r
	}			}

llvm/trunk/test/Transforms/InstCombine/phi-timeout.ll

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -instcombine -S -debug 2>&1 \| FileCheck %s
				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"

				; We are really checking that this doesn't loop forever. We would never
				; actually get to the checks here if it did.

				define void @timeout(i16* nocapture readonly %cinfo) {
				; CHECK-LABEL: @timeout(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[ARRAYIDX15:%.]] = getelementptr inbounds i16, i16 [[CINFO:%.*]], i32 2
				; CHECK-NEXT: [[L:%.]] = load i16, i16 [[ARRAYIDX15]], align 2
				; CHECK-NEXT: [[CMP17:%.*]] = icmp eq i16 [[L]], 0
				; CHECK-NEXT: [[EXTRACT_T1:%.*]] = trunc i16 [[L]] to i8
				; CHECK-NEXT: br i1 [[CMP17]], label [[IF_THEN:%.]], label [[IF_END:%.]]
				; CHECK: if.then:
				; CHECK-NEXT: [[DOTPRE:%.]] = load i16, i16 [[ARRAYIDX15]], align 2
				; CHECK-NEXT: [[EXTRACT_T:%.*]] = trunc i16 [[DOTPRE]] to i8
				; CHECK-NEXT: br label [[IF_END]]
				; CHECK: if.end:
				; CHECK-NEXT: [[P_OFF0:%.*]] = phi i8 [ [[EXTRACT_T]], [[IF_THEN]] ], [ [[EXTRACT_T1]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[SUB:%.*]] = add i8 [[P_OFF0]], -1
				; CHECK-NEXT: store i8 [[SUB]], i8* undef, align 1
				; CHECK-NEXT: br label [[FOR_BODY]]
				;
				entry:
				br label %for.body

				for.body:
				%arrayidx15 = getelementptr inbounds i16, i16* %cinfo, i32 2
				%l = load i16, i16* %arrayidx15, align 2
				%cmp17 = icmp eq i16 %l, 0
				br i1 %cmp17, label %if.then, label %if.end

				if.then:
				%.pre = load i16, i16* %arrayidx15, align 2
				br label %if.end

				if.end:
				%p = phi i16 [ %.pre, %if.then ], [ %l, %for.body ]
				%conv19 = trunc i16 %p to i8
				%sub = add i8 %conv19, -1
				store i8 %sub, i8* undef, align 1
				br label %for.body
				}

llvm/trunk/test/Transforms/InstCombine/should-change-type.ll

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -instcombine -S \| FileCheck %s
				target datalayout = "n64"

				; Tests for removing zext/trunc from/to i8, i16 and i32, even if it is
				; not a legal type.

				define i8 @test1(i8 %x, i8 %y) {
				; CHECK-LABEL: @test1(
				; CHECK-NEXT: [[C:%.]] = add i8 [[X:%.]], [[Y:%.*]]
				; CHECK-NEXT: ret i8 [[C]]
				;
				%xz = zext i8 %x to i64
				%yz = zext i8 %y to i64
				%c = add i64 %xz, %yz
				%d = trunc i64 %c to i8
				ret i8 %d
				}

				define i16 @test2(i16 %x, i16 %y) {
				; CHECK-LABEL: @test2(
				; CHECK-NEXT: [[C:%.]] = add i16 [[X:%.]], [[Y:%.*]]
				; CHECK-NEXT: ret i16 [[C]]
				;
				%xz = zext i16 %x to i64
				%yz = zext i16 %y to i64
				%c = add i64 %xz, %yz
				%d = trunc i64 %c to i16
				ret i16 %d
				}

				define i32 @test3(i32 %x, i32 %y) {
				; CHECK-LABEL: @test3(
				; CHECK-NEXT: [[C:%.]] = add i32 [[X:%.]], [[Y:%.*]]
				; CHECK-NEXT: ret i32 [[C]]
				;
				%xz = zext i32 %x to i64
				%yz = zext i32 %y to i64
				%c = add i64 %xz, %yz
				%d = trunc i64 %c to i32
				ret i32 %d
				}

				define i9 @test4(i9 %x, i9 %y) {
				; CHECK-LABEL: @test4(
				; CHECK-NEXT: [[XZ:%.]] = zext i9 [[X:%.]] to i64
				; CHECK-NEXT: [[YZ:%.]] = zext i9 [[Y:%.]] to i64
				; CHECK-NEXT: [[C:%.*]] = add nuw nsw i64 [[XZ]], [[YZ]]
				; CHECK-NEXT: [[D:%.*]] = trunc i64 [[C]] to i9
				; CHECK-NEXT: ret i9 [[D]]
				;
				%xz = zext i9 %x to i64
				%yz = zext i9 %y to i64
				%c = add i64 %xz, %yz
				%d = trunc i64 %c to i9
				ret i9 %d
				}