This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
InstCombineShifts.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
redundant-left-shift-input-masking-variant-a.ll

Differential D64512

[InstCombine] Dropping redundant masking before left-shift [0/5] (PR42563)
ClosedPublic

Authored by lebedev.ri on Jul 10 2019, 10:08 AM.

Download Raw Diff

Details

Reviewers

spatel
nikic
huihuiz
xbolva00

Commits

rGa5f0824eb518: [InstCombine] Dropping redundant masking before left-shift [0/5] (PR42563)
rL366535: [InstCombine] Dropping redundant masking before left-shift [0/5] (PR42563)

Summary

If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.

There are many variants to this pattern:
a. (x & ((1 << MaskShAmt) - 1)) << ShiftShAmt
All these patterns can be simplified to just:
x << ShiftShAmt
iff:
a. (MaskShAmt+ShiftShAmt) u>= bitwidth(x)

alive proof:
a: https://rise4fun.com/Alive/wi9

Indeed, not all of these patterns are canonical.
But since this fold will only produce a single instruction
i'm really interested in handling even uncanonical patterns,
since i have this general kind of pattern in hotpaths,
and it is not totally outlandish for bit-twiddling code.

For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.

https://bugs.llvm.org/show_bug.cgi?id=42563

Diff Detail

Repository: rL LLVM

Event Timeline

lebedev.ri created this revision.Jul 10 2019, 10:08 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptJul 10 2019, 10:08 AM

lebedev.ri added a child revision: D64514: [InstCombine] Dropping redundant masking before left-shift [1/5] (PR42563).Jul 10 2019, 10:14 AM

Rebased over re-combed tests, NFC.

lebedev.ri marked 2 inline comments as done.Jul 12 2019, 2:28 PM

lebedev.ri added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
73 ↗	(On Diff #209047)	Been thinking about this, and while the case where `ShiftShAmt` is a constant will already work, the case where mask is constant, is another case here: https://rise4fun.com/Alive/mc8 Will add that in some next follow-up patch.
103–104 ↗	(On Diff #209047)	It may be possible to use `SimplifyICmpInst()` here, to catch more cases, may look into that after patchset.

Up.
I'm perfectly happy to do trivial follow-up patches in post-commit review mode,
but i feel like the base patch/the approach might be good to be reviewed..

Looks fine

llvm/test/Transforms/InstCombine/redundant-left-shift-input-masking-variant-a.ll
343 ↗	(On Diff #209047)	What about ‘exact’ ?

This revision is now accepted and ready to land.Jul 16 2019, 5:41 AM

@spatel any comments?

llvm/test/Transforms/InstCombine/redundant-left-shift-input-masking-variant-a.ll
343 ↗	(On Diff #209047)	In all these tests the last instruction is `shl`, not `lshr`; `shl` can only have `nuw`/`nsw`/`nuw nsw`. In any case, the new shift must not have any of these flags, because now that we have dropped masking, we no longer know that we won't shift-out non-0 bits. alive obviously agrees: https://rise4fun.com/Alive/NDG https://rise4fun.com/Alive/sliGT https://rise4fun.com/Alive/TWI5 https://rise4fun.com/Alive/EiG https://rise4fun.com/Alive/Fsly

xbolva00 added inline comments.Jul 16 2019, 6:37 AM

llvm/test/Transforms/InstCombine/redundant-left-shift-input-masking-variant-a.ll
343 ↗	(On Diff #209047)	Thanks

spatel added a subscriber: efriedma.Jul 16 2019, 8:04 AM

spatel added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
81–83 ↗	(On Diff #209047)	assert on opcode and assign operands rather than using match - the input must be 'shl'?
103–104 ↗	(On Diff #209047)	Using instsimplify like that (or as currently in this patch with SimplifyBinOp) is where we go into the InstCombine gray area of, "Is it worth the cost?" With InstSimplify, we're using a potentially expensive/recursive analysis (similar to ValueTracking), and we know that's where InstCombine spends a lot of time. cc'ing @efriedma for further thoughts.
108 ↗	(On Diff #209047)	Add a code comment similar to what you explained here in the review: "We no longer know that we won't shift-out non-0 bits."

lebedev.ri added inline comments.Jul 16 2019, 8:40 AM

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
103–104 ↗	(On Diff #209047)	I'm only seeing two big alternatives here: a) not doing these kind of optimizations at all. i sure hope this isn't the answer :) b) patternmatch proliferation - basically just reimplement every interesting fold that we'd get via `SimplifyBinOp` in here - code bloat and wasted dev time, i'd rather not. These patterns can come up within bit stream manipulation codepaths (think like `llvm/lib/Bitcode`), and they may end up being the hotpath, so it's good to not have them there. I'm not sure i have any numbers at hand, but as a ballpark, in the end with all these shift optimizations (including those that i didn't do yet), i'm hoping to reclaim* -5%..-25% of runtime. As a middle-ground, can the recursion depth be specified when calling `SimplifyBinOp()`? I'm not seeing any cleverer fix here. reclaim - when changing this buffer abstraction from return (cache >> (fillLevel - count)) & ((1U << count) - 1U); to return (cache >> (32 - count)); that is the perf regression i'm getting; and looking at the crude IR in regressed benchmarks, i think* some of that loss i will reclaim.

spatel added inline comments.Jul 16 2019, 9:21 AM

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
103–104 ↗	(On Diff #209047)	I agree - I'd like to have the optimization. I'm only acknowledging our long-standing problem with instcombine. There's 1 more possibility: move this to AggressiveInstCombine (which is currently only enabled at -O3...).

To follow-up on inline comment - right now (llvm master vs llvm master with this patchset; rawspeed develop with no patches ontop) this fold happens once:

$ /repositories/llvm-test-suite/utils/compare.py -m instcombine.MasksDroped /builddirs/llvm-project/build-llvm-test-suite-{old,new}/results.json
/repositories/llvm-test-suite/utils/compare.py:109: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

  d = pd.concat(datasets, axis=0, names=['run'], keys=datasetnames)
Tests: 198
Metric: instcombine.MasksDroped

Program                                        results results0 diff 
test-suite :: RawSpeed/RawSpeed.test           NaN      1.00    nan%
test-suite...AllocatorAdaptorBenchmark.test    NaN     NaN      nan%
test-suite...lateDecompressorBenchmark.test    NaN     NaN      nan%
test-suite...sRawInterpolatorBenchmark.test    NaN     NaN      nan%
test-suite...eed/io/BitStreamBenchmark.test    NaN     NaN      nan%
test-suite...a/CameraMetaDataBenchmark.test    NaN     NaN      nan%
test-suite...fDecoderFuzzer-ArwDecoder.test    NaN     NaN      nan%
test-suite...fDecoderFuzzer-Cr2Decoder.test    NaN     NaN      nan%
test-suite...fDecoderFuzzer-DcrDecoder.test    NaN     NaN      nan%
test-suite...fDecoderFuzzer-DcsDecoder.test    NaN     NaN      nan%
test-suite...fDecoderFuzzer-DngDecoder.test    NaN     NaN      nan%
test-suite...fDecoderFuzzer-ErfDecoder.test    NaN     NaN      nan%
test-suite...fDecoderFuzzer-IiqDecoder.test    NaN     NaN      nan%
test-suite...fDecoderFuzzer-KdcDecoder.test    NaN     NaN      nan%
test-suite...fDecoderFuzzer-MefDecoder.test    NaN     NaN      nan%
Geomean difference                                              nan%
       results  results0  diff
count  0.0      1.0       0.0 
mean  NaN       1.0      NaN  
std   NaN      NaN       NaN  
min   NaN       1.0      NaN  
25%   NaN       1.0      NaN  
50%   NaN       1.0      NaN  
75%   NaN       1.0      NaN  
max   NaN       1.0      NaN

/builddirs/llvm-project/build-llvm-test-suite-new$ grep -r "instcombine.MasksDroped"
RawSpeed/build/CMakeFiles/rawspeed.dir/src/librawspeed/decompressors/SamsungV0Decompressor.stats:       "instcombine.MasksDroped": 1,
results.json:        "instcombine.MasksDroped": 1.0,

I'm expecting that this number will be better in the end, when all the bits are in place.

While there, the previous fold (+reassociateShiftAmtsOfTwoSameDirectionShifts(), D63812) is more frequent:

$ /repositories/llvm-test-suite/utils/compare.py -m instcombine.ShiftsCombined /builddirs/llvm-project/build-llvm-test-suite-{old,new}/results.json
/repositories/llvm-test-suite/utils/compare.py:109: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

  d = pd.concat(datasets, axis=0, names=['run'], keys=datasetnames)
Tests: 198
Metric: instcombine.ShiftsCombined

Program                                        results results0 diff 
test-suite :: RawSpeed/RawSpeed.test           26.00   26.00    0.0%
test-suite...AllocatorAdaptorBenchmark.test    NaN     NaN      nan%
test-suite...lateDecompressorBenchmark.test    NaN     NaN      nan%
test-suite...sRawInterpolatorBenchmark.test    NaN     NaN      nan%
test-suite...eed/io/BitStreamBenchmark.test    NaN     NaN      nan%
test-suite...a/CameraMetaDataBenchmark.test    NaN     NaN      nan%
test-suite...fDecoderFuzzer-ArwDecoder.test    NaN     NaN      nan%
test-suite...fDecoderFuzzer-Cr2Decoder.test    NaN     NaN      nan%
test-suite...fDecoderFuzzer-DcrDecoder.test    NaN     NaN      nan%
test-suite...fDecoderFuzzer-DcsDecoder.test    NaN     NaN      nan%
test-suite...fDecoderFuzzer-DngDecoder.test    NaN     NaN      nan%
test-suite...fDecoderFuzzer-ErfDecoder.test    NaN     NaN      nan%
test-suite...fDecoderFuzzer-IiqDecoder.test    NaN     NaN      nan%
test-suite...fDecoderFuzzer-KdcDecoder.test    NaN     NaN      nan%
test-suite...fDecoderFuzzer-MefDecoder.test    NaN     NaN      nan%
Geomean difference                                              nan%
       results  results0  diff
count  1.0      1.0       1.0 
mean   26.0     26.0      0.0 
std   NaN      NaN       NaN  
min    26.0     26.0      0.0 
25%    26.0     26.0      0.0 
50%    26.0     26.0      0.0 
75%    26.0     26.0      0.0 
max    26.0     26.0      0.0

The performance implications of this patchset are as i expected them to be:

raw.pixls.us-unique/Samsung/NX30$ //usr/src/googlebenchmark/tools/compare.py -a benchmarks /builddirs/llvm-project/build-llvm-test-suite-{old,new}/RawSpeed/build/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_repetitions=128 --benchmark_min_time=0.00000001 2015-03-07-163604_sam_7204.srw 
RUNNING: /builddirs/llvm-project/build-llvm-test-suite-old/RawSpeed/build/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_repetitions=128 --benchmark_min_time=0.00000001 2015-03-07-163604_sam_7204.srw --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmp3aOcOJ
2019-07-16 19:55:51
Running /builddirs/llvm-project/build-llvm-test-suite-old/RawSpeed/build/src/utilities/rsbench/rsbench
Run on (8 X 4000 MHz CPU s)
CPU Caches:
  L1 Data 16K (x8)
  L1 Instruction 64K (x4)
  L2 Unified 2048K (x4)
  L3 Unified 8192K (x1)
Load Average: 0.88, 0.76, 1.53
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                       Time             CPU   Iterations  CPUTime,s CPUTime/WallTime     Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2015-03-07-163604_sam_7204.srw/threads:1/process_time/real_time_mean          136 ms          136 ms          128   0.135973          0.99989   20.5978M       151.485M        151.468M      7.35441        7.3536   0.135988
2015-03-07-163604_sam_7204.srw/threads:1/process_time/real_time_median        136 ms          136 ms          128   0.135954                1   20.5978M       151.507M        151.507M      7.35546       7.35547   0.135953
2015-03-07-163604_sam_7204.srw/threads:1/process_time/real_time_stddev      0.212 ms        0.193 ms          128   193.542u         237.857u          0       215.294k        236.262k    0.0104522     0.0114703   212.466u
RUNNING: /builddirs/llvm-project/build-llvm-test-suite-new/RawSpeed/build/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_repetitions=128 --benchmark_min_time=0.00000001 2015-03-07-163604_sam_7204.srw --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmpWIQdvn
2019-07-16 19:56:10
Running /builddirs/llvm-project/build-llvm-test-suite-new/RawSpeed/build/src/utilities/rsbench/rsbench
Run on (8 X 4000 MHz CPU s)
CPU Caches:
  L1 Data 16K (x8)
  L1 Instruction 64K (x4)
  L2 Unified 2048K (x4)
  L3 Unified 8192K (x1)
Load Average: 0.92, 0.78, 1.52
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                       Time             CPU   Iterations  CPUTime,s CPUTime/WallTime     Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2015-03-07-163604_sam_7204.srw/threads:1/process_time/real_time_mean          131 ms          131 ms          128   0.131231         0.999992   20.5978M       156.959M        156.958M      7.62015        7.6201   0.131232
2015-03-07-163604_sam_7204.srw/threads:1/process_time/real_time_median        131 ms          131 ms          128   0.131175                1   20.5978M       157.026M        157.024M      7.62343        7.6233   0.131177
2015-03-07-163604_sam_7204.srw/threads:1/process_time/real_time_stddev      0.218 ms        0.218 ms          128   217.942u         33.6202u          0       259.861k        259.966k    0.0126159      0.012621   218.033u
Comparing /builddirs/llvm-project/build-llvm-test-suite-old/RawSpeed/build/src/utilities/rsbench/rsbench to /builddirs/llvm-project/build-llvm-test-suite-new/RawSpeed/build/src/utilities/rsbench/rsbench
Benchmark                                                                                Time             CPU      Time Old      Time New       CPU Old       CPU New
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
2015-03-07-163604_sam_7204.srw/threads:1/process_time/real_time_pvalue                 0.0000          0.0000      U Test, Repetitions: 128 vs 128
2015-03-07-163604_sam_7204.srw/threads:1/process_time/real_time_mean                  -0.0350         -0.0349           136           131           136           131
2015-03-07-163604_sam_7204.srw/threads:1/process_time/real_time_median                -0.0351         -0.0352           136           131           136           131
2015-03-07-163604_sam_7204.srw/threads:1/process_time/real_time_stddev                +0.0260         +0.1253             0             0             0             0

-4% improvement is notable.

With InstSimplify, we're using a potentially expensive/recursive analysis (similar to ValueTracking), and we know that's where InstCombine spends a lot of time.

cc'ing @efriedma for further thoughts.

There's 1 more possibility: move this to AggressiveInstCombine (which is currently only enabled at -O3...).

Note that before we get to the SimplifyBinOp() we first need to match 3 or 4 instructions,
which is more strict than the previously-added reassociateShiftAmtsOfTwoSameDirectionShifts() fold.
@efriedma If retargeting *this* fold into AggressiveInstCombine is the only acceptable way forward
please let me know to avoid lack of forward progress/stalling..

I As discussed at http://lists.llvm.org/pipermail/llvm-dev/2017-March/111257.html, instcombine is probably doing too much work in general given the number of times it's run. But I don't think this specific use of instsimplify here is likely to be a problem in practice; given the pattern guard, it should trigger very rarely.

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
96 ↗	(On Diff #209047)	SimplifyAddInst?

@xbolva00, @spatel, @efriedma thank you for the review!

In D64512#1592416, @efriedma wrote:

I As discussed at http://lists.llvm.org/pipermail/llvm-dev/2017-March/111257.html, instcombine is probably doing too much work in general given the number of times it's run.

Yep, i agree. Compile time is important.

In D64512#1592416, @efriedma wrote:

But I don't think this specific use of instsimplify here is likely to be a problem in practice; given the pattern guard, it should trigger very rarely.

Okay then, thank you for taking a look!
I will then proceed with this design (keeping it in instcombine).

Will rebase this addressing inline notes and land this set - the followup patches are
trivial extensions that simply teach it about more similar patterns,
if anyone wants to review those - be my guest, but i'm not expecting any surprises there.

Rebased, addressed review notes.

Closed by commit rL366535: [InstCombine] Dropping redundant masking before left-shift [0/5] (PR42563) (authored by lebedevri). · Explain WhyJul 19 2019, 1:27 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

InstCombine/

InstCombineShifts.cpp

50 lines

test/

Transforms/

InstCombine/

redundant-left-shift-input-masking-variant-a.ll

22 lines

Diff 210762

llvm/trunk/lib/Transforms/InstCombine/InstCombineShifts.cpp

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	if (ShiftOpcode == Instruction::BinaryOps::Shl) {
NewShift->setHasNoSignedWrap(Sh0->hasNoSignedWrap() &&		NewShift->setHasNoSignedWrap(Sh0->hasNoSignedWrap() &&
Sh1->hasNoSignedWrap());		Sh1->hasNoSignedWrap());
} else {		} else {
NewShift->setIsExact(Sh0->isExact() && Sh1->isExact());		NewShift->setIsExact(Sh0->isExact() && Sh1->isExact());
}		}
return NewShift;		return NewShift;
}		}

		// If we have some pattern that leaves only some low bits set, and then performs
		// left-shift of those bits, if none of the bits that are left after the final
		// shift are modified by the mask, we can omit the mask.
		//
		// There are many variants to this pattern:
		// a) (x & ((1 << MaskShAmt) - 1)) << ShiftShAmt
		// All these patterns can be simplified to just:
		// x << ShiftShAmt
		// iff:
		// a) (MaskShAmt+ShiftShAmt) u>= bitwidth(x)
		static Instruction *
		dropRedundantMaskingOfLeftShiftInput(BinaryOperator *OuterShift,
		const SimplifyQuery &SQ) {
		assert(OuterShift->getOpcode() == Instruction::BinaryOps::Shl &&
		"The input must be 'shl'!");

		Value *Masked = OuterShift->getOperand(0);
		Value *ShiftShAmt = OuterShift->getOperand(1);

		Value *MaskShAmt;

		// ((1 << MaskShAmt) - 1)
		auto MaskA = m_Add(m_Shl(m_One(), m_Value(MaskShAmt)), m_AllOnes());

		Value *X;
		if (!match(Masked, m_c_And(MaskA, m_Value(X))))
		return nullptr;

		// Can we simplify (MaskShAmt+ShiftShAmt) ?
		Value *SumOfShAmts =
		SimplifyAddInst(MaskShAmt, ShiftShAmt, /IsNSW=/false, /IsNUW=/false,
		SQ.getWithInstruction(OuterShift));
		if (!SumOfShAmts)
		return nullptr; // Did not simplify.
		// Is the total shift amount not smaller than the bit width?
		// FIXME: could also rely on ConstantRange.
		unsigned BitWidth = X->getType()->getScalarSizeInBits();
		if (!match(SumOfShAmts, m_SpecificInt_ICMP(ICmpInst::Predicate::ICMP_UGE,
		APInt(BitWidth, BitWidth))))
		return nullptr;
		// All good, we can do this fold.

		// No 'NUW'/'NSW'!
		// We no longer know that we won't shift-out non-0 bits.
		return BinaryOperator::Create(OuterShift->getOpcode(), X, ShiftShAmt);
		}

Instruction *InstCombiner::commonShiftTransforms(BinaryOperator &I) {		Instruction *InstCombiner::commonShiftTransforms(BinaryOperator &I) {
Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);		Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);
assert(Op0->getType() == Op1->getType());		assert(Op0->getType() == Op1->getType());

// See if we can fold away this shift.		// See if we can fold away this shift.
if (SimplifyDemandedInstructionBits(I))		if (SimplifyDemandedInstructionBits(I))
return &I;		return &I;

▲ Show 20 Lines • Show All 548 Lines • ▼ Show 20 Lines	if (Value *V = SimplifyShlInst(I.getOperand(0), I.getOperand(1),
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);

if (Instruction *X = foldVectorBinop(I))		if (Instruction *X = foldVectorBinop(I))
return X;		return X;

if (Instruction *V = commonShiftTransforms(I))		if (Instruction *V = commonShiftTransforms(I))
return V;		return V;

		if (Instruction *V = dropRedundantMaskingOfLeftShiftInput(&I, SQ))
		return V;

Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);		Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);
Type *Ty = I.getType();		Type *Ty = I.getType();
unsigned BitWidth = Ty->getScalarSizeInBits();		unsigned BitWidth = Ty->getScalarSizeInBits();

const APInt *ShAmtAPInt;		const APInt *ShAmtAPInt;
if (match(Op1, m_APInt(ShAmtAPInt))) {		if (match(Op1, m_APInt(ShAmtAPInt))) {
unsigned ShAmt = ShAmtAPInt->getZExtValue();		unsigned ShAmt = ShAmtAPInt->getZExtValue();
unsigned BitWidth = Ty->getScalarSizeInBits();		unsigned BitWidth = Ty->getScalarSizeInBits();
▲ Show 20 Lines • Show All 302 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstCombine/redundant-left-shift-input-masking-variant-a.ll

	Show All 19 Lines
	; CHECK-NEXT: [[T0:%.]] = shl i32 1, [[NBITS:%.]]			; CHECK-NEXT: [[T0:%.]] = shl i32 1, [[NBITS:%.]]
	; CHECK-NEXT: [[T1:%.*]] = add nsw i32 [[T0]], -1			; CHECK-NEXT: [[T1:%.*]] = add nsw i32 [[T0]], -1
	; CHECK-NEXT: [[T2:%.]] = and i32 [[T1]], [[X:%.]]			; CHECK-NEXT: [[T2:%.]] = and i32 [[T1]], [[X:%.]]
	; CHECK-NEXT: [[T3:%.*]] = sub i32 32, [[NBITS]]			; CHECK-NEXT: [[T3:%.*]] = sub i32 32, [[NBITS]]
	; CHECK-NEXT: call void @use32(i32 [[T0]])			; CHECK-NEXT: call void @use32(i32 [[T0]])
	; CHECK-NEXT: call void @use32(i32 [[T1]])			; CHECK-NEXT: call void @use32(i32 [[T1]])
	; CHECK-NEXT: call void @use32(i32 [[T2]])			; CHECK-NEXT: call void @use32(i32 [[T2]])
	; CHECK-NEXT: call void @use32(i32 [[T3]])			; CHECK-NEXT: call void @use32(i32 [[T3]])
	; CHECK-NEXT: [[T4:%.*]] = shl i32 [[T2]], [[T3]]			; CHECK-NEXT: [[T4:%.*]] = shl i32 [[X]], [[T3]]
	; CHECK-NEXT: ret i32 [[T4]]			; CHECK-NEXT: ret i32 [[T4]]
	;			;
	%t0 = shl i32 1, %nbits			%t0 = shl i32 1, %nbits
	%t1 = add nsw i32 %t0, -1			%t1 = add nsw i32 %t0, -1
	%t2 = and i32 %t1, %x			%t2 = and i32 %t1, %x
	%t3 = sub i32 32, %nbits			%t3 = sub i32 32, %nbits
	call void @use32(i32 %t0)			call void @use32(i32 %t0)
	call void @use32(i32 %t1)			call void @use32(i32 %t1)
	call void @use32(i32 %t2)			call void @use32(i32 %t2)
	call void @use32(i32 %t3)			call void @use32(i32 %t3)
	%t4 = shl i32 %t2, %t3			%t4 = shl i32 %t2, %t3
	ret i32 %t4			ret i32 %t4
	}			}

	define i32 @t1_bigger_shift(i32 %x, i32 %nbits) {			define i32 @t1_bigger_shift(i32 %x, i32 %nbits) {
	; CHECK-LABEL: @t1_bigger_shift(			; CHECK-LABEL: @t1_bigger_shift(
	; CHECK-NEXT: [[T0:%.]] = shl i32 1, [[NBITS:%.]]			; CHECK-NEXT: [[T0:%.]] = shl i32 1, [[NBITS:%.]]
	; CHECK-NEXT: [[T1:%.*]] = add nsw i32 [[T0]], -1			; CHECK-NEXT: [[T1:%.*]] = add nsw i32 [[T0]], -1
	; CHECK-NEXT: [[T2:%.]] = and i32 [[T1]], [[X:%.]]			; CHECK-NEXT: [[T2:%.]] = and i32 [[T1]], [[X:%.]]
	; CHECK-NEXT: [[T3:%.*]] = sub i32 33, [[NBITS]]			; CHECK-NEXT: [[T3:%.*]] = sub i32 33, [[NBITS]]
	; CHECK-NEXT: call void @use32(i32 [[T0]])			; CHECK-NEXT: call void @use32(i32 [[T0]])
	; CHECK-NEXT: call void @use32(i32 [[T1]])			; CHECK-NEXT: call void @use32(i32 [[T1]])
	; CHECK-NEXT: call void @use32(i32 [[T2]])			; CHECK-NEXT: call void @use32(i32 [[T2]])
	; CHECK-NEXT: call void @use32(i32 [[T3]])			; CHECK-NEXT: call void @use32(i32 [[T3]])
	; CHECK-NEXT: [[T4:%.*]] = shl i32 [[T2]], [[T3]]			; CHECK-NEXT: [[T4:%.*]] = shl i32 [[X]], [[T3]]
	; CHECK-NEXT: ret i32 [[T4]]			; CHECK-NEXT: ret i32 [[T4]]
	;			;
	%t0 = shl i32 1, %nbits			%t0 = shl i32 1, %nbits
	%t1 = add nsw i32 %t0, -1			%t1 = add nsw i32 %t0, -1
	%t2 = and i32 %t1, %x			%t2 = and i32 %t1, %x
	%t3 = sub i32 33, %nbits ; subtracting from bitwidth+1			%t3 = sub i32 33, %nbits ; subtracting from bitwidth+1
	call void @use32(i32 %t0)			call void @use32(i32 %t0)
	call void @use32(i32 %t1)			call void @use32(i32 %t1)
	Show All 10 Lines
	; CHECK-NEXT: [[T2:%.*]] = add nsw i32 [[T1]], -1			; CHECK-NEXT: [[T2:%.*]] = add nsw i32 [[T1]], -1
	; CHECK-NEXT: [[T3:%.]] = and i32 [[T2]], [[X:%.]]			; CHECK-NEXT: [[T3:%.]] = and i32 [[T2]], [[X:%.]]
	; CHECK-NEXT: [[T4:%.*]] = sub i32 32, [[NBITS]]			; CHECK-NEXT: [[T4:%.*]] = sub i32 32, [[NBITS]]
	; CHECK-NEXT: call void @use32(i32 [[T0]])			; CHECK-NEXT: call void @use32(i32 [[T0]])
	; CHECK-NEXT: call void @use32(i32 [[T1]])			; CHECK-NEXT: call void @use32(i32 [[T1]])
	; CHECK-NEXT: call void @use32(i32 [[T2]])			; CHECK-NEXT: call void @use32(i32 [[T2]])
	; CHECK-NEXT: call void @use32(i32 [[T3]])			; CHECK-NEXT: call void @use32(i32 [[T3]])
	; CHECK-NEXT: call void @use32(i32 [[T4]])			; CHECK-NEXT: call void @use32(i32 [[T4]])
	; CHECK-NEXT: [[T5:%.*]] = shl i32 [[T3]], [[T4]]			; CHECK-NEXT: [[T5:%.*]] = shl i32 [[X]], [[T4]]
	; CHECK-NEXT: ret i32 [[T5]]			; CHECK-NEXT: ret i32 [[T5]]
	;			;
	%t0 = add i32 %nbits, 1			%t0 = add i32 %nbits, 1
	%t1 = shl i32 1, %t0 ; shifting by nbits+1			%t1 = shl i32 1, %t0 ; shifting by nbits+1
	%t2 = add nsw i32 %t1, -1			%t2 = add nsw i32 %t1, -1
	%t3 = and i32 %t2, %x			%t3 = and i32 %t2, %x
	%t4 = sub i32 32, %nbits			%t4 = sub i32 32, %nbits
	call void @use32(i32 %t0)			call void @use32(i32 %t0)
	Show All 15 Lines
	; CHECK-NEXT: [[T2:%.*]] = add nsw <3 x i32> [[T1]], <i32 -1, i32 -1, i32 -1>			; CHECK-NEXT: [[T2:%.*]] = add nsw <3 x i32> [[T1]], <i32 -1, i32 -1, i32 -1>
	; CHECK-NEXT: [[T3:%.]] = and <3 x i32> [[T2]], [[X:%.]]			; CHECK-NEXT: [[T3:%.]] = and <3 x i32> [[T2]], [[X:%.]]
	; CHECK-NEXT: [[T4:%.*]] = sub <3 x i32> <i32 32, i32 32, i32 32>, [[NBITS]]			; CHECK-NEXT: [[T4:%.*]] = sub <3 x i32> <i32 32, i32 32, i32 32>, [[NBITS]]
	; CHECK-NEXT: call void @use3xi32(<3 x i32> [[NBITS]])			; CHECK-NEXT: call void @use3xi32(<3 x i32> [[NBITS]])
	; CHECK-NEXT: call void @use3xi32(<3 x i32> [[T1]])			; CHECK-NEXT: call void @use3xi32(<3 x i32> [[T1]])
	; CHECK-NEXT: call void @use3xi32(<3 x i32> [[T2]])			; CHECK-NEXT: call void @use3xi32(<3 x i32> [[T2]])
	; CHECK-NEXT: call void @use3xi32(<3 x i32> [[T3]])			; CHECK-NEXT: call void @use3xi32(<3 x i32> [[T3]])
	; CHECK-NEXT: call void @use3xi32(<3 x i32> [[T4]])			; CHECK-NEXT: call void @use3xi32(<3 x i32> [[T4]])
	; CHECK-NEXT: [[T5:%.*]] = shl <3 x i32> [[T3]], [[T4]]			; CHECK-NEXT: [[T5:%.*]] = shl <3 x i32> [[X]], [[T4]]
	; CHECK-NEXT: ret <3 x i32> [[T5]]			; CHECK-NEXT: ret <3 x i32> [[T5]]
	;			;
	%t0 = add <3 x i32> %nbits, <i32 0, i32 0, i32 0>			%t0 = add <3 x i32> %nbits, <i32 0, i32 0, i32 0>
	%t1 = shl <3 x i32> <i32 1, i32 1, i32 1>, %t0			%t1 = shl <3 x i32> <i32 1, i32 1, i32 1>, %t0
	%t2 = add nsw <3 x i32> %t1, <i32 -1, i32 -1, i32 -1>			%t2 = add nsw <3 x i32> %t1, <i32 -1, i32 -1, i32 -1>
	%t3 = and <3 x i32> %t2, %x			%t3 = and <3 x i32> %t2, %x
	%t4 = sub <3 x i32> <i32 32, i32 32, i32 32>, %nbits			%t4 = sub <3 x i32> <i32 32, i32 32, i32 32>, %nbits
	call void @use3xi32(<3 x i32> %t0)			call void @use3xi32(<3 x i32> %t0)
	Show All 12 Lines
	; CHECK-NEXT: [[T2:%.*]] = add nsw <3 x i32> [[T1]], <i32 -1, i32 -1, i32 -1>			; CHECK-NEXT: [[T2:%.*]] = add nsw <3 x i32> [[T1]], <i32 -1, i32 -1, i32 -1>
	; CHECK-NEXT: [[T3:%.]] = and <3 x i32> [[T2]], [[X:%.]]			; CHECK-NEXT: [[T3:%.]] = and <3 x i32> [[T2]], [[X:%.]]
	; CHECK-NEXT: [[T4:%.*]] = sub <3 x i32> <i32 33, i32 32, i32 32>, [[NBITS]]			; CHECK-NEXT: [[T4:%.*]] = sub <3 x i32> <i32 33, i32 32, i32 32>, [[NBITS]]
	; CHECK-NEXT: call void @use3xi32(<3 x i32> [[T0]])			; CHECK-NEXT: call void @use3xi32(<3 x i32> [[T0]])
	; CHECK-NEXT: call void @use3xi32(<3 x i32> [[T1]])			; CHECK-NEXT: call void @use3xi32(<3 x i32> [[T1]])
	; CHECK-NEXT: call void @use3xi32(<3 x i32> [[T2]])			; CHECK-NEXT: call void @use3xi32(<3 x i32> [[T2]])
	; CHECK-NEXT: call void @use3xi32(<3 x i32> [[T3]])			; CHECK-NEXT: call void @use3xi32(<3 x i32> [[T3]])
	; CHECK-NEXT: call void @use3xi32(<3 x i32> [[T4]])			; CHECK-NEXT: call void @use3xi32(<3 x i32> [[T4]])
	; CHECK-NEXT: [[T5:%.*]] = shl <3 x i32> [[T3]], [[T4]]			; CHECK-NEXT: [[T5:%.*]] = shl <3 x i32> [[X]], [[T4]]
	; CHECK-NEXT: ret <3 x i32> [[T5]]			; CHECK-NEXT: ret <3 x i32> [[T5]]
	;			;
	%t0 = add <3 x i32> %nbits, <i32 -1, i32 0, i32 1>			%t0 = add <3 x i32> %nbits, <i32 -1, i32 0, i32 1>
	%t1 = shl <3 x i32> <i32 1, i32 1, i32 1>, %t0			%t1 = shl <3 x i32> <i32 1, i32 1, i32 1>, %t0
	%t2 = add nsw <3 x i32> %t1, <i32 -1, i32 -1, i32 -1>			%t2 = add nsw <3 x i32> %t1, <i32 -1, i32 -1, i32 -1>
	%t3 = and <3 x i32> %t2, %x			%t3 = and <3 x i32> %t2, %x
	%t4 = sub <3 x i32> <i32 33, i32 32, i32 32>, %nbits			%t4 = sub <3 x i32> <i32 33, i32 32, i32 32>, %nbits
	call void @use3xi32(<3 x i32> %t0)			call void @use3xi32(<3 x i32> %t0)
	Show All 11 Lines
	; CHECK-NEXT: [[T2:%.*]] = add nsw <3 x i32> [[T1]], <i32 -1, i32 undef, i32 -1>			; CHECK-NEXT: [[T2:%.*]] = add nsw <3 x i32> [[T1]], <i32 -1, i32 undef, i32 -1>
	; CHECK-NEXT: [[T3:%.]] = and <3 x i32> [[T2]], [[X:%.]]			; CHECK-NEXT: [[T3:%.]] = and <3 x i32> [[T2]], [[X:%.]]
	; CHECK-NEXT: [[T4:%.*]] = sub <3 x i32> <i32 32, i32 undef, i32 32>, [[NBITS]]			; CHECK-NEXT: [[T4:%.*]] = sub <3 x i32> <i32 32, i32 undef, i32 32>, [[NBITS]]
	; CHECK-NEXT: call void @use3xi32(<3 x i32> [[NBITS]])			; CHECK-NEXT: call void @use3xi32(<3 x i32> [[NBITS]])
	; CHECK-NEXT: call void @use3xi32(<3 x i32> [[T1]])			; CHECK-NEXT: call void @use3xi32(<3 x i32> [[T1]])
	; CHECK-NEXT: call void @use3xi32(<3 x i32> [[T2]])			; CHECK-NEXT: call void @use3xi32(<3 x i32> [[T2]])
	; CHECK-NEXT: call void @use3xi32(<3 x i32> [[T3]])			; CHECK-NEXT: call void @use3xi32(<3 x i32> [[T3]])
	; CHECK-NEXT: call void @use3xi32(<3 x i32> [[T4]])			; CHECK-NEXT: call void @use3xi32(<3 x i32> [[T4]])
	; CHECK-NEXT: [[T5:%.*]] = shl <3 x i32> [[T3]], [[T4]]			; CHECK-NEXT: [[T5:%.*]] = shl <3 x i32> [[X]], [[T4]]
	; CHECK-NEXT: ret <3 x i32> [[T5]]			; CHECK-NEXT: ret <3 x i32> [[T5]]
	;			;
	%t0 = add <3 x i32> %nbits, <i32 0, i32 undef, i32 0>			%t0 = add <3 x i32> %nbits, <i32 0, i32 undef, i32 0>
	%t1 = shl <3 x i32> <i32 1, i32 undef, i32 1>, %t0			%t1 = shl <3 x i32> <i32 1, i32 undef, i32 1>, %t0
	%t2 = add nsw <3 x i32> %t1, <i32 -1, i32 undef, i32 -1>			%t2 = add nsw <3 x i32> %t1, <i32 -1, i32 undef, i32 -1>
	%t3 = and <3 x i32> %t2, %x			%t3 = and <3 x i32> %t2, %x
	%t4 = sub <3 x i32> <i32 32, i32 undef, i32 32>, %nbits			%t4 = sub <3 x i32> <i32 32, i32 undef, i32 32>, %nbits
	call void @use3xi32(<3 x i32> %t0)			call void @use3xi32(<3 x i32> %t0)
	Show All 15 Lines
	; CHECK-NEXT: [[T0:%.]] = shl i32 1, [[NBITS:%.]]			; CHECK-NEXT: [[T0:%.]] = shl i32 1, [[NBITS:%.]]
	; CHECK-NEXT: [[T1:%.*]] = add nsw i32 [[T0]], -1			; CHECK-NEXT: [[T1:%.*]] = add nsw i32 [[T0]], -1
	; CHECK-NEXT: [[T2:%.*]] = and i32 [[X]], [[T1]]			; CHECK-NEXT: [[T2:%.*]] = and i32 [[X]], [[T1]]
	; CHECK-NEXT: [[T3:%.*]] = sub i32 32, [[NBITS]]			; CHECK-NEXT: [[T3:%.*]] = sub i32 32, [[NBITS]]
	; CHECK-NEXT: call void @use32(i32 [[T0]])			; CHECK-NEXT: call void @use32(i32 [[T0]])
	; CHECK-NEXT: call void @use32(i32 [[T1]])			; CHECK-NEXT: call void @use32(i32 [[T1]])
	; CHECK-NEXT: call void @use32(i32 [[T2]])			; CHECK-NEXT: call void @use32(i32 [[T2]])
	; CHECK-NEXT: call void @use32(i32 [[T3]])			; CHECK-NEXT: call void @use32(i32 [[T3]])
	; CHECK-NEXT: [[T4:%.*]] = shl i32 [[T2]], [[T3]]			; CHECK-NEXT: [[T4:%.*]] = shl i32 [[X]], [[T3]]
	; CHECK-NEXT: ret i32 [[T4]]			; CHECK-NEXT: ret i32 [[T4]]
	;			;
	%x = call i32 @gen32()			%x = call i32 @gen32()
	%t0 = shl i32 1, %nbits			%t0 = shl i32 1, %nbits
	%t1 = add nsw i32 %t0, -1			%t1 = add nsw i32 %t0, -1
	%t2 = and i32 %x, %t1 ; swapped			%t2 = and i32 %x, %t1 ; swapped
	%t3 = sub i32 32, %nbits			%t3 = sub i32 32, %nbits
	call void @use32(i32 %t0)			call void @use32(i32 %t0)
	▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[T4:%.*]] = and i32 [[T3]], [[T1]]			; CHECK-NEXT: [[T4:%.*]] = and i32 [[T3]], [[T1]]
	; CHECK-NEXT: [[T5:%.*]] = sub i32 32, [[NBITS1]]			; CHECK-NEXT: [[T5:%.*]] = sub i32 32, [[NBITS1]]
	; CHECK-NEXT: call void @use32(i32 [[T0]])			; CHECK-NEXT: call void @use32(i32 [[T0]])
	; CHECK-NEXT: call void @use32(i32 [[T1]])			; CHECK-NEXT: call void @use32(i32 [[T1]])
	; CHECK-NEXT: call void @use32(i32 [[T2]])			; CHECK-NEXT: call void @use32(i32 [[T2]])
	; CHECK-NEXT: call void @use32(i32 [[T3]])			; CHECK-NEXT: call void @use32(i32 [[T3]])
	; CHECK-NEXT: call void @use32(i32 [[T4]])			; CHECK-NEXT: call void @use32(i32 [[T4]])
	; CHECK-NEXT: call void @use32(i32 [[T5]])			; CHECK-NEXT: call void @use32(i32 [[T5]])
	; CHECK-NEXT: [[T6:%.*]] = shl i32 [[T4]], [[T5]]			; CHECK-NEXT: [[T6:%.*]] = shl i32 [[T1]], [[T5]]
	; CHECK-NEXT: ret i32 [[T6]]			; CHECK-NEXT: ret i32 [[T6]]
	;			;
	%t0 = shl i32 1, %nbits0			%t0 = shl i32 1, %nbits0
	%t1 = add nsw i32 %t0, -1			%t1 = add nsw i32 %t0, -1
	%t2 = shl i32 1, %nbits1			%t2 = shl i32 1, %nbits1
	%t3 = add nsw i32 %t2, -1			%t3 = add nsw i32 %t2, -1
	%t4 = and i32 %t3, %t1 ; both hands of 'and' could be mask..			%t4 = and i32 %t3, %t1 ; both hands of 'and' could be mask..
	%t5 = sub i32 32, %nbits1			%t5 = sub i32 32, %nbits1
	Show All 14 Lines
	; CHECK-NEXT: [[T0:%.]] = shl i32 1, [[NBITS:%.]]			; CHECK-NEXT: [[T0:%.]] = shl i32 1, [[NBITS:%.]]
	; CHECK-NEXT: [[T1:%.*]] = add nsw i32 [[T0]], -1			; CHECK-NEXT: [[T1:%.*]] = add nsw i32 [[T0]], -1
	; CHECK-NEXT: [[T2:%.]] = and i32 [[T1]], [[X:%.]]			; CHECK-NEXT: [[T2:%.]] = and i32 [[T1]], [[X:%.]]
	; CHECK-NEXT: [[T3:%.*]] = sub i32 32, [[NBITS]]			; CHECK-NEXT: [[T3:%.*]] = sub i32 32, [[NBITS]]
	; CHECK-NEXT: call void @use32(i32 [[T0]])			; CHECK-NEXT: call void @use32(i32 [[T0]])
	; CHECK-NEXT: call void @use32(i32 [[T1]])			; CHECK-NEXT: call void @use32(i32 [[T1]])
	; CHECK-NEXT: call void @use32(i32 [[T2]])			; CHECK-NEXT: call void @use32(i32 [[T2]])
	; CHECK-NEXT: call void @use32(i32 [[T3]])			; CHECK-NEXT: call void @use32(i32 [[T3]])
	; CHECK-NEXT: [[T4:%.*]] = shl nuw i32 [[T2]], [[T3]]			; CHECK-NEXT: [[T4:%.*]] = shl i32 [[X]], [[T3]]
	; CHECK-NEXT: ret i32 [[T4]]			; CHECK-NEXT: ret i32 [[T4]]
	;			;
	%t0 = shl i32 1, %nbits			%t0 = shl i32 1, %nbits
	%t1 = add nsw i32 %t0, -1			%t1 = add nsw i32 %t0, -1
	%t2 = and i32 %t1, %x			%t2 = and i32 %t1, %x
	%t3 = sub i32 32, %nbits			%t3 = sub i32 32, %nbits
	call void @use32(i32 %t0)			call void @use32(i32 %t0)
	call void @use32(i32 %t1)			call void @use32(i32 %t1)
	call void @use32(i32 %t2)			call void @use32(i32 %t2)
	call void @use32(i32 %t3)			call void @use32(i32 %t3)
	%t4 = shl nuw i32 %t2, %t3			%t4 = shl nuw i32 %t2, %t3
	ret i32 %t4			ret i32 %t4
	}			}

	define i32 @t10_nsw(i32 %x, i32 %nbits) {			define i32 @t10_nsw(i32 %x, i32 %nbits) {
	; CHECK-LABEL: @t10_nsw(			; CHECK-LABEL: @t10_nsw(
	; CHECK-NEXT: [[T0:%.]] = shl i32 1, [[NBITS:%.]]			; CHECK-NEXT: [[T0:%.]] = shl i32 1, [[NBITS:%.]]
	; CHECK-NEXT: [[T1:%.*]] = add nsw i32 [[T0]], -1			; CHECK-NEXT: [[T1:%.*]] = add nsw i32 [[T0]], -1
	; CHECK-NEXT: [[T2:%.]] = and i32 [[T1]], [[X:%.]]			; CHECK-NEXT: [[T2:%.]] = and i32 [[T1]], [[X:%.]]
	; CHECK-NEXT: [[T3:%.*]] = sub i32 32, [[NBITS]]			; CHECK-NEXT: [[T3:%.*]] = sub i32 32, [[NBITS]]
	; CHECK-NEXT: call void @use32(i32 [[T0]])			; CHECK-NEXT: call void @use32(i32 [[T0]])
	; CHECK-NEXT: call void @use32(i32 [[T1]])			; CHECK-NEXT: call void @use32(i32 [[T1]])
	; CHECK-NEXT: call void @use32(i32 [[T2]])			; CHECK-NEXT: call void @use32(i32 [[T2]])
	; CHECK-NEXT: call void @use32(i32 [[T3]])			; CHECK-NEXT: call void @use32(i32 [[T3]])
	; CHECK-NEXT: [[T4:%.*]] = shl nsw i32 [[T2]], [[T3]]			; CHECK-NEXT: [[T4:%.*]] = shl i32 [[X]], [[T3]]
	; CHECK-NEXT: ret i32 [[T4]]			; CHECK-NEXT: ret i32 [[T4]]
	;			;
	%t0 = shl i32 1, %nbits			%t0 = shl i32 1, %nbits
	%t1 = add nsw i32 %t0, -1			%t1 = add nsw i32 %t0, -1
	%t2 = and i32 %t1, %x			%t2 = and i32 %t1, %x
	%t3 = sub i32 32, %nbits			%t3 = sub i32 32, %nbits
	call void @use32(i32 %t0)			call void @use32(i32 %t0)
	call void @use32(i32 %t1)			call void @use32(i32 %t1)
	call void @use32(i32 %t2)			call void @use32(i32 %t2)
	call void @use32(i32 %t3)			call void @use32(i32 %t3)
	%t4 = shl nsw i32 %t2, %t3			%t4 = shl nsw i32 %t2, %t3
	ret i32 %t4			ret i32 %t4
	}			}

	define i32 @t11_nuw_nsw(i32 %x, i32 %nbits) {			define i32 @t11_nuw_nsw(i32 %x, i32 %nbits) {
	; CHECK-LABEL: @t11_nuw_nsw(			; CHECK-LABEL: @t11_nuw_nsw(
	; CHECK-NEXT: [[T0:%.]] = shl i32 1, [[NBITS:%.]]			; CHECK-NEXT: [[T0:%.]] = shl i32 1, [[NBITS:%.]]
	; CHECK-NEXT: [[T1:%.*]] = add nsw i32 [[T0]], -1			; CHECK-NEXT: [[T1:%.*]] = add nsw i32 [[T0]], -1
	; CHECK-NEXT: [[T2:%.]] = and i32 [[T1]], [[X:%.]]			; CHECK-NEXT: [[T2:%.]] = and i32 [[T1]], [[X:%.]]
	; CHECK-NEXT: [[T3:%.*]] = sub i32 32, [[NBITS]]			; CHECK-NEXT: [[T3:%.*]] = sub i32 32, [[NBITS]]
	; CHECK-NEXT: call void @use32(i32 [[T0]])			; CHECK-NEXT: call void @use32(i32 [[T0]])
	; CHECK-NEXT: call void @use32(i32 [[T1]])			; CHECK-NEXT: call void @use32(i32 [[T1]])
	; CHECK-NEXT: call void @use32(i32 [[T2]])			; CHECK-NEXT: call void @use32(i32 [[T2]])
	; CHECK-NEXT: call void @use32(i32 [[T3]])			; CHECK-NEXT: call void @use32(i32 [[T3]])
	; CHECK-NEXT: [[T4:%.*]] = shl nuw nsw i32 [[T2]], [[T3]]			; CHECK-NEXT: [[T4:%.*]] = shl i32 [[X]], [[T3]]
	; CHECK-NEXT: ret i32 [[T4]]			; CHECK-NEXT: ret i32 [[T4]]
	;			;
	%t0 = shl i32 1, %nbits			%t0 = shl i32 1, %nbits
	%t1 = add nsw i32 %t0, -1			%t1 = add nsw i32 %t0, -1
	%t2 = and i32 %t1, %x			%t2 = and i32 %t1, %x
	%t3 = sub i32 32, %nbits			%t3 = sub i32 32, %nbits
	call void @use32(i32 %t0)			call void @use32(i32 %t0)
	call void @use32(i32 %t1)			call void @use32(i32 %t1)
	▲ Show 20 Lines • Show All 82 Lines • Show Last 20 Lines