This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
7/7
LoopVectorize.cpp
2/3
VPlan.h
10/10
VPlanTransforms.h
172/178
VPlanTransforms.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
AArch64/
10/15
deterministic-type-shrinkage.ll
6/14
loop-vectorization-factors.ll
-
type-shrinkage-insertelt.ll
-
scalable-trunc-min-bitwidth.ll
2/3
trunc-shifts.ll

Differential D149903

[VPlan] Replace IR based truncateToMinimalBitwidths with VPlan version.
ClosedPublic

Authored by fhahn on May 4 2023, 2:07 PM.

Download Raw Diff

Details

Reviewers

Ayal
gilr
rengolin

Commits

rG70535f5e609f: [VPlan] Replace IR based truncateToMinimalBitwidths with VPlan version.

Summary

This patch replaces the IR based truncateToMinimalBitwidths with a VPlan
version. This has 2 benefits:

the VPlan-based version is simpler; we don't need to implement special codegen for each supported instruction type like the IR based one.
Removes a dependency on the cost-model after VPlan execution and
Removes a use of getVPValue that uses underlying values after VPlan execution (See removed FIXME).

Depends on D149081.

Depends on D149079.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Ayal added inline comments.Oct 4 2023, 4:03 PM

llvm/lib/Transforms/Vectorize/VPlan.h
281	How/Is this removal related?
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
886	(Future) Thought: wonder if instead of iterating over all live-ins looking to truncate any, it may be better to iterate over MinBWs and check if any are live-ins. Or lookup MinBWs upon construction of a live-in.
887	nit: use `LiveInInst` or something similar rather than `UI`?
893
900	Set once before the loop for all live-ins to be truncated.
909	Any order other than depth first would also do, right?
920	(Future) Thought: this is an awkward way of retrieving "the" recipe that corresponds to each member of MinBWs - look through all recipes for those having the desired "underlying" insn. Perhaps better lookup MinBWs upon construction of a recipe for an Instruction. Or migrate the analysis that builds MinBWs to run on VPlan.
921	nit: lookup.
926	Would be good to comment how memory and replicate cases are (not) processed.
932	Better assert than continue? Here ProcessedRecipes was already bumped, but should all MinBWs members correspond to Integer types, of distinct (smaller) size, whether live-in or not?
942	This deals only with ZExt/SExt, easier to check directly if Opcode is one or the other? OTOH, better handle Trunc here as well? Is it handled well below?
946	`// SExt/Zext is redundant - stick with its operand.` ?
953	Place assert earlier?
955–956
967	This means the size of all operands is equal to NewResSizeInBits, can this be?
971–972	nit: keep consistent with above.
980–982	nit: keep consistent with above.
llvm/lib/Transforms/Vectorize/VPlanTransforms.h
72	nit: a VPlan transform should fold redundant ZExt-Trunc pairs rather than leaving them ("as hints") to `InstCombine`. Being a public method, which does not need SE, should the caller of optimize() precede its call with a direct call to trunctateToMinimalBitwidth(), rather than pass MinBWs to optimize()?

Address latest comments, apologies for the delay!

llvm/lib/Transforms/Vectorize/VPlan.h
281	The last user of this function has been removed in the patch.
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
876–877	Code has been moved to D159202
879–881	code has been moved to D159202
885	Wrapped and added comment, thanks!
887	Renamed, thanks!
888	Updated, thanks!
893	Adjusted, thanks!
894	Turned into assert, thanks!
900	hoisted, thanks!
909	Yes, I think the order doesn't matter here.
921	Done, thanks!
926	Added a comment, thanks!
932	Turned `isIntegerTy` into assert but retained size check as there entries where the sizes are the same (e.g. for `truncs`).
942	Thanks, changed to `if`. I don't think Trunc is handled explicitly in the latest version.
946	this check has been moved up and is not needed any longer.
953	moved up,, thanks!
955–956	adjusted, thanks!
967	There are cases where a Zext narrowed earlier is used as operand here, so the tie is already adjusted.
971–972	Adjusted, thanks!
980–982	reordered, thanks!

Harbormaster completed remote builds in B257842: Diff 557740.Oct 17 2023, 1:22 PM

Various comments, also trying to reason about how this patch changes tests.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3443	Retain a comment explaining why replicate recipes are not truncated?
3482	Retain this comment regarding dropping wrapping flags?
3497	A Trunc is handled by shrinking its operand.
3522	(If nothing is done to the operands, what is the result extended too?)
llvm/lib/Transforms/Vectorize/VPlan.h
281	Very well!
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
755	Thought: worth introducing as a member of VPValue, to be overridden by VPWidenCastRecipe? Note that this is Element/Scalar Type. Effectively adding scalar type info to all VPValues? Might be good to investigate separately, although the current use-cases would probably be very limited Very well.
780	Thought: could/should each MinBW be attached to its recipe asap - when the latter is created, considering it depends on associated underlying instruction? Might be a potential follow-up, but we would still potentially updated MinBWs on each recipe replacement? Sure, like updating any other property of a recipe when replaced.
796	Agreed - MinBW should specify a consistent minimal bit width for all users, and for all operands, but there seems to be some discrepancy that is confusing: A. Instructions whose operands and return value are all of a single type (excluding condition operand of selects) are converted to operate on a narrower type by (a) shrinking their operands to the narrower type and (b) extending their result from the narrower type to their original type. Instructions that feed values to such instructions or use their values, continue to feed and use values of the original type. A pair of such instructions where one feeds the other will be added a zext-trunc pair between them which will later be folded. B. Instructions that convert between two distinct types, continue to digest the original source type but are updated to produce values of the new destination type. Their users, when reached subsequently, need to check if any of their operands have been narrowed. But if this is the case, why bother expanding results in (b) above? OTOH, the narrowed results of conversion instructions can also be expanded (to be folded later), keeping the treatment consistent? Always expecting the new type to be strictly smaller than the current one. Perhaps conversion instructions could be skipped now and handled by subsequent folding pass - looking for trunc-trunc and sext-trunc pairs in addition to zext-trunc ones? C. Loads are ignored - excluded from MiinBWs? They could potentially be narrowed to load only the required bits, though its unclear if a strided narrow load is better than a unit-strided wider load and trunc - as in an interleave-group(?) D. Phis are ignored - excluded from MinBWs. Truncated header induction phi's are handled separately. Other phi's may deserve narrowing(?)
885	Suffice to ask `if (!NewResSizeInBits)`?
886	Thoughts about the above? Hopefully avoids exposing getLiveIns(), at the expense of holding a mapping between Values and LiveIns, as in LiveOuts.
889	assert "MinBW member must be integer" rather than continue - thereby skipping a MinBW member.
905	Can skip phi's, none are included in MinBWs.
906	Are any loads included in MinBWs, or is this dead code? Stores of course are irrelevant.
919	Suffice to ask `if (!NewResSizeInBits)`?
920	Thoughts about the above?
928	Should replicate recipes be handled next to handling widen memory recipes above?
932	nit: `ResTy` >> `OldResTy`, `ResSizeInBits` >> `OldResSizeInBits`
935	`assert(ResSizeInBits > NewResSizeInBits && "Nothing to shrink?");` here instead of below?
941	nit: `VPC` >> `OldExt`, `Opc` >> `OldOpc`?
942	Does Trunc (which can truncate to a smaller bitwidth) implicitly fall through and has its operand shrunk to the smaller bitwidth, effectively turning it into a ZExt?
945	Comment is obsolete here - dealt with new type being equal to operand type, which should result in replacing the SExt/ZExt with its operand, see below.
946	?
950	nit: `C` >> `NewCast`? If getTypeSizeInBits(Op) == NewResSizeInBits should C be set to Op (w/o inserting it) instead of creating a redundant cast?
967	Maybe worth a comment.
llvm/lib/Transforms/Vectorize/VPlanTransforms.h
72	Thoughts on the above? Better truncate to minimal bitwidth asap, as it relies on IR information? Conceptually a scalar transform. Does "as hints to InstCombine" below still hold?
llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
41–42	hmm, we now spot the redundant duplicate zext of WIDE_LOAD from <16 x i8> to <16 x i16>, originally both TMP4 and TMP10.
68	Spotted and removed duplicate zext of WIDE_LOAD8.
159	This testcase stores the 2nd least significant byte of a 32b product (of two invariant values, one 16b and the other 32b) checking that computing 16b product suffices. But more optimizations should take place: the expansion of the multipliers to 32b should be eliminated (along with their truncation to 16b), and the invariant multiplication-lshr-trunc sequence should be hoisted out of the loop.
167	BROADCAST_SPLAT is (still) trunc'ed twice due to UF=2?
168	Both insertelement's now use poison.
176	BROADCAST_SPLAT2 is (still) trunc'ed twice due to UF=2?
llvm/test/Transforms/LoopVectorize/AArch64/loop-vectorization-factors.ll
302	We now fold a trunc-zext of zext'ed WIDE_LOAD from <16 x i16> => <16 x i32> => <16 x i16>, but fail to fold a similar one following the add-2's?
330	We now get rid of a pair of <8 x i16> => <8 x i32> => <8 x i16> before the add-2's (so this is not an NFC patch), but still retain the pair of <8 x i16> => <8 x i32> => <8 x i16> after it - missed MinBW/trunc-zext opportunity?
474–475	Hmm, before we narrowed these two sufflevectors to operate on <16 x i8> and zext-trunc their result, now we let them operate on original <16 x i32> and truncate the result?
487	Many zext-trunc pairs left to collect.
513	Above trunc of TMP2 is redundant along with its zext in the ph.
520	Above trunc of TMP4 is redundant along with its zext in the ph.
llvm/test/Transforms/LoopVectorize/trunc-shifts.ll
334	We now get rid of a pair of <4 x i16> => <4 x i32> => <4 x i16> before the lshr (so this is not an NFC patch), but still retain the pair/triple of <4 x i16> => <4 x i32> => <4 x i16> => <4 x i8> after it - missed MinBW opportunity?

fhahn mentioned this in rG0c8e5be6fa08: [VPlan] Simplify redundant trunc (zext A) pairs to A..Oct 22 2023, 3:42 AM

fhahn mentioned this in rG6f3b88baa2ac: [VPlan] Move trunc ([s|z]ext A) simplifications to simplifyRecipe..Nov 16 2023, 1:17 PM

Address comments and major simplification after moving cast folding to simplifyRecipes.

Hope all comments should be addressed, hope i didn't miss any.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3443	Retained when skipping VPReplicateRecipe.
3482	Done, thanks!
3522	It stays the same, there's no extend in that case.
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
755	This has been updated to now use VPTypeAnalysis.
796	The latest version doesn't have special treatment for casts, they remain unchanged and VPlan recipe simplification will take care of folding them if possible.
885	This code has now been removed; LiveIns are handled when truncating the other operands of an instruction; otherwise we leave the type info in an inconsistent state.
886	LiveIns are now handled directly when truncating other operands; getLiveIns has been removed.
889	Turned into an assert, thanks!
905	There's an early continue now that skips phis and other unsupported recipes.
906	Nope, looks like this is not needed in the latest version.
919	Simplified, thanks!
920	I think it would be best to have the analysis based on VPlan. Building MinBWs early would probably require extra work to update/invalidate it during transforms.
928	We still need to count them for verification

fhahn added inline comments.Nov 16 2023, 2:15 PM

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
932	Renamed, thanks!
935	Done, and also removed continue
941	This code is now gone, handled by recipe simplification.
945	Code is gone now
946	Code now gone.
950	Code gone now.
llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
159	still more work to do :) Arguably the invariant instructions are artificial, in the regular pipeline, no invariant instructions should remain.
167	The latest version avoids truncating the same value twice.
168	I think the use of undef is a leftover that wasn't updated; it should be poison.
176	The latest version avoids truncating the same value twice.
llvm/test/Transforms/LoopVectorize/AArch64/loop-vectorization-factors.ll
302	folding now happens all in simplifyRecieps, should handle this now
474–475	I think there's nothing we can do about that; we first need to splat the value when generating code, but InstCombine should take care of that.
487	Should be better cleaned up now
llvm/test/Transforms/LoopVectorize/trunc-shifts.ll
334	trunc/ext pairs should be better cleaned up in the latest version

Harbormaster completed remote builds in B258087: Diff 558116.Nov 16 2023, 6:49 PM

Looks much simpler! Minor last nits.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
874	nit: are these still hints to InstCombine, or to subsequent VPlan cleanups?
883–886	?
909	But a (more) expensive RPOT order is needed, to handle defs before uses?
933	Is it possible for MinBWs not to contain Op's live-in IR value in this case?
967–969	nit

Address latest comments, thanks!

fhahn added inline comments.Nov 23 2023, 4:10 AM

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
874	Updated, thanks!
883–886	Simplified , thanks!
909	The latest version should not need RPO, as the bit width of the results do not change for any user (previously they might due to early cast simplifications). Changed to depth first.
933	Yes, MinBWs only contains instructions, but not other values like arguments. Added a clarifying assert.
967–969	Done thanks! This also limits the scope of TypeInfo to the range where it is valid. after `truncateToMinimalBitwidths, we would need to invalidate the info for the modified recipes otherwise. This can be done in the future.

Harbormaster completed remote builds in B258119: Diff 558159.Nov 23 2023, 4:59 AM

ping :)

Ayal added inline comments.Nov 29 2023, 9:23 AM

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
780	Just noting potential follow-up, possibly as a TODO somewhere: attach each MinBW to its recipe when the latter is created, supplementing its underlying inst.
880	nit: `ProcessedRecipesNum`?
880	`ProcessedTruncs` is used outside ifdef below, move its definition out of ifdef here? Or is it meant to ensure truncated operands are counted once by ProcessedRecipes for debugging only? If an operand is truncated multiple times, all its truncations must be to the same size, because "MinBW should specify a consistent minimal bit width for all users(, and for all operands)"? Worth explaining why processed truncs are recorded.
883	Should `PH` be skipped? Trying to shrink the (live-in) operands of recipes in PH will insert them at the end of PH...
886	Shrunk operands are placed before R, but its extension is placed after - and calls for this make_early_inc_range, right?
902	Just note that the counting of ProcessedRecipes may miss casts that fail to be processed later.
907	Does `OldResSizeInBits` equal to the size of `OldResTy`, for the non-cast Widen or Select `R`?
924	`Ins`? Perhaps `ProcessedTrunc`?
925	Handle the simple if !ins.second /* Op already processed */ case first, potentially early-continuing? Clearer to check if ProcessedTruncs.lookup(Op) or if ProcessedTruncs.contains(Op) and if so use ProcessedTruncs[Op], otherwise insert it?
928	nit: place simpler if !isLiveIn case first?
930–933	nit
938	Note that truncations of live-ins could also be inserted before R, thereby leaving the treatment of live-ins to debugging only, and leaving their LICM and commoning to a subsequent VPlan cleanup pass, along with trunc-zext foldings.
967–969	Very well. Worth commenting that `TypeInfo` should not be used following truncateToMinimalBitwidths.
llvm/lib/Transforms/Vectorize/VPlanTransforms.h
72	WDYT on the above: should the caller of optimize() precede its call with a direct call to trunctateToMinimalBitwidth(), rather than pass MinBWs to optimize()?

Rebase and address latest comments, thanks!

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
880	Changed to `NumProcessedRecipes`
880	It's to re-use previously generated truncates. Note that we cannot RAUW after creating the new truncate, as this may make other uses not well typed (until they are processed and all their operands are truncated) Moved out of ifdef
883	Good point, there should be nothing to shrink in PH for now, as the analysis is for the loop body only, adjusted!
886	Yep
902	Do you mean updating the comment here or just a general note? We need to include the recipes in the count, otherwise the verification later will fail
907	Yes, I forgot to remove this use of IR `getType`. Updated to use `TypeInfo.inferScalarType(ResultVPV)` and then `getScalarSizeInBits` of the returned type.
924	Updated, thanks!
925	Early continue would mean duplicating the code to update the operands, I left things for now as is, including using `insert`. `insert` means we only need to lookup the insert-pos once, vs 2 lookups with separate `lookup` and then `[]. WDYT?
928	Done, thanks!
938	Yep, for now it is simpler and results in a smaller test diff to do it directly there as it is not only LICM but also very simple CSE
967–969	Sunk further into truncateToMinimualBitwidths
llvm/lib/Transforms/Vectorize/VPlanTransforms.h
72	Sounds good, updated, thanks!

Ayal added inline comments.Nov 29 2023, 2:23 PM

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
880	Note that we cannot RAUW after creating the new truncate, as this may make other uses not well typed (until they are processed and all their operands are truncated) Very well, may deserve a comment.
902	I mean we count casts as if they are processed, expecting they will be later, w/o checking that they actually do.
907	Ah, ok, wondered if using the size of the type of `UI` directly would be simpler?
911	Should be the same `Ctx` passed in as parameter?
925	OK, WDYT of the something as follows: auto [ProcessedIter, DidNotExist] = ProcessedTruncs.insert({Op, nullptr}); VPWidenCastRecipe NewOp = DidNotExist ? new VPWidenCastRecipe(Instruction::Trunc, Op, NewResTy) : ProcessedIter->second; R.setOperand(Idx, NewOp); if (!DidNotExist) continue; ProcessedIter->second = NewOp; if (!Op->isLiveIn()) { Shrunk->insertBefore(&R); } else { PH->appendRecipe(Shrunk); #ifndef NDEBUG auto OpInst = dyn_cast<Instruction>(Op->getLiveInIRValue()); bool IsContained = MinBWs.contains(OpInst); assert((!OpInst \|\| IsContained) && "All processed instructions should be contained in MinBWs."); NumProcessedRecipes += IsContained; #endif }
965–971	nit: redundant move of empty line?
llvm/lib/Transforms/Vectorize/VPlanTransforms.h
84

Ayal added inline comments.Nov 30 2023, 12:05 AM

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

925

Maybe IterIsEmpty would be a better name, to avoid double negation, as in:

        auto [ProcessedIter, IterIsEmpty] = ProcessedTruncs.insert({Op, nullptr});
        VPWidenCastRecipe *NewOp = IterIsEmpty ? new VPWidenCastRecipe(Instruction::Trunc, Op, NewResTy)
                                               : ProcessedIter->second;
        R.setOperand(Idx, NewOp);
        if (!IterIsEmpty)
          continue;
        ProcessedIter->second = NewOp;
        if (!Op->isLiveIn()) {
          NewOp->insertBefore(&R);
        } else {
          PH->appendRecipe(NewOp);
#ifndef NDEBUG
          auto *OpInst = dyn_cast<Instruction>(Op->getLiveInIRValue());
          bool IsContained = MinBWs.contains(OpInst);
          assert((!OpInst || IsContained) &&
                 "All processed instructions should be contained in MinBWs.");
          NumProcessedRecipes += IsContained;
#endif
        }

Addressed latest comments, thanks!

Harbormaster completed remote builds in B258145: Diff 558195.Nov 30 2023, 4:04 AM

fhahn added inline comments.Nov 30 2023, 5:14 AM

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
880	Added a comment to ProcessedTruncs definition.
902	They don't need handling explicitly, as redundant casts will be removed later. Expanded the comment slightly to Also skip casts which do not need to be handled explicitly here, as redundant casts will be removed during recipe simplification.
907	It might be slightly simpler, but would mean this may lead to a crash further down the line, once we support recipes without underlying values/instructions (and we forget to update this line) and/or if some other transform adjusted the type. Left as is for now
911	Yes, fixed!
965–971	changed back, thanks!
llvm/lib/Transforms/Vectorize/VPlanTransforms.h
84	Fixed, thanks!

This looks good to me, thanks for accommodating!
Adding a minor redundancy spotted plus some test related notes.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
949	redundant - hoist above the early-continue.
llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
167	Duplicated TMP0 and TMP1 still here?
176	Still seeing duplicate TMP2 and TMP3?
194–196	Trunc & insertelement LICM'd from vec.epilog.vector.body to vec.epilog.ph.
249–250	ditto.
llvm/test/Transforms/LoopVectorize/AArch64/loop-vectorization-factors.ll
30–31	Fold zext-trunc pair, several such cases follow.
302	The one following the add-2's is also folded now.
330	Other pair also folded now.
474–475	Worth testing with a subsequent instCombine, to ensure pessimization is avoided?
487	Indeed looks like it!
llvm/test/Transforms/LoopVectorize/trunc-shifts.ll
334	Indeed!

This revision is now accepted and ready to land.Nov 30 2023, 5:22 AM

Closed by commit rG70535f5e609f: [VPlan] Replace IR based truncateToMinimalBitwidths with VPlan version. (authored by fhahn). · Explain WhyDec 2 2023, 8:13 AM

This revision was automatically updated to reflect the committed changes.

fhahn added a commit: rG70535f5e609f: [VPlan] Replace IR based truncateToMinimalBitwidths with VPlan version..

fhahn marked 2 inline comments as done.Dec 2 2023, 8:15 AM

fhahn added inline comments.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
949	Fixed in the committed version, thanks!
llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
167	They were due to redundant casts being added for Live-in values, fixed by checking in VPWidenCastRecipe::execute for now, with a FIXME to address this with explicit unrolling.

This triggers failed asserts, see https://github.com/llvm/llvm-project/issues/74231.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

149 lines

4 lines

3 lines

97 lines

test/

Transforms/

LoopVectorize/

AArch64/

deterministic-type-shrinkage.ll

139 lines

loop-vectorization-factors.ll

388 lines

type-shrinkage-insertelt.ll

76 lines

scalable-trunc-min-bitwidth.ll

10 lines

trunc-shifts.ll

14 lines

Diff 558116

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 638 Lines • ▼ Show 20 Lines	protected:

/// Create code for the loop exit value of the reduction.		/// Create code for the loop exit value of the reduction.
void fixReduction(VPReductionPHIRecipe *Phi, VPTransformState &State);		void fixReduction(VPReductionPHIRecipe *Phi, VPTransformState &State);

/// Iteratively sink the scalarized operands of a predicated instruction into		/// Iteratively sink the scalarized operands of a predicated instruction into
/// the block that was created for it.		/// the block that was created for it.
void sinkScalarOperands(Instruction *PredInst);		void sinkScalarOperands(Instruction *PredInst);

/// Shrinks vector element sizes to the smallest bitwidth they can be legally
/// represented as.
void truncateToMinimalBitwidths(VPTransformState &State);

/// Returns (and creates if needed) the trip count of the widened loop.		/// Returns (and creates if needed) the trip count of the widened loop.
Value getOrCreateVectorTripCount(BasicBlock InsertBlock);		Value getOrCreateVectorTripCount(BasicBlock InsertBlock);

/// Returns a bitcasted value to the requested vector type.		/// Returns a bitcasted value to the requested vector type.
/// Also handles bitcasts of vector<float> <-> vector<pointer> types.		/// Also handles bitcasts of vector<float> <-> vector<pointer> types.
Value createBitOrPointerCast(Value V, VectorType *DstVTy,		Value createBitOrPointerCast(Value V, VectorType *DstVTy,
const DataLayout &DL);		const DataLayout &DL);

▲ Show 20 Lines • Show All 2,769 Lines • ▼ Show 20 Lines
}		}

static Type largestIntegerVectorType(Type T1, Type *T2) {		static Type largestIntegerVectorType(Type T1, Type *T2) {
auto *I1 = cast<IntegerType>(cast<VectorType>(T1)->getElementType());		auto *I1 = cast<IntegerType>(cast<VectorType>(T1)->getElementType());
auto *I2 = cast<IntegerType>(cast<VectorType>(T2)->getElementType());		auto *I2 = cast<IntegerType>(cast<VectorType>(T2)->getElementType());
return I1->getBitWidth() > I2->getBitWidth() ? T1 : T2;		return I1->getBitWidth() > I2->getBitWidth() ? T1 : T2;
}		}

void InnerLoopVectorizer::truncateToMinimalBitwidths(VPTransformState &State) {
// For every instruction `I` in MinBWs, truncate the operands, create a
// truncated version of `I` and reextend its result. InstCombine runs
// later and will remove any ext/trunc pairs.
SmallPtrSet<Value *, 4> Erased;
for (const auto &KV : Cost->getMinimalBitwidths()) {
// If the value wasn't vectorized, we must maintain the original scalar
// type. The absence of the value from State indicates that it
AyalUnsubmitted Done Reply Inline Actions Retain a comment explaining why replicate recipes are not truncated? Ayal: Retain a comment explaining why replicate recipes are not truncated?
fhahnAuthorUnsubmitted Done Reply Inline Actions Retained when skipping VPReplicateRecipe. fhahn: Retained when skipping VPReplicateRecipe.
// wasn't vectorized.
// FIXME: Should not rely on getVPValue at this point.
VPValue *Def = State.Plan->getVPValue(KV.first, true);
if (!State.hasAnyVectorValue(Def))
continue;
// If the instruction is defined outside the loop, only update the first
// part; the first part will be re-used for all other parts.
unsigned UFToUse = OrigLoop->contains(KV.first) ? UF : 1;
for (unsigned Part = 0; Part < UFToUse; ++Part) {
Value *I = State.get(Def, Part);
if (Erased.count(I) \|\| I->use_empty() \|\| !isa<Instruction>(I))
continue;
Type *OriginalTy = I->getType();
Type *ScalarTruncatedTy =
IntegerType::get(OriginalTy->getContext(), KV.second);
auto *TruncatedTy = VectorType::get(
ScalarTruncatedTy, cast<VectorType>(OriginalTy)->getElementCount());
if (TruncatedTy == OriginalTy)
continue;

IRBuilder<> B(cast<Instruction>(I));
auto ShrinkOperand = [&](Value V) -> Value {
if (auto *ZI = dyn_cast<ZExtInst>(V))
if (ZI->getSrcTy() == TruncatedTy)
return ZI->getOperand(0);
return B.CreateZExtOrTrunc(V, TruncatedTy);
};

// The actual instruction modification depends on the instruction type,
// unfortunately.
Value *NewI = nullptr;
if (auto *BO = dyn_cast<BinaryOperator>(I)) {
Value *Op0 = ShrinkOperand(BO->getOperand(0));
Value *Op1 = ShrinkOperand(BO->getOperand(1));
NewI = B.CreateBinOp(BO->getOpcode(), Op0, Op1);

// Any wrapping introduced by shrinking this operation shouldn't be
// considered undefined behavior. So, we can't unconditionally copy
// arithmetic wrapping flags to NewI.
AyalUnsubmitted Done Reply Inline Actions Retain this comment regarding dropping wrapping flags? Ayal: Retain this comment regarding dropping wrapping flags?
fhahnAuthorUnsubmitted Done Reply Inline Actions Done, thanks! fhahn: Done, thanks!
cast<BinaryOperator>(NewI)->copyIRFlags(I, /IncludeWrapFlags=/false);
} else if (auto *CI = dyn_cast<ICmpInst>(I)) {
Value *Op0 = ShrinkOperand(BO->getOperand(0));
Value *Op1 = ShrinkOperand(BO->getOperand(1));
NewI = B.CreateICmp(CI->getPredicate(), Op0, Op1);
} else if (auto *SI = dyn_cast<SelectInst>(I)) {
Value *TV = ShrinkOperand(SI->getTrueValue());
Value *FV = ShrinkOperand(SI->getFalseValue());
NewI = B.CreateSelect(SI->getCondition(), TV, FV);
} else if (auto *CI = dyn_cast<CastInst>(I)) {
switch (CI->getOpcode()) {
default:
llvm_unreachable("Unhandled cast!");
case Instruction::Trunc:
NewI = ShrinkOperand(CI->getOperand(0));
AyalUnsubmitted Done Reply Inline Actions A Trunc is handled by shrinking its operand. Ayal: A Trunc is handled by shrinking its operand.
break;
case Instruction::SExt:
NewI = B.CreateSExtOrTrunc(
CI->getOperand(0),
smallestIntegerVectorType(OriginalTy, TruncatedTy));
break;
case Instruction::ZExt:
NewI = B.CreateZExtOrTrunc(
CI->getOperand(0),
smallestIntegerVectorType(OriginalTy, TruncatedTy));
break;
}
} else if (auto *SI = dyn_cast<ShuffleVectorInst>(I)) {
auto Elements0 =
cast<VectorType>(SI->getOperand(0)->getType())->getElementCount();
auto *O0 = B.CreateZExtOrTrunc(
SI->getOperand(0), VectorType::get(ScalarTruncatedTy, Elements0));
auto Elements1 =
cast<VectorType>(SI->getOperand(1)->getType())->getElementCount();
auto *O1 = B.CreateZExtOrTrunc(
SI->getOperand(1), VectorType::get(ScalarTruncatedTy, Elements1));

NewI = B.CreateShuffleVector(O0, O1, SI->getShuffleMask());
} else if (isa<LoadInst>(I) \|\| isa<PHINode>(I)) {
// Don't do anything with the operands, just extend the result.
AyalUnsubmitted Done Reply Inline Actions (If nothing is done to the operands, what is the result extended too?) Ayal: (If nothing is done to the operands, what is the result extended too?)
fhahnAuthorUnsubmitted Done Reply Inline Actions It stays the same, there's no extend in that case. fhahn: It stays the same, there's no extend in that case.
continue;
} else if (auto *IE = dyn_cast<InsertElementInst>(I)) {
auto Elements =
cast<VectorType>(IE->getOperand(0)->getType())->getElementCount();
auto *O0 = B.CreateZExtOrTrunc(
IE->getOperand(0), VectorType::get(ScalarTruncatedTy, Elements));
auto *O1 = B.CreateZExtOrTrunc(IE->getOperand(1), ScalarTruncatedTy);
NewI = B.CreateInsertElement(O0, O1, IE->getOperand(2));
} else if (auto *EE = dyn_cast<ExtractElementInst>(I)) {
auto Elements =
cast<VectorType>(EE->getOperand(0)->getType())->getElementCount();
auto *O0 = B.CreateZExtOrTrunc(
EE->getOperand(0), VectorType::get(ScalarTruncatedTy, Elements));
NewI = B.CreateExtractElement(O0, EE->getOperand(2));
} else {
// If we don't know what to do, be conservative and don't do anything.
continue;
}

// Lastly, extend the result.
NewI->takeName(cast<Instruction>(I));
Value *Res = B.CreateZExtOrTrunc(NewI, OriginalTy);
I->replaceAllUsesWith(Res);
cast<Instruction>(I)->eraseFromParent();
Erased.insert(I);
State.reset(Def, Res, Part);
}
}

// We'll have created a bunch of ZExts that are now parentless. Clean up.
for (const auto &KV : Cost->getMinimalBitwidths()) {
// If the value wasn't vectorized, we must maintain the original scalar
// type. The absence of the value from State indicates that it
// wasn't vectorized.
// FIXME: Should not rely on getVPValue at this point.
VPValue *Def = State.Plan->getVPValue(KV.first, true);
if (!State.hasAnyVectorValue(Def))
continue;
unsigned UFToUse = OrigLoop->contains(KV.first) ? UF : 1;
for (unsigned Part = 0; Part < UFToUse; ++Part) {
Value *I = State.get(Def, Part);
ZExtInst *Inst = dyn_cast<ZExtInst>(I);
if (Inst && Inst->use_empty()) {
Value *NewI = Inst->getOperand(0);
Inst->eraseFromParent();
State.reset(Def, NewI, Part);
}
}
}
}

void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State,		void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State,
VPlan &Plan) {		VPlan &Plan) {
// Insert truncates and extends for any truncated instructions as hints to
// InstCombine.
if (VF.isVector())
truncateToMinimalBitwidths(State);

// Fix widened non-induction PHIs by setting up the PHI operands.		// Fix widened non-induction PHIs by setting up the PHI operands.
if (EnableVPlanNativePath)		if (EnableVPlanNativePath)
fixNonInductionPHIs(Plan, State);		fixNonInductionPHIs(Plan, State);

// At this point every instruction in the original loop is widened to a		// At this point every instruction in the original loop is widened to a
// vector form. Now we need to fix the recurrences in the loop. These PHI		// vector form. Now we need to fix the recurrences in the loop. These PHI
// nodes are currently empty because we did not want to introduce cycles.		// nodes are currently empty because we did not want to introduce cycles.
// This is the second stage of vectorizing recurrences.		// This is the second stage of vectorizing recurrences.
▲ Show 20 Lines • Show All 5,118 Lines • ▼ Show 20 Lines	void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
ElementCount MaxVF) {		ElementCount MaxVF) {
assert(OrigLoop->isInnermost() && "Inner loop expected.");		assert(OrigLoop->isInnermost() && "Inner loop expected.");

auto MaxVFTimes2 = MaxVF * 2;		auto MaxVFTimes2 = MaxVF * 2;
for (ElementCount VF = MinVF; ElementCount::isKnownLT(VF, MaxVFTimes2);) {		for (ElementCount VF = MinVF; ElementCount::isKnownLT(VF, MaxVFTimes2);) {
VFRange SubRange = {VF, MaxVFTimes2};		VFRange SubRange = {VF, MaxVFTimes2};
if (auto Plan = tryToBuildVPlanWithVPRecipes(SubRange)) {		if (auto Plan = tryToBuildVPlanWithVPRecipes(SubRange)) {
// Now optimize the initial VPlan.		// Now optimize the initial VPlan.
VPlanTransforms::optimize(Plan, PSE.getSE());		VPlanTransforms::optimize(Plan, PSE.getSE(), CM.getMinimalBitwidths());
assert(VPlanVerifier::verifyPlanIsValid(*Plan) && "VPlan is invalid");		assert(VPlanVerifier::verifyPlanIsValid(*Plan) && "VPlan is invalid");
VPlans.push_back(std::move(Plan));		VPlans.push_back(std::move(Plan));
}		}
VF = SubRange.End;		VF = SubRange.End;
}		}
}		}

// Add the necessary canonical IV and branch recipes required to control the		// Add the necessary canonical IV and branch recipes required to control the
▲ Show 20 Lines • Show All 1,800 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlan.h

Show First 20 Lines • Show All 269 Lines • ▼ Show 20 Lines	struct VPTransformState {
Value get(VPValue Def, const VPIteration &Instance);		Value get(VPValue Def, const VPIteration &Instance);

bool hasVectorValue(VPValue *Def, unsigned Part) {		bool hasVectorValue(VPValue *Def, unsigned Part) {
auto I = Data.PerPartOutput.find(Def);		auto I = Data.PerPartOutput.find(Def);
return I != Data.PerPartOutput.end() && Part < I->second.size() &&		return I != Data.PerPartOutput.end() && Part < I->second.size() &&
I->second[Part];		I->second[Part];
}		}

bool hasAnyVectorValue(VPValue *Def) const {
return Data.PerPartOutput.contains(Def);
}

AyalUnsubmitted Done Reply Inline Actions How/Is this removal related? Ayal: How/Is this removal related?
fhahnAuthorUnsubmitted Done Reply Inline Actions The last user of this function has been removed in the patch. fhahn: The last user of this function has been removed in the patch.
AyalUnsubmitted Not Done Reply Inline Actions Very well! Ayal: Very well!
bool hasScalarValue(VPValue *Def, VPIteration Instance) {		bool hasScalarValue(VPValue *Def, VPIteration Instance) {
auto I = Data.PerPartScalars.find(Def);		auto I = Data.PerPartScalars.find(Def);
if (I == Data.PerPartScalars.end())		if (I == Data.PerPartScalars.end())
return false;		return false;
unsigned CacheIdx = Instance.Lane.mapToCacheIndex(VF);		unsigned CacheIdx = Instance.Lane.mapToCacheIndex(VF);
return Instance.Part < I->second.size() &&		return Instance.Part < I->second.size() &&
CacheIdx < I->second[Instance.Part].size() &&		CacheIdx < I->second[Instance.Part].size() &&
I->second[Instance.Part][CacheIdx];		I->second[Instance.Part][CacheIdx];
▲ Show 20 Lines • Show All 2,763 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlanTransforms.h

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines struct VPlanTransforms {

/// resulting plan to \p BestVF and \p BestUF. /// resulting plan to \p BestVF and \p BestUF.

static void optimizeForVFAndUF(VPlan &Plan, ElementCount BestVF, static void optimizeForVFAndUF(VPlan &Plan, ElementCount BestVF,

unsigned BestUF, unsigned BestUF,

PredicatedScalarEvolution &PSE); PredicatedScalarEvolution &PSE);

/// Apply VPlan-to-VPlan optimizations to \p Plan, including induction recipe /// Apply VPlan-to-VPlan optimizations to \p Plan, including induction recipe

/// optimizations, dead recipe removal, replicate region optimizations and /// optimizations, dead recipe removal, replicate region optimizations and

/// block merging. /// block merging.

static void optimize(VPlan &Plan, ScalarEvolution &SE); static void optimize(VPlan &Plan, ScalarEvolution &SE,

const MapVector<Instruction *, uint64_t> &MinBWs);

/// Wrap predicated VPReplicateRecipes with a mask operand in an if-then /// Wrap predicated VPReplicateRecipes with a mask operand in an if-then

/// region block and remove the mask operand. Optimize the created regions by /// region block and remove the mask operand. Optimize the created regions by

/// iteratively sinking scalar operands into the region, followed by merging /// iteratively sinking scalar operands into the region, followed by merging

/// regions until no improvements are remaining. /// regions until no improvements are remaining.

static void createAndOptimizeReplicateRegions(VPlan &Plan); static void createAndOptimizeReplicateRegions(VPlan &Plan);

/// Replace (ICMP_ULE, wide canonical IV, backedge-taken-count) checks with an /// Replace (ICMP_ULE, wide canonical IV, backedge-taken-count) checks with an

/// (active-lane-mask recipe, wide canonical IV, trip-count). If \p /// (active-lane-mask recipe, wide canonical IV, trip-count). If \p

AyalUnsubmitted

Done

nit: a VPlan transform should fold redundant ZExt-Trunc pairs rather than leaving them ("as hints") to InstCombine.

Being a public method, which does not need SE, should the caller of optimize() precede its call with a direct call to trunctateToMinimalBitwidth(), rather than pass MinBWs to optimize()?

Ayal: nit: a VPlan transform should fold redundant ZExt-Trunc pairs rather than leaving them ("as…

AyalUnsubmitted

Done

Thoughts on the above?
Better truncate to minimal bitwidth asap, as it relies on IR information? Conceptually a scalar transform.
Does "as hints to InstCombine" below still hold?

Ayal: Thoughts on the above? Better truncate to minimal bitwidth asap, as it relies on IR information?

AyalUnsubmitted

Done

WDYT on the above: should the caller of optimize() precede its call with a direct call to trunctateToMinimalBitwidth(), rather than pass MinBWs to optimize()?

Ayal: WDYT on the above: should the caller of optimize() precede its call with a direct call to…

fhahnAuthorUnsubmitted

Done

Sounds good, updated, thanks!

fhahn: Sounds good, updated, thanks!

/// UseActiveLaneMaskForControlFlow is true, introduce an /// UseActiveLaneMaskForControlFlow is true, introduce an

/// VPActiveLaneMaskPHIRecipe. If \p DataAndControlFlowWithoutRuntimeCheck is /// VPActiveLaneMaskPHIRecipe. If \p DataAndControlFlowWithoutRuntimeCheck is

/// true, no minimum-iteration runtime check will be created (during skeleton /// true, no minimum-iteration runtime check will be created (during skeleton

/// creation) and instead it is handled using active-lane-mask. \p /// creation) and instead it is handled using active-lane-mask. \p

/// DataAndControlFlowWithoutRuntimeCheck implies \p /// DataAndControlFlowWithoutRuntimeCheck implies \p

/// UseActiveLaneMaskForControlFlow. /// UseActiveLaneMaskForControlFlow.

static void addActiveLaneMask(VPlan &Plan, static void addActiveLaneMask(VPlan &Plan,

bool UseActiveLaneMaskForControlFlow, bool UseActiveLaneMaskForControlFlow,

bool DataAndControlFlowWithoutRuntimeCheck); bool DataAndControlFlowWithoutRuntimeCheck);

private: private:

/// Remove redundant VPBasicBlocks by merging them into their predecessor if /// Remove redundant VPBasicBlocks by merging them into their predecessor if

AyalUnsubmitted

Done

/// Insert truncates and extends for any truncated recipe. Redundant casts

- /// will folded later.

+ /// will be folded later.

static void

Ayal:

fhahnAuthorUnsubmitted

Done

Fixed, thanks!

fhahn: Fixed, thanks!

/// the predecessor has a single successor. /// the predecessor has a single successor.

static bool mergeBlocksIntoPredecessors(VPlan &Plan); static bool mergeBlocksIntoPredecessors(VPlan &Plan);

/// Remove redundant casts of inductions. /// Remove redundant casts of inductions.

/// ///

/// Such redundant casts are casts of induction variables that can be ignored, /// Such redundant casts are casts of induction variables that can be ignored,

/// because we already proved that the casted phi is equal to the uncasted phi /// because we already proved that the casted phi is equal to the uncasted phi

/// in the vectorized loop. There is no need to vectorize the cast - the same /// in the vectorized loop. There is no need to vectorize the cast - the same

Show All 12 Lines private:

/// the needs of vector extracts. /// the needs of vector extracts.

static void optimizeInductions(VPlan &Plan, ScalarEvolution &SE); static void optimizeInductions(VPlan &Plan, ScalarEvolution &SE);

/// Remove redundant EpxandSCEVRecipes in \p Plan's entry block by replacing /// Remove redundant EpxandSCEVRecipes in \p Plan's entry block by replacing

/// them with already existing recipes expanding the same SCEV expression. /// them with already existing recipes expanding the same SCEV expression.

static void removeRedundantExpandSCEVRecipes(VPlan &Plan); static void removeRedundantExpandSCEVRecipes(VPlan &Plan);

}; };

AyalUnsubmitted

Done

Note: a VPlan-based InstCombine could take care of these "hints" by folding redundant extend-truncate pairs.

Ayal: Note: a VPlan-based InstCombine could take care of these "hints" by folding redundant extend…

fhahnAuthorUnsubmitted

Done

Agreed, I think we already have a few separate transforms that could fit into a general instcombine transform

fhahn: Agreed, I think we already have a few separate transforms that could fit into a general…

AyalUnsubmitted

Done

The dead casts removal at the end of current truncateToMinimalBitwidths() should already be taken care of by recipe dce, right?

Ayal: The dead casts removal at the end of current truncateToMinimalBitwidths() should already be…

fhahnAuthorUnsubmitted

Done

Yes that should be taken care of.

fhahn: Yes that should be taken care of.

} // namespace llvm } // namespace llvm

#endif // LLVM_TRANSFORMS_VECTORIZE_VPLANTRANSFORMS_H #endif // LLVM_TRANSFORMS_VECTORIZE_VPLANTRANSFORMS_H

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

Show First 20 Lines • Show All 746 Lines • ▼ Show 20 Lines for (VPFirstOrderRecurrencePHIRecipe *FOR : RecurrencePhis) {

// all users. // all users.

RecurSplice->setOperand(0, FOR); RecurSplice->setOperand(0, FOR);

} }

return true; return true;

} }

void VPlanTransforms::clearReductionWrapFlags(VPlan &Plan) { void VPlanTransforms::clearReductionWrapFlags(VPlan &Plan) {

for (VPRecipeBase &R : for (VPRecipeBase &R :

Plan.getVectorLoopRegion()->getEntryBasicBlock()->phis()) { Plan.getVectorLoopRegion()->getEntryBasicBlock()->phis()) {

AyalUnsubmitted

Done

nit: can return the type size in bits, as that is what is needed here. Op >> VPV?

Thought: worth introducing as a member of VPValue, to be overridden by VPWidenCastRecipe? Note that this is Element/Scalar Type.

Ayal: nit: can return the type size in bits, as that is what is needed here. Op >> VPV? Thought…

fhahnAuthorUnsubmitted

Done

Adjusted to return size in bits to simplify code, thanks!

Thought: worth introducing as a member of VPValue, to be overridden by VPWidenCastRecipe? Note that this is Element/Scalar Type.

Effectively adding scalar type info to all VPValues? Might be good to investigate separately, although the current use-cases would probably be very limited

fhahn: Adjusted to return size in bits to simplify code, thanks! > Thought: worth introducing as a…

AyalUnsubmitted

Done

Thought: worth introducing as a member of VPValue, to be overridden by VPWidenCastRecipe? Note that this is Element/Scalar Type.

Effectively adding scalar type info to all VPValues? Might be good to investigate separately, although the current use-cases would probably be very limited

Very well.

Ayal: >> Thought: worth introducing as a member of VPValue, to be overridden by VPWidenCastRecipe?

fhahnAuthorUnsubmitted

Done

This has been updated to now use VPTypeAnalysis.

fhahn: This has been updated to now use VPTypeAnalysis.

AyalUnsubmitted

Done

nit: VPValue *Op >> VPValue *VPV?

Ayal: nit: `VPValue *Op` >> `VPValue *VPV`?

fhahnAuthorUnsubmitted

Done

Updated, thanks!

fhahn: Updated, thanks!

auto *PhiR = dyn_cast<VPReductionPHIRecipe>(&R); auto *PhiR = dyn_cast<VPReductionPHIRecipe>(&R);

if (!PhiR) if (!PhiR)

continue; continue;

const RecurrenceDescriptor &RdxDesc = PhiR->getRecurrenceDescriptor(); const RecurrenceDescriptor &RdxDesc = PhiR->getRecurrenceDescriptor();

RecurKind RK = RdxDesc.getRecurrenceKind(); RecurKind RK = RdxDesc.getRecurrenceKind();

if (RK != RecurKind::Add && RK != RecurKind::Mul) if (RK != RecurKind::Add && RK != RecurKind::Mul)

continue; continue;

AyalUnsubmitted

Done

nit: worth an empty line?

Ayal: nit: worth an empty line?

fhahnAuthorUnsubmitted

Done

added, thanks!

fhahn: added, thanks!

SmallSetVector<VPValue *, 8> Worklist; SmallSetVector<VPValue *, 8> Worklist;

Worklist.insert(PhiR); Worklist.insert(PhiR);

for (unsigned I = 0; I != Worklist.size(); ++I) { for (unsigned I = 0; I != Worklist.size(); ++I) {

VPValue *Cur = Worklist[I]; VPValue *Cur = Worklist[I];

if (auto *RecWithFlags = if (auto *RecWithFlags =

dyn_cast<VPRecipeWithIRFlags>(Cur->getDefiningRecipe())) { dyn_cast<VPRecipeWithIRFlags>(Cur->getDefiningRecipe())) {

RecWithFlags->dropPoisonGeneratingFlags(); RecWithFlags->dropPoisonGeneratingFlags();

AyalUnsubmitted

Done

continue;

- auto *UI =

- cast_or_null<Instruction>(R.getVPSingleValue()->getUnderlyingValue());

+ VPValue *ResultVPV = R.getVPSingleValue();

+ auto *UI = cast_or_null<Instruction>(ResultVPV->getUnderlyingValue());

auto I = MinBWs.find(UI);

Ayal:

fhahnAuthorUnsubmitted

Done

Updated, thanks!

fhahn: Updated, thanks!

} }

AyalUnsubmitted

Done

nit: is find() ok given a null UI?

Ayal: nit: is find() ok given a null UI?

fhahnAuthorUnsubmitted

Done

Yes I think so, the keys are pointers and they shouldn't be dereferenced.

fhahn: Yes I think so, the keys are pointers and they shouldn't be dereferenced.

for (VPUser *U : Cur->users()) { for (VPUser *U : Cur->users()) {

auto *UserRecipe = dyn_cast<VPRecipeBase>(U); auto *UserRecipe = dyn_cast<VPRecipeBase>(U);

AyalUnsubmitted

Done

continue;

+ unsigned ResSizeInBits = GetSizeInBits(ResultVPV);

unsigned NewResSizeInBits = I->second;

Ayal:

fhahnAuthorUnsubmitted

Done

Adjusted, thanks!

fhahn: Adjusted, thanks!

if (!UserRecipe) if (!UserRecipe)

continue; continue;

for (VPValue *V : UserRecipe->definedValues()) for (VPValue *V : UserRecipe->definedValues())

Worklist.insert(V); Worklist.insert(V);

AyalUnsubmitted

Done

Type *ResTy = UI->getType();

- if (!ResTy->isIntegerTy() ||

- ResTy->getScalarSizeInBits() == NewResSizeInBits)

+ if (!ResTy->isIntegerTy() || ResSizeInBits == NewResSizeInBits)

continue;

Ayal:

fhahnAuthorUnsubmitted

Done

Done, thanks!

fhahn: Done, thanks!

} }

AyalUnsubmitted

Done

nit: this can be checked first, instead of checking for single defined value.

Thought: could/should each MinBW be attached to its recipe asap - when the latter is created, considering it depends on associated underlying instruction?

Ayal: nit: this can be checked first, instead of checking for single defined value. Thought…

fhahnAuthorUnsubmitted

Done

Moved the check up, thanks!

Thought: could/should each MinBW be attached to its recipe asap - when the latter is created, considering it depends on associated underlying instruction?

Might be a potential follow-up, but we would still potentially updated MinBWs on each recipe replacement?

fhahn: Moved the check up, thanks! > Thought: could/should each MinBW be attached to its recipe asap…

AyalUnsubmitted

Not Done

Thought: could/should each MinBW be attached to its recipe asap - when the latter is created, considering it depends on associated underlying instruction?

Might be a potential follow-up, but we would still potentially updated MinBWs on each recipe replacement?

Sure, like updating any other property of a recipe when replaced.

Ayal: >> Thought: could/should each MinBW be attached to its recipe asap - when the latter is created…

AyalUnsubmitted

Not Done

Just noting potential follow-up, possibly as a TODO somewhere: attach each MinBW to its recipe when the latter is created, supplementing its underlying inst.

Ayal: Just noting potential follow-up, possibly as a TODO somewhere: attach each MinBW to its recipe…

} }

AyalUnsubmitted

Done

nit: auto ResNewTyInBits = I->second;
nit: auto ResNewTy = IntegerType::get(ResTy->getContext(), ResNewTyInBits); ?

Ayal: nit: `auto ResNewTyInBits = I->second;` nit: `auto ResNewTy = IntegerType::get(ResTy…

fhahnAuthorUnsubmitted

Done

Added variables, thanks!

fhahn: Added variables, thanks!

/// Returns true is \p V is constant one. /// Returns true is \p V is constant one.

static bool isConstantOne(VPValue *V) { static bool isConstantOne(VPValue *V) {

AyalUnsubmitted

Done

nit: suffice to check isa<> and continue to work with R instead of VPW?

Ayal: nit: suffice to check isa<> and continue to work with R instead of VPW?

fhahnAuthorUnsubmitted

Done

Done, thanks!

fhahn: Done, thanks!

if (!V->isLiveIn()) if (!V->isLiveIn())

AyalUnsubmitted

Done

UI is aka UV. Better call it UI from the start, as it's an Instruction* rather than Value*.

Ayal: UI is aka UV. Better call it UI from the start, as it's an Instruction* rather than Value*.

fhahnAuthorUnsubmitted

Done

Renamed, thanks

fhahn: Renamed, thanks

return false; return false;

auto *C = dyn_cast<ConstantInt>(V->getLiveInIRValue()); auto *C = dyn_cast<ConstantInt>(V->getLiveInIRValue());

return C && C->isOne(); return C && C->isOne();

} }

/// Returns the llvm::Instruction opcode for \p R. /// Returns the llvm::Instruction opcode for \p R.

AyalUnsubmitted

Done

UI->getType() is aka ResTy. Already early-continued if it was equal in size to I->second. Can it be smaller in size than I->second? If so worth early-continuing above, if not worth asserting?

Ayal: UI->getType() is aka ResTy. Already early-continued if it was equal in size to I->second. Can…

fhahnAuthorUnsubmitted

Done

Updated to use ResTy and replace check with assert, thanks!

fhahn: Updated to use `ResTy` and replace check with assert, thanks!

static unsigned getOpcodeForRecipe(VPRecipeBase &R) { static unsigned getOpcodeForRecipe(VPRecipeBase &R) {

AyalUnsubmitted

Done

Operand of SExt/ZExt must be smaller in size than its result, so if result is at most I->second so must its operand be?

Ayal: Operand of SExt/ZExt must be smaller in size than its result, so if result is at most I->second…

fhahnAuthorUnsubmitted

Done

Current must be ResTy > NewResTy, and the operand can also be >= NewResTy I think. There also are test cases exercising the path.

fhahn: Current must be `ResTy > NewResTy`, and the operand can also be `>= NewResTy` I think. There…

AyalUnsubmitted

Done

case Instruction::ZExt: {

- assert(ResTy->getScalarSizeInBits() > NewResSizeInBits &&

- "Nothing to shrink?");

+ assert(ResSizeInBits > NewResSizeInBits && "Nothing to shrink?");

if (GetSizeInBits(R.getOperand(0)) >= NewResSizeInBits)

Ayal:

fhahnAuthorUnsubmitted

Done

Updated, thanks!

fhahn: Updated, thanks!

AyalUnsubmitted

Done

nit: can set auto *Op = R.getOperand(0); for consistency with below.

Ayal: nit: can set `auto *Op = R.getOperand(0);` for consistency with below.

AyalUnsubmitted

Done

nit: can assert ResSizeInBits > NewResSizeInBits above, after early-continuing if they're equal.
Actually, they shouldn't even be equal (also compares?), assuming MinBWs is up-to-date and each insn is visited and optimized once. Current code also early-continues when equal, so replacing it with an assert can be done in a separate patch.

Ayal: nit: can assert ResSizeInBits > NewResSizeInBits above, after early-continuing if they're equal.

fhahnAuthorUnsubmitted

Done

I think the assertion might not always hold ,e.g. for truncate recipes.

fhahn: I think the assertion might not always hold ,e.g. for truncate recipes.

if (auto *WidenR = dyn_cast<VPWidenRecipe>(&R)) if (auto *WidenR = dyn_cast<VPWidenRecipe>(&R))

return WidenR->getUnderlyingInstr()->getOpcode(); return WidenR->getUnderlyingInstr()->getOpcode();

AyalUnsubmitted

Done

OK, operand < ResTy due to SExt/ZExt,
and NewResTy < ResTy due to MinBW.
NewResTy == ResTy cases should arguably be excluded from MinBWs? (independent of this patch)
Now if operand < NewResTy (< ResTy) then we SExt/ZExt the operand directly to NewResTy instead, and continue - why is the "Extend result to original width" part skipped in this case?
If OTOH operand > NewResTy a Trunc is needed rather than an Extend, and provided by subsequent code which is reached by break, followed by ZExt back to ResTy.
Otherwise if operand == NewResTy, the SExt/ZExt could be dropped, but we keep it and end up generating a redundant ZExt from R to ResTy - which have same sizes? It's probably ok because the knowledge that NewResTy bits suffice is already there, but would be good to clarify/clean up.

Ayal: OK, operand < ResTy due to SExt/ZExt, and NewResTy < ResTy due to MinBW. NewResTy == ResTy…

fhahnAuthorUnsubmitted

Done

Now if operand < NewResTy (< ResTy) then we SExt/ZExt the operand directly to NewResTy instead, and continue - why is the "Extend result to original width" part skipped in this case?

In that case, the original (wider) cast is replaced by a new (narrower) cast and there's no need to truncate.

If OTOH operand > NewResTy a Trunc is needed rather than an Extend, and provided by subsequent code which is reached by break, followed by ZExt back to ResTy.

Yep.

Otherwise if operand == NewResTy, the SExt/ZExt could be dropped, but we keep it and end up generating a redundant ZExt from R to ResTy - which have same sizes? It's probably ok because the knowledge that NewResTy bits suffice is already there, but would be good to clarify/clean up.

Yes we would at the moment generate redundant extend/trunc chains, which would indeed be good to clean up. I think we could fold those as follow-up.

fhahn: > Now if operand < NewResTy (< ResTy) then we SExt/ZExt the operand directly to NewResTy…

AyalUnsubmitted

Done

Now if operand < NewResTy (< ResTy) then we SExt/ZExt the operand directly to NewResTy instead, and continue - why is the "Extend result to original width" part skipped in this case?

In that case, the original (wider) cast is replaced by a new (narrower) cast and there's no need to truncate.

Yes, the extend-to-Res is replaced by a narrower extend-to-NewRes, but w/o another extend-back-to-Res to provide the original width, might it feed a user, say, a binary operation with mismatched size operands - where the other operand can also shrink to NewRes (as guaranteed by MinBWs) but was extended-back-to-Res? I.e., should all shrunks extend-back-to-Res, or none of them? May need better test coverage.

Ayal: >> Now if operand < NewResTy (< ResTy) then we SExt/ZExt the operand directly to NewResTy…

fhahnAuthorUnsubmitted

Done

Hm I am not sure, but if MinBWs is set the a specific bit width, wouldn't this require that all users to have the same minimal bit width for the value?

fhahn: Hm I am not sure, but if MinBWs is set the a specific bit width, wouldn't this require that all…

AyalUnsubmitted

Done

Agreed - MinBW should specify a consistent minimal bit width for all users, and for all operands, but there seems to be some discrepancy that is confusing:

A. Instructions whose operands and return value are all of a single type (excluding condition operand of selects) are converted to operate on a narrower type by (a) shrinking their operands to the narrower type and (b) extending their result from the narrower type to their original type. Instructions that feed values to such instructions or use their values, continue to feed and use values of the original type.
A pair of such instructions where one feeds the other will be added a zext-trunc pair between them which will later be folded.

B. Instructions that convert between two distinct types, continue to digest the original source type but are updated to produce values of the new destination type. Their users, when reached subsequently, need to check if any of their operands have been narrowed. But if this is the case, why bother expanding results in (b) above? OTOH, the narrowed results of conversion instructions can also be expanded (to be folded later), keeping the treatment consistent? Always expecting the new type to be strictly smaller than the current one. Perhaps conversion instructions could be skipped now and handled by subsequent folding pass - looking for trunc-trunc and sext-trunc pairs in addition to zext-trunc ones?

C. Loads are ignored - excluded from MiinBWs? They could potentially be narrowed to load only the required bits, though its unclear if a strided narrow load is better than a unit-strided wider load and trunc - as in an interleave-group(?)

D. Phis are ignored - excluded from MinBWs. Truncated header induction phi's are handled separately. Other phi's may deserve narrowing(?)

Ayal: Agreed - MinBW should specify a consistent minimal bit width for all users, and for all…

fhahnAuthorUnsubmitted

Done

The latest version doesn't have special treatment for casts, they remain unchanged and VPlan recipe simplification will take care of folding them if possible.

fhahn: The latest version doesn't have special treatment for casts, they remain unchanged and VPlan…

if (auto *WidenC = dyn_cast<VPWidenCastRecipe>(&R)) if (auto *WidenC = dyn_cast<VPWidenCastRecipe>(&R))

AyalUnsubmitted

Done

nit: may look better to take R's opcode than UI's, but that requires casting it to VPWidenCastRecipe, so above isa maybe worth dyn_cast after all...

Ayal: nit: may look better to take R's opcode than UI's, but that requires casting it to…

fhahnAuthorUnsubmitted

Done

Updated, thanks!

fhahn: Updated, thanks!

return WidenC->getOpcode(); return WidenC->getOpcode();

if (auto *RepR = dyn_cast<VPReplicateRecipe>(&R)) if (auto *RepR = dyn_cast<VPReplicateRecipe>(&R))

return RepR->getUnderlyingInstr()->getOpcode(); return RepR->getUnderlyingInstr()->getOpcode();

if (auto *VPI = dyn_cast<VPInstruction>(&R)) if (auto *VPI = dyn_cast<VPInstruction>(&R))

return VPI->getOpcode(); return VPI->getOpcode();

return 0; return 0;

} }

/// Try to simplify recipe \p R. /// Try to simplify recipe \p R.

static void simplifyRecipe(VPRecipeBase &R, VPTypeAnalysis &TypeInfo) { static void simplifyRecipe(VPRecipeBase &R, VPTypeAnalysis &TypeInfo) {

switch (getOpcodeForRecipe(R)) { switch (getOpcodeForRecipe(R)) {

case Instruction::Mul: { case Instruction::Mul: {

VPValue *A = R.getOperand(0); VPValue *A = R.getOperand(0);

AyalUnsubmitted

Done

assert Op > NewRes? What about the condition operand of select?

Ayal: assert Op > NewRes? What about the condition operand of select?

fhahnAuthorUnsubmitted

Done

Added assert, thanks!

Hmm, select would indeed be handled incorrectly, but I wasn't able to find a suitable test case. Removed VPWidenSelect for now, but will try to come up with a test case. Alternatively could leave select-handling in + assert to surface a test case, if one exists.

fhahn: Added assert, thanks! Hmm, select would indeed be handled incorrectly, but I wasn't able to…

AyalUnsubmitted

Done

Current code seems to handle selects, and compares, as well as loads and phi's - extending only their result - although MinBWs seems to exclude them(?). So Blend and WidenMemory recipes need not be considered, neither should Replicate recipe - those are to retain their current BW (hence all should extend back to ResTy rather than shrinking all to NewResTy). Worth trying to check if all insns of MinBWs were considered somehow?

Ayal: Current code seems to handle selects, and compares, as well as loads and phi's - extending only…

fhahnAuthorUnsubmitted

Done

Updated to also handle selects and replicate recipes. New tests should have been added a while ago.

I also added an assert checking if the number of processed instructions matches MinBWs.size().

fhahn: Updated to also handle selects and replicate recipes. New tests should have been added a while…

VPValue *B = R.getOperand(1); VPValue *B = R.getOperand(1);

if (isConstantOne(A)) if (isConstantOne(A))

AyalUnsubmitted

Done

continue;

- auto *Shrunk = new VPWidenCastRecipe(

- Instruction::Trunc, Op, IntegerType::get(Ctx, NewResSizeInBits));

+ auto *Shrunk = new VPWidenCastRecipe(Instruction::Trunc, Op, NewResTy);

R.setOperand(Idx, Shrunk);

Ayal:

fhahnAuthorUnsubmitted

Done

Updated, thanks!

fhahn: Updated, thanks!

return R.getVPSingleValue()->replaceAllUsesWith(B); return R.getVPSingleValue()->replaceAllUsesWith(B);

AyalUnsubmitted

Done

nit: first take care of creating and inserting Shrunk, then take care of R's flags drop and operand set?

Ayal: nit: first take care of creating and inserting Shrunk, then take care of R's flags drop and…

fhahnAuthorUnsubmitted

Done

Done, thanks!

fhahn: Done, thanks!

if (isConstantOne(B)) if (isConstantOne(B))

return R.getVPSingleValue()->replaceAllUsesWith(A); return R.getVPSingleValue()->replaceAllUsesWith(A);

break; break;

AyalUnsubmitted

Done

R.getOperand(Idx) is aka Op.

Ayal: R.getOperand(Idx) is aka Op.

fhahnAuthorUnsubmitted

Done

Done, thanks!

fhahn: Done, thanks!

} }

case Instruction::Trunc: { case Instruction::Trunc: {

VPRecipeBase *Ext = R.getOperand(0)->getDefiningRecipe(); VPRecipeBase *Ext = R.getOperand(0)->getDefiningRecipe();

if (!Ext) if (!Ext)

break; break;

AyalUnsubmitted

Done

// Extend result to original width.

- auto *Ext =

- new VPWidenCastRecipe(Instruction::ZExt, R.getVPSingleValue(), ResTy);

+ auto *Ext = new VPWidenCastRecipe(Instruction::ZExt, ResultVPV, ResTy);

ResultVPV->replaceAllUsesWith(Ext);

Ayal:

fhahnAuthorUnsubmitted

Done

Done, thanks!

fhahn: Done, thanks!

unsigned ExtOpcode = getOpcodeForRecipe(*Ext); unsigned ExtOpcode = getOpcodeForRecipe(*Ext);

if (ExtOpcode != Instruction::ZExt && ExtOpcode != Instruction::SExt) if (ExtOpcode != Instruction::ZExt && ExtOpcode != Instruction::SExt)

AyalUnsubmitted

Done

ResultVPV->replaceAllUsesWith(Ext);

- Ext->setOperand(0, R.getVPSingleValue());

+ Ext->setOperand(0, ResultVPV);

Ext->insertAfter(&R);

Ayal:

fhahnAuthorUnsubmitted

Done

Updated, thanks!

fhahn: Updated, thanks!

break; break;

AyalUnsubmitted

Done

nit: define auto *RVPValue = R.getVPSingleValue() above?

Would be good to have a common base class for all recipes having a single value, as this amounts to a cast.

Ayal: nit: define `auto *RVPValue = R.getVPSingleValue()` above? Would be good to have a common base…

fhahnAuthorUnsubmitted

Done

nit: define auto *RVPValue = R.getVPSingleValue() above?

Done thanks!

Would be good to have a common base class for all recipes having a single value, as this amounts to a cast.

Yes, I think that came up in earlier patches as well.

fhahn: > nit: define auto *RVPValue = R.getVPSingleValue() above? Done thanks! > Would be good to…

VPValue *A = Ext->getOperand(0); VPValue *A = Ext->getOperand(0);

VPValue *Trunc = R.getVPSingleValue(); VPValue *Trunc = R.getVPSingleValue();

Type *TruncTy = TypeInfo.inferScalarType(Trunc); Type *TruncTy = TypeInfo.inferScalarType(Trunc);

AyalUnsubmitted

Done

Other insertions of shrunk operands and smaller extends are placed before R; this one is placed after - and calls for make_early_inc_range, right?

Ayal: Other insertions of shrunk operands and smaller extends are placed before R; this one is placed…

fhahnAuthorUnsubmitted

Done

Yep.

fhahn: Yep.

Type *ATy = TypeInfo.inferScalarType(A); Type *ATy = TypeInfo.inferScalarType(A);

if (TruncTy == ATy) { if (TruncTy == ATy) {

Trunc->replaceAllUsesWith(A); Trunc->replaceAllUsesWith(A);

} else if (ATy->getScalarSizeInBits() < TruncTy->getScalarSizeInBits()) { } else if (ATy->getScalarSizeInBits() < TruncTy->getScalarSizeInBits()) {

auto *VPC = auto *VPC =

new VPWidenCastRecipe(Instruction::CastOps(ExtOpcode), A, TruncTy); new VPWidenCastRecipe(Instruction::CastOps(ExtOpcode), A, TruncTy);

VPC->insertBefore(&R); VPC->insertBefore(&R);

Trunc->replaceAllUsesWith(VPC); Trunc->replaceAllUsesWith(VPC);

Show All 29 Lines static void simplifyRecipes(VPlan &Plan, LLVMContext &Ctx) {

VPTypeAnalysis TypeInfo(Ctx); VPTypeAnalysis TypeInfo(Ctx);

for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(RPOT)) { for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(RPOT)) {

for (VPRecipeBase &R : make_early_inc_range(*VPBB)) { for (VPRecipeBase &R : make_early_inc_range(*VPBB)) {

simplifyRecipe(R, TypeInfo); simplifyRecipe(R, TypeInfo);

} }

void VPlanTransforms::optimize(VPlan &Plan, ScalarEvolution &SE) { /// Insert truncates and extends for any truncated instructions as hints to

/// InstCombine.

AyalUnsubmitted

Done

nit: are these still hints to InstCombine, or to subsequent VPlan cleanups?

Ayal: nit: are these still hints to InstCombine, or to subsequent VPlan cleanups?

fhahnAuthorUnsubmitted

Done

Updated, thanks!

fhahn: Updated, thanks!

static void

truncateToMinimalBitwidths(VPlan &Plan,

const MapVector<Instruction *, uint64_t> &MinBWs,

AyalUnsubmitted

Done

auto GetSizeInBits = [](VPValue *VPV) {

- auto *UV = VPV->getUnderlyingValue();

- if (UV)

+ if (auto *UV = VPV->getUnderlyingValue())

return UV->getType()->getScalarSizeInBits();

nit

Ayal: nit

fhahnAuthorUnsubmitted

Done

Code has been moved to D159202

fhahn: Code has been moved to D159202

VPTypeAnalysis &TypeInfo) {

#ifndef NDEBUG

unsigned ProcessedRecipes = 0;

AyalUnsubmitted

Done

nit: ProcessedRecipesNum?

Ayal: nit: `ProcessedRecipesNum`?

fhahnAuthorUnsubmitted

Done

Changed to NumProcessedRecipes

fhahn: Changed to `NumProcessedRecipes`

AyalUnsubmitted

Done

ProcessedTruncs is used outside ifdef below, move its definition out of ifdef here? Or is it meant to ensure truncated operands are counted once by ProcessedRecipes for debugging only? If an operand is truncated multiple times, all its truncations must be to the same size, because "MinBW should specify a consistent minimal bit width for all users(, and for all operands)"?

Worth explaining why processed truncs are recorded.

Ayal: `ProcessedTruncs` is used outside ifdef below, move its definition out of ifdef here? Or is it…

fhahnAuthorUnsubmitted

Done

It's to re-use previously generated truncates. Note that we cannot RAUW after creating the new truncate, as this may make other uses not well typed (until they are processed and all their operands are truncated)

Moved out of ifdef

fhahn: It's to re-use previously generated truncates. Note that we cannot RAUW after creating the new…

AyalUnsubmitted

Done

Note that we cannot RAUW after creating the new truncate, as this may make other uses not well typed (until they are processed and all their operands are truncated)

Very well, may deserve a comment.

Ayal: > Note that we cannot RAUW after creating the new truncate, as this may make other uses not…

fhahnAuthorUnsubmitted

Done

Added a comment to ProcessedTruncs definition.

fhahn: Added a comment to ProcessedTruncs definition.

DenseMap<VPValue *, VPWidenCastRecipe *> ProcessedTruncs;

AyalUnsubmitted

Done

return UV->getType()->getScalarSizeInBits();

- if (auto *VPC = dyn_cast<VPWidenCastRecipe>(VPV)) {

+ if (auto *VPC = dyn_cast<VPWidenCastRecipe>(VPV))

return VPC->getResultType()->getScalarSizeInBits();

- }

llvm_unreachable("trying to get type of a VPValue without type info");

nit

Ayal: nit

fhahnAuthorUnsubmitted

Done

code has been moved to D159202

fhahn: code has been moved to D159202

#endif

VPBasicBlock *PH =

AyalUnsubmitted

Done

Should PH be skipped? Trying to shrink the (live-in) operands of recipes in PH will insert them at the end of PH...

Ayal: Should `PH` be skipped? Trying to shrink the (live-in) operands of recipes in PH will insert…

fhahnAuthorUnsubmitted

Done

Good point, there should be nothing to shrink in PH for now, as the analysis is for the loop body only, adjusted!

fhahn: Good point, there should be nothing to shrink in PH for now, as the analysis is for the loop…

cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getSinglePredecessor());

ReversePostOrderTraversal<VPBlockDeepTraversalWrapper<VPBlockBase *>> RPOT(

AyalUnsubmitted

Done

Define ProcessedRecipes only for debug?

/// First truncate live-ins that represent relevant Instructions.

Ayal: Define `ProcessedRecipes` only for debug? /// First truncate live-ins that represent relevant…

fhahnAuthorUnsubmitted

Done

Wrapped and added comment, thanks!

fhahn: Wrapped and added comment, thanks!

AyalUnsubmitted

Done

Suffice to ask if (!NewResSizeInBits)?

Ayal: Suffice to ask `if (!NewResSizeInBits)`?

fhahnAuthorUnsubmitted

Done

This code has now been removed; LiveIns are handled when truncating the other operands of an instruction; otherwise we leave the type info in an inconsistent state.

fhahn: This code has now been removed; LiveIns are handled when truncating the other operands of an…

Plan.getEntry());

AyalUnsubmitted

Done

(Future) Thought: wonder if instead of iterating over all live-ins looking to truncate any, it may be better to iterate over MinBWs and check if any are live-ins. Or lookup MinBWs upon construction of a live-in.

Ayal: (Future) Thought: wonder if instead of iterating over all live-ins looking to truncate any, it…

AyalUnsubmitted

Done

Thoughts about the above? Hopefully avoids exposing getLiveIns(), at the expense of holding a mapping between Values and LiveIns, as in LiveOuts.

Ayal: Thoughts about the above? Hopefully avoids exposing getLiveIns(), at the expense of holding a…

fhahnAuthorUnsubmitted

Done

LiveIns are now handled directly when truncating other operands; getLiveIns has been removed.

fhahn: LiveIns are now handled directly when truncating other operands; getLiveIns has been removed.

AyalUnsubmitted

Done

#endif

- VPBasicBlock *PH =

- cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getSinglePredecessor());

- ReversePostOrderTraversal<VPBlockDeepTraversalWrapper<VPBlockBase *>> RPOT(

- Plan.getEntry());

+ VPBasicBlock *PH = Plan.getEntry();

+ ReversePostOrderTraversal<VPBlockDeepTraversalWrapper<VPBlockBase *>> RPOT(PH);

for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(RPOT)) {

Ayal: ?

fhahnAuthorUnsubmitted

Done

Simplified , thanks!

fhahn: Simplified , thanks!

AyalUnsubmitted

Done

Shrunk operands are placed before R, but its extension is placed after - and calls for this make_early_inc_range, right?

Ayal: Shrunk operands are placed before R, but its extension is placed after - and calls for this…

fhahnAuthorUnsubmitted

Done

Yep

fhahn: Yep

for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(RPOT)) {

AyalUnsubmitted

Done

nit: use LiveInInst or something similar rather than UI?

Ayal: nit: use `LiveInInst` or something similar rather than `UI`?

fhahnAuthorUnsubmitted

Done

Renamed, thanks!

fhahn: Renamed, thanks!

for (VPRecipeBase &R : make_early_inc_range(*VPBB)) {

AyalUnsubmitted

Done

Would `MinBWs.lookup(UI) look better? Returning zero clearly indicates unfound.

Ayal: Would ``MinBWs.lookup(UI)` look better? Returning zero clearly indicates unfound.

fhahnAuthorUnsubmitted

Done

Updated, thanks!

fhahn: Updated, thanks!

if (!isa<VPWidenRecipe, VPWidenCastRecipe, VPReplicateRecipe,

AyalUnsubmitted

Done

assert "MinBW member must be integer" rather than continue - thereby skipping a MinBW member.

Ayal: assert "MinBW member must be integer" rather than continue - thereby skipping a MinBW member.

fhahnAuthorUnsubmitted

Done

Turned into an assert, thanks!

fhahn: Turned into an assert, thanks!

VPWidenSelectRecipe>(&R))

continue;

VPValue *ResultVPV = R.getVPSingleValue();

AyalUnsubmitted

Done

unsigned NewResSizeInBits = I->second;

- Type *ResTy = VPV->getLiveInIRValue()->getType();

+ Type *ResTy = UI->getType();

if (!ResTy->isIntegerTy())

Ayal:

fhahnAuthorUnsubmitted

Done

Adjusted, thanks!

fhahn: Adjusted, thanks!

auto *UI = cast_or_null<Instruction>(ResultVPV->getUnderlyingValue());

AyalUnsubmitted

Done

Can this happen - continuing will lose a member of MinBWs - better assert instead?

Ayal: Can this happen - continuing will lose a member of MinBWs - better assert instead?

fhahnAuthorUnsubmitted

Done

Turned into assert, thanks!

fhahn: Turned into assert, thanks!

unsigned NewResSizeInBits = MinBWs.lookup(UI);

if (!NewResSizeInBits)

continue;

#ifndef NDEBUG

ProcessedRecipes++;

AyalUnsubmitted

Done

auto *Shrunk = new VPWidenCastRecipe(Instruction::Trunc, VPV, NewResTy);

- VPBasicBlock *PH = dyn_cast<VPBasicBlock>(

+ VPBasicBlock *PH = cast<VPBasicBlock>(

Plan.getVectorLoopRegion()->getSinglePredecessor());

Set once before the loop for all live-ins to be truncated.

Ayal: Set once before the loop for all live-ins to be truncated.

fhahnAuthorUnsubmitted

Done

hoisted, thanks!

fhahn: hoisted, thanks!

#endif

// If the value wasn't vectorized, we must maintain the original scalar

AyalUnsubmitted

Done

Just note that the counting of ProcessedRecipes may miss casts that fail to be processed later.

Ayal: Just note that the counting of ProcessedRecipes may miss casts that fail to be processed later.

fhahnAuthorUnsubmitted

Done

Do you mean updating the comment here or just a general note? We need to include the recipes in the count, otherwise the verification later will fail

fhahn: Do you mean updating the comment here or just a general note? We need to include the recipes in…

AyalUnsubmitted

Done

I mean we count casts as if they are processed, expecting they will be later, w/o checking that they actually do.

Ayal: I mean we count casts as if they are processed, expecting they will be later, w/o checking that…

fhahnAuthorUnsubmitted

Done

They don't need handling explicitly, as redundant casts will be removed later. Expanded the comment slightly to

Also skip casts which do not need to be handled explicitly here, as redundant casts will be removed during recipe simplification.

fhahn: They don't need handling explicitly, as redundant casts will be removed later. Expanded the…

// type. Skip those here, after incrementing ProcessedRecipes. Also skip

// casts, as redundant casts will be removed during recipe simplification.

if (isa<VPReplicateRecipe, VPWidenCastRecipe>(&R))

AyalUnsubmitted

Done

Can skip phi's, none are included in MinBWs.

Ayal: Can skip phi's, none are included in MinBWs.

fhahnAuthorUnsubmitted

Done

There's an early continue now that skips phis and other unsupported recipes.

fhahn: There's an early continue now that skips phis and other unsupported recipes.

continue;

AyalUnsubmitted

Done

Are any loads included in MinBWs, or is this dead code? Stores of course are irrelevant.

Ayal: Are any loads included in MinBWs, or is this dead code? Stores of course are irrelevant.

fhahnAuthorUnsubmitted

Done

Nope, looks like this is not needed in the latest version.

fhahn: Nope, looks like this is not needed in the latest version.

AyalUnsubmitted

Done

Does OldResSizeInBits equal to the size of OldResTy, for the non-cast Widen or Select R?

Ayal: Does `OldResSizeInBits` equal to the size of `OldResTy`, for the non-cast Widen or Select `R`?

fhahnAuthorUnsubmitted

Done

Yes, I forgot to remove this use of IR getType. Updated to use TypeInfo.inferScalarType(ResultVPV) and then getScalarSizeInBits of the returned type.

fhahn: Yes, I forgot to remove this use of IR `getType`. Updated to use ` TypeInfo.inferScalarType…

AyalUnsubmitted

Done

Ah, ok, wondered if using the size of the type of UI directly would be simpler?

Ayal: Ah, ok, wondered if using the size of the type of `UI` directly would be simpler?

fhahnAuthorUnsubmitted

Done

It might be slightly simpler, but would mean this may lead to a crash further down the line, once we support recipes without underlying values/instructions (and we forget to update this line) and/or if some other transform adjusted the type. Left as is for now

fhahn: It might be slightly simpler, but would mean this may lead to a crash further down the line…

unsigned OldResSizeInBits =

TypeInfo.inferScalarType(ResultVPV)->getScalarSizeInBits();

AyalUnsubmitted

Done

Any order other than depth first would also do, right?

Ayal: Any order other than depth first would also do, right?

fhahnAuthorUnsubmitted

Done

Yes, I think the order doesn't matter here.

fhahn: Yes, I think the order doesn't matter here.

AyalUnsubmitted

Done

But a (more) expensive RPOT order is needed, to handle defs before uses?

Ayal: But a (more) expensive RPOT order is needed, to handle defs before uses?

fhahnAuthorUnsubmitted

Done

The latest version should not need RPO, as the bit width of the results do not change for any user (previously they might due to early cast simplifications). Changed to depth first.

fhahn: The latest version should not need RPO, as the bit width of the results do not change for any…

Type *OldResTy = UI->getType();

assert(OldResTy->isIntegerTy() && "only integer types supported");

AyalUnsubmitted

Done

Should be the same Ctx passed in as parameter?

Ayal: Should be the same `Ctx` passed in as parameter?

fhahnAuthorUnsubmitted

Done

Yes, fixed!

fhahn: Yes, fixed!

assert(OldResSizeInBits > NewResSizeInBits && "Nothing to shrink?");

LLVMContext &Ctx = OldResTy->getContext();

auto *NewResTy = IntegerType::get(Ctx, NewResSizeInBits);

// Shrink operands by introducing truncates as needed.

unsigned StartIdx = isa<VPWidenSelectRecipe>(&R) ? 1 : 0;

for (unsigned Idx = StartIdx; Idx != R.getNumOperands(); ++Idx) {

AyalUnsubmitted

Done

Suffice to ask if (!NewResSizeInBits)?

Ayal: Suffice to ask `if (!NewResSizeInBits)`?

fhahnAuthorUnsubmitted

Done

Simplified, thanks!

fhahn: Simplified, thanks!

auto *Op = R.getOperand(Idx);

AyalUnsubmitted

Done

(Future) Thought: this is an awkward way of retrieving "the" recipe that corresponds to each member of MinBWs - look through all recipes for those having the desired "underlying" insn. Perhaps better lookup MinBWs upon construction of a recipe for an Instruction.
Or migrate the analysis that builds MinBWs to run on VPlan.

Ayal: (Future) Thought: this is an awkward way of retrieving "the" recipe that corresponds to each…

AyalUnsubmitted

Done

Thoughts about the above?

Ayal: Thoughts about the above?

fhahnAuthorUnsubmitted

Done

I think it would be best to have the analysis based on VPlan. Building MinBWs early would probably require extra work to update/invalidate it during transforms.

fhahn: I think it would be best to have the analysis based on VPlan. Building MinBWs early would…

unsigned OpSizeInBits =

AyalUnsubmitted

Done

nit: lookup.

Ayal: nit: lookup.

fhahnAuthorUnsubmitted

Done

Done, thanks!

fhahn: Done, thanks!

TypeInfo.inferScalarType(Op)->getScalarSizeInBits();

if (OpSizeInBits == NewResSizeInBits)

continue;

AyalUnsubmitted

Done

Ins? Perhaps ProcessedTrunc?

Ayal: `Ins`? Perhaps `ProcessedTrunc`?

fhahnAuthorUnsubmitted

Done

Updated, thanks!

fhahn: Updated, thanks!

assert(OpSizeInBits > NewResSizeInBits && "nothing to truncate");

AyalUnsubmitted

Done

Handle the simple if !ins.second /* Op already processed */ case first, potentially early-continuing?

Clearer to check if ProcessedTruncs.lookup(Op) or if ProcessedTruncs.contains(Op) and if so use ProcessedTruncs[Op], otherwise insert it?

Ayal: Handle the simple if !ins.second /* Op already processed */ case first, potentially early…

fhahnAuthorUnsubmitted

Done

Early continue would mean duplicating the code to update the operands, I left things for now as is, including using insert. insert means we only need to lookup the insert-pos once, vs 2 lookups with separate lookup and then `[]. WDYT?

fhahn: Early continue would mean duplicating the code to update the operands, I left things for now…

AyalUnsubmitted

Not Done

OK, WDYT of the something as follows:

        auto [ProcessedIter, DidNotExist] = ProcessedTruncs.insert({Op, nullptr});
        VPWidenCastRecipe *NewOp = DidNotExist ? new VPWidenCastRecipe(Instruction::Trunc, Op, NewResTy)
                                               : ProcessedIter->second;
        R.setOperand(Idx, NewOp);
        if (!DidNotExist)
          continue;
        ProcessedIter->second = NewOp;
        if (!Op->isLiveIn()) {
          Shrunk->insertBefore(&R);
        } else {
          PH->appendRecipe(Shrunk);
#ifndef NDEBUG
          auto *OpInst = dyn_cast<Instruction>(Op->getLiveInIRValue());
          bool IsContained = MinBWs.contains(OpInst);
          assert((!OpInst || IsContained) &&
                 "All processed instructions should be contained in MinBWs.");
          NumProcessedRecipes += IsContained;
#endif
        }

Ayal: OK, WDYT of the something as follows: ``` auto [ProcessedIter, DidNotExist] =…

AyalUnsubmitted

Not Done

Maybe IterIsEmpty would be a better name, to avoid double negation, as in:

        auto [ProcessedIter, IterIsEmpty] = ProcessedTruncs.insert({Op, nullptr});
        VPWidenCastRecipe *NewOp = IterIsEmpty ? new VPWidenCastRecipe(Instruction::Trunc, Op, NewResTy)
                                               : ProcessedIter->second;
        R.setOperand(Idx, NewOp);
        if (!IterIsEmpty)
          continue;
        ProcessedIter->second = NewOp;
        if (!Op->isLiveIn()) {
          NewOp->insertBefore(&R);
        } else {
          PH->appendRecipe(NewOp);
#ifndef NDEBUG
          auto *OpInst = dyn_cast<Instruction>(Op->getLiveInIRValue());
          bool IsContained = MinBWs.contains(OpInst);
          assert((!OpInst || IsContained) &&
                 "All processed instructions should be contained in MinBWs.");
          NumProcessedRecipes += IsContained;
#endif
        }

Ayal: Maybe `IterIsEmpty` would be a better name, to avoid double negation, as in: ``` auto…

auto Ins = ProcessedTruncs.insert({Op, nullptr});

AyalUnsubmitted

Done

Would be good to comment how memory and replicate cases are (not) processed.

Ayal: Would be good to comment how memory and replicate cases are (not) processed.

fhahnAuthorUnsubmitted

Done

Added a comment, thanks!

fhahn: Added a comment, thanks!

if (Ins.second) {

auto Shrunk = new VPWidenCastRecipe(Instruction::Trunc, Op, NewResTy);

AyalUnsubmitted

Done

Should replicate recipes be handled next to handling widen memory recipes above?

Ayal: Should replicate recipes be handled next to handling widen memory recipes above?

fhahnAuthorUnsubmitted

Done

We still need to count them for verification

fhahn: We still need to count them for verification

AyalUnsubmitted

Done

nit: place simpler if !isLiveIn case first?

Ayal: nit: place simpler if !isLiveIn case first?

fhahnAuthorUnsubmitted

Done

Done, thanks!

fhahn: Done, thanks!

Ins.first->second = Shrunk;

if (Op->isLiveIn()) {

#ifndef NDEBUG

ProcessedRecipes +=

AyalUnsubmitted

Done

Better assert than continue? Here ProcessedRecipes was already bumped, but should all MinBWs members correspond to Integer types, of distinct (smaller) size, whether live-in or not?

Ayal: Better assert than continue? Here ProcessedRecipes was already bumped, but should all MinBWs…

fhahnAuthorUnsubmitted

Done

Turned isIntegerTy into assert but retained size check as there entries where the sizes are the same (e.g. for truncs).

fhahn: Turned `isIntegerTy` into assert but retained size check as there entries where the sizes are…

AyalUnsubmitted

Done

nit: ResTy >> OldResTy, ResSizeInBits >> OldResSizeInBits

Ayal: nit: `ResTy` >> `OldResTy`, `ResSizeInBits` >> `OldResSizeInBits`

fhahnAuthorUnsubmitted

Done

Renamed, thanks!

fhahn: Renamed, thanks!

MinBWs.contains(dyn_cast<Instruction>(Op->getLiveInIRValue()));

AyalUnsubmitted

Done

Is it possible for MinBWs not to contain Op's live-in IR value in this case?

Ayal: Is it possible for MinBWs not to contain Op's live-in IR value in this case?

fhahnAuthorUnsubmitted

Done

Yes, MinBWs only contains instructions, but not other values like arguments. Added a clarifying assert.

fhahn: Yes, MinBWs only contains instructions, but not other values like arguments. Added a clarifying…

AyalUnsubmitted

Done

#ifndef NDEBUG

- bool IsContained =

- MinBWs.contains(dyn_cast<Instruction>(Op->getLiveInIRValue()));

+ auto *OpInst = dyn_cast<Instruction>(Op->getLiveInIRValue());

+ bool IsContained = MinBWs.contains(OpInst);

+ assert((!OpInst || IsContained) && "...");

ProcessedRecipes += IsContained;

- assert((IsContained || !isa<Instruction>(Op->getLiveInIRValue())) &&

"All processed instructions should be contained in MinBWs.");

nit

Ayal: nit

#endif

PH->appendRecipe(Shrunk);

AyalUnsubmitted

Done

assert(ResSizeInBits > NewResSizeInBits && "Nothing to shrink?"); here instead of below?

Ayal: `assert(ResSizeInBits > NewResSizeInBits && "Nothing to shrink?");` here instead of below?

fhahnAuthorUnsubmitted

Done

Done, and also removed continue

fhahn: Done, and also removed continue

} else {

Shrunk->insertBefore(&R);

}

AyalUnsubmitted

Done

Note that truncations of live-ins could also be inserted before R, thereby leaving the treatment of live-ins to debugging only, and leaving their LICM and commoning to a subsequent VPlan cleanup pass, along with trunc-zext foldings.

Ayal: Note that truncations of live-ins could also be inserted before R, thereby leaving the…

fhahnAuthorUnsubmitted

Done

Yep, for now it is simpler and results in a smaller test diff to do it directly there as it is not only LICM but also very simple CSE

fhahn: Yep, for now it is simpler and results in a smaller test diff to do it directly there as it is…

}

R.setOperand(Idx, Ins.first->second);

}

AyalUnsubmitted

Done

nit: VPC >> OldExt, Opc >> OldOpc?

Ayal: nit: `VPC` >> `OldExt`, `Opc` >> `OldOpc`?

fhahnAuthorUnsubmitted

Done

This code is now gone, handled by recipe simplification.

fhahn: This code is now gone, handled by recipe simplification.

AyalUnsubmitted

Done

This deals only with ZExt/SExt, easier to check directly if Opcode is one or the other?

OTOH, better handle Trunc here as well? Is it handled well below?

Ayal: This deals only with ZExt/SExt, easier to check directly if Opcode is one or the other? OTOH…

fhahnAuthorUnsubmitted

Done

Thanks, changed to if. I don't think Trunc is handled explicitly in the latest version.

fhahn: Thanks, changed to `if`. I don't think Trunc is handled explicitly in the latest version.

AyalUnsubmitted

Not Done

Does Trunc (which can truncate to a smaller bitwidth) implicitly fall through and has its operand shrunk to the smaller bitwidth, effectively turning it into a ZExt?

Ayal: Does Trunc (which can truncate to a smaller bitwidth) implicitly fall through and has its…

// Any wrapping introduced by shrinking this operation shouldn't be

// considered undefined behavior. So, we can't unconditionally copy

// arithmetic wrapping flags to VPW.

AyalUnsubmitted

Done

Comment is obsolete here - dealt with new type being equal to operand type, which should result in replacing the SExt/ZExt with its operand, see below.

Ayal: Comment is obsolete here - dealt with new type being equal to operand type, which should result…

fhahnAuthorUnsubmitted

Done

Code is gone now

fhahn: Code is gone now

if (auto *VPW = dyn_cast<VPRecipeWithIRFlags>(&R))

AyalUnsubmitted

Done

// SExt/Zext is redundant - stick with its operand.
?

Ayal: `// SExt/Zext is redundant - stick with its operand.` ?

fhahnAuthorUnsubmitted

Done

this check has been moved up and is not needed any longer.

fhahn: this check has been moved up and is not needed any longer.

AyalUnsubmitted

Done

// SExt/Zext is redundant - stick with its operand.

- Instruction::CastOps Opcode = VPC->getOpcode();

+ Instruction::CastOps NewOpc = Opc;

VPValue *Op = R.getOperand(0);

Ayal: ?

fhahnAuthorUnsubmitted

Done

Code now gone.

fhahn: Code now gone.

VPW->dropPoisonGeneratingFlags();

// Extend result to original width.

AyalUnsubmitted

Not Done

#endif

}

- R.setOperand(Idx, ProcessedIter->second);

}

// Any wrapping introduced by shrinking this operation shouldn't be

redundant - hoist above the early-continue.

Ayal: redundant - hoist above the early-continue.

fhahnAuthorUnsubmitted

Done

Fixed in the committed version, thanks!

fhahn: Fixed in the committed version, thanks!

auto *Ext = new VPWidenCastRecipe(Instruction::ZExt, ResultVPV, OldResTy);

AyalUnsubmitted

Done

nit: C >> NewCast?

If getTypeSizeInBits(Op) == NewResSizeInBits should C be set to Op (w/o inserting it) instead of creating a redundant cast?

Ayal: nit: `C` >> `NewCast`? If getTypeSizeInBits(Op) == NewResSizeInBits should C be set to Op (w/o…

fhahnAuthorUnsubmitted

Done

Code gone now.

fhahn: Code gone now.

Ext->insertAfter(&R);

ResultVPV->replaceAllUsesWith(Ext);

Ext->setOperand(0, ResultVPV);

AyalUnsubmitted

Done

Place assert earlier?

Ayal: Place assert earlier?

fhahnAuthorUnsubmitted

Done

moved up,, thanks!

fhahn: moved up,, thanks!

}

AyalUnsubmitted

Done

auto *C = new VPWidenCastRecipe(Opcode, Op, NewResTy);

- C->insertBefore(&R);

- ResultVPV->replaceAllUsesWith(C);

+ C->insertBefore(&VPC);

+ VPC->replaceAllUsesWith(C);

continue;

Ayal:

fhahnAuthorUnsubmitted

Done

adjusted, thanks!

fhahn: adjusted, thanks!

assert(MinBWs.size() == ProcessedRecipes &&

"some entries in MinBWs haven't been processed");

}

void VPlanTransforms::optimize(

VPlan &Plan, ScalarEvolution &SE,

const MapVector<Instruction *, uint64_t> &MinBWs) {

removeRedundantCanonicalIVs(Plan); removeRedundantCanonicalIVs(Plan);

removeRedundantInductionCasts(Plan); removeRedundantInductionCasts(Plan);

optimizeInductions(Plan, SE); optimizeInductions(Plan, SE);

VPTypeAnalysis TypeInfo(SE.getContext());

AyalUnsubmitted

Done

This means the size of all operands is equal to NewResSizeInBits, can this be?

Ayal: This means the size of all operands is equal to NewResSizeInBits, can this be?

fhahnAuthorUnsubmitted

Done

There are cases where a Zext narrowed earlier is used as operand here, so the tie is already adjusted.

fhahn: There are cases where a Zext narrowed earlier is used as operand here, so the tie is already…

AyalUnsubmitted

Not Done

Maybe worth a comment.

Ayal: Maybe worth a comment.

if (!Plan.hasVF(ElementCount::getFixed(1)))

truncateToMinimalBitwidths(Plan, MinBWs, TypeInfo);

AyalUnsubmitted

Done

optimizeInductions(Plan, SE);

- VPTypeAnalysis TypeInfo(SE.getContext());

- if (!Plan.hasVF(ElementCount::getFixed(1)))

+ if (!Plan.hasVF(ElementCount::getFixed(1))) {

+ VPTypeAnalysis TypeInfo(SE.getContext());

truncateToMinimalBitwidths(Plan, MinBWs, TypeInfo);

+ }

simplifyRecipes(Plan, SE.getContext());

nit

Ayal: nit

fhahnAuthorUnsubmitted

Done

Done thanks! This also limits the scope of TypeInfo to the range where it is valid. after `truncateToMinimalBitwidths, we would need to invalidate the info for the modified recipes otherwise. This can be done in the future.

fhahn: Done thanks! This also limits the scope of TypeInfo to the range where it is valid. after…

AyalUnsubmitted

Done

Very well. Worth commenting that TypeInfo should not be used following truncateToMinimalBitwidths.

Ayal: Very well. Worth commenting that `TypeInfo` should not be used following…

fhahnAuthorUnsubmitted

Done

Sunk further into truncateToMinimualBitwidths

fhahn: Sunk further into truncateToMinimualBitwidths

simplifyRecipes(Plan, SE.getContext()); simplifyRecipes(Plan, SE.getContext());

AyalUnsubmitted

Done

removeRedundantCanonicalIVs(Plan);

removeRedundantInductionCasts(Plan);

- optimizeInductions(Plan, SE);

+ optimizeInductions(Plan, SE);

simplifyRecipes(Plan, SE.getContext());

nit: redundant move of empty line?

Ayal: nit: redundant move of empty line?

fhahnAuthorUnsubmitted

Done

changed back, thanks!

fhahn: changed back, thanks!

removeDeadRecipes(Plan); removeDeadRecipes(Plan);

AyalUnsubmitted

Done

auto *Shrunk = new VPWidenCastRecipe(Instruction::Trunc, Op, NewResTy);

- R.setOperand(Idx, Shrunk);

Shrunk->insertBefore(&R);

+ R.setOperand(Idx, Shrunk);

}

if (auto *VPW = dyn_cast<VPRecipeWithIRFlags>(&R))

nit: keep consistent with above.

Ayal: nit: keep consistent with above.

fhahnAuthorUnsubmitted

Done

Adjusted, thanks!

fhahn: Adjusted, thanks!

createAndOptimizeReplicateRegions(Plan); createAndOptimizeReplicateRegions(Plan);

removeRedundantExpandSCEVRecipes(Plan); removeRedundantExpandSCEVRecipes(Plan);

mergeBlocksIntoPredecessors(Plan); mergeBlocksIntoPredecessors(Plan);

} }

// Add a VPActiveLaneMaskPHIRecipe and related recipes to \p Plan and replace // Add a VPActiveLaneMaskPHIRecipe and related recipes to \p Plan and replace

// the loop terminator with a branch-on-cond recipe with the negated // the loop terminator with a branch-on-cond recipe with the negated

// active-lane-mask as operand. Note that this turns the loop into an // active-lane-mask as operand. Note that this turns the loop into an

AyalUnsubmitted

Done

auto *Ext = new VPWidenCastRecipe(Instruction::ZExt, ResultVPV, ResTy);

- ResultVPV->replaceAllUsesWith(Ext);

- Ext->setOperand(0, ResultVPV);

Ext->insertAfter(&R);

+ Ext->setOperand(0, ResultVPV);

+ ResultVPV->replaceAllUsesWith(Ext);

}

nit: keep consistent with above.

Ayal: nit: keep consistent with above.

fhahnAuthorUnsubmitted

Done

reordered, thanks!

fhahn: reordered, thanks!

// uncountable one. Only the existing terminator is replaced, all other existing // uncountable one. Only the existing terminator is replaced, all other existing

// recipes/users remain unchanged, except for poison-generating flags being // recipes/users remain unchanged, except for poison-generating flags being

// dropped from the canonical IV increment. Return the created // dropped from the canonical IV increment. Return the created

// VPActiveLaneMaskPHIRecipe. // VPActiveLaneMaskPHIRecipe.

// //

// The function uses the following definitions: // The function uses the following definitions:

// //

// %TripCount = DataWithControlFlowWithoutRuntimeCheck ? // %TripCount = DataWithControlFlowWithoutRuntimeCheck ?

▲ Show 20 Lines • Show All 137 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll

Show All 32 Lines
; CHECK-NEXT: [[TMP3:%.*]] = zext <16 x i8> [[WIDE_LOAD2]] to <16 x i16>		; CHECK-NEXT: [[TMP3:%.*]] = zext <16 x i8> [[WIDE_LOAD2]] to <16 x i16>
; CHECK-NEXT: [[TMP4:%.*]] = zext <16 x i8> [[WIDE_LOAD]] to <16 x i16>		; CHECK-NEXT: [[TMP4:%.*]] = zext <16 x i8> [[WIDE_LOAD]] to <16 x i16>
; CHECK-NEXT: [[TMP5:%.*]] = mul nuw <16 x i16> [[TMP3]], [[TMP4]]		; CHECK-NEXT: [[TMP5:%.*]] = mul nuw <16 x i16> [[TMP3]], [[TMP4]]
; CHECK-NEXT: [[TMP6:%.*]] = lshr <16 x i16> [[TMP5]], <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>		; CHECK-NEXT: [[TMP6:%.*]] = lshr <16 x i16> [[TMP5]], <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>
; CHECK-NEXT: [[TMP7:%.*]] = trunc <16 x i16> [[TMP6]] to <16 x i8>		; CHECK-NEXT: [[TMP7:%.*]] = trunc <16 x i16> [[TMP6]] to <16 x i8>
; CHECK-NEXT: store <16 x i8> [[TMP7]], ptr [[TMP2]], align 1		; CHECK-NEXT: store <16 x i8> [[TMP7]], ptr [[TMP2]], align 1
; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i8, ptr [[B]], i64 [[INDEX]]		; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i8, ptr [[B]], i64 [[INDEX]]
; CHECK-NEXT: [[WIDE_LOAD3:%.*]] = load <16 x i8>, ptr [[TMP8]], align 1		; CHECK-NEXT: [[WIDE_LOAD3:%.*]] = load <16 x i8>, ptr [[TMP8]], align 1
; CHECK-NEXT: [[TMP9:%.*]] = zext <16 x i8> [[WIDE_LOAD3]] to <16 x i16>		; CHECK-NEXT: [[TMP9:%.*]] = zext <16 x i8> [[WIDE_LOAD3]] to <16 x i16>
; CHECK-NEXT: [[TMP10:%.*]] = zext <16 x i8> [[WIDE_LOAD]] to <16 x i16>		; CHECK-NEXT: [[TMP10:%.*]] = mul nuw <16 x i16> [[TMP9]], [[TMP4]]
		AyalUnsubmitted Not Done Reply Inline Actions hmm, we now spot the redundant duplicate zext of WIDE_LOAD from <16 x i8> to <16 x i16>, originally both TMP4 and TMP10. Ayal: hmm, we now spot the redundant duplicate zext of WIDE_LOAD from <16 x i8> to <16 x i16>…
; CHECK-NEXT: [[TMP11:%.*]] = mul nuw <16 x i16> [[TMP9]], [[TMP10]]		; CHECK-NEXT: [[TMP11:%.*]] = lshr <16 x i16> [[TMP10]], <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>
; CHECK-NEXT: [[TMP12:%.*]] = lshr <16 x i16> [[TMP11]], <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>		; CHECK-NEXT: [[TMP12:%.*]] = trunc <16 x i16> [[TMP11]] to <16 x i8>
; CHECK-NEXT: [[TMP13:%.*]] = trunc <16 x i16> [[TMP12]] to <16 x i8>		; CHECK-NEXT: store <16 x i8> [[TMP12]], ptr [[TMP8]], align 1
; CHECK-NEXT: store <16 x i8> [[TMP13]], ptr [[TMP8]], align 1
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16		; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
; CHECK-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]		; CHECK-NEXT: [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
; CHECK-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]		; CHECK-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
; CHECK: middle.block:		; CHECK: middle.block:
; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[TMP0]]		; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[TMP0]]
; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.]], label [[VEC_EPILOG_ITER_CHECK:%.]]		; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.]], label [[VEC_EPILOG_ITER_CHECK:%.]]
; CHECK: vec.epilog.iter.check:		; CHECK: vec.epilog.iter.check:
; CHECK-NEXT: [[N_VEC_REMAINING:%.*]] = and i64 [[TMP0]], 8		; CHECK-NEXT: [[N_VEC_REMAINING:%.*]] = and i64 [[TMP0]], 8
; CHECK-NEXT: [[MIN_EPILOG_ITERS_CHECK_NOT_NOT:%.*]] = icmp eq i64 [[N_VEC_REMAINING]], 0		; CHECK-NEXT: [[MIN_EPILOG_ITERS_CHECK_NOT_NOT:%.*]] = icmp eq i64 [[N_VEC_REMAINING]], 0
; CHECK-NEXT: br i1 [[MIN_EPILOG_ITERS_CHECK_NOT_NOT]], label [[VEC_EPILOG_SCALAR_PH]], label [[VEC_EPILOG_PH]]		; CHECK-NEXT: br i1 [[MIN_EPILOG_ITERS_CHECK_NOT_NOT]], label [[VEC_EPILOG_SCALAR_PH]], label [[VEC_EPILOG_PH]]
; CHECK: vec.epilog.ph:		; CHECK: vec.epilog.ph:
; CHECK-NEXT: [[VEC_EPILOG_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[VECTOR_MAIN_LOOP_ITER_CHECK]] ]		; CHECK-NEXT: [[VEC_EPILOG_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[VECTOR_MAIN_LOOP_ITER_CHECK]] ]
; CHECK-NEXT: [[N_VEC5:%.*]] = and i64 [[TMP0]], 4294967288		; CHECK-NEXT: [[N_VEC5:%.*]] = and i64 [[TMP0]], 4294967288
; CHECK-NEXT: br label [[VEC_EPILOG_VECTOR_BODY:%.*]]		; CHECK-NEXT: br label [[VEC_EPILOG_VECTOR_BODY:%.*]]
; CHECK: vec.epilog.vector.body:		; CHECK: vec.epilog.vector.body:
; CHECK-NEXT: [[INDEX7:%.]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], [[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT11:%.]], [[VEC_EPILOG_VECTOR_BODY]] ]		; CHECK-NEXT: [[INDEX7:%.]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], [[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT11:%.]], [[VEC_EPILOG_VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP15:%.*]] = getelementptr inbounds i8, ptr [[C]], i64 [[INDEX7]]		; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds i8, ptr [[C]], i64 [[INDEX7]]
; CHECK-NEXT: [[WIDE_LOAD8:%.*]] = load <8 x i8>, ptr [[TMP15]], align 1		; CHECK-NEXT: [[WIDE_LOAD8:%.*]] = load <8 x i8>, ptr [[TMP14]], align 1
; CHECK-NEXT: [[TMP16:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 [[INDEX7]]		; CHECK-NEXT: [[TMP15:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 [[INDEX7]]
; CHECK-NEXT: [[WIDE_LOAD9:%.*]] = load <8 x i8>, ptr [[TMP16]], align 1		; CHECK-NEXT: [[WIDE_LOAD9:%.*]] = load <8 x i8>, ptr [[TMP15]], align 1
; CHECK-NEXT: [[TMP17:%.*]] = zext <8 x i8> [[WIDE_LOAD9]] to <8 x i16>		; CHECK-NEXT: [[TMP16:%.*]] = zext <8 x i8> [[WIDE_LOAD9]] to <8 x i16>
; CHECK-NEXT: [[TMP18:%.*]] = zext <8 x i8> [[WIDE_LOAD8]] to <8 x i16>		; CHECK-NEXT: [[TMP17:%.*]] = zext <8 x i8> [[WIDE_LOAD8]] to <8 x i16>
; CHECK-NEXT: [[TMP19:%.*]] = mul nuw <8 x i16> [[TMP17]], [[TMP18]]		; CHECK-NEXT: [[TMP18:%.*]] = mul nuw <8 x i16> [[TMP16]], [[TMP17]]
		AyalUnsubmitted Not Done Reply Inline Actions Spotted and removed duplicate zext of WIDE_LOAD8. Ayal: Spotted and removed duplicate zext of WIDE_LOAD8.
; CHECK-NEXT: [[TMP20:%.*]] = lshr <8 x i16> [[TMP19]], <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>		; CHECK-NEXT: [[TMP19:%.*]] = lshr <8 x i16> [[TMP18]], <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>
; CHECK-NEXT: [[TMP21:%.*]] = trunc <8 x i16> [[TMP20]] to <8 x i8>		; CHECK-NEXT: [[TMP20:%.*]] = trunc <8 x i16> [[TMP19]] to <8 x i8>
; CHECK-NEXT: store <8 x i8> [[TMP21]], ptr [[TMP16]], align 1		; CHECK-NEXT: store <8 x i8> [[TMP20]], ptr [[TMP15]], align 1
; CHECK-NEXT: [[TMP22:%.*]] = getelementptr inbounds i8, ptr [[B]], i64 [[INDEX7]]		; CHECK-NEXT: [[TMP21:%.*]] = getelementptr inbounds i8, ptr [[B]], i64 [[INDEX7]]
; CHECK-NEXT: [[WIDE_LOAD10:%.*]] = load <8 x i8>, ptr [[TMP22]], align 1		; CHECK-NEXT: [[WIDE_LOAD10:%.*]] = load <8 x i8>, ptr [[TMP21]], align 1
; CHECK-NEXT: [[TMP23:%.*]] = zext <8 x i8> [[WIDE_LOAD10]] to <8 x i16>		; CHECK-NEXT: [[TMP22:%.*]] = zext <8 x i8> [[WIDE_LOAD10]] to <8 x i16>
; CHECK-NEXT: [[TMP24:%.*]] = zext <8 x i8> [[WIDE_LOAD8]] to <8 x i16>		; CHECK-NEXT: [[TMP23:%.*]] = mul nuw <8 x i16> [[TMP22]], [[TMP17]]
; CHECK-NEXT: [[TMP25:%.*]] = mul nuw <8 x i16> [[TMP23]], [[TMP24]]		; CHECK-NEXT: [[TMP24:%.*]] = lshr <8 x i16> [[TMP23]], <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>
; CHECK-NEXT: [[TMP26:%.*]] = lshr <8 x i16> [[TMP25]], <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>		; CHECK-NEXT: [[TMP25:%.*]] = trunc <8 x i16> [[TMP24]] to <8 x i8>
; CHECK-NEXT: [[TMP27:%.*]] = trunc <8 x i16> [[TMP26]] to <8 x i8>		; CHECK-NEXT: store <8 x i8> [[TMP25]], ptr [[TMP21]], align 1
; CHECK-NEXT: store <8 x i8> [[TMP27]], ptr [[TMP22]], align 1
; CHECK-NEXT: [[INDEX_NEXT11]] = add nuw i64 [[INDEX7]], 8		; CHECK-NEXT: [[INDEX_NEXT11]] = add nuw i64 [[INDEX7]], 8
; CHECK-NEXT: [[TMP28:%.*]] = icmp eq i64 [[INDEX_NEXT11]], [[N_VEC5]]		; CHECK-NEXT: [[TMP26:%.*]] = icmp eq i64 [[INDEX_NEXT11]], [[N_VEC5]]
; CHECK-NEXT: br i1 [[TMP28]], label [[VEC_EPILOG_MIDDLE_BLOCK:%.*]], label [[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]		; CHECK-NEXT: br i1 [[TMP26]], label [[VEC_EPILOG_MIDDLE_BLOCK:%.*]], label [[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
; CHECK: vec.epilog.middle.block:		; CHECK: vec.epilog.middle.block:
; CHECK-NEXT: [[CMP_N6:%.*]] = icmp eq i64 [[N_VEC5]], [[TMP0]]		; CHECK-NEXT: [[CMP_N6:%.*]] = icmp eq i64 [[N_VEC5]], [[TMP0]]
; CHECK-NEXT: br i1 [[CMP_N6]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[VEC_EPILOG_SCALAR_PH]]		; CHECK-NEXT: br i1 [[CMP_N6]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[VEC_EPILOG_SCALAR_PH]]
; CHECK: vec.epilog.scalar.ph:		; CHECK: vec.epilog.scalar.ph:
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC5]], [[VEC_EPILOG_MIDDLE_BLOCK]] ], [ [[N_VEC]], [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[ITER_CHECK]] ]		; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC5]], [[VEC_EPILOG_MIDDLE_BLOCK]] ], [ [[N_VEC]], [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[ITER_CHECK]] ]
; CHECK-NEXT: br label [[FOR_BODY:%.*]]		; CHECK-NEXT: br label [[FOR_BODY:%.*]]
; CHECK: for.cond.cleanup.loopexit:		; CHECK: for.cond.cleanup.loopexit:
; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]		; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]
; CHECK: for.cond.cleanup:		; CHECK: for.cond.cleanup:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
; CHECK: for.body:		; CHECK: for.body:
; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[VEC_EPILOG_SCALAR_PH]] ]		; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[VEC_EPILOG_SCALAR_PH]] ]
; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i8, ptr [[C]], i64 [[INDVARS_IV]]		; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i8, ptr [[C]], i64 [[INDVARS_IV]]
; CHECK-NEXT: [[TMP29:%.*]] = load i8, ptr [[ARRAYIDX]], align 1		; CHECK-NEXT: [[TMP27:%.*]] = load i8, ptr [[ARRAYIDX]], align 1
; CHECK-NEXT: [[CONV:%.*]] = zext i8 [[TMP29]] to i32		; CHECK-NEXT: [[CONV:%.*]] = zext i8 [[TMP27]] to i32
; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 [[INDVARS_IV]]		; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 [[INDVARS_IV]]
; CHECK-NEXT: [[TMP30:%.*]] = load i8, ptr [[ARRAYIDX2]], align 1		; CHECK-NEXT: [[TMP28:%.*]] = load i8, ptr [[ARRAYIDX2]], align 1
; CHECK-NEXT: [[CONV3:%.*]] = zext i8 [[TMP30]] to i32		; CHECK-NEXT: [[CONV3:%.*]] = zext i8 [[TMP28]] to i32
; CHECK-NEXT: [[MUL:%.*]] = mul nuw nsw i32 [[CONV3]], [[CONV]]		; CHECK-NEXT: [[MUL:%.*]] = mul nuw nsw i32 [[CONV3]], [[CONV]]
; CHECK-NEXT: [[SHR_26:%.*]] = lshr i32 [[MUL]], 8		; CHECK-NEXT: [[SHR_26:%.*]] = lshr i32 [[MUL]], 8
; CHECK-NEXT: [[CONV4:%.*]] = trunc i32 [[SHR_26]] to i8		; CHECK-NEXT: [[CONV4:%.*]] = trunc i32 [[SHR_26]] to i8
; CHECK-NEXT: store i8 [[CONV4]], ptr [[ARRAYIDX2]], align 1		; CHECK-NEXT: store i8 [[CONV4]], ptr [[ARRAYIDX2]], align 1
; CHECK-NEXT: [[ARRAYIDX8:%.*]] = getelementptr inbounds i8, ptr [[B]], i64 [[INDVARS_IV]]		; CHECK-NEXT: [[ARRAYIDX8:%.*]] = getelementptr inbounds i8, ptr [[B]], i64 [[INDVARS_IV]]
; CHECK-NEXT: [[TMP31:%.*]] = load i8, ptr [[ARRAYIDX8]], align 1		; CHECK-NEXT: [[TMP29:%.*]] = load i8, ptr [[ARRAYIDX8]], align 1
; CHECK-NEXT: [[CONV9:%.*]] = zext i8 [[TMP31]] to i32		; CHECK-NEXT: [[CONV9:%.*]] = zext i8 [[TMP29]] to i32
; CHECK-NEXT: [[MUL10:%.*]] = mul nuw nsw i32 [[CONV9]], [[CONV]]		; CHECK-NEXT: [[MUL10:%.*]] = mul nuw nsw i32 [[CONV9]], [[CONV]]
; CHECK-NEXT: [[SHR11_27:%.*]] = lshr i32 [[MUL10]], 8		; CHECK-NEXT: [[SHR11_27:%.*]] = lshr i32 [[MUL10]], 8
; CHECK-NEXT: [[CONV12:%.*]] = trunc i32 [[SHR11_27]] to i8		; CHECK-NEXT: [[CONV12:%.*]] = trunc i32 [[SHR11_27]] to i8
; CHECK-NEXT: store i8 [[CONV12]], ptr [[ARRAYIDX8]], align 1		; CHECK-NEXT: store i8 [[CONV12]], ptr [[ARRAYIDX8]], align 1
; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1		; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32		; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], [[N]]		; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], [[N]]
; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]		; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
Show All 36 Lines	for.body: ; preds = %for.body.preheader, %for.body
br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body		br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body
}		}


define void @test_shrink_zext_in_preheader(ptr noalias %src, ptr noalias %dst, i32 %A, i16 %B) {		define void @test_shrink_zext_in_preheader(ptr noalias %src, ptr noalias %dst, i32 %A, i16 %B) {
; CHECK-LABEL: define void @test_shrink_zext_in_preheader		; CHECK-LABEL: define void @test_shrink_zext_in_preheader
; CHECK-SAME: (ptr noalias [[SRC:%.]], ptr noalias [[DST:%.]], i32 [[A:%.]], i16 [[B:%.]]) {		; CHECK-SAME: (ptr noalias [[SRC:%.]], ptr noalias [[DST:%.]], i32 [[A:%.]], i16 [[B:%.]]) {
; CHECK-NEXT: iter.check:		; CHECK-NEXT: iter.check:
		; CHECK-NEXT: [[CONV10:%.*]] = zext i16 [[B]] to i32
		AyalUnsubmitted Done Reply Inline Actions This testcase stores the 2nd least significant byte of a 32b product (of two invariant values, one 16b and the other 32b) checking that computing 16b product suffices. But more optimizations should take place: the expansion of the multipliers to 32b should be eliminated (along with their truncation to 16b), and the invariant multiplication-lshr-trunc sequence should be hoisted out of the loop. Ayal: This testcase stores the 2nd least significant byte of a 32b product (of two invariant values…
		fhahnAuthorUnsubmitted Done Reply Inline Actions still more work to do :) Arguably the invariant instructions are artificial, in the regular pipeline, no invariant instructions should remain. fhahn: still more work to do :) Arguably the invariant instructions are artificial, in the regular…
; CHECK-NEXT: br i1 false, label [[VEC_EPILOG_SCALAR_PH:%.]], label [[VECTOR_MAIN_LOOP_ITER_CHECK:%.]]		; CHECK-NEXT: br i1 false, label [[VEC_EPILOG_SCALAR_PH:%.]], label [[VECTOR_MAIN_LOOP_ITER_CHECK:%.]]
; CHECK: vector.main.loop.iter.check:		; CHECK: vector.main.loop.iter.check:
; CHECK-NEXT: br i1 false, label [[VEC_EPILOG_PH:%.]], label [[VECTOR_PH:%.]]		; CHECK-NEXT: br i1 false, label [[VEC_EPILOG_PH:%.]], label [[VECTOR_PH:%.]]
; CHECK: vector.ph:		; CHECK: vector.ph:
; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <16 x i32> poison, i32 [[A]], i64 0		; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <16 x i32> poison, i32 [[A]], i64 0
; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <16 x i32> [[BROADCAST_SPLATINSERT]], <16 x i32> poison, <16 x i32> zeroinitializer		; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <16 x i32> [[BROADCAST_SPLATINSERT]], <16 x i32> poison, <16 x i32> zeroinitializer
; CHECK-NEXT: [[TMP0:%.*]] = insertelement <16 x i16> undef, i16 [[B]], i64 0		; CHECK-NEXT: [[TMP0:%.*]] = trunc <16 x i32> [[BROADCAST_SPLAT]] to <16 x i16>
; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <16 x i16> [[TMP0]], <16 x i16> poison, <16 x i32> zeroinitializer		; CHECK-NEXT: [[TMP1:%.*]] = trunc <16 x i32> [[BROADCAST_SPLAT]] to <16 x i16>
		AyalUnsubmitted Done Reply Inline Actions BROADCAST_SPLAT is (still) trunc'ed twice due to UF=2? Ayal: BROADCAST_SPLAT is (still) trunc'ed twice due to UF=2?
		fhahnAuthorUnsubmitted Done Reply Inline Actions The latest version avoids truncating the same value twice. fhahn: The latest version avoids truncating the same value twice.
		AyalUnsubmitted Not Done Reply Inline Actions Duplicated TMP0 and TMP1 still here? Ayal: Duplicated TMP0 and TMP1 still here?
		fhahnAuthorUnsubmitted Done Reply Inline Actions They were due to redundant casts being added for Live-in values, fixed by checking in VPWidenCastRecipe::execute for now, with a FIXME to address this with explicit unrolling. fhahn: They were due to redundant casts being added for Live-in values, fixed by checking in…
		; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <16 x i32> poison, i32 [[CONV10]], i64 0
		AyalUnsubmitted Done Reply Inline Actions Both insertelement's now use poison. Ayal: Both insertelement's now use poison.
		fhahnAuthorUnsubmitted Done Reply Inline Actions I think the use of undef is a leftover that wasn't updated; it should be poison. fhahn: I think the use of undef is a leftover that wasn't updated; it should be poison.
		; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <16 x i32> [[BROADCAST_SPLATINSERT1]], <16 x i32> poison, <16 x i32> zeroinitializer
		; CHECK-NEXT: [[TMP2:%.*]] = trunc <16 x i32> [[BROADCAST_SPLAT2]] to <16 x i16>
		; CHECK-NEXT: [[TMP3:%.*]] = trunc <16 x i32> [[BROADCAST_SPLAT2]] to <16 x i16>
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]		; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]		; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP1:%.*]] = trunc <16 x i32> [[BROADCAST_SPLAT]] to <16 x i16>		; CHECK-NEXT: [[TMP4:%.*]] = mul <16 x i16> [[TMP0]], [[TMP2]]
; CHECK-NEXT: [[TMP2:%.*]] = mul <16 x i16> [[BROADCAST_SPLAT2]], [[TMP1]]		; CHECK-NEXT: [[TMP5:%.*]] = mul <16 x i16> [[TMP1]], [[TMP3]]
		AyalUnsubmitted Done Reply Inline Actions BROADCAST_SPLAT2 is (still) trunc'ed twice due to UF=2? Ayal: BROADCAST_SPLAT2 is (still) trunc'ed twice due to UF=2?
		fhahnAuthorUnsubmitted Done Reply Inline Actions The latest version avoids truncating the same value twice. fhahn: The latest version avoids truncating the same value twice.
		AyalUnsubmitted Not Done Reply Inline Actions Still seeing duplicate TMP2 and TMP3? Ayal: Still seeing duplicate TMP2 and TMP3?
; CHECK-NEXT: [[TMP3:%.*]] = trunc <16 x i32> [[BROADCAST_SPLAT]] to <16 x i16>
; CHECK-NEXT: [[TMP4:%.*]] = mul <16 x i16> [[BROADCAST_SPLAT2]], [[TMP3]]
; CHECK-NEXT: [[TMP5:%.*]] = lshr <16 x i16> [[TMP2]], <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>
; CHECK-NEXT: [[TMP6:%.*]] = lshr <16 x i16> [[TMP4]], <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>		; CHECK-NEXT: [[TMP6:%.*]] = lshr <16 x i16> [[TMP4]], <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>
; CHECK-NEXT: [[TMP7:%.*]] = trunc <16 x i16> [[TMP5]] to <16 x i8>		; CHECK-NEXT: [[TMP7:%.*]] = lshr <16 x i16> [[TMP5]], <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>
; CHECK-NEXT: [[TMP8:%.*]] = trunc <16 x i16> [[TMP6]] to <16 x i8>		; CHECK-NEXT: [[TMP8:%.*]] = trunc <16 x i16> [[TMP6]] to <16 x i8>
; CHECK-NEXT: [[TMP9:%.*]] = sext i32 [[INDEX]] to i64		; CHECK-NEXT: [[TMP9:%.*]] = trunc <16 x i16> [[TMP7]] to <16 x i8>
; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i8, ptr [[DST]], i64 [[TMP9]]		; CHECK-NEXT: [[TMP10:%.*]] = sext i32 [[INDEX]] to i64
; CHECK-NEXT: store <16 x i8> [[TMP7]], ptr [[TMP10]], align 1		; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds i8, ptr [[DST]], i64 [[TMP10]]
; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds i8, ptr [[TMP10]], i64 16
; CHECK-NEXT: store <16 x i8> [[TMP8]], ptr [[TMP11]], align 1		; CHECK-NEXT: store <16 x i8> [[TMP8]], ptr [[TMP11]], align 1
		; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds i8, ptr [[TMP11]], i64 16
		; CHECK-NEXT: store <16 x i8> [[TMP9]], ptr [[TMP12]], align 1
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 32		; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 32
; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i32 [[INDEX_NEXT]], 992		; CHECK-NEXT: [[TMP13:%.*]] = icmp eq i32 [[INDEX_NEXT]], 992
; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]		; CHECK-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
; CHECK: middle.block:		; CHECK: middle.block:
; CHECK-NEXT: br i1 false, label [[EXIT:%.]], label [[VEC_EPILOG_ITER_CHECK:%.]]		; CHECK-NEXT: br i1 false, label [[EXIT:%.]], label [[VEC_EPILOG_ITER_CHECK:%.]]
; CHECK: vec.epilog.iter.check:		; CHECK: vec.epilog.iter.check:
; CHECK-NEXT: br i1 false, label [[VEC_EPILOG_SCALAR_PH]], label [[VEC_EPILOG_PH]]		; CHECK-NEXT: br i1 false, label [[VEC_EPILOG_SCALAR_PH]], label [[VEC_EPILOG_PH]]
; CHECK: vec.epilog.ph:		; CHECK: vec.epilog.ph:
; CHECK-NEXT: [[TMP13:%.*]] = insertelement <8 x i16> undef, i16 [[B]], i64 0
; CHECK-NEXT: br label [[VEC_EPILOG_VECTOR_BODY:%.*]]
; CHECK: vec.epilog.vector.body:
; CHECK-NEXT: [[INDEX4:%.]] = phi i32 [ 992, [[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT9:%.]], [[VEC_EPILOG_VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP14:%.*]] = trunc i32 [[A]] to i16		; CHECK-NEXT: [[TMP14:%.*]] = trunc i32 [[A]] to i16
; CHECK-NEXT: [[TMP15:%.*]] = insertelement <8 x i16> undef, i16 [[TMP14]], i64 0		; CHECK-NEXT: [[TMP15:%.*]] = insertelement <8 x i16> undef, i16 [[TMP14]], i64 0
; CHECK-NEXT: [[TMP16:%.*]] = mul <8 x i16> [[TMP15]], [[TMP13]]		; CHECK-NEXT: [[TMP16:%.*]] = insertelement <8 x i16> undef, i16 [[B]], i64 0
		AyalUnsubmitted Not Done Reply Inline Actions Trunc & insertelement LICM'd from vec.epilog.vector.body to vec.epilog.ph. Ayal: Trunc & insertelement LICM'd from vec.epilog.vector.body to vec.epilog.ph.
; CHECK-NEXT: [[TMP17:%.*]] = lshr <8 x i16> [[TMP16]], <i16 8, i16 0, i16 0, i16 0, i16 0, i16 0, i16 0, i16 0>		; CHECK-NEXT: br label [[VEC_EPILOG_VECTOR_BODY:%.*]]
; CHECK-NEXT: [[TMP18:%.*]] = trunc <8 x i16> [[TMP17]] to <8 x i8>		; CHECK: vec.epilog.vector.body:
; CHECK-NEXT: [[TMP19:%.*]] = shufflevector <8 x i8> [[TMP18]], <8 x i8> poison, <8 x i32> zeroinitializer		; CHECK-NEXT: [[INDEX7:%.]] = phi i32 [ 992, [[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT8:%.]], [[VEC_EPILOG_VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP20:%.*]] = sext i32 [[INDEX4]] to i64		; CHECK-NEXT: [[TMP17:%.*]] = mul <8 x i16> [[TMP15]], [[TMP16]]
; CHECK-NEXT: [[TMP21:%.*]] = getelementptr inbounds i8, ptr [[DST]], i64 [[TMP20]]		; CHECK-NEXT: [[TMP18:%.*]] = lshr <8 x i16> [[TMP17]], <i16 8, i16 0, i16 0, i16 0, i16 0, i16 0, i16 0, i16 0>
; CHECK-NEXT: store <8 x i8> [[TMP19]], ptr [[TMP21]], align 1		; CHECK-NEXT: [[TMP19:%.*]] = trunc <8 x i16> [[TMP18]] to <8 x i8>
; CHECK-NEXT: [[INDEX_NEXT9]] = add nuw i32 [[INDEX4]], 8		; CHECK-NEXT: [[TMP20:%.*]] = shufflevector <8 x i8> [[TMP19]], <8 x i8> poison, <8 x i32> zeroinitializer
; CHECK-NEXT: [[TMP22:%.*]] = icmp eq i32 [[INDEX_NEXT9]], 1000		; CHECK-NEXT: [[TMP21:%.*]] = sext i32 [[INDEX7]] to i64
; CHECK-NEXT: br i1 [[TMP22]], label [[VEC_EPILOG_MIDDLE_BLOCK:%.*]], label [[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]		; CHECK-NEXT: [[TMP22:%.*]] = getelementptr inbounds i8, ptr [[DST]], i64 [[TMP21]]
		; CHECK-NEXT: store <8 x i8> [[TMP20]], ptr [[TMP22]], align 1
		; CHECK-NEXT: [[INDEX_NEXT8]] = add nuw i32 [[INDEX7]], 8
		; CHECK-NEXT: [[TMP23:%.*]] = icmp eq i32 [[INDEX_NEXT8]], 1000
		; CHECK-NEXT: br i1 [[TMP23]], label [[VEC_EPILOG_MIDDLE_BLOCK:%.*]], label [[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
; CHECK: vec.epilog.middle.block:		; CHECK: vec.epilog.middle.block:
; CHECK-NEXT: br i1 true, label [[EXIT]], label [[VEC_EPILOG_SCALAR_PH]]		; CHECK-NEXT: br i1 true, label [[EXIT]], label [[VEC_EPILOG_SCALAR_PH]]
; CHECK: vec.epilog.scalar.ph:		; CHECK: vec.epilog.scalar.ph:
; CHECK-NEXT: br label [[LOOP:%.*]]		; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:		; CHECK: loop:
; CHECK-NEXT: br i1 poison, label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP7:![0-9]+]]		; CHECK-NEXT: br i1 poison, label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP7:![0-9]+]]
; CHECK: exit:		; CHECK: exit:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
Show All 23 Lines
define void @test_shrink_select(ptr noalias %src, ptr noalias %dst, i32 %A, i1 %c) {		define void @test_shrink_select(ptr noalias %src, ptr noalias %dst, i32 %A, i1 %c) {
; CHECK-LABEL: define void @test_shrink_select		; CHECK-LABEL: define void @test_shrink_select
; CHECK-SAME: (ptr noalias [[SRC:%.]], ptr noalias [[DST:%.]], i32 [[A:%.]], i1 [[C:%.]]) {		; CHECK-SAME: (ptr noalias [[SRC:%.]], ptr noalias [[DST:%.]], i32 [[A:%.]], i1 [[C:%.]]) {
; CHECK-NEXT: iter.check:		; CHECK-NEXT: iter.check:
; CHECK-NEXT: br i1 false, label [[VEC_EPILOG_SCALAR_PH:%.]], label [[VECTOR_MAIN_LOOP_ITER_CHECK:%.]]		; CHECK-NEXT: br i1 false, label [[VEC_EPILOG_SCALAR_PH:%.]], label [[VECTOR_MAIN_LOOP_ITER_CHECK:%.]]
; CHECK: vector.main.loop.iter.check:		; CHECK: vector.main.loop.iter.check:
; CHECK-NEXT: br i1 false, label [[VEC_EPILOG_PH:%.]], label [[VECTOR_PH:%.]]		; CHECK-NEXT: br i1 false, label [[VEC_EPILOG_PH:%.]], label [[VECTOR_PH:%.]]
; CHECK: vector.ph:		; CHECK: vector.ph:
		; CHECK-NEXT: [[TMP0:%.*]] = trunc i32 [[A]] to i16
		; CHECK-NEXT: [[TMP1:%.*]] = insertelement <16 x i16> undef, i16 [[TMP0]], i64 0
		AyalUnsubmitted Not Done Reply Inline Actions ditto. Ayal: ditto.
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]		; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]		; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP0:%.*]] = trunc i32 [[A]] to i16
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <16 x i16> undef, i16 [[TMP0]], i64 0
; CHECK-NEXT: [[TMP2:%.*]] = mul <16 x i16> [[TMP1]], <i16 99, i16 poison, i16 poison, i16 poison, i16 poison, i16 poison, i16 poison, i16 poison, i16 poison, i16 poison, i16 poison, i16 poison, i16 poison, i16 poison, i16 poison, i16 poison>		; CHECK-NEXT: [[TMP2:%.*]] = mul <16 x i16> [[TMP1]], <i16 99, i16 poison, i16 poison, i16 poison, i16 poison, i16 poison, i16 poison, i16 poison, i16 poison, i16 poison, i16 poison, i16 poison, i16 poison, i16 poison, i16 poison, i16 poison>
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <16 x i16> [[TMP2]], <16 x i16> poison, <16 x i32> zeroinitializer		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <16 x i16> [[TMP2]], <16 x i16> poison, <16 x i32> zeroinitializer
; CHECK-NEXT: [[TMP4:%.*]] = lshr <16 x i16> [[TMP3]], <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>		; CHECK-NEXT: [[TMP4:%.*]] = lshr <16 x i16> [[TMP3]], <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>
; CHECK-NEXT: [[TMP5:%.*]] = select i1 [[C]], <16 x i16> [[TMP4]], <16 x i16> [[TMP3]]		; CHECK-NEXT: [[TMP5:%.*]] = select i1 [[C]], <16 x i16> [[TMP4]], <16 x i16> [[TMP3]]
; CHECK-NEXT: [[TMP6:%.*]] = trunc <16 x i16> [[TMP5]] to <16 x i8>		; CHECK-NEXT: [[TMP6:%.*]] = trunc <16 x i16> [[TMP5]] to <16 x i8>
; CHECK-NEXT: [[TMP7:%.*]] = sext i32 [[INDEX]] to i64		; CHECK-NEXT: [[TMP7:%.*]] = sext i32 [[INDEX]] to i64
; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i8, ptr [[DST]], i64 [[TMP7]]		; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i8, ptr [[DST]], i64 [[TMP7]]
; CHECK-NEXT: store <16 x i8> [[TMP6]], ptr [[TMP8]], align 1		; CHECK-NEXT: store <16 x i8> [[TMP6]], ptr [[TMP8]], align 1
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 16		; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 16
; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i32 [[INDEX_NEXT]], 992		; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i32 [[INDEX_NEXT]], 992
; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]		; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
; CHECK: middle.block:		; CHECK: middle.block:
; CHECK-NEXT: br i1 false, label [[EXIT:%.]], label [[VEC_EPILOG_ITER_CHECK:%.]]		; CHECK-NEXT: br i1 false, label [[EXIT:%.]], label [[VEC_EPILOG_ITER_CHECK:%.]]
; CHECK: vec.epilog.iter.check:		; CHECK: vec.epilog.iter.check:
; CHECK-NEXT: br i1 false, label [[VEC_EPILOG_SCALAR_PH]], label [[VEC_EPILOG_PH]]		; CHECK-NEXT: br i1 false, label [[VEC_EPILOG_SCALAR_PH]], label [[VEC_EPILOG_PH]]
; CHECK: vec.epilog.ph:		; CHECK: vec.epilog.ph:
; CHECK-NEXT: br label [[VEC_EPILOG_VECTOR_BODY:%.*]]
; CHECK: vec.epilog.vector.body:
; CHECK-NEXT: [[INDEX2:%.]] = phi i32 [ 992, [[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT5:%.]], [[VEC_EPILOG_VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP10:%.*]] = trunc i32 [[A]] to i16		; CHECK-NEXT: [[TMP10:%.*]] = trunc i32 [[A]] to i16
; CHECK-NEXT: [[TMP11:%.*]] = insertelement <8 x i16> undef, i16 [[TMP10]], i64 0		; CHECK-NEXT: [[TMP11:%.*]] = insertelement <8 x i16> undef, i16 [[TMP10]], i64 0
		; CHECK-NEXT: br label [[VEC_EPILOG_VECTOR_BODY:%.*]]
		; CHECK: vec.epilog.vector.body:
		; CHECK-NEXT: [[INDEX3:%.]] = phi i32 [ 992, [[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT4:%.]], [[VEC_EPILOG_VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP12:%.*]] = mul <8 x i16> [[TMP11]], <i16 99, i16 poison, i16 poison, i16 poison, i16 poison, i16 poison, i16 poison, i16 poison>		; CHECK-NEXT: [[TMP12:%.*]] = mul <8 x i16> [[TMP11]], <i16 99, i16 poison, i16 poison, i16 poison, i16 poison, i16 poison, i16 poison, i16 poison>
; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <8 x i16> [[TMP12]], <8 x i16> poison, <8 x i32> zeroinitializer		; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <8 x i16> [[TMP12]], <8 x i16> poison, <8 x i32> zeroinitializer
; CHECK-NEXT: [[TMP14:%.*]] = lshr <8 x i16> [[TMP13]], <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>		; CHECK-NEXT: [[TMP14:%.*]] = lshr <8 x i16> [[TMP13]], <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>
; CHECK-NEXT: [[TMP15:%.*]] = select i1 [[C]], <8 x i16> [[TMP14]], <8 x i16> [[TMP13]]		; CHECK-NEXT: [[TMP15:%.*]] = select i1 [[C]], <8 x i16> [[TMP14]], <8 x i16> [[TMP13]]
; CHECK-NEXT: [[TMP16:%.*]] = trunc <8 x i16> [[TMP15]] to <8 x i8>		; CHECK-NEXT: [[TMP16:%.*]] = trunc <8 x i16> [[TMP15]] to <8 x i8>
; CHECK-NEXT: [[TMP17:%.*]] = sext i32 [[INDEX2]] to i64		; CHECK-NEXT: [[TMP17:%.*]] = sext i32 [[INDEX3]] to i64
; CHECK-NEXT: [[TMP18:%.*]] = getelementptr inbounds i8, ptr [[DST]], i64 [[TMP17]]		; CHECK-NEXT: [[TMP18:%.*]] = getelementptr inbounds i8, ptr [[DST]], i64 [[TMP17]]
; CHECK-NEXT: store <8 x i8> [[TMP16]], ptr [[TMP18]], align 1		; CHECK-NEXT: store <8 x i8> [[TMP16]], ptr [[TMP18]], align 1
; CHECK-NEXT: [[INDEX_NEXT5]] = add nuw i32 [[INDEX2]], 8		; CHECK-NEXT: [[INDEX_NEXT4]] = add nuw i32 [[INDEX3]], 8
; CHECK-NEXT: [[TMP19:%.*]] = icmp eq i32 [[INDEX_NEXT5]], 1000		; CHECK-NEXT: [[TMP19:%.*]] = icmp eq i32 [[INDEX_NEXT4]], 1000
; CHECK-NEXT: br i1 [[TMP19]], label [[VEC_EPILOG_MIDDLE_BLOCK:%.*]], label [[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]		; CHECK-NEXT: br i1 [[TMP19]], label [[VEC_EPILOG_MIDDLE_BLOCK:%.*]], label [[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]
; CHECK: vec.epilog.middle.block:		; CHECK: vec.epilog.middle.block:
; CHECK-NEXT: br i1 true, label [[EXIT]], label [[VEC_EPILOG_SCALAR_PH]]		; CHECK-NEXT: br i1 true, label [[EXIT]], label [[VEC_EPILOG_SCALAR_PH]]
; CHECK: vec.epilog.scalar.ph:		; CHECK: vec.epilog.scalar.ph:
; CHECK-NEXT: br label [[LOOP:%.*]]		; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:		; CHECK: loop:
; CHECK-NEXT: br i1 poison, label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP10:![0-9]+]]		; CHECK-NEXT: br i1 poison, label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP10:![0-9]+]]
; CHECK: exit:		; CHECK: exit:
Show All 23 Lines

llvm/test/Transforms/LoopVectorize/AArch64/loop-vectorization-factors.ll

	Show All 21 Lines
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 0			; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i8, ptr [[P]], i64 [[TMP1]]			; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i8, ptr [[P]], i64 [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds i8, ptr [[TMP2]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds i8, ptr [[TMP2]], i32 0
	; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <16 x i8>, ptr [[TMP3]], align 1			; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <16 x i8>, ptr [[TMP3]], align 1
	; CHECK-NEXT: [[TMP4:%.*]] = add <16 x i8> [[WIDE_LOAD]], <i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2>			; CHECK-NEXT: [[TMP4:%.*]] = add <16 x i8> [[WIDE_LOAD]], <i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2>
	; CHECK-NEXT: [[TMP5:%.*]] = zext <16 x i8> [[TMP4]] to <16 x i32>			; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i8, ptr [[Q]], i64 [[TMP1]]
				AyalUnsubmitted Not Done Reply Inline Actions Fold zext-trunc pair, several such cases follow. Ayal: Fold zext-trunc pair, several such cases follow.
	; CHECK-NEXT: [[TMP6:%.*]] = trunc <16 x i32> [[TMP5]] to <16 x i8>			; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i8, ptr [[TMP5]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds i8, ptr [[Q]], i64 [[TMP1]]			; CHECK-NEXT: store <16 x i8> [[TMP4]], ptr [[TMP6]], align 1
	; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i8, ptr [[TMP7]], i32 0
	; CHECK-NEXT: store <16 x i8> [[TMP6]], ptr [[TMP8]], align 1
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
	; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.]], label [[VEC_EPILOG_ITER_CHECK:%.]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.]], label [[VEC_EPILOG_ITER_CHECK:%.]]
	; CHECK: vec.epilog.iter.check:			; CHECK: vec.epilog.iter.check:
	; CHECK-NEXT: [[N_VEC_REMAINING:%.*]] = sub i64 [[TMP0]], [[N_VEC]]			; CHECK-NEXT: [[N_VEC_REMAINING:%.*]] = sub i64 [[TMP0]], [[N_VEC]]
	; CHECK-NEXT: [[MIN_EPILOG_ITERS_CHECK:%.*]] = icmp ult i64 [[N_VEC_REMAINING]], 8			; CHECK-NEXT: [[MIN_EPILOG_ITERS_CHECK:%.*]] = icmp ult i64 [[N_VEC_REMAINING]], 8
	; CHECK-NEXT: br i1 [[MIN_EPILOG_ITERS_CHECK]], label [[VEC_EPILOG_SCALAR_PH]], label [[VEC_EPILOG_PH]]			; CHECK-NEXT: br i1 [[MIN_EPILOG_ITERS_CHECK]], label [[VEC_EPILOG_SCALAR_PH]], label [[VEC_EPILOG_PH]]
	; CHECK: vec.epilog.ph:			; CHECK: vec.epilog.ph:
	; CHECK-NEXT: [[VEC_EPILOG_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[VECTOR_MAIN_LOOP_ITER_CHECK]] ]			; CHECK-NEXT: [[VEC_EPILOG_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[VECTOR_MAIN_LOOP_ITER_CHECK]] ]
	; CHECK-NEXT: [[N_MOD_VF2:%.*]] = urem i64 [[TMP0]], 8			; CHECK-NEXT: [[N_MOD_VF2:%.*]] = urem i64 [[TMP0]], 8
	; CHECK-NEXT: [[N_VEC3:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF2]]			; CHECK-NEXT: [[N_VEC3:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF2]]
	; CHECK-NEXT: br label [[VEC_EPILOG_VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VEC_EPILOG_VECTOR_BODY:%.*]]
	; CHECK: vec.epilog.vector.body:			; CHECK: vec.epilog.vector.body:
	; CHECK-NEXT: [[INDEX5:%.]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], [[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT7:%.]], [[VEC_EPILOG_VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX5:%.]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], [[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT7:%.]], [[VEC_EPILOG_VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP10:%.*]] = add i64 [[INDEX5]], 0			; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[INDEX5]], 0
	; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds i8, ptr [[P]], i64 [[TMP10]]			; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i8, ptr [[P]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds i8, ptr [[TMP11]], i32 0			; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i8, ptr [[TMP9]], i32 0
	; CHECK-NEXT: [[WIDE_LOAD6:%.*]] = load <8 x i8>, ptr [[TMP12]], align 1			; CHECK-NEXT: [[WIDE_LOAD6:%.*]] = load <8 x i8>, ptr [[TMP10]], align 1
	; CHECK-NEXT: [[TMP13:%.*]] = add <8 x i8> [[WIDE_LOAD6]], <i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2>			; CHECK-NEXT: [[TMP11:%.*]] = add <8 x i8> [[WIDE_LOAD6]], <i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2>
	; CHECK-NEXT: [[TMP14:%.*]] = zext <8 x i8> [[TMP13]] to <8 x i32>			; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds i8, ptr [[Q]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP15:%.*]] = trunc <8 x i32> [[TMP14]] to <8 x i8>			; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds i8, ptr [[TMP12]], i32 0
	; CHECK-NEXT: [[TMP16:%.*]] = getelementptr inbounds i8, ptr [[Q]], i64 [[TMP10]]			; CHECK-NEXT: store <8 x i8> [[TMP11]], ptr [[TMP13]], align 1
	; CHECK-NEXT: [[TMP17:%.*]] = getelementptr inbounds i8, ptr [[TMP16]], i32 0
	; CHECK-NEXT: store <8 x i8> [[TMP15]], ptr [[TMP17]], align 1
	; CHECK-NEXT: [[INDEX_NEXT7]] = add nuw i64 [[INDEX5]], 8			; CHECK-NEXT: [[INDEX_NEXT7]] = add nuw i64 [[INDEX5]], 8
	; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT7]], [[N_VEC3]]			; CHECK-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT7]], [[N_VEC3]]
	; CHECK-NEXT: br i1 [[TMP18]], label [[VEC_EPILOG_MIDDLE_BLOCK:%.*]], label [[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP14]], label [[VEC_EPILOG_MIDDLE_BLOCK:%.*]], label [[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
	; CHECK: vec.epilog.middle.block:			; CHECK: vec.epilog.middle.block:
	; CHECK-NEXT: [[CMP_N4:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC3]]			; CHECK-NEXT: [[CMP_N4:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC3]]
	; CHECK-NEXT: br i1 [[CMP_N4]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[VEC_EPILOG_SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N4]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[VEC_EPILOG_SCALAR_PH]]
	; CHECK: vec.epilog.scalar.ph:			; CHECK: vec.epilog.scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC3]], [[VEC_EPILOG_MIDDLE_BLOCK]] ], [ [[N_VEC]], [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[ITER_CHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC3]], [[VEC_EPILOG_MIDDLE_BLOCK]] ], [ [[N_VEC]], [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[ITER_CHECK]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.cond.cleanup.loopexit:			; CHECK: for.cond.cleanup.loopexit:
	; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]			; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[VEC_EPILOG_SCALAR_PH]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[VEC_EPILOG_SCALAR_PH]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i8, ptr [[P]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i8, ptr [[P]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[TMP19:%.*]] = load i8, ptr [[ARRAYIDX]], align 1			; CHECK-NEXT: [[TMP15:%.*]] = load i8, ptr [[ARRAYIDX]], align 1
	; CHECK-NEXT: [[CONV:%.*]] = zext i8 [[TMP19]] to i32			; CHECK-NEXT: [[CONV:%.*]] = zext i8 [[TMP15]] to i32
	; CHECK-NEXT: [[ADD:%.*]] = add nuw nsw i32 [[CONV]], 2			; CHECK-NEXT: [[ADD:%.*]] = add nuw nsw i32 [[CONV]], 2
	; CHECK-NEXT: [[CONV1:%.*]] = trunc i32 [[ADD]] to i8			; CHECK-NEXT: [[CONV1:%.*]] = trunc i32 [[ADD]] to i8
	; CHECK-NEXT: [[ARRAYIDX3:%.*]] = getelementptr inbounds i8, ptr [[Q]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX3:%.*]] = getelementptr inbounds i8, ptr [[Q]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: store i8 [[CONV1]], ptr [[ARRAYIDX3]], align 1			; CHECK-NEXT: store i8 [[CONV1]], ptr [[ARRAYIDX3]], align 1
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], [[LEN]]			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], [[LEN]]
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
	▲ Show 20 Lines • Show All 135 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 0			; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i16, ptr [[P]], i64 [[TMP1]]			; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i16, ptr [[P]], i64 [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds i16, ptr [[TMP2]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds i16, ptr [[TMP2]], i32 0
	; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <8 x i16>, ptr [[TMP3]], align 2			; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <8 x i16>, ptr [[TMP3]], align 2
	; CHECK-NEXT: [[TMP4:%.*]] = add <8 x i16> [[WIDE_LOAD]], <i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2>			; CHECK-NEXT: [[TMP4:%.*]] = add <8 x i16> [[WIDE_LOAD]], <i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2>
	; CHECK-NEXT: [[TMP5:%.*]] = zext <8 x i16> [[TMP4]] to <8 x i32>			; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i16, ptr [[Q]], i64 [[TMP1]]
	; CHECK-NEXT: [[TMP6:%.*]] = trunc <8 x i32> [[TMP5]] to <8 x i16>			; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i16, ptr [[TMP5]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds i16, ptr [[Q]], i64 [[TMP1]]			; CHECK-NEXT: store <8 x i16> [[TMP4]], ptr [[TMP6]], align 2
	; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i16, ptr [[TMP7]], i32 0
	; CHECK-NEXT: store <8 x i16> [[TMP6]], ptr [[TMP8]], align 2
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
	; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.cond.cleanup.loopexit:			; CHECK: for.cond.cleanup.loopexit:
	; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]			; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i16, ptr [[P]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i16, ptr [[P]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[TMP10:%.*]] = load i16, ptr [[ARRAYIDX]], align 2			; CHECK-NEXT: [[TMP8:%.*]] = load i16, ptr [[ARRAYIDX]], align 2
	; CHECK-NEXT: [[CONV8:%.*]] = zext i16 [[TMP10]] to i32			; CHECK-NEXT: [[CONV8:%.*]] = zext i16 [[TMP8]] to i32
	; CHECK-NEXT: [[ADD:%.*]] = add nuw nsw i32 [[CONV8]], 2			; CHECK-NEXT: [[ADD:%.*]] = add nuw nsw i32 [[CONV8]], 2
	; CHECK-NEXT: [[CONV1:%.*]] = trunc i32 [[ADD]] to i16			; CHECK-NEXT: [[CONV1:%.*]] = trunc i32 [[ADD]] to i16
	; CHECK-NEXT: [[ARRAYIDX3:%.*]] = getelementptr inbounds i16, ptr [[Q]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX3:%.*]] = getelementptr inbounds i16, ptr [[Q]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: store i16 [[CONV1]], ptr [[ARRAYIDX3]], align 2			; CHECK-NEXT: store i16 [[CONV1]], ptr [[ARRAYIDX3]], align 2
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], [[LEN]]			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], [[LEN]]
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]
	Show All 37 Lines
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP0]], 16			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP0]], 16
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 0			; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i8, ptr [[P]], i64 [[TMP1]]			; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i8, ptr [[P]], i64 [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds i8, ptr [[TMP2]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds i8, ptr [[TMP2]], i32 0
	; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <16 x i8>, ptr [[TMP3]], align 1			; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <16 x i8>, ptr [[TMP3]], align 1
				AyalUnsubmitted Done Reply Inline Actions We now fold a trunc-zext of zext'ed WIDE_LOAD from <16 x i16> => <16 x i32> => <16 x i16>, but fail to fold a similar one following the add-2's? Ayal: We now fold a trunc-zext of zext'ed WIDE_LOAD from <16 x i16> => <16 x i32> => <16 x i16>, but…
				fhahnAuthorUnsubmitted Done Reply Inline Actions folding now happens all in simplifyRecieps, should handle this now fhahn: folding now happens all in simplifyRecieps, should handle this now
				AyalUnsubmitted Not Done Reply Inline Actions The one following the add-2's is also folded now. Ayal: The one following the add-2's is also folded now.
	; CHECK-NEXT: [[TMP4:%.*]] = zext <16 x i8> [[WIDE_LOAD]] to <16 x i16>			; CHECK-NEXT: [[TMP4:%.*]] = zext <16 x i8> [[WIDE_LOAD]] to <16 x i16>
	; CHECK-NEXT: [[TMP5:%.*]] = zext <16 x i16> [[TMP4]] to <16 x i32>			; CHECK-NEXT: [[TMP5:%.*]] = add <16 x i16> [[TMP4]], <i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2>
	; CHECK-NEXT: [[TMP6:%.*]] = trunc <16 x i32> [[TMP5]] to <16 x i16>			; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i16, ptr [[Q]], i64 [[TMP1]]
	; CHECK-NEXT: [[TMP7:%.*]] = add <16 x i16> [[TMP6]], <i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2>			; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds i16, ptr [[TMP6]], i32 0
	; CHECK-NEXT: [[TMP8:%.*]] = zext <16 x i16> [[TMP7]] to <16 x i32>			; CHECK-NEXT: store <16 x i16> [[TMP5]], ptr [[TMP7]], align 2
	; CHECK-NEXT: [[TMP9:%.*]] = trunc <16 x i32> [[TMP8]] to <16 x i16>
	; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i16, ptr [[Q]], i64 [[TMP1]]
	; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds i16, ptr [[TMP10]], i32 0
	; CHECK-NEXT: store <16 x i16> [[TMP9]], ptr [[TMP11]], align 2
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
	; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.]], label [[VEC_EPILOG_ITER_CHECK:%.]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.]], label [[VEC_EPILOG_ITER_CHECK:%.]]
	; CHECK: vec.epilog.iter.check:			; CHECK: vec.epilog.iter.check:
	; CHECK-NEXT: [[N_VEC_REMAINING:%.*]] = sub i64 [[TMP0]], [[N_VEC]]			; CHECK-NEXT: [[N_VEC_REMAINING:%.*]] = sub i64 [[TMP0]], [[N_VEC]]
	; CHECK-NEXT: [[MIN_EPILOG_ITERS_CHECK:%.*]] = icmp ult i64 [[N_VEC_REMAINING]], 8			; CHECK-NEXT: [[MIN_EPILOG_ITERS_CHECK:%.*]] = icmp ult i64 [[N_VEC_REMAINING]], 8
	; CHECK-NEXT: br i1 [[MIN_EPILOG_ITERS_CHECK]], label [[VEC_EPILOG_SCALAR_PH]], label [[VEC_EPILOG_PH]]			; CHECK-NEXT: br i1 [[MIN_EPILOG_ITERS_CHECK]], label [[VEC_EPILOG_SCALAR_PH]], label [[VEC_EPILOG_PH]]
	; CHECK: vec.epilog.ph:			; CHECK: vec.epilog.ph:
	; CHECK-NEXT: [[VEC_EPILOG_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[VECTOR_MAIN_LOOP_ITER_CHECK]] ]			; CHECK-NEXT: [[VEC_EPILOG_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[VECTOR_MAIN_LOOP_ITER_CHECK]] ]
	; CHECK-NEXT: [[N_MOD_VF2:%.*]] = urem i64 [[TMP0]], 8			; CHECK-NEXT: [[N_MOD_VF2:%.*]] = urem i64 [[TMP0]], 8
	; CHECK-NEXT: [[N_VEC3:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF2]]			; CHECK-NEXT: [[N_VEC3:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF2]]
	; CHECK-NEXT: br label [[VEC_EPILOG_VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VEC_EPILOG_VECTOR_BODY:%.*]]
	; CHECK: vec.epilog.vector.body:			; CHECK: vec.epilog.vector.body:
	; CHECK-NEXT: [[INDEX5:%.]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], [[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT7:%.]], [[VEC_EPILOG_VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX5:%.]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], [[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT7:%.]], [[VEC_EPILOG_VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP13:%.*]] = add i64 [[INDEX5]], 0			; CHECK-NEXT: [[TMP9:%.*]] = add i64 [[INDEX5]], 0
	; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds i8, ptr [[P]], i64 [[TMP13]]			; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i8, ptr [[P]], i64 [[TMP9]]
	; CHECK-NEXT: [[TMP15:%.*]] = getelementptr inbounds i8, ptr [[TMP14]], i32 0			; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds i8, ptr [[TMP10]], i32 0
	; CHECK-NEXT: [[WIDE_LOAD6:%.*]] = load <8 x i8>, ptr [[TMP15]], align 1			; CHECK-NEXT: [[WIDE_LOAD6:%.*]] = load <8 x i8>, ptr [[TMP11]], align 1
	; CHECK-NEXT: [[TMP16:%.*]] = zext <8 x i8> [[WIDE_LOAD6]] to <8 x i16>			; CHECK-NEXT: [[TMP12:%.*]] = zext <8 x i8> [[WIDE_LOAD6]] to <8 x i16>
	; CHECK-NEXT: [[TMP17:%.*]] = zext <8 x i16> [[TMP16]] to <8 x i32>			; CHECK-NEXT: [[TMP13:%.*]] = add <8 x i16> [[TMP12]], <i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2>
				AyalUnsubmitted Not Done Reply Inline Actions We now get rid of a pair of <8 x i16> => <8 x i32> => <8 x i16> before the add-2's (so this is not an NFC patch), but still retain the pair of <8 x i16> => <8 x i32> => <8 x i16> after it - missed MinBW/trunc-zext opportunity? Ayal: We now get rid of a pair of <8 x i16> => <8 x i32> => <8 x i16> before the add-2's (so this is…
				AyalUnsubmitted Not Done Reply Inline Actions Other pair also folded now. Ayal: Other pair also folded now.
	; CHECK-NEXT: [[TMP18:%.*]] = trunc <8 x i32> [[TMP17]] to <8 x i16>			; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds i16, ptr [[Q]], i64 [[TMP9]]
	; CHECK-NEXT: [[TMP19:%.*]] = add <8 x i16> [[TMP18]], <i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2, i16 2>			; CHECK-NEXT: [[TMP15:%.*]] = getelementptr inbounds i16, ptr [[TMP14]], i32 0
	; CHECK-NEXT: [[TMP20:%.*]] = zext <8 x i16> [[TMP19]] to <8 x i32>			; CHECK-NEXT: store <8 x i16> [[TMP13]], ptr [[TMP15]], align 2
	; CHECK-NEXT: [[TMP21:%.*]] = trunc <8 x i32> [[TMP20]] to <8 x i16>
	; CHECK-NEXT: [[TMP22:%.*]] = getelementptr inbounds i16, ptr [[Q]], i64 [[TMP13]]
	; CHECK-NEXT: [[TMP23:%.*]] = getelementptr inbounds i16, ptr [[TMP22]], i32 0
	; CHECK-NEXT: store <8 x i16> [[TMP21]], ptr [[TMP23]], align 2
	; CHECK-NEXT: [[INDEX_NEXT7]] = add nuw i64 [[INDEX5]], 8			; CHECK-NEXT: [[INDEX_NEXT7]] = add nuw i64 [[INDEX5]], 8
	; CHECK-NEXT: [[TMP24:%.*]] = icmp eq i64 [[INDEX_NEXT7]], [[N_VEC3]]			; CHECK-NEXT: [[TMP16:%.*]] = icmp eq i64 [[INDEX_NEXT7]], [[N_VEC3]]
	; CHECK-NEXT: br i1 [[TMP24]], label [[VEC_EPILOG_MIDDLE_BLOCK:%.*]], label [[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP16]], label [[VEC_EPILOG_MIDDLE_BLOCK:%.*]], label [[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]]
	; CHECK: vec.epilog.middle.block:			; CHECK: vec.epilog.middle.block:
	; CHECK-NEXT: [[CMP_N4:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC3]]			; CHECK-NEXT: [[CMP_N4:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC3]]
	; CHECK-NEXT: br i1 [[CMP_N4]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[VEC_EPILOG_SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N4]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[VEC_EPILOG_SCALAR_PH]]
	; CHECK: vec.epilog.scalar.ph:			; CHECK: vec.epilog.scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC3]], [[VEC_EPILOG_MIDDLE_BLOCK]] ], [ [[N_VEC]], [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[ITER_CHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC3]], [[VEC_EPILOG_MIDDLE_BLOCK]] ], [ [[N_VEC]], [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[ITER_CHECK]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.cond.cleanup.loopexit:			; CHECK: for.cond.cleanup.loopexit:
	; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]			; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[VEC_EPILOG_SCALAR_PH]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[VEC_EPILOG_SCALAR_PH]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i8, ptr [[P]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i8, ptr [[P]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[TMP25:%.*]] = load i8, ptr [[ARRAYIDX]], align 1			; CHECK-NEXT: [[TMP17:%.*]] = load i8, ptr [[ARRAYIDX]], align 1
	; CHECK-NEXT: [[CONV:%.*]] = zext i8 [[TMP25]] to i32			; CHECK-NEXT: [[CONV:%.*]] = zext i8 [[TMP17]] to i32
	; CHECK-NEXT: [[ADD:%.*]] = add nuw nsw i32 [[CONV]], 2			; CHECK-NEXT: [[ADD:%.*]] = add nuw nsw i32 [[CONV]], 2
	; CHECK-NEXT: [[CONV1:%.*]] = trunc i32 [[ADD]] to i16			; CHECK-NEXT: [[CONV1:%.*]] = trunc i32 [[ADD]] to i16
	; CHECK-NEXT: [[ARRAYIDX3:%.*]] = getelementptr inbounds i16, ptr [[Q]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX3:%.*]] = getelementptr inbounds i16, ptr [[Q]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: store i16 [[CONV1]], ptr [[ARRAYIDX3]], align 2			; CHECK-NEXT: store i16 [[CONV1]], ptr [[ARRAYIDX3]], align 2
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], [[LEN]]			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], [[LEN]]
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]
	▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[VEC_EPILOG_SCALAR_PH:%.]], label [[VECTOR_MAIN_LOOP_ITER_CHECK:%.]]			; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[VEC_EPILOG_SCALAR_PH:%.]], label [[VECTOR_MAIN_LOOP_ITER_CHECK:%.]]
	; CHECK: vector.main.loop.iter.check:			; CHECK: vector.main.loop.iter.check:
	; CHECK-NEXT: [[MIN_ITERS_CHECK1:%.*]] = icmp ult i64 [[TMP0]], 16			; CHECK-NEXT: [[MIN_ITERS_CHECK1:%.*]] = icmp ult i64 [[TMP0]], 16
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK1]], label [[VEC_EPILOG_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK1]], label [[VEC_EPILOG_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP0]], 16			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP0]], 16
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF]]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <16 x i32> poison, i32 [[CONV13]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <16 x i32> poison, i32 [[CONV13]], i64 0
	; CHECK-NEXT: [[TMP1:%.*]] = trunc <16 x i32> [[BROADCAST_SPLATINSERT]] to <16 x i8>			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <16 x i32> [[BROADCAST_SPLATINSERT]], <16 x i32> poison, <16 x i32> zeroinitializer
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <16 x i8> [[TMP1]], <16 x i8> poison, <16 x i32> zeroinitializer			; CHECK-NEXT: [[TMP1:%.*]] = trunc <16 x i32> [[BROADCAST_SPLAT]] to <16 x i8>
				AyalUnsubmitted Done Reply Inline Actions Hmm, before we narrowed these two sufflevectors to operate on <16 x i8> and zext-trunc their result, now we let them operate on original <16 x i32> and truncate the result? Ayal: Hmm, before we narrowed these two sufflevectors to operate on <16 x i8> and zext-trunc their…
				fhahnAuthorUnsubmitted Done Reply Inline Actions I think there's nothing we can do about that; we first need to splat the value when generating code, but InstCombine should take care of that. fhahn: I think there's nothing we can do about that; we first need to splat the value when generating…
				AyalUnsubmitted Not Done Reply Inline Actions Worth testing with a subsequent instCombine, to ensure pessimization is avoided? Ayal: Worth testing with a subsequent instCombine, to ensure pessimization is avoided?
	; CHECK-NEXT: [[TMP2:%.*]] = zext <16 x i8> [[BROADCAST_SPLAT]] to <16 x i32>
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT2:%.*]] = insertelement <16 x i32> poison, i32 [[CONV11]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT2:%.*]] = insertelement <16 x i32> poison, i32 [[CONV11]], i64 0
	; CHECK-NEXT: [[TMP3:%.*]] = trunc <16 x i32> [[BROADCAST_SPLATINSERT2]] to <16 x i8>			; CHECK-NEXT: [[BROADCAST_SPLAT3:%.*]] = shufflevector <16 x i32> [[BROADCAST_SPLATINSERT2]], <16 x i32> poison, <16 x i32> zeroinitializer
	; CHECK-NEXT: [[BROADCAST_SPLAT3:%.*]] = shufflevector <16 x i8> [[TMP3]], <16 x i8> poison, <16 x i32> zeroinitializer			; CHECK-NEXT: [[TMP2:%.*]] = trunc <16 x i32> [[BROADCAST_SPLAT3]] to <16 x i8>
	; CHECK-NEXT: [[TMP4:%.*]] = zext <16 x i8> [[BROADCAST_SPLAT3]] to <16 x i32>
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[INDEX]], 0			; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i8, ptr [[P]], i64 [[TMP5]]			; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds i8, ptr [[P]], i64 [[TMP3]]
	; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds i8, ptr [[TMP6]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i8, ptr [[TMP4]], i32 0
	; CHECK-NEXT: [[TMP8:%.*]] = load <16 x i8>, ptr [[TMP7]], align 1			; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <16 x i8>, ptr [[TMP5]], align 1
	; CHECK-NEXT: [[TMP9:%.*]] = shl <16 x i8> [[TMP8]], <i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4>			; CHECK-NEXT: [[TMP6:%.*]] = shl <16 x i8> [[WIDE_LOAD]], <i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4>
	; CHECK-NEXT: [[TMP10:%.*]] = zext <16 x i8> [[TMP9]] to <16 x i32>			; CHECK-NEXT: [[TMP7:%.*]] = add <16 x i8> [[TMP6]], <i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32>
				AyalUnsubmitted Done Reply Inline Actions Many zext-trunc pairs left to collect. Ayal: Many zext-trunc pairs left to collect.
				fhahnAuthorUnsubmitted Done Reply Inline Actions Should be better cleaned up now fhahn: Should be better cleaned up now
				AyalUnsubmitted Not Done Reply Inline Actions Indeed looks like it! Ayal: Indeed looks like it!
	; CHECK-NEXT: [[TMP11:%.*]] = trunc <16 x i32> [[TMP10]] to <16 x i8>			; CHECK-NEXT: [[TMP8:%.*]] = or <16 x i8> [[WIDE_LOAD]], <i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51>
	; CHECK-NEXT: [[TMP12:%.*]] = add <16 x i8> [[TMP11]], <i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32>			; CHECK-NEXT: [[TMP9:%.*]] = mul <16 x i8> [[TMP8]], <i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60>
	; CHECK-NEXT: [[TMP13:%.*]] = zext <16 x i8> [[TMP12]] to <16 x i32>			; CHECK-NEXT: [[TMP10:%.*]] = and <16 x i8> [[TMP7]], [[TMP1]]
	; CHECK-NEXT: [[TMP14:%.*]] = or <16 x i8> [[TMP8]], <i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51>			; CHECK-NEXT: [[TMP11:%.*]] = and <16 x i8> [[TMP9]], <i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4>
	; CHECK-NEXT: [[TMP15:%.*]] = zext <16 x i8> [[TMP14]] to <16 x i32>			; CHECK-NEXT: [[TMP12:%.*]] = xor <16 x i8> [[TMP11]], [[TMP2]]
	; CHECK-NEXT: [[TMP16:%.*]] = trunc <16 x i32> [[TMP15]] to <16 x i8>			; CHECK-NEXT: [[TMP13:%.*]] = mul <16 x i8> [[TMP12]], [[TMP10]]
	; CHECK-NEXT: [[TMP17:%.*]] = mul <16 x i8> [[TMP16]], <i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60>			; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds i8, ptr [[Q]], i64 [[TMP3]]
	; CHECK-NEXT: [[TMP18:%.*]] = zext <16 x i8> [[TMP17]] to <16 x i32>			; CHECK-NEXT: [[TMP15:%.*]] = getelementptr inbounds i8, ptr [[TMP14]], i32 0
	; CHECK-NEXT: [[TMP19:%.*]] = trunc <16 x i32> [[TMP13]] to <16 x i8>			; CHECK-NEXT: store <16 x i8> [[TMP13]], ptr [[TMP15]], align 1
	; CHECK-NEXT: [[TMP20:%.*]] = trunc <16 x i32> [[TMP2]] to <16 x i8>
	AyalUnsubmitted Not Done Reply Inline Actions Above trunc of TMP2 is redundant along with its zext in the ph. Ayal: Above trunc of TMP2 is redundant along with its zext in the ph.
	; CHECK-NEXT: [[TMP21:%.*]] = and <16 x i8> [[TMP19]], [[TMP20]]
	; CHECK-NEXT: [[TMP22:%.*]] = zext <16 x i8> [[TMP21]] to <16 x i32>
	; CHECK-NEXT: [[TMP23:%.*]] = trunc <16 x i32> [[TMP18]] to <16 x i8>
	; CHECK-NEXT: [[TMP24:%.*]] = and <16 x i8> [[TMP23]], <i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4>
	; CHECK-NEXT: [[TMP25:%.*]] = zext <16 x i8> [[TMP24]] to <16 x i32>
	; CHECK-NEXT: [[TMP26:%.*]] = trunc <16 x i32> [[TMP25]] to <16 x i8>
	; CHECK-NEXT: [[TMP27:%.*]] = trunc <16 x i32> [[TMP4]] to <16 x i8>
	AyalUnsubmitted Not Done Reply Inline Actions Above trunc of TMP4 is redundant along with its zext in the ph. Ayal: Above trunc of TMP4 is redundant along with its zext in the ph.
	; CHECK-NEXT: [[TMP28:%.*]] = xor <16 x i8> [[TMP26]], [[TMP27]]
	; CHECK-NEXT: [[TMP29:%.*]] = zext <16 x i8> [[TMP28]] to <16 x i32>
	; CHECK-NEXT: [[TMP30:%.*]] = trunc <16 x i32> [[TMP29]] to <16 x i8>
	; CHECK-NEXT: [[TMP31:%.*]] = trunc <16 x i32> [[TMP22]] to <16 x i8>
	; CHECK-NEXT: [[TMP32:%.*]] = mul <16 x i8> [[TMP30]], [[TMP31]]
	; CHECK-NEXT: [[TMP33:%.*]] = zext <16 x i8> [[TMP32]] to <16 x i32>
	; CHECK-NEXT: [[TMP34:%.*]] = trunc <16 x i32> [[TMP33]] to <16 x i8>
	; CHECK-NEXT: [[TMP35:%.*]] = getelementptr inbounds i8, ptr [[Q]], i64 [[TMP5]]
	; CHECK-NEXT: [[TMP36:%.*]] = getelementptr inbounds i8, ptr [[TMP35]], i32 0
	; CHECK-NEXT: store <16 x i8> [[TMP34]], ptr [[TMP36]], align 1
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
	; CHECK-NEXT: [[TMP37:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP16:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP37]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP15:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP16]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP15:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.]], label [[VEC_EPILOG_ITER_CHECK:%.]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.]], label [[VEC_EPILOG_ITER_CHECK:%.]]
	; CHECK: vec.epilog.iter.check:			; CHECK: vec.epilog.iter.check:
	; CHECK-NEXT: [[N_VEC_REMAINING:%.*]] = sub i64 [[TMP0]], [[N_VEC]]			; CHECK-NEXT: [[N_VEC_REMAINING:%.*]] = sub i64 [[TMP0]], [[N_VEC]]
	; CHECK-NEXT: [[MIN_EPILOG_ITERS_CHECK:%.*]] = icmp ult i64 [[N_VEC_REMAINING]], 8			; CHECK-NEXT: [[MIN_EPILOG_ITERS_CHECK:%.*]] = icmp ult i64 [[N_VEC_REMAINING]], 8
	; CHECK-NEXT: br i1 [[MIN_EPILOG_ITERS_CHECK]], label [[VEC_EPILOG_SCALAR_PH]], label [[VEC_EPILOG_PH]]			; CHECK-NEXT: br i1 [[MIN_EPILOG_ITERS_CHECK]], label [[VEC_EPILOG_SCALAR_PH]], label [[VEC_EPILOG_PH]]
	; CHECK: vec.epilog.ph:			; CHECK: vec.epilog.ph:
	; CHECK-NEXT: [[VEC_EPILOG_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[VECTOR_MAIN_LOOP_ITER_CHECK]] ]			; CHECK-NEXT: [[VEC_EPILOG_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[VECTOR_MAIN_LOOP_ITER_CHECK]] ]
	; CHECK-NEXT: [[N_MOD_VF4:%.*]] = urem i64 [[TMP0]], 8			; CHECK-NEXT: [[N_MOD_VF4:%.*]] = urem i64 [[TMP0]], 8
	; CHECK-NEXT: [[N_VEC5:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF4]]			; CHECK-NEXT: [[N_VEC5:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF4]]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT8:%.*]] = insertelement <8 x i32> poison, i32 [[CONV13]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT7:%.*]] = insertelement <8 x i32> poison, i32 [[CONV13]], i64 0
	; CHECK-NEXT: [[TMP38:%.*]] = trunc <8 x i32> [[BROADCAST_SPLATINSERT8]] to <8 x i8>			; CHECK-NEXT: [[BROADCAST_SPLAT8:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT7]], <8 x i32> poison, <8 x i32> zeroinitializer
	; CHECK-NEXT: [[BROADCAST_SPLAT9:%.*]] = shufflevector <8 x i8> [[TMP38]], <8 x i8> poison, <8 x i32> zeroinitializer			; CHECK-NEXT: [[TMP17:%.*]] = trunc <8 x i32> [[BROADCAST_SPLAT8]] to <8 x i8>
	; CHECK-NEXT: [[TMP39:%.*]] = zext <8 x i8> [[BROADCAST_SPLAT9]] to <8 x i32>			; CHECK-NEXT: [[BROADCAST_SPLATINSERT9:%.*]] = insertelement <8 x i32> poison, i32 [[CONV11]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT10:%.*]] = insertelement <8 x i32> poison, i32 [[CONV11]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLAT10:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT9]], <8 x i32> poison, <8 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP40:%.*]] = trunc <8 x i32> [[BROADCAST_SPLATINSERT10]] to <8 x i8>			; CHECK-NEXT: [[TMP18:%.*]] = trunc <8 x i32> [[BROADCAST_SPLAT10]] to <8 x i8>
	; CHECK-NEXT: [[BROADCAST_SPLAT11:%.*]] = shufflevector <8 x i8> [[TMP40]], <8 x i8> poison, <8 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP41:%.*]] = zext <8 x i8> [[BROADCAST_SPLAT11]] to <8 x i32>
	; CHECK-NEXT: br label [[VEC_EPILOG_VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VEC_EPILOG_VECTOR_BODY:%.*]]
	; CHECK: vec.epilog.vector.body:			; CHECK: vec.epilog.vector.body:
	; CHECK-NEXT: [[INDEX7:%.]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], [[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT12:%.]], [[VEC_EPILOG_VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX11:%.]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], [[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT13:%.]], [[VEC_EPILOG_VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP42:%.*]] = add i64 [[INDEX7]], 0			; CHECK-NEXT: [[TMP19:%.*]] = add i64 [[INDEX11]], 0
	; CHECK-NEXT: [[TMP43:%.*]] = getelementptr inbounds i8, ptr [[P]], i64 [[TMP42]]			; CHECK-NEXT: [[TMP20:%.*]] = getelementptr inbounds i8, ptr [[P]], i64 [[TMP19]]
	; CHECK-NEXT: [[TMP44:%.*]] = getelementptr inbounds i8, ptr [[TMP43]], i32 0			; CHECK-NEXT: [[TMP21:%.*]] = getelementptr inbounds i8, ptr [[TMP20]], i32 0
	; CHECK-NEXT: [[TMP45:%.*]] = load <8 x i8>, ptr [[TMP44]], align 1			; CHECK-NEXT: [[WIDE_LOAD12:%.*]] = load <8 x i8>, ptr [[TMP21]], align 1
	; CHECK-NEXT: [[TMP46:%.*]] = shl <8 x i8> [[TMP45]], <i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4>			; CHECK-NEXT: [[TMP22:%.*]] = shl <8 x i8> [[WIDE_LOAD12]], <i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4>
	; CHECK-NEXT: [[TMP47:%.*]] = zext <8 x i8> [[TMP46]] to <8 x i32>			; CHECK-NEXT: [[TMP23:%.*]] = add <8 x i8> [[TMP22]], <i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32>
	; CHECK-NEXT: [[TMP48:%.*]] = trunc <8 x i32> [[TMP47]] to <8 x i8>			; CHECK-NEXT: [[TMP24:%.*]] = or <8 x i8> [[WIDE_LOAD12]], <i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51>
	; CHECK-NEXT: [[TMP49:%.*]] = add <8 x i8> [[TMP48]], <i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32>			; CHECK-NEXT: [[TMP25:%.*]] = mul <8 x i8> [[TMP24]], <i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60>
	; CHECK-NEXT: [[TMP50:%.*]] = zext <8 x i8> [[TMP49]] to <8 x i32>			; CHECK-NEXT: [[TMP26:%.*]] = and <8 x i8> [[TMP23]], [[TMP17]]
	; CHECK-NEXT: [[TMP51:%.*]] = or <8 x i8> [[TMP45]], <i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51>			; CHECK-NEXT: [[TMP27:%.*]] = and <8 x i8> [[TMP25]], <i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4>
	; CHECK-NEXT: [[TMP52:%.*]] = zext <8 x i8> [[TMP51]] to <8 x i32>			; CHECK-NEXT: [[TMP28:%.*]] = xor <8 x i8> [[TMP27]], [[TMP18]]
	; CHECK-NEXT: [[TMP53:%.*]] = trunc <8 x i32> [[TMP52]] to <8 x i8>			; CHECK-NEXT: [[TMP29:%.*]] = mul <8 x i8> [[TMP28]], [[TMP26]]
	; CHECK-NEXT: [[TMP54:%.*]] = mul <8 x i8> [[TMP53]], <i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60>			; CHECK-NEXT: [[TMP30:%.*]] = getelementptr inbounds i8, ptr [[Q]], i64 [[TMP19]]
	; CHECK-NEXT: [[TMP55:%.*]] = zext <8 x i8> [[TMP54]] to <8 x i32>			; CHECK-NEXT: [[TMP31:%.*]] = getelementptr inbounds i8, ptr [[TMP30]], i32 0
	; CHECK-NEXT: [[TMP56:%.*]] = trunc <8 x i32> [[TMP50]] to <8 x i8>			; CHECK-NEXT: store <8 x i8> [[TMP29]], ptr [[TMP31]], align 1
	; CHECK-NEXT: [[TMP57:%.*]] = trunc <8 x i32> [[TMP39]] to <8 x i8>			; CHECK-NEXT: [[INDEX_NEXT13]] = add nuw i64 [[INDEX11]], 8
	; CHECK-NEXT: [[TMP58:%.*]] = and <8 x i8> [[TMP56]], [[TMP57]]			; CHECK-NEXT: [[TMP32:%.*]] = icmp eq i64 [[INDEX_NEXT13]], [[N_VEC5]]
	; CHECK-NEXT: [[TMP59:%.*]] = zext <8 x i8> [[TMP58]] to <8 x i32>			; CHECK-NEXT: br i1 [[TMP32]], label [[VEC_EPILOG_MIDDLE_BLOCK:%.*]], label [[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP16:![0-9]+]]
	; CHECK-NEXT: [[TMP60:%.*]] = trunc <8 x i32> [[TMP55]] to <8 x i8>
	; CHECK-NEXT: [[TMP61:%.*]] = and <8 x i8> [[TMP60]], <i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4>
	; CHECK-NEXT: [[TMP62:%.*]] = zext <8 x i8> [[TMP61]] to <8 x i32>
	; CHECK-NEXT: [[TMP63:%.*]] = trunc <8 x i32> [[TMP62]] to <8 x i8>
	; CHECK-NEXT: [[TMP64:%.*]] = trunc <8 x i32> [[TMP41]] to <8 x i8>
	; CHECK-NEXT: [[TMP65:%.*]] = xor <8 x i8> [[TMP63]], [[TMP64]]
	; CHECK-NEXT: [[TMP66:%.*]] = zext <8 x i8> [[TMP65]] to <8 x i32>
	; CHECK-NEXT: [[TMP67:%.*]] = trunc <8 x i32> [[TMP66]] to <8 x i8>
	; CHECK-NEXT: [[TMP68:%.*]] = trunc <8 x i32> [[TMP59]] to <8 x i8>
	; CHECK-NEXT: [[TMP69:%.*]] = mul <8 x i8> [[TMP67]], [[TMP68]]
	; CHECK-NEXT: [[TMP70:%.*]] = zext <8 x i8> [[TMP69]] to <8 x i32>
	; CHECK-NEXT: [[TMP71:%.*]] = trunc <8 x i32> [[TMP70]] to <8 x i8>
	; CHECK-NEXT: [[TMP72:%.*]] = getelementptr inbounds i8, ptr [[Q]], i64 [[TMP42]]
	; CHECK-NEXT: [[TMP73:%.*]] = getelementptr inbounds i8, ptr [[TMP72]], i32 0
	; CHECK-NEXT: store <8 x i8> [[TMP71]], ptr [[TMP73]], align 1
	; CHECK-NEXT: [[INDEX_NEXT12]] = add nuw i64 [[INDEX7]], 8
	; CHECK-NEXT: [[TMP74:%.*]] = icmp eq i64 [[INDEX_NEXT12]], [[N_VEC5]]
	; CHECK-NEXT: br i1 [[TMP74]], label [[VEC_EPILOG_MIDDLE_BLOCK:%.*]], label [[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP16:![0-9]+]]
	; CHECK: vec.epilog.middle.block:			; CHECK: vec.epilog.middle.block:
	; CHECK-NEXT: [[CMP_N6:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC5]]			; CHECK-NEXT: [[CMP_N6:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC5]]
	; CHECK-NEXT: br i1 [[CMP_N6]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[VEC_EPILOG_SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N6]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[VEC_EPILOG_SCALAR_PH]]
	; CHECK: vec.epilog.scalar.ph:			; CHECK: vec.epilog.scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC5]], [[VEC_EPILOG_MIDDLE_BLOCK]] ], [ [[N_VEC]], [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[ITER_CHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC5]], [[VEC_EPILOG_MIDDLE_BLOCK]] ], [ [[N_VEC]], [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[ITER_CHECK]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.cond.cleanup.loopexit:			; CHECK: for.cond.cleanup.loopexit:
	; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]			; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[VEC_EPILOG_SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[VEC_EPILOG_SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i8, ptr [[P]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i8, ptr [[P]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[TMP75:%.*]] = load i8, ptr [[ARRAYIDX]], align 1			; CHECK-NEXT: [[TMP33:%.*]] = load i8, ptr [[ARRAYIDX]], align 1
	; CHECK-NEXT: [[CONV:%.*]] = zext i8 [[TMP75]] to i32			; CHECK-NEXT: [[CONV:%.*]] = zext i8 [[TMP33]] to i32
	; CHECK-NEXT: [[ADD:%.*]] = shl i32 [[CONV]], 4			; CHECK-NEXT: [[ADD:%.*]] = shl i32 [[CONV]], 4
	; CHECK-NEXT: [[CONV2:%.*]] = add nuw nsw i32 [[ADD]], 32			; CHECK-NEXT: [[CONV2:%.*]] = add nuw nsw i32 [[ADD]], 32
	; CHECK-NEXT: [[OR:%.*]] = or i32 [[CONV]], 51			; CHECK-NEXT: [[OR:%.*]] = or i32 [[CONV]], 51
	; CHECK-NEXT: [[MUL:%.*]] = mul nuw nsw i32 [[OR]], 60			; CHECK-NEXT: [[MUL:%.*]] = mul nuw nsw i32 [[OR]], 60
	; CHECK-NEXT: [[AND:%.*]] = and i32 [[CONV2]], [[CONV13]]			; CHECK-NEXT: [[AND:%.*]] = and i32 [[CONV2]], [[CONV13]]
	; CHECK-NEXT: [[MUL_MASKED:%.*]] = and i32 [[MUL]], 252			; CHECK-NEXT: [[MUL_MASKED:%.*]] = and i32 [[MUL]], 252
	; CHECK-NEXT: [[CONV17:%.*]] = xor i32 [[MUL_MASKED]], [[CONV11]]			; CHECK-NEXT: [[CONV17:%.*]] = xor i32 [[MUL_MASKED]], [[CONV11]]
	; CHECK-NEXT: [[MUL18:%.*]] = mul nuw nsw i32 [[CONV17]], [[AND]]			; CHECK-NEXT: [[MUL18:%.*]] = mul nuw nsw i32 [[CONV17]], [[AND]]
	▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[VEC_EPILOG_SCALAR_PH:%.]], label [[VECTOR_MAIN_LOOP_ITER_CHECK:%.]]			; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[VEC_EPILOG_SCALAR_PH:%.]], label [[VECTOR_MAIN_LOOP_ITER_CHECK:%.]]
	; CHECK: vector.main.loop.iter.check:			; CHECK: vector.main.loop.iter.check:
	; CHECK-NEXT: [[MIN_ITERS_CHECK1:%.*]] = icmp ult i64 [[TMP0]], 16			; CHECK-NEXT: [[MIN_ITERS_CHECK1:%.*]] = icmp ult i64 [[TMP0]], 16
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK1]], label [[VEC_EPILOG_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK1]], label [[VEC_EPILOG_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP0]], 16			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP0]], 16
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF]]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <16 x i32> poison, i32 [[CONV13]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <16 x i32> poison, i32 [[CONV13]], i64 0
	; CHECK-NEXT: [[TMP1:%.*]] = trunc <16 x i32> [[BROADCAST_SPLATINSERT]] to <16 x i8>			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <16 x i32> [[BROADCAST_SPLATINSERT]], <16 x i32> poison, <16 x i32> zeroinitializer
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <16 x i8> [[TMP1]], <16 x i8> poison, <16 x i32> zeroinitializer			; CHECK-NEXT: [[TMP1:%.*]] = trunc <16 x i32> [[BROADCAST_SPLAT]] to <16 x i8>
	; CHECK-NEXT: [[TMP2:%.*]] = zext <16 x i8> [[BROADCAST_SPLAT]] to <16 x i32>
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT2:%.*]] = insertelement <16 x i32> poison, i32 [[CONV11]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT2:%.*]] = insertelement <16 x i32> poison, i32 [[CONV11]], i64 0
	; CHECK-NEXT: [[TMP3:%.*]] = trunc <16 x i32> [[BROADCAST_SPLATINSERT2]] to <16 x i8>			; CHECK-NEXT: [[BROADCAST_SPLAT3:%.*]] = shufflevector <16 x i32> [[BROADCAST_SPLATINSERT2]], <16 x i32> poison, <16 x i32> zeroinitializer
	; CHECK-NEXT: [[BROADCAST_SPLAT3:%.*]] = shufflevector <16 x i8> [[TMP3]], <16 x i8> poison, <16 x i32> zeroinitializer			; CHECK-NEXT: [[TMP2:%.*]] = trunc <16 x i32> [[BROADCAST_SPLAT3]] to <16 x i8>
	; CHECK-NEXT: [[TMP4:%.*]] = zext <16 x i8> [[BROADCAST_SPLAT3]] to <16 x i32>
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[INDEX]], 0			; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i16, ptr [[P]], i64 [[TMP5]]			; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds i16, ptr [[P]], i64 [[TMP3]]
	; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds i16, ptr [[TMP6]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i16, ptr [[TMP4]], i32 0
	; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <16 x i16>, ptr [[TMP7]], align 2			; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <16 x i16>, ptr [[TMP5]], align 2
	; CHECK-NEXT: [[TMP8:%.*]] = trunc <16 x i16> [[WIDE_LOAD]] to <16 x i8>			; CHECK-NEXT: [[TMP6:%.*]] = trunc <16 x i16> [[WIDE_LOAD]] to <16 x i8>
	; CHECK-NEXT: [[TMP9:%.*]] = zext <16 x i8> [[TMP8]] to <16 x i32>			; CHECK-NEXT: [[TMP7:%.*]] = shl <16 x i8> [[TMP6]], <i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4>
	; CHECK-NEXT: [[TMP10:%.*]] = trunc <16 x i32> [[TMP9]] to <16 x i8>			; CHECK-NEXT: [[TMP8:%.*]] = add <16 x i8> [[TMP7]], <i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32>
	; CHECK-NEXT: [[TMP11:%.*]] = shl <16 x i8> [[TMP10]], <i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4>			; CHECK-NEXT: [[TMP9:%.*]] = and <16 x i8> [[TMP6]], <i8 -52, i8 -52, i8 -52, i8 -52, i8 -52, i8 -52, i8 -52, i8 -52, i8 -52, i8 -52, i8 -52, i8 -52, i8 -52, i8 -52, i8 -52, i8 -52>
	; CHECK-NEXT: [[TMP12:%.*]] = zext <16 x i8> [[TMP11]] to <16 x i32>			; CHECK-NEXT: [[TMP10:%.*]] = or <16 x i8> [[TMP9]], <i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51>
	; CHECK-NEXT: [[TMP13:%.*]] = trunc <16 x i32> [[TMP12]] to <16 x i8>			; CHECK-NEXT: [[TMP11:%.*]] = mul <16 x i8> [[TMP10]], <i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60>
	; CHECK-NEXT: [[TMP14:%.*]] = add <16 x i8> [[TMP13]], <i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32>			; CHECK-NEXT: [[TMP12:%.*]] = and <16 x i8> [[TMP8]], [[TMP1]]
	; CHECK-NEXT: [[TMP15:%.*]] = zext <16 x i8> [[TMP14]] to <16 x i32>			; CHECK-NEXT: [[TMP13:%.*]] = and <16 x i8> [[TMP11]], <i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4>
	; CHECK-NEXT: [[TMP16:%.*]] = and <16 x i8> [[TMP8]], <i8 -52, i8 -52, i8 -52, i8 -52, i8 -52, i8 -52, i8 -52, i8 -52, i8 -52, i8 -52, i8 -52, i8 -52, i8 -52, i8 -52, i8 -52, i8 -52>			; CHECK-NEXT: [[TMP14:%.*]] = xor <16 x i8> [[TMP13]], [[TMP2]]
	; CHECK-NEXT: [[TMP17:%.*]] = zext <16 x i8> [[TMP16]] to <16 x i32>			; CHECK-NEXT: [[TMP15:%.*]] = mul <16 x i8> [[TMP14]], [[TMP12]]
	; CHECK-NEXT: [[TMP18:%.*]] = trunc <16 x i32> [[TMP17]] to <16 x i8>			; CHECK-NEXT: [[TMP16:%.*]] = getelementptr inbounds i8, ptr [[Q]], i64 [[TMP3]]
	; CHECK-NEXT: [[TMP19:%.*]] = or <16 x i8> [[TMP18]], <i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51>			; CHECK-NEXT: [[TMP17:%.*]] = getelementptr inbounds i8, ptr [[TMP16]], i32 0
	; CHECK-NEXT: [[TMP20:%.*]] = zext <16 x i8> [[TMP19]] to <16 x i32>			; CHECK-NEXT: store <16 x i8> [[TMP15]], ptr [[TMP17]], align 1
	; CHECK-NEXT: [[TMP21:%.*]] = trunc <16 x i32> [[TMP20]] to <16 x i8>
	; CHECK-NEXT: [[TMP22:%.*]] = mul <16 x i8> [[TMP21]], <i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60>
	; CHECK-NEXT: [[TMP23:%.*]] = zext <16 x i8> [[TMP22]] to <16 x i32>
	; CHECK-NEXT: [[TMP24:%.*]] = trunc <16 x i32> [[TMP15]] to <16 x i8>
	; CHECK-NEXT: [[TMP25:%.*]] = trunc <16 x i32> [[TMP2]] to <16 x i8>
	; CHECK-NEXT: [[TMP26:%.*]] = and <16 x i8> [[TMP24]], [[TMP25]]
	; CHECK-NEXT: [[TMP27:%.*]] = zext <16 x i8> [[TMP26]] to <16 x i32>
	; CHECK-NEXT: [[TMP28:%.*]] = trunc <16 x i32> [[TMP23]] to <16 x i8>
	; CHECK-NEXT: [[TMP29:%.*]] = and <16 x i8> [[TMP28]], <i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4>
	; CHECK-NEXT: [[TMP30:%.*]] = zext <16 x i8> [[TMP29]] to <16 x i32>
	; CHECK-NEXT: [[TMP31:%.*]] = trunc <16 x i32> [[TMP30]] to <16 x i8>
	; CHECK-NEXT: [[TMP32:%.*]] = trunc <16 x i32> [[TMP4]] to <16 x i8>
	; CHECK-NEXT: [[TMP33:%.*]] = xor <16 x i8> [[TMP31]], [[TMP32]]
	; CHECK-NEXT: [[TMP34:%.*]] = zext <16 x i8> [[TMP33]] to <16 x i32>
	; CHECK-NEXT: [[TMP35:%.*]] = trunc <16 x i32> [[TMP34]] to <16 x i8>
	; CHECK-NEXT: [[TMP36:%.*]] = trunc <16 x i32> [[TMP27]] to <16 x i8>
	; CHECK-NEXT: [[TMP37:%.*]] = mul <16 x i8> [[TMP35]], [[TMP36]]
	; CHECK-NEXT: [[TMP38:%.*]] = zext <16 x i8> [[TMP37]] to <16 x i32>
	; CHECK-NEXT: [[TMP39:%.*]] = trunc <16 x i32> [[TMP38]] to <16 x i8>
	; CHECK-NEXT: [[TMP40:%.*]] = getelementptr inbounds i8, ptr [[Q]], i64 [[TMP5]]
	; CHECK-NEXT: [[TMP41:%.*]] = getelementptr inbounds i8, ptr [[TMP40]], i32 0
	; CHECK-NEXT: store <16 x i8> [[TMP39]], ptr [[TMP41]], align 1
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
	; CHECK-NEXT: [[TMP42:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP42]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP18:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP18:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.]], label [[VEC_EPILOG_ITER_CHECK:%.]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.]], label [[VEC_EPILOG_ITER_CHECK:%.]]
	; CHECK: vec.epilog.iter.check:			; CHECK: vec.epilog.iter.check:
	; CHECK-NEXT: [[N_VEC_REMAINING:%.*]] = sub i64 [[TMP0]], [[N_VEC]]			; CHECK-NEXT: [[N_VEC_REMAINING:%.*]] = sub i64 [[TMP0]], [[N_VEC]]
	; CHECK-NEXT: [[MIN_EPILOG_ITERS_CHECK:%.*]] = icmp ult i64 [[N_VEC_REMAINING]], 8			; CHECK-NEXT: [[MIN_EPILOG_ITERS_CHECK:%.*]] = icmp ult i64 [[N_VEC_REMAINING]], 8
	; CHECK-NEXT: br i1 [[MIN_EPILOG_ITERS_CHECK]], label [[VEC_EPILOG_SCALAR_PH]], label [[VEC_EPILOG_PH]]			; CHECK-NEXT: br i1 [[MIN_EPILOG_ITERS_CHECK]], label [[VEC_EPILOG_SCALAR_PH]], label [[VEC_EPILOG_PH]]
	; CHECK: vec.epilog.ph:			; CHECK: vec.epilog.ph:
	; CHECK-NEXT: [[VEC_EPILOG_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[VECTOR_MAIN_LOOP_ITER_CHECK]] ]			; CHECK-NEXT: [[VEC_EPILOG_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[VECTOR_MAIN_LOOP_ITER_CHECK]] ]
	; CHECK-NEXT: [[N_MOD_VF4:%.*]] = urem i64 [[TMP0]], 8			; CHECK-NEXT: [[N_MOD_VF4:%.*]] = urem i64 [[TMP0]], 8
	; CHECK-NEXT: [[N_VEC5:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF4]]			; CHECK-NEXT: [[N_VEC5:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF4]]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT9:%.*]] = insertelement <8 x i32> poison, i32 [[CONV13]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT7:%.*]] = insertelement <8 x i32> poison, i32 [[CONV13]], i64 0
	; CHECK-NEXT: [[TMP43:%.*]] = trunc <8 x i32> [[BROADCAST_SPLATINSERT9]] to <8 x i8>			; CHECK-NEXT: [[BROADCAST_SPLAT8:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT7]], <8 x i32> poison, <8 x i32> zeroinitializer
	; CHECK-NEXT: [[BROADCAST_SPLAT10:%.*]] = shufflevector <8 x i8> [[TMP43]], <8 x i8> poison, <8 x i32> zeroinitializer			; CHECK-NEXT: [[TMP19:%.*]] = trunc <8 x i32> [[BROADCAST_SPLAT8]] to <8 x i8>
	; CHECK-NEXT: [[TMP44:%.*]] = zext <8 x i8> [[BROADCAST_SPLAT10]] to <8 x i32>			; CHECK-NEXT: [[BROADCAST_SPLATINSERT9:%.*]] = insertelement <8 x i32> poison, i32 [[CONV11]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT11:%.*]] = insertelement <8 x i32> poison, i32 [[CONV11]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLAT10:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT9]], <8 x i32> poison, <8 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP45:%.*]] = trunc <8 x i32> [[BROADCAST_SPLATINSERT11]] to <8 x i8>			; CHECK-NEXT: [[TMP20:%.*]] = trunc <8 x i32> [[BROADCAST_SPLAT10]] to <8 x i8>
	; CHECK-NEXT: [[BROADCAST_SPLAT12:%.*]] = shufflevector <8 x i8> [[TMP45]], <8 x i8> poison, <8 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP46:%.*]] = zext <8 x i8> [[BROADCAST_SPLAT12]] to <8 x i32>
	; CHECK-NEXT: br label [[VEC_EPILOG_VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VEC_EPILOG_VECTOR_BODY:%.*]]
	; CHECK: vec.epilog.vector.body:			; CHECK: vec.epilog.vector.body:
	; CHECK-NEXT: [[INDEX7:%.]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], [[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT13:%.]], [[VEC_EPILOG_VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX11:%.]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], [[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT13:%.]], [[VEC_EPILOG_VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP47:%.*]] = add i64 [[INDEX7]], 0			; CHECK-NEXT: [[TMP21:%.*]] = add i64 [[INDEX11]], 0
	; CHECK-NEXT: [[TMP48:%.*]] = getelementptr inbounds i16, ptr [[P]], i64 [[TMP47]]			; CHECK-NEXT: [[TMP22:%.*]] = getelementptr inbounds i16, ptr [[P]], i64 [[TMP21]]
	; CHECK-NEXT: [[TMP49:%.*]] = getelementptr inbounds i16, ptr [[TMP48]], i32 0			; CHECK-NEXT: [[TMP23:%.*]] = getelementptr inbounds i16, ptr [[TMP22]], i32 0
	; CHECK-NEXT: [[WIDE_LOAD8:%.*]] = load <8 x i16>, ptr [[TMP49]], align 2			; CHECK-NEXT: [[WIDE_LOAD12:%.*]] = load <8 x i16>, ptr [[TMP23]], align 2
	; CHECK-NEXT: [[TMP50:%.*]] = trunc <8 x i16> [[WIDE_LOAD8]] to <8 x i8>			; CHECK-NEXT: [[TMP24:%.*]] = trunc <8 x i16> [[WIDE_LOAD12]] to <8 x i8>
	; CHECK-NEXT: [[TMP51:%.*]] = zext <8 x i8> [[TMP50]] to <8 x i32>			; CHECK-NEXT: [[TMP25:%.*]] = shl <8 x i8> [[TMP24]], <i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4>
	; CHECK-NEXT: [[TMP52:%.*]] = trunc <8 x i32> [[TMP51]] to <8 x i8>			; CHECK-NEXT: [[TMP26:%.*]] = add <8 x i8> [[TMP25]], <i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32>
	; CHECK-NEXT: [[TMP53:%.*]] = shl <8 x i8> [[TMP52]], <i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4>			; CHECK-NEXT: [[TMP27:%.*]] = and <8 x i8> [[TMP24]], <i8 -52, i8 -52, i8 -52, i8 -52, i8 -52, i8 -52, i8 -52, i8 -52>
	; CHECK-NEXT: [[TMP54:%.*]] = zext <8 x i8> [[TMP53]] to <8 x i32>			; CHECK-NEXT: [[TMP28:%.*]] = or <8 x i8> [[TMP27]], <i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51>
	; CHECK-NEXT: [[TMP55:%.*]] = trunc <8 x i32> [[TMP54]] to <8 x i8>			; CHECK-NEXT: [[TMP29:%.*]] = mul <8 x i8> [[TMP28]], <i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60>
	; CHECK-NEXT: [[TMP56:%.*]] = add <8 x i8> [[TMP55]], <i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32, i8 32>			; CHECK-NEXT: [[TMP30:%.*]] = and <8 x i8> [[TMP26]], [[TMP19]]
	; CHECK-NEXT: [[TMP57:%.*]] = zext <8 x i8> [[TMP56]] to <8 x i32>			; CHECK-NEXT: [[TMP31:%.*]] = and <8 x i8> [[TMP29]], <i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4>
	; CHECK-NEXT: [[TMP58:%.*]] = and <8 x i8> [[TMP50]], <i8 -52, i8 -52, i8 -52, i8 -52, i8 -52, i8 -52, i8 -52, i8 -52>			; CHECK-NEXT: [[TMP32:%.*]] = xor <8 x i8> [[TMP31]], [[TMP20]]
	; CHECK-NEXT: [[TMP59:%.*]] = zext <8 x i8> [[TMP58]] to <8 x i32>			; CHECK-NEXT: [[TMP33:%.*]] = mul <8 x i8> [[TMP32]], [[TMP30]]
	; CHECK-NEXT: [[TMP60:%.*]] = trunc <8 x i32> [[TMP59]] to <8 x i8>			; CHECK-NEXT: [[TMP34:%.*]] = getelementptr inbounds i8, ptr [[Q]], i64 [[TMP21]]
	; CHECK-NEXT: [[TMP61:%.*]] = or <8 x i8> [[TMP60]], <i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51, i8 51>			; CHECK-NEXT: [[TMP35:%.*]] = getelementptr inbounds i8, ptr [[TMP34]], i32 0
	; CHECK-NEXT: [[TMP62:%.*]] = zext <8 x i8> [[TMP61]] to <8 x i32>			; CHECK-NEXT: store <8 x i8> [[TMP33]], ptr [[TMP35]], align 1
	; CHECK-NEXT: [[TMP63:%.*]] = trunc <8 x i32> [[TMP62]] to <8 x i8>			; CHECK-NEXT: [[INDEX_NEXT13]] = add nuw i64 [[INDEX11]], 8
	; CHECK-NEXT: [[TMP64:%.*]] = mul <8 x i8> [[TMP63]], <i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60, i8 60>			; CHECK-NEXT: [[TMP36:%.*]] = icmp eq i64 [[INDEX_NEXT13]], [[N_VEC5]]
	; CHECK-NEXT: [[TMP65:%.*]] = zext <8 x i8> [[TMP64]] to <8 x i32>			; CHECK-NEXT: br i1 [[TMP36]], label [[VEC_EPILOG_MIDDLE_BLOCK:%.*]], label [[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP19:![0-9]+]]
	; CHECK-NEXT: [[TMP66:%.*]] = trunc <8 x i32> [[TMP57]] to <8 x i8>
	; CHECK-NEXT: [[TMP67:%.*]] = trunc <8 x i32> [[TMP44]] to <8 x i8>
	; CHECK-NEXT: [[TMP68:%.*]] = and <8 x i8> [[TMP66]], [[TMP67]]
	; CHECK-NEXT: [[TMP69:%.*]] = zext <8 x i8> [[TMP68]] to <8 x i32>
	; CHECK-NEXT: [[TMP70:%.*]] = trunc <8 x i32> [[TMP65]] to <8 x i8>
	; CHECK-NEXT: [[TMP71:%.*]] = and <8 x i8> [[TMP70]], <i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4, i8 -4>
	; CHECK-NEXT: [[TMP72:%.*]] = zext <8 x i8> [[TMP71]] to <8 x i32>
	; CHECK-NEXT: [[TMP73:%.*]] = trunc <8 x i32> [[TMP72]] to <8 x i8>
	; CHECK-NEXT: [[TMP74:%.*]] = trunc <8 x i32> [[TMP46]] to <8 x i8>
	; CHECK-NEXT: [[TMP75:%.*]] = xor <8 x i8> [[TMP73]], [[TMP74]]
	; CHECK-NEXT: [[TMP76:%.*]] = zext <8 x i8> [[TMP75]] to <8 x i32>
	; CHECK-NEXT: [[TMP77:%.*]] = trunc <8 x i32> [[TMP76]] to <8 x i8>
	; CHECK-NEXT: [[TMP78:%.*]] = trunc <8 x i32> [[TMP69]] to <8 x i8>
	; CHECK-NEXT: [[TMP79:%.*]] = mul <8 x i8> [[TMP77]], [[TMP78]]
	; CHECK-NEXT: [[TMP80:%.*]] = zext <8 x i8> [[TMP79]] to <8 x i32>
	; CHECK-NEXT: [[TMP81:%.*]] = trunc <8 x i32> [[TMP80]] to <8 x i8>
	; CHECK-NEXT: [[TMP82:%.*]] = getelementptr inbounds i8, ptr [[Q]], i64 [[TMP47]]
	; CHECK-NEXT: [[TMP83:%.*]] = getelementptr inbounds i8, ptr [[TMP82]], i32 0
	; CHECK-NEXT: store <8 x i8> [[TMP81]], ptr [[TMP83]], align 1
	; CHECK-NEXT: [[INDEX_NEXT13]] = add nuw i64 [[INDEX7]], 8
	; CHECK-NEXT: [[TMP84:%.*]] = icmp eq i64 [[INDEX_NEXT13]], [[N_VEC5]]
	; CHECK-NEXT: br i1 [[TMP84]], label [[VEC_EPILOG_MIDDLE_BLOCK:%.*]], label [[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP19:![0-9]+]]
	; CHECK: vec.epilog.middle.block:			; CHECK: vec.epilog.middle.block:
	; CHECK-NEXT: [[CMP_N6:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC5]]			; CHECK-NEXT: [[CMP_N6:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC5]]
	; CHECK-NEXT: br i1 [[CMP_N6]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[VEC_EPILOG_SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N6]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[VEC_EPILOG_SCALAR_PH]]
	; CHECK: vec.epilog.scalar.ph:			; CHECK: vec.epilog.scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC5]], [[VEC_EPILOG_MIDDLE_BLOCK]] ], [ [[N_VEC]], [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[ITER_CHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC5]], [[VEC_EPILOG_MIDDLE_BLOCK]] ], [ [[N_VEC]], [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[ITER_CHECK]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.cond.cleanup.loopexit:			; CHECK: for.cond.cleanup.loopexit:
	; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]			; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[VEC_EPILOG_SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[VEC_EPILOG_SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i16, ptr [[P]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i16, ptr [[P]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[TMP85:%.*]] = load i16, ptr [[ARRAYIDX]], align 2			; CHECK-NEXT: [[TMP37:%.*]] = load i16, ptr [[ARRAYIDX]], align 2
	; CHECK-NEXT: [[CONV:%.*]] = sext i16 [[TMP85]] to i32			; CHECK-NEXT: [[CONV:%.*]] = sext i16 [[TMP37]] to i32
	; CHECK-NEXT: [[ADD:%.*]] = shl i32 [[CONV]], 4			; CHECK-NEXT: [[ADD:%.*]] = shl i32 [[CONV]], 4
	; CHECK-NEXT: [[CONV2:%.*]] = add nsw i32 [[ADD]], 32			; CHECK-NEXT: [[CONV2:%.*]] = add nsw i32 [[ADD]], 32
	; CHECK-NEXT: [[OR:%.*]] = and i32 [[CONV]], 204			; CHECK-NEXT: [[OR:%.*]] = and i32 [[CONV]], 204
	; CHECK-NEXT: [[CONV8:%.*]] = or i32 [[OR]], 51			; CHECK-NEXT: [[CONV8:%.*]] = or i32 [[OR]], 51
	; CHECK-NEXT: [[MUL:%.*]] = mul nuw nsw i32 [[CONV8]], 60			; CHECK-NEXT: [[MUL:%.*]] = mul nuw nsw i32 [[CONV8]], 60
	; CHECK-NEXT: [[AND:%.*]] = and i32 [[CONV2]], [[CONV13]]			; CHECK-NEXT: [[AND:%.*]] = and i32 [[CONV2]], [[CONV13]]
	; CHECK-NEXT: [[MUL_MASKED:%.*]] = and i32 [[MUL]], 252			; CHECK-NEXT: [[MUL_MASKED:%.*]] = and i32 [[MUL]], 252
	; CHECK-NEXT: [[CONV17:%.*]] = xor i32 [[MUL_MASKED]], [[CONV11]]			; CHECK-NEXT: [[CONV17:%.*]] = xor i32 [[MUL_MASKED]], [[CONV11]]
	▲ Show 20 Lines • Show All 209 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/type-shrinkage-insertelt.ll

	Show All 15 Lines
	; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1			; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
	; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2			; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
	; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3			; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
	; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds i16, ptr [[A]], i64 [[TMP0]]			; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds i16, ptr [[A]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i16, ptr [[TMP4]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i16, ptr [[TMP4]], i32 0
	; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i16>, ptr [[TMP5]], align 2			; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i16>, ptr [[TMP5]], align 2
	; CHECK-NEXT: [[TMP6:%.*]] = add <4 x i16> [[WIDE_LOAD]], <i16 10, i16 10, i16 10, i16 10>			; CHECK-NEXT: [[TMP6:%.*]] = add <4 x i16> [[WIDE_LOAD]], <i16 10, i16 10, i16 10, i16 10>
	; CHECK-NEXT: [[TMP7:%.*]] = zext <4 x i16> [[TMP6]] to <4 x i32>			; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP8:%.*]] = trunc <4 x i32> [[TMP7]] to <4 x i16>			; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP1]]
	; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP0]]			; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP2]]
	; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP1]]			; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP3]]
	; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP2]]			; CHECK-NEXT: [[TMP11:%.*]] = load i64, ptr [[TMP7]], align 8
	; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP3]]			; CHECK-NEXT: [[TMP12:%.*]] = load i64, ptr [[TMP8]], align 8
	; CHECK-NEXT: [[TMP13:%.*]] = load i64, ptr [[TMP9]], align 8			; CHECK-NEXT: [[TMP13:%.*]] = load i64, ptr [[TMP9]], align 8
	; CHECK-NEXT: [[TMP14:%.*]] = load i64, ptr [[TMP10]], align 8			; CHECK-NEXT: [[TMP14:%.*]] = load i64, ptr [[TMP10]], align 8
	; CHECK-NEXT: [[TMP15:%.*]] = load i64, ptr [[TMP11]], align 8			; CHECK-NEXT: [[TMP15:%.*]] = ashr exact i64 [[TMP11]], 32
	; CHECK-NEXT: [[TMP16:%.*]] = load i64, ptr [[TMP12]], align 8			; CHECK-NEXT: [[TMP16:%.*]] = ashr exact i64 [[TMP12]], 32
	; CHECK-NEXT: [[TMP17:%.*]] = ashr exact i64 [[TMP13]], 32			; CHECK-NEXT: [[TMP17:%.*]] = ashr exact i64 [[TMP13]], 32
	; CHECK-NEXT: [[TMP18:%.*]] = ashr exact i64 [[TMP14]], 32			; CHECK-NEXT: [[TMP18:%.*]] = ashr exact i64 [[TMP14]], 32
	; CHECK-NEXT: [[TMP19:%.*]] = ashr exact i64 [[TMP15]], 32			; CHECK-NEXT: [[TMP19:%.*]] = getelementptr inbounds i16, ptr [[M3]], i64 [[TMP15]]
	; CHECK-NEXT: [[TMP20:%.*]] = ashr exact i64 [[TMP16]], 32			; CHECK-NEXT: [[TMP20:%.*]] = getelementptr inbounds i16, ptr [[M3]], i64 [[TMP16]]
	; CHECK-NEXT: [[TMP21:%.*]] = getelementptr inbounds i16, ptr [[M3]], i64 [[TMP17]]			; CHECK-NEXT: [[TMP21:%.*]] = getelementptr inbounds i16, ptr [[M3]], i64 [[TMP17]]
	; CHECK-NEXT: [[TMP22:%.*]] = getelementptr inbounds i16, ptr [[M3]], i64 [[TMP18]]			; CHECK-NEXT: [[TMP22:%.*]] = getelementptr inbounds i16, ptr [[M3]], i64 [[TMP18]]
	; CHECK-NEXT: [[TMP23:%.*]] = getelementptr inbounds i16, ptr [[M3]], i64 [[TMP19]]			; CHECK-NEXT: [[TMP23:%.*]] = extractelement <4 x i16> [[TMP6]], i32 0
	; CHECK-NEXT: [[TMP24:%.*]] = getelementptr inbounds i16, ptr [[M3]], i64 [[TMP20]]			; CHECK-NEXT: store i16 [[TMP23]], ptr [[TMP19]], align 2
	; CHECK-NEXT: [[TMP25:%.*]] = extractelement <4 x i16> [[TMP8]], i32 0			; CHECK-NEXT: [[TMP24:%.*]] = extractelement <4 x i16> [[TMP6]], i32 1
				; CHECK-NEXT: store i16 [[TMP24]], ptr [[TMP20]], align 2
				; CHECK-NEXT: [[TMP25:%.*]] = extractelement <4 x i16> [[TMP6]], i32 2
	; CHECK-NEXT: store i16 [[TMP25]], ptr [[TMP21]], align 2			; CHECK-NEXT: store i16 [[TMP25]], ptr [[TMP21]], align 2
	; CHECK-NEXT: [[TMP26:%.*]] = extractelement <4 x i16> [[TMP8]], i32 1			; CHECK-NEXT: [[TMP26:%.*]] = extractelement <4 x i16> [[TMP6]], i32 3
	; CHECK-NEXT: store i16 [[TMP26]], ptr [[TMP22]], align 2			; CHECK-NEXT: store i16 [[TMP26]], ptr [[TMP22]], align 2
	; CHECK-NEXT: [[TMP27:%.*]] = extractelement <4 x i16> [[TMP8]], i32 2
	; CHECK-NEXT: store i16 [[TMP27]], ptr [[TMP23]], align 2
	; CHECK-NEXT: [[TMP28:%.*]] = extractelement <4 x i16> [[TMP8]], i32 3
	; CHECK-NEXT: store i16 [[TMP28]], ptr [[TMP24]], align 2
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP29:%.*]] = icmp eq i64 [[INDEX_NEXT]], 16			; CHECK-NEXT: [[TMP27:%.*]] = icmp eq i64 [[INDEX_NEXT]], 16
	; CHECK-NEXT: br i1 [[TMP29]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP27]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[FOR_INC1286_LOOPEXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[FOR_INC1286_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 16, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 16, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[IF_THEN1165_US:%.*]]			; CHECK-NEXT: br label [[IF_THEN1165_US:%.*]]
	; CHECK: if.then1165.us:			; CHECK: if.then1165.us:
	; CHECK-NEXT: [[INDVARS_IV1783:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT1784:%.]], [[IF_THEN1165_US]] ]			; CHECK-NEXT: [[INDVARS_IV1783:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT1784:%.]], [[IF_THEN1165_US]] ]
	; CHECK-NEXT: [[GEP_A:%.*]] = getelementptr inbounds i16, ptr [[A]], i64 [[INDVARS_IV1783]]			; CHECK-NEXT: [[GEP_A:%.*]] = getelementptr inbounds i16, ptr [[A]], i64 [[INDVARS_IV1783]]
	▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP4:%.*]] = load i32, ptr [[C]], align 4			; CHECK-NEXT: [[TMP4:%.*]] = load i32, ptr [[C]], align 4
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[TMP4]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[TMP4]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i16, ptr [[A]], i64 [[TMP0]]			; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i16, ptr [[A]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i16, ptr [[TMP5]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i16, ptr [[TMP5]], i32 0
	; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i16>, ptr [[TMP6]], align 2			; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i16>, ptr [[TMP6]], align 2
	; CHECK-NEXT: [[TMP7:%.*]] = trunc <4 x i32> [[BROADCAST_SPLAT]] to <4 x i16>			; CHECK-NEXT: [[TMP7:%.*]] = trunc <4 x i32> [[BROADCAST_SPLAT]] to <4 x i16>
	; CHECK-NEXT: [[TMP8:%.*]] = add <4 x i16> [[WIDE_LOAD]], [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = add <4 x i16> [[WIDE_LOAD]], [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = zext <4 x i16> [[TMP8]] to <4 x i32>			; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP10:%.*]] = trunc <4 x i32> [[TMP9]] to <4 x i16>			; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP1]]
	; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP0]]			; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP2]]
	; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP1]]			; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP3]]
	; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP2]]			; CHECK-NEXT: [[TMP13:%.*]] = load i64, ptr [[TMP9]], align 8
	; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP3]]			; CHECK-NEXT: [[TMP14:%.*]] = load i64, ptr [[TMP10]], align 8
	; CHECK-NEXT: [[TMP15:%.*]] = load i64, ptr [[TMP11]], align 8			; CHECK-NEXT: [[TMP15:%.*]] = load i64, ptr [[TMP11]], align 8
	; CHECK-NEXT: [[TMP16:%.*]] = load i64, ptr [[TMP12]], align 8			; CHECK-NEXT: [[TMP16:%.*]] = load i64, ptr [[TMP12]], align 8
	; CHECK-NEXT: [[TMP17:%.*]] = load i64, ptr [[TMP13]], align 8			; CHECK-NEXT: [[TMP17:%.*]] = ashr exact i64 [[TMP13]], 32
	; CHECK-NEXT: [[TMP18:%.*]] = load i64, ptr [[TMP14]], align 8			; CHECK-NEXT: [[TMP18:%.*]] = ashr exact i64 [[TMP14]], 32
	; CHECK-NEXT: [[TMP19:%.*]] = ashr exact i64 [[TMP15]], 32			; CHECK-NEXT: [[TMP19:%.*]] = ashr exact i64 [[TMP15]], 32
	; CHECK-NEXT: [[TMP20:%.*]] = ashr exact i64 [[TMP16]], 32			; CHECK-NEXT: [[TMP20:%.*]] = ashr exact i64 [[TMP16]], 32
	; CHECK-NEXT: [[TMP21:%.*]] = ashr exact i64 [[TMP17]], 32			; CHECK-NEXT: [[TMP21:%.*]] = getelementptr inbounds i16, ptr [[M3]], i64 [[TMP17]]
	; CHECK-NEXT: [[TMP22:%.*]] = ashr exact i64 [[TMP18]], 32			; CHECK-NEXT: [[TMP22:%.*]] = getelementptr inbounds i16, ptr [[M3]], i64 [[TMP18]]
	; CHECK-NEXT: [[TMP23:%.*]] = getelementptr inbounds i16, ptr [[M3]], i64 [[TMP19]]			; CHECK-NEXT: [[TMP23:%.*]] = getelementptr inbounds i16, ptr [[M3]], i64 [[TMP19]]
	; CHECK-NEXT: [[TMP24:%.*]] = getelementptr inbounds i16, ptr [[M3]], i64 [[TMP20]]			; CHECK-NEXT: [[TMP24:%.*]] = getelementptr inbounds i16, ptr [[M3]], i64 [[TMP20]]
	; CHECK-NEXT: [[TMP25:%.*]] = getelementptr inbounds i16, ptr [[M3]], i64 [[TMP21]]			; CHECK-NEXT: [[TMP25:%.*]] = extractelement <4 x i16> [[TMP8]], i32 0
	; CHECK-NEXT: [[TMP26:%.*]] = getelementptr inbounds i16, ptr [[M3]], i64 [[TMP22]]			; CHECK-NEXT: store i16 [[TMP25]], ptr [[TMP21]], align 2
	; CHECK-NEXT: [[TMP27:%.*]] = extractelement <4 x i16> [[TMP10]], i32 0			; CHECK-NEXT: [[TMP26:%.*]] = extractelement <4 x i16> [[TMP8]], i32 1
				; CHECK-NEXT: store i16 [[TMP26]], ptr [[TMP22]], align 2
				; CHECK-NEXT: [[TMP27:%.*]] = extractelement <4 x i16> [[TMP8]], i32 2
	; CHECK-NEXT: store i16 [[TMP27]], ptr [[TMP23]], align 2			; CHECK-NEXT: store i16 [[TMP27]], ptr [[TMP23]], align 2
	; CHECK-NEXT: [[TMP28:%.*]] = extractelement <4 x i16> [[TMP10]], i32 1			; CHECK-NEXT: [[TMP28:%.*]] = extractelement <4 x i16> [[TMP8]], i32 3
	; CHECK-NEXT: store i16 [[TMP28]], ptr [[TMP24]], align 2			; CHECK-NEXT: store i16 [[TMP28]], ptr [[TMP24]], align 2
	; CHECK-NEXT: [[TMP29:%.*]] = extractelement <4 x i16> [[TMP10]], i32 2
	; CHECK-NEXT: store i16 [[TMP29]], ptr [[TMP25]], align 2
	; CHECK-NEXT: [[TMP30:%.*]] = extractelement <4 x i16> [[TMP10]], i32 3
	; CHECK-NEXT: store i16 [[TMP30]], ptr [[TMP26]], align 2
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP31:%.*]] = icmp eq i64 [[INDEX_NEXT]], 16			; CHECK-NEXT: [[TMP29:%.*]] = icmp eq i64 [[INDEX_NEXT]], 16
	; CHECK-NEXT: br i1 [[TMP31]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP29]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[FOR_INC1286_LOOPEXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[FOR_INC1286_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 16, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 16, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[IF_THEN1165_US:%.*]]			; CHECK-NEXT: br label [[IF_THEN1165_US:%.*]]
	; CHECK: if.then1165.us:			; CHECK: if.then1165.us:
	; CHECK-NEXT: [[INDVARS_IV1783:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT1784:%.]], [[IF_THEN1165_US]] ]			; CHECK-NEXT: [[INDVARS_IV1783:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT1784:%.]], [[IF_THEN1165_US]] ]
	; CHECK-NEXT: [[FPTR:%.*]] = load i32, ptr [[C]], align 4			; CHECK-NEXT: [[FPTR:%.*]] = load i32, ptr [[C]], align 4
	Show All 39 Lines

llvm/test/Transforms/LoopVectorize/scalable-trunc-min-bitwidth.ll

	Show First 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 4			; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 4
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[LEN]], [[TMP3]]			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[LEN]], [[TMP3]]
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[LEN]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[LEN]], [[N_MOD_VF]]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 4 x i32> poison, i32 [[ARG1:%.]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 4 x i32> poison, i32 [[ARG1:%.]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
				; CHECK-NEXT: [[TMP4:%.*]] = trunc <vscale x 4 x i32> [[BROADCAST_SPLAT]] to <vscale x 4 x i8>
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds i8, ptr [[P:%.]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds i8, ptr [[P:%.]], i64 [[INDEX]]
	; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 4 x i8>, ptr [[TMP4]], align 1			; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 4 x i8>, ptr [[TMP5]], align 1
	; CHECK-NEXT: [[TMP5:%.*]] = trunc <vscale x 4 x i32> [[BROADCAST_SPLAT]] to <vscale x 4 x i8>			; CHECK-NEXT: [[TMP6:%.*]] = xor <vscale x 4 x i8> [[WIDE_LOAD]], [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.*]] = xor <vscale x 4 x i8> [[WIDE_LOAD]], [[TMP5]]
	; CHECK-NEXT: [[TMP7:%.*]] = mul <vscale x 4 x i8> [[TMP6]], [[WIDE_LOAD]]			; CHECK-NEXT: [[TMP7:%.*]] = mul <vscale x 4 x i8> [[TMP6]], [[WIDE_LOAD]]
	; CHECK-NEXT: store <vscale x 4 x i8> [[TMP7]], ptr [[TMP4]], align 1			; CHECK-NEXT: store <vscale x 4 x i8> [[TMP7]], ptr [[TMP5]], align 1
	; CHECK-NEXT: [[TMP8:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP8:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP9:%.*]] = mul i64 [[TMP8]], 4			; CHECK-NEXT: [[TMP9:%.*]] = mul i64 [[TMP8]], 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP9]]			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP9]]
	; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[LEN]], [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[LEN]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_EXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_EXIT:%.*]], label [[SCALAR_PH]]
	Show All 40 Lines

llvm/test/Transforms/LoopVectorize/trunc-shifts.ll

	Show First 20 Lines • Show All 322 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[OFFSET_IDX:%.*]] = trunc i32 [[INDEX]] to i8			; CHECK-NEXT: [[OFFSET_IDX:%.*]] = trunc i32 [[INDEX]] to i8
	; CHECK-NEXT: [[TMP0:%.*]] = add i8 [[OFFSET_IDX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i8 [[OFFSET_IDX]], 0
	; CHECK-NEXT: [[TMP1:%.*]] = zext i8 [[TMP0]] to i64			; CHECK-NEXT: [[TMP1:%.*]] = zext i8 [[TMP0]] to i64
	; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 [[TMP1]]			; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds i8, ptr [[TMP2]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds i8, ptr [[TMP2]], i32 0
	; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i8>, ptr [[TMP3]], align 1			; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i8>, ptr [[TMP3]], align 1
	; CHECK-NEXT: [[TMP4:%.*]] = zext <4 x i8> [[WIDE_LOAD]] to <4 x i16>			; CHECK-NEXT: [[TMP4:%.*]] = zext <4 x i8> [[WIDE_LOAD]] to <4 x i16>
	; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i16> [[TMP4]] to <4 x i32>			; CHECK-NEXT: [[TMP5:%.*]] = lshr <4 x i16> [[TMP4]], <i16 4, i16 4, i16 4, i16 4>
	; CHECK-NEXT: [[TMP6:%.*]] = trunc <4 x i32> [[TMP5]] to <4 x i16>			; CHECK-NEXT: [[TMP6:%.*]] = trunc <4 x i16> [[TMP5]] to <4 x i8>
	; CHECK-NEXT: [[TMP7:%.*]] = lshr <4 x i16> [[TMP6]], <i16 4, i16 4, i16 4, i16 4>			; CHECK-NEXT: store <4 x i8> [[TMP6]], ptr [[TMP3]], align 8
	; CHECK-NEXT: [[TMP8:%.*]] = zext <4 x i16> [[TMP7]] to <4 x i32>
	; CHECK-NEXT: [[TMP9:%.*]] = trunc <4 x i32> [[TMP8]] to <4 x i16>
	; CHECK-NEXT: [[TMP10:%.*]] = trunc <4 x i16> [[TMP9]] to <4 x i8>
	; CHECK-NEXT: store <4 x i8> [[TMP10]], ptr [[TMP3]], align 8
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4
				AyalUnsubmitted Done Reply Inline Actions We now get rid of a pair of <4 x i16> => <4 x i32> => <4 x i16> before the lshr (so this is not an NFC patch), but still retain the pair/triple of <4 x i16> => <4 x i32> => <4 x i16> => <4 x i8> after it - missed MinBW opportunity? Ayal: We now get rid of a pair of <4 x i16> => <4 x i32> => <4 x i16> before the lshr (so this is not…
				fhahnAuthorUnsubmitted Done Reply Inline Actions trunc/ext pairs should be better cleaned up in the latest version fhahn: trunc/ext pairs should be better cleaned up in the latest version
				AyalUnsubmitted Not Done Reply Inline Actions Indeed! Ayal: Indeed!
	; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i32 [[INDEX_NEXT]], 100			; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i32 [[INDEX_NEXT]], 100
	; CHECK-NEXT: br i1 [[TMP11]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i8 [ 100, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i8 [ 100, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i8 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]			; CHECK-NEXT: [[IV:%.]] = phi i8 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]
	; CHECK-NEXT: [[IV_EXT:%.*]] = zext i8 [[IV]] to i64			; CHECK-NEXT: [[IV_EXT:%.*]] = zext i8 [[IV]] to i64
	▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Replace IR based truncateToMinimalBitwidths with VPlan version.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 558116

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/lib/Transforms/Vectorize/VPlan.h

llvm/lib/Transforms/Vectorize/VPlanTransforms.h

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll

llvm/test/Transforms/LoopVectorize/AArch64/loop-vectorization-factors.ll

llvm/test/Transforms/LoopVectorize/AArch64/type-shrinkage-insertelt.ll

llvm/test/Transforms/LoopVectorize/scalable-trunc-min-bitwidth.ll

llvm/test/Transforms/LoopVectorize/trunc-shifts.ll

[VPlan] Replace IR based truncateToMinimalBitwidths with VPlan version.
ClosedPublic