This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
7/7
LoopVectorize.cpp
10/10
VPlanTransforms.h
172/178
VPlanTransforms.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
AArch64/
10/15
deterministic-type-shrinkage.ll
-
ARM/
2/2
pointer_iv.ll

Differential D149903

[VPlan] Replace IR based truncateToMinimalBitwidths with VPlan version.
ClosedPublic

Authored by fhahn on May 4 2023, 2:07 PM.

Download Raw Diff

Details

Reviewers

Ayal
gilr
rengolin

Commits

rG70535f5e609f: [VPlan] Replace IR based truncateToMinimalBitwidths with VPlan version.

Summary

This patch replaces the IR based truncateToMinimalBitwidths with a VPlan
version. This has 2 benefits:

the VPlan-based version is simpler; we don't need to implement special codegen for each supported instruction type like the IR based one.
Removes a dependency on the cost-model after VPlan execution and
Removes a use of getVPValue that uses underlying values after VPlan execution (See removed FIXME).

Depends on D149081.

Depends on D149079.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Ayal added inline comments.Oct 4 2023, 4:03 PM

llvm/lib/Transforms/Vectorize/VPlan.h
280 ↗	(On Diff #553760)	How/Is this removal related?
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
769	(Future) Thought: wonder if instead of iterating over all live-ins looking to truncate any, it may be better to iterate over MinBWs and check if any are live-ins. Or lookup MinBWs upon construction of a live-in.
770	nit: use `LiveInInst` or something similar rather than `UI`?
776
783	Set once before the loop for all live-ins to be truncated.
792	Any order other than depth first would also do, right?
803	(Future) Thought: this is an awkward way of retrieving "the" recipe that corresponds to each member of MinBWs - look through all recipes for those having the desired "underlying" insn. Perhaps better lookup MinBWs upon construction of a recipe for an Instruction. Or migrate the analysis that builds MinBWs to run on VPlan.
804	nit: lookup.
809	Would be good to comment how memory and replicate cases are (not) processed.
815	Better assert than continue? Here ProcessedRecipes was already bumped, but should all MinBWs members correspond to Integer types, of distinct (smaller) size, whether live-in or not?
825	This deals only with ZExt/SExt, easier to check directly if Opcode is one or the other? OTOH, better handle Trunc here as well? Is it handled well below?
829	`// SExt/Zext is redundant - stick with its operand.` ?
836	Place assert earlier?
838–839
850	This means the size of all operands is equal to NewResSizeInBits, can this be?
854–855	nit: keep consistent with above.
863–865	nit: keep consistent with above.
llvm/lib/Transforms/Vectorize/VPlanTransforms.h
47	nit: a VPlan transform should fold redundant ZExt-Trunc pairs rather than leaving them ("as hints") to `InstCombine`. Being a public method, which does not need SE, should the caller of optimize() precede its call with a direct call to trunctateToMinimalBitwidth(), rather than pass MinBWs to optimize()?

Address latest comments, apologies for the delay!

llvm/lib/Transforms/Vectorize/VPlan.h
280 ↗	(On Diff #553760)	The last user of this function has been removed in the patch.
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
759–760	Code has been moved to D159202
762–764	code has been moved to D159202
768	Wrapped and added comment, thanks!
770	Renamed, thanks!
771	Updated, thanks!
776	Adjusted, thanks!
777	Turned into assert, thanks!
783	hoisted, thanks!
792	Yes, I think the order doesn't matter here.
804	Done, thanks!
809	Added a comment, thanks!
815	Turned `isIntegerTy` into assert but retained size check as there entries where the sizes are the same (e.g. for `truncs`).
825	Thanks, changed to `if`. I don't think Trunc is handled explicitly in the latest version.
829	this check has been moved up and is not needed any longer.
836	moved up,, thanks!
838–839	adjusted, thanks!
850	There are cases where a Zext narrowed earlier is used as operand here, so the tie is already adjusted.
854–855	Adjusted, thanks!
863–865	reordered, thanks!

Harbormaster completed remote builds in B257842: Diff 557740.Oct 17 2023, 1:22 PM

Various comments, also trying to reason about how this patch changes tests.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3564	Retain a comment explaining why replicate recipes are not truncated?
3599	Retain this comment regarding dropping wrapping flags?
3614	A Trunc is handled by shrinking its operand.
3639	(If nothing is done to the operands, what is the result extended too?)
llvm/lib/Transforms/Vectorize/VPlan.h
280 ↗	(On Diff #553760)	Very well!
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
760	Thought: worth introducing as a member of VPValue, to be overridden by VPWidenCastRecipe? Note that this is Element/Scalar Type. Effectively adding scalar type info to all VPValues? Might be good to investigate separately, although the current use-cases would probably be very limited Very well.
768	Suffice to ask `if (!NewResSizeInBits)`?
769	Thoughts about the above? Hopefully avoids exposing getLiveIns(), at the expense of holding a mapping between Values and LiveIns, as in LiveOuts.
772	assert "MinBW member must be integer" rather than continue - thereby skipping a MinBW member.
785	Thought: could/should each MinBW be attached to its recipe asap - when the latter is created, considering it depends on associated underlying instruction? Might be a potential follow-up, but we would still potentially updated MinBWs on each recipe replacement? Sure, like updating any other property of a recipe when replaced.
788	Can skip phi's, none are included in MinBWs.
789	Are any loads included in MinBWs, or is this dead code? Stores of course are irrelevant.
801	Agreed - MinBW should specify a consistent minimal bit width for all users, and for all operands, but there seems to be some discrepancy that is confusing: A. Instructions whose operands and return value are all of a single type (excluding condition operand of selects) are converted to operate on a narrower type by (a) shrinking their operands to the narrower type and (b) extending their result from the narrower type to their original type. Instructions that feed values to such instructions or use their values, continue to feed and use values of the original type. A pair of such instructions where one feeds the other will be added a zext-trunc pair between them which will later be folded. B. Instructions that convert between two distinct types, continue to digest the original source type but are updated to produce values of the new destination type. Their users, when reached subsequently, need to check if any of their operands have been narrowed. But if this is the case, why bother expanding results in (b) above? OTOH, the narrowed results of conversion instructions can also be expanded (to be folded later), keeping the treatment consistent? Always expecting the new type to be strictly smaller than the current one. Perhaps conversion instructions could be skipped now and handled by subsequent folding pass - looking for trunc-trunc and sext-trunc pairs in addition to zext-trunc ones? C. Loads are ignored - excluded from MiinBWs? They could potentially be narrowed to load only the required bits, though its unclear if a strided narrow load is better than a unit-strided wider load and trunc - as in an interleave-group(?) D. Phis are ignored - excluded from MinBWs. Truncated header induction phi's are handled separately. Other phi's may deserve narrowing(?)
802	Suffice to ask `if (!NewResSizeInBits)`?
803	Thoughts about the above?
811	Should replicate recipes be handled next to handling widen memory recipes above?
815	nit: `ResTy` >> `OldResTy`, `ResSizeInBits` >> `OldResSizeInBits`
818	`assert(ResSizeInBits > NewResSizeInBits && "Nothing to shrink?");` here instead of below?
824	nit: `VPC` >> `OldExt`, `Opc` >> `OldOpc`?
825	Does Trunc (which can truncate to a smaller bitwidth) implicitly fall through and has its operand shrunk to the smaller bitwidth, effectively turning it into a ZExt?
828	Comment is obsolete here - dealt with new type being equal to operand type, which should result in replacing the SExt/ZExt with its operand, see below.
829	?
833	nit: `C` >> `NewCast`? If getTypeSizeInBits(Op) == NewResSizeInBits should C be set to Op (w/o inserting it) instead of creating a redundant cast?
850	Maybe worth a comment.
llvm/lib/Transforms/Vectorize/VPlanTransforms.h
47	Thoughts on the above? Better truncate to minimal bitwidth asap, as it relies on IR information? Conceptually a scalar transform. Does "as hints to InstCombine" below still hold?
llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
9–10	hmm, we now spot the redundant duplicate zext of WIDE_LOAD from <16 x i8> to <16 x i16>, originally both TMP4 and TMP10.
15	Spotted and removed duplicate zext of WIDE_LOAD8.
55	BROADCAST_SPLAT is (still) trunc'ed twice due to UF=2?
55	BROADCAST_SPLAT2 is (still) trunc'ed twice due to UF=2?
55	This testcase stores the 2nd least significant byte of a 32b product (of two invariant values, one 16b and the other 32b) checking that computing 16b product suffices. But more optimizations should take place: the expansion of the multipliers to 32b should be eliminated (along with their truncation to 16b), and the invariant multiplication-lshr-trunc sequence should be hoisted out of the loop.
56	Both insertelement's now use poison.
llvm/test/Transforms/LoopVectorize/AArch64/loop-vectorization-factors.ll
308 ↗	(On Diff #557740)	We now fold a trunc-zext of zext'ed WIDE_LOAD from <16 x i16> => <16 x i32> => <16 x i16>, but fail to fold a similar one following the add-2's?
338 ↗	(On Diff #557740)	We now get rid of a pair of <8 x i16> => <8 x i32> => <8 x i16> before the add-2's (so this is not an NFC patch), but still retain the pair of <8 x i16> => <8 x i32> => <8 x i16> after it - missed MinBW/trunc-zext opportunity?
484 ↗	(On Diff #557740)	Hmm, before we narrowed these two sufflevectors to operate on <16 x i8> and zext-trunc their result, now we let them operate on original <16 x i32> and truncate the result?
498 ↗	(On Diff #557740)	Many zext-trunc pairs left to collect.
513 ↗	(On Diff #557740)	Above trunc of TMP2 is redundant along with its zext in the ph.
520 ↗	(On Diff #557740)	Above trunc of TMP4 is redundant along with its zext in the ph.
llvm/test/Transforms/LoopVectorize/trunc-shifts.ll
334 ↗	(On Diff #557740)	We now get rid of a pair of <4 x i16> => <4 x i32> => <4 x i16> before the lshr (so this is not an NFC patch), but still retain the pair/triple of <4 x i16> => <4 x i32> => <4 x i16> => <4 x i8> after it - missed MinBW opportunity?

fhahn mentioned this in rG0c8e5be6fa08: [VPlan] Simplify redundant trunc (zext A) pairs to A..Oct 22 2023, 3:42 AM

fhahn mentioned this in rG6f3b88baa2ac: [VPlan] Move trunc ([s|z]ext A) simplifications to simplifyRecipe..Nov 16 2023, 1:17 PM

Address comments and major simplification after moving cast folding to simplifyRecipes.

Hope all comments should be addressed, hope i didn't miss any.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3564	Retained when skipping VPReplicateRecipe.
3599	Done, thanks!
3639	It stays the same, there's no extend in that case.
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
760	This has been updated to now use VPTypeAnalysis.
768	This code has now been removed; LiveIns are handled when truncating the other operands of an instruction; otherwise we leave the type info in an inconsistent state.
769	LiveIns are now handled directly when truncating other operands; getLiveIns has been removed.
772	Turned into an assert, thanks!
788	There's an early continue now that skips phis and other unsupported recipes.
789	Nope, looks like this is not needed in the latest version.
801	The latest version doesn't have special treatment for casts, they remain unchanged and VPlan recipe simplification will take care of folding them if possible.
802	Simplified, thanks!
803	I think it would be best to have the analysis based on VPlan. Building MinBWs early would probably require extra work to update/invalidate it during transforms.
811	We still need to count them for verification

fhahn added inline comments.Nov 16 2023, 2:15 PM

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
815	Renamed, thanks!
818	Done, and also removed continue
824	This code is now gone, handled by recipe simplification.
828	Code is gone now
829	Code now gone.
833	Code gone now.
llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
55	The latest version avoids truncating the same value twice.
55	The latest version avoids truncating the same value twice.
55	still more work to do :) Arguably the invariant instructions are artificial, in the regular pipeline, no invariant instructions should remain.
56	I think the use of undef is a leftover that wasn't updated; it should be poison.
llvm/test/Transforms/LoopVectorize/AArch64/loop-vectorization-factors.ll
308 ↗	(On Diff #557740)	folding now happens all in simplifyRecieps, should handle this now
484 ↗	(On Diff #557740)	I think there's nothing we can do about that; we first need to splat the value when generating code, but InstCombine should take care of that.
498 ↗	(On Diff #557740)	Should be better cleaned up now
llvm/test/Transforms/LoopVectorize/trunc-shifts.ll
334 ↗	(On Diff #557740)	trunc/ext pairs should be better cleaned up in the latest version

Harbormaster completed remote builds in B258087: Diff 558116.Nov 16 2023, 6:49 PM

Looks much simpler! Minor last nits.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
757	nit: are these still hints to InstCombine, or to subsequent VPlan cleanups?
757–759	nit
766–769	?
792	But a (more) expensive RPOT order is needed, to handle defs before uses?
816	Is it possible for MinBWs not to contain Op's live-in IR value in this case?

Address latest comments, thanks!

fhahn added inline comments.Nov 23 2023, 4:10 AM

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
757	Updated, thanks!
757–759	Done thanks! This also limits the scope of TypeInfo to the range where it is valid. after `truncateToMinimalBitwidths, we would need to invalidate the info for the modified recipes otherwise. This can be done in the future.
766–769	Simplified , thanks!
792	The latest version should not need RPO, as the bit width of the results do not change for any user (previously they might due to early cast simplifications). Changed to depth first.
816	Yes, MinBWs only contains instructions, but not other values like arguments. Added a clarifying assert.

Harbormaster completed remote builds in B258119: Diff 558159.Nov 23 2023, 4:59 AM

ping :)

Ayal added inline comments.Nov 29 2023, 9:23 AM

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
757–759	Very well. Worth commenting that `TypeInfo` should not be used following truncateToMinimalBitwidths.
763	nit: `ProcessedRecipesNum`?
763	`ProcessedTruncs` is used outside ifdef below, move its definition out of ifdef here? Or is it meant to ensure truncated operands are counted once by ProcessedRecipes for debugging only? If an operand is truncated multiple times, all its truncations must be to the same size, because "MinBW should specify a consistent minimal bit width for all users(, and for all operands)"? Worth explaining why processed truncs are recorded.
766	Should `PH` be skipped? Trying to shrink the (live-in) operands of recipes in PH will insert them at the end of PH...
769	Shrunk operands are placed before R, but its extension is placed after - and calls for this make_early_inc_range, right?
785	Just note that the counting of ProcessedRecipes may miss casts that fail to be processed later.
785	Just noting potential follow-up, possibly as a TODO somewhere: attach each MinBW to its recipe when the latter is created, supplementing its underlying inst.
790	Does `OldResSizeInBits` equal to the size of `OldResTy`, for the non-cast Widen or Select `R`?
807	`Ins`? Perhaps `ProcessedTrunc`?
808	Handle the simple if !ins.second /* Op already processed */ case first, potentially early-continuing? Clearer to check if ProcessedTruncs.lookup(Op) or if ProcessedTruncs.contains(Op) and if so use ProcessedTruncs[Op], otherwise insert it?
811	nit: place simpler if !isLiveIn case first?
813–816	nit
821	Note that truncations of live-ins could also be inserted before R, thereby leaving the treatment of live-ins to debugging only, and leaving their LICM and commoning to a subsequent VPlan cleanup pass, along with trunc-zext foldings.
llvm/lib/Transforms/Vectorize/VPlanTransforms.h
47	WDYT on the above: should the caller of optimize() precede its call with a direct call to trunctateToMinimalBitwidth(), rather than pass MinBWs to optimize()?

Rebase and address latest comments, thanks!

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
757–759	Sunk further into truncateToMinimualBitwidths
763	Changed to `NumProcessedRecipes`
763	It's to re-use previously generated truncates. Note that we cannot RAUW after creating the new truncate, as this may make other uses not well typed (until they are processed and all their operands are truncated) Moved out of ifdef
766	Good point, there should be nothing to shrink in PH for now, as the analysis is for the loop body only, adjusted!
769	Yep
785	Do you mean updating the comment here or just a general note? We need to include the recipes in the count, otherwise the verification later will fail
790	Yes, I forgot to remove this use of IR `getType`. Updated to use `TypeInfo.inferScalarType(ResultVPV)` and then `getScalarSizeInBits` of the returned type.
807	Updated, thanks!
808	Early continue would mean duplicating the code to update the operands, I left things for now as is, including using `insert`. `insert` means we only need to lookup the insert-pos once, vs 2 lookups with separate `lookup` and then `[]. WDYT?
811	Done, thanks!
821	Yep, for now it is simpler and results in a smaller test diff to do it directly there as it is not only LICM but also very simple CSE
llvm/lib/Transforms/Vectorize/VPlanTransforms.h
47	Sounds good, updated, thanks!

Ayal added inline comments.Nov 29 2023, 2:23 PM

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
756–835	nit: redundant move of empty line?
763	Note that we cannot RAUW after creating the new truncate, as this may make other uses not well typed (until they are processed and all their operands are truncated) Very well, may deserve a comment.
785	I mean we count casts as if they are processed, expecting they will be later, w/o checking that they actually do.
790	Ah, ok, wondered if using the size of the type of `UI` directly would be simpler?
795	Should be the same `Ctx` passed in as parameter?
808	OK, WDYT of the something as follows: auto [ProcessedIter, DidNotExist] = ProcessedTruncs.insert({Op, nullptr}); VPWidenCastRecipe NewOp = DidNotExist ? new VPWidenCastRecipe(Instruction::Trunc, Op, NewResTy) : ProcessedIter->second; R.setOperand(Idx, NewOp); if (!DidNotExist) continue; ProcessedIter->second = NewOp; if (!Op->isLiveIn()) { Shrunk->insertBefore(&R); } else { PH->appendRecipe(Shrunk); #ifndef NDEBUG auto OpInst = dyn_cast<Instruction>(Op->getLiveInIRValue()); bool IsContained = MinBWs.contains(OpInst); assert((!OpInst \|\| IsContained) && "All processed instructions should be contained in MinBWs."); NumProcessedRecipes += IsContained; #endif }
llvm/lib/Transforms/Vectorize/VPlanTransforms.h
47

Ayal added inline comments.Nov 30 2023, 12:05 AM

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

808

Maybe IterIsEmpty would be a better name, to avoid double negation, as in:

        auto [ProcessedIter, IterIsEmpty] = ProcessedTruncs.insert({Op, nullptr});
        VPWidenCastRecipe *NewOp = IterIsEmpty ? new VPWidenCastRecipe(Instruction::Trunc, Op, NewResTy)
                                               : ProcessedIter->second;
        R.setOperand(Idx, NewOp);
        if (!IterIsEmpty)
          continue;
        ProcessedIter->second = NewOp;
        if (!Op->isLiveIn()) {
          NewOp->insertBefore(&R);
        } else {
          PH->appendRecipe(NewOp);
#ifndef NDEBUG
          auto *OpInst = dyn_cast<Instruction>(Op->getLiveInIRValue());
          bool IsContained = MinBWs.contains(OpInst);
          assert((!OpInst || IsContained) &&
                 "All processed instructions should be contained in MinBWs.");
          NumProcessedRecipes += IsContained;
#endif
        }

Addressed latest comments, thanks!

Harbormaster completed remote builds in B258145: Diff 558195.Nov 30 2023, 4:04 AM

fhahn added inline comments.Nov 30 2023, 5:14 AM

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
756–835	changed back, thanks!
763	Added a comment to ProcessedTruncs definition.
785	They don't need handling explicitly, as redundant casts will be removed later. Expanded the comment slightly to Also skip casts which do not need to be handled explicitly here, as redundant casts will be removed during recipe simplification.
790	It might be slightly simpler, but would mean this may lead to a crash further down the line, once we support recipes without underlying values/instructions (and we forget to update this line) and/or if some other transform adjusted the type. Left as is for now
795	Yes, fixed!
llvm/lib/Transforms/Vectorize/VPlanTransforms.h
47	Fixed, thanks!

This looks good to me, thanks for accommodating!
Adding a minor redundancy spotted plus some test related notes.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
833	redundant - hoist above the early-continue.
llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
54	Trunc & insertelement LICM'd from vec.epilog.vector.body to vec.epilog.ph.
55	Duplicated TMP0 and TMP1 still here?
55	Still seeing duplicate TMP2 and TMP3?
55–56	ditto.
llvm/test/Transforms/LoopVectorize/AArch64/loop-vectorization-factors.ll
308 ↗	(On Diff #557740)	The one following the add-2's is also folded now.
338 ↗	(On Diff #557740)	Other pair also folded now.
484 ↗	(On Diff #557740)	Worth testing with a subsequent instCombine, to ensure pessimization is avoided?
498 ↗	(On Diff #557740)	Indeed looks like it!
30 ↗	(On Diff #558195)	Fold zext-trunc pair, several such cases follow.
llvm/test/Transforms/LoopVectorize/trunc-shifts.ll
334 ↗	(On Diff #557740)	Indeed!

This revision is now accepted and ready to land.Nov 30 2023, 5:22 AM

Closed by commit rG70535f5e609f: [VPlan] Replace IR based truncateToMinimalBitwidths with VPlan version. (authored by fhahn). · Explain WhyDec 2 2023, 8:13 AM

This revision was automatically updated to reflect the committed changes.

fhahn added a commit: rG70535f5e609f: [VPlan] Replace IR based truncateToMinimalBitwidths with VPlan version..

fhahn marked 2 inline comments as done.Dec 2 2023, 8:15 AM

fhahn added inline comments.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
833	Fixed in the committed version, thanks!
llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
55	They were due to redundant casts being added for Live-in values, fixed by checking in VPWidenCastRecipe::execute for now, with a FIXME to address this with explicit unrolling.

This triggers failed asserts, see https://github.com/llvm/llvm-project/issues/74231.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

143 lines

VPlanTransforms.h

5 lines

VPlanTransforms.cpp

79 lines

test/

Transforms/

LoopVectorize/

AArch64/

deterministic-type-shrinkage.ll

2 lines

ARM/

pointer_iv.ll

12 lines

Diff 519651

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 603 Lines • ▼ Show 20 Lines	protected:
/// Clear NSW/NUW flags from reduction instructions if necessary.		/// Clear NSW/NUW flags from reduction instructions if necessary.
void clearReductionWrapFlags(VPReductionPHIRecipe *PhiR,		void clearReductionWrapFlags(VPReductionPHIRecipe *PhiR,
VPTransformState &State);		VPTransformState &State);

/// Iteratively sink the scalarized operands of a predicated instruction into		/// Iteratively sink the scalarized operands of a predicated instruction into
/// the block that was created for it.		/// the block that was created for it.
void sinkScalarOperands(Instruction *PredInst);		void sinkScalarOperands(Instruction *PredInst);

/// Shrinks vector element sizes to the smallest bitwidth they can be legally
/// represented as.
void truncateToMinimalBitwidths(VPTransformState &State);

/// Returns (and creates if needed) the trip count of the widened loop.		/// Returns (and creates if needed) the trip count of the widened loop.
Value getOrCreateVectorTripCount(BasicBlock InsertBlock);		Value getOrCreateVectorTripCount(BasicBlock InsertBlock);

/// Returns a bitcasted value to the requested vector type.		/// Returns a bitcasted value to the requested vector type.
/// Also handles bitcasts of vector<float> <-> vector<pointer> types.		/// Also handles bitcasts of vector<float> <-> vector<pointer> types.
Value createBitOrPointerCast(Value V, VectorType *DstVTy,		Value createBitOrPointerCast(Value V, VectorType *DstVTy,
const DataLayout &DL);		const DataLayout &DL);

▲ Show 20 Lines • Show All 2,925 Lines • ▼ Show 20 Lines
}		}

static Type largestIntegerVectorType(Type T1, Type *T2) {		static Type largestIntegerVectorType(Type T1, Type *T2) {
auto *I1 = cast<IntegerType>(cast<VectorType>(T1)->getElementType());		auto *I1 = cast<IntegerType>(cast<VectorType>(T1)->getElementType());
auto *I2 = cast<IntegerType>(cast<VectorType>(T2)->getElementType());		auto *I2 = cast<IntegerType>(cast<VectorType>(T2)->getElementType());
return I1->getBitWidth() > I2->getBitWidth() ? T1 : T2;		return I1->getBitWidth() > I2->getBitWidth() ? T1 : T2;
}		}

void InnerLoopVectorizer::truncateToMinimalBitwidths(VPTransformState &State) {
// For every instruction `I` in MinBWs, truncate the operands, create a
// truncated version of `I` and reextend its result. InstCombine runs
// later and will remove any ext/trunc pairs.
SmallPtrSet<Value *, 4> Erased;
for (const auto &KV : Cost->getMinimalBitwidths()) {
// If the value wasn't vectorized, we must maintain the original scalar
// type. The absence of the value from State indicates that it
AyalUnsubmitted Done Reply Inline Actions Retain a comment explaining why replicate recipes are not truncated? Ayal: Retain a comment explaining why replicate recipes are not truncated?
fhahnAuthorUnsubmitted Done Reply Inline Actions Retained when skipping VPReplicateRecipe. fhahn: Retained when skipping VPReplicateRecipe.
// wasn't vectorized.
// FIXME: Should not rely on getVPValue at this point.
VPValue *Def = State.Plan->getVPValue(KV.first, true);
if (!State.hasAnyVectorValue(Def))
continue;
for (unsigned Part = 0; Part < UF; ++Part) {
Value *I = State.get(Def, Part);
if (Erased.count(I) \|\| I->use_empty() \|\| !isa<Instruction>(I))
continue;
Type *OriginalTy = I->getType();
Type *ScalarTruncatedTy =
IntegerType::get(OriginalTy->getContext(), KV.second);
auto *TruncatedTy = VectorType::get(
ScalarTruncatedTy, cast<VectorType>(OriginalTy)->getElementCount());
if (TruncatedTy == OriginalTy)
continue;

IRBuilder<> B(cast<Instruction>(I));
auto ShrinkOperand = [&](Value V) -> Value {
if (auto *ZI = dyn_cast<ZExtInst>(V))
if (ZI->getSrcTy() == TruncatedTy)
return ZI->getOperand(0);
return B.CreateZExtOrTrunc(V, TruncatedTy);
};

// The actual instruction modification depends on the instruction type,
// unfortunately.
Value *NewI = nullptr;
if (auto *BO = dyn_cast<BinaryOperator>(I)) {
NewI = B.CreateBinOp(BO->getOpcode(), ShrinkOperand(BO->getOperand(0)),
ShrinkOperand(BO->getOperand(1)));

// Any wrapping introduced by shrinking this operation shouldn't be
// considered undefined behavior. So, we can't unconditionally copy
// arithmetic wrapping flags to NewI.
AyalUnsubmitted Done Reply Inline Actions Retain this comment regarding dropping wrapping flags? Ayal: Retain this comment regarding dropping wrapping flags?
fhahnAuthorUnsubmitted Done Reply Inline Actions Done, thanks! fhahn: Done, thanks!
cast<BinaryOperator>(NewI)->copyIRFlags(I, /IncludeWrapFlags=/false);
} else if (auto *CI = dyn_cast<ICmpInst>(I)) {
NewI =
B.CreateICmp(CI->getPredicate(), ShrinkOperand(CI->getOperand(0)),
ShrinkOperand(CI->getOperand(1)));
} else if (auto *SI = dyn_cast<SelectInst>(I)) {
NewI = B.CreateSelect(SI->getCondition(),
ShrinkOperand(SI->getTrueValue()),
ShrinkOperand(SI->getFalseValue()));
} else if (auto *CI = dyn_cast<CastInst>(I)) {
switch (CI->getOpcode()) {
default:
llvm_unreachable("Unhandled cast!");
case Instruction::Trunc:
NewI = ShrinkOperand(CI->getOperand(0));
AyalUnsubmitted Done Reply Inline Actions A Trunc is handled by shrinking its operand. Ayal: A Trunc is handled by shrinking its operand.
break;
case Instruction::SExt:
NewI = B.CreateSExtOrTrunc(
CI->getOperand(0),
smallestIntegerVectorType(OriginalTy, TruncatedTy));
break;
case Instruction::ZExt:
NewI = B.CreateZExtOrTrunc(
CI->getOperand(0),
smallestIntegerVectorType(OriginalTy, TruncatedTy));
break;
}
} else if (auto *SI = dyn_cast<ShuffleVectorInst>(I)) {
auto Elements0 =
cast<VectorType>(SI->getOperand(0)->getType())->getElementCount();
auto *O0 = B.CreateZExtOrTrunc(
SI->getOperand(0), VectorType::get(ScalarTruncatedTy, Elements0));
auto Elements1 =
cast<VectorType>(SI->getOperand(1)->getType())->getElementCount();
auto *O1 = B.CreateZExtOrTrunc(
SI->getOperand(1), VectorType::get(ScalarTruncatedTy, Elements1));

NewI = B.CreateShuffleVector(O0, O1, SI->getShuffleMask());
} else if (isa<LoadInst>(I) \|\| isa<PHINode>(I)) {
// Don't do anything with the operands, just extend the result.
AyalUnsubmitted Done Reply Inline Actions (If nothing is done to the operands, what is the result extended too?) Ayal: (If nothing is done to the operands, what is the result extended too?)
fhahnAuthorUnsubmitted Done Reply Inline Actions It stays the same, there's no extend in that case. fhahn: It stays the same, there's no extend in that case.
continue;
} else if (auto *IE = dyn_cast<InsertElementInst>(I)) {
auto Elements =
cast<VectorType>(IE->getOperand(0)->getType())->getElementCount();
auto *O0 = B.CreateZExtOrTrunc(
IE->getOperand(0), VectorType::get(ScalarTruncatedTy, Elements));
auto *O1 = B.CreateZExtOrTrunc(IE->getOperand(1), ScalarTruncatedTy);
NewI = B.CreateInsertElement(O0, O1, IE->getOperand(2));
} else if (auto *EE = dyn_cast<ExtractElementInst>(I)) {
auto Elements =
cast<VectorType>(EE->getOperand(0)->getType())->getElementCount();
auto *O0 = B.CreateZExtOrTrunc(
EE->getOperand(0), VectorType::get(ScalarTruncatedTy, Elements));
NewI = B.CreateExtractElement(O0, EE->getOperand(2));
} else {
// If we don't know what to do, be conservative and don't do anything.
continue;
}

// Lastly, extend the result.
NewI->takeName(cast<Instruction>(I));
Value *Res = B.CreateZExtOrTrunc(NewI, OriginalTy);
I->replaceAllUsesWith(Res);
cast<Instruction>(I)->eraseFromParent();
Erased.insert(I);
State.reset(Def, Res, Part);
}
}

// We'll have created a bunch of ZExts that are now parentless. Clean up.
for (const auto &KV : Cost->getMinimalBitwidths()) {
// If the value wasn't vectorized, we must maintain the original scalar
// type. The absence of the value from State indicates that it
// wasn't vectorized.
// FIXME: Should not rely on getVPValue at this point.
VPValue *Def = State.Plan->getVPValue(KV.first, true);
if (!State.hasAnyVectorValue(Def))
continue;
for (unsigned Part = 0; Part < UF; ++Part) {
Value *I = State.get(Def, Part);
ZExtInst *Inst = dyn_cast<ZExtInst>(I);
if (Inst && Inst->use_empty()) {
Value *NewI = Inst->getOperand(0);
Inst->eraseFromParent();
State.reset(Def, NewI, Part);
}
}
}
}

void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State,		void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State,
VPlan &Plan) {		VPlan &Plan) {
// Insert truncates and extends for any truncated instructions as hints to
// InstCombine.
if (VF.isVector())
truncateToMinimalBitwidths(State);

// Fix widened non-induction PHIs by setting up the PHI operands.		// Fix widened non-induction PHIs by setting up the PHI operands.
if (EnableVPlanNativePath)		if (EnableVPlanNativePath)
fixNonInductionPHIs(Plan, State);		fixNonInductionPHIs(Plan, State);

// At this point every instruction in the original loop is widened to a		// At this point every instruction in the original loop is widened to a
// vector form. Now we need to fix the recurrences in the loop. These PHI		// vector form. Now we need to fix the recurrences in the loop. These PHI
// nodes are currently empty because we did not want to introduce cycles.		// nodes are currently empty because we did not want to introduce cycles.
// This is the second stage of vectorizing recurrences.		// This is the second stage of vectorizing recurrences.
▲ Show 20 Lines • Show All 5,309 Lines • ▼ Show 20 Lines	std::optional<VPlanPtr> LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(

// ---------------------------------------------------------------------------		// ---------------------------------------------------------------------------
// Transform initial VPlan: Apply previously taken decisions, in order, to		// Transform initial VPlan: Apply previously taken decisions, in order, to
// bring the VPlan to its final state.		// bring the VPlan to its final state.
// ---------------------------------------------------------------------------		// ---------------------------------------------------------------------------

VPlanTransforms::removeRedundantCanonicalIVs(*Plan);		VPlanTransforms::removeRedundantCanonicalIVs(*Plan);
VPlanTransforms::removeRedundantInductionCasts(*Plan);		VPlanTransforms::removeRedundantInductionCasts(*Plan);
		VPlanTransforms::truncateToMinimalBitwidths(*Plan, CM.getMinimalBitwidths());

// Adjust the recipes for any inloop reductions.		// Adjust the recipes for any inloop reductions.
adjustRecipesForReductions(cast<VPBasicBlock>(TopRegion->getExiting()), Plan,		adjustRecipesForReductions(cast<VPBasicBlock>(TopRegion->getExiting()), Plan,
RecipeBuilder, Range.Start);		RecipeBuilder, Range.Start);

// Sink users of fixed-order recurrence past the recipe defining the previous		// Sink users of fixed-order recurrence past the recipe defining the previous
// value and introduce FirstOrderRecurrenceSplice VPInstructions.		// value and introduce FirstOrderRecurrenceSplice VPInstructions.
if (!VPlanTransforms::adjustFixedOrderRecurrences(*Plan, Builder))		if (!VPlanTransforms::adjustFixedOrderRecurrences(*Plan, Builder))
▲ Show 20 Lines • Show All 1,612 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlanTransforms.h

Show All 38 Lines struct VPlanTransforms {

/// Wrap predicated VPReplicateRecipes with a mask operand in an if-then /// Wrap predicated VPReplicateRecipes with a mask operand in an if-then

/// region block and remove the mask operand. Optimize the created regions by /// region block and remove the mask operand. Optimize the created regions by

/// iteratively sinking scalar operands into the region, followed by merging /// iteratively sinking scalar operands into the region, followed by merging

/// regions until no improvements are remaining. /// regions until no improvements are remaining.

static void createAndOptimizeReplicateRegions(VPlan &Plan); static void createAndOptimizeReplicateRegions(VPlan &Plan);

/// Remove redundant VPBasicBlocks by merging them into their predecessor if /// Remove redundant VPBasicBlocks by merging them into their predecessor if

/// the predecessor has a single successor. /// the predecessor has a single successor.

AyalUnsubmitted

Done

nit: a VPlan transform should fold redundant ZExt-Trunc pairs rather than leaving them ("as hints") to InstCombine.

Being a public method, which does not need SE, should the caller of optimize() precede its call with a direct call to trunctateToMinimalBitwidth(), rather than pass MinBWs to optimize()?

Ayal: nit: a VPlan transform should fold redundant ZExt-Trunc pairs rather than leaving them ("as…

AyalUnsubmitted

Done

Thoughts on the above?
Better truncate to minimal bitwidth asap, as it relies on IR information? Conceptually a scalar transform.
Does "as hints to InstCombine" below still hold?

Ayal: Thoughts on the above? Better truncate to minimal bitwidth asap, as it relies on IR information?

AyalUnsubmitted

Done

WDYT on the above: should the caller of optimize() precede its call with a direct call to trunctateToMinimalBitwidth(), rather than pass MinBWs to optimize()?

Ayal: WDYT on the above: should the caller of optimize() precede its call with a direct call to…

fhahnAuthorUnsubmitted

Done

Sounds good, updated, thanks!

fhahn: Sounds good, updated, thanks!

AyalUnsubmitted

Done

/// Insert truncates and extends for any truncated recipe. Redundant casts

- /// will folded later.

+ /// will be folded later.

static void

Ayal:

fhahnAuthorUnsubmitted

Done

Fixed, thanks!

fhahn: Fixed, thanks!

static bool mergeBlocksIntoPredecessors(VPlan &Plan); static bool mergeBlocksIntoPredecessors(VPlan &Plan);

/// Remove redundant casts of inductions. /// Remove redundant casts of inductions.

/// ///

/// Such redundant casts are casts of induction variables that can be ignored, /// Such redundant casts are casts of induction variables that can be ignored,

/// because we already proved that the casted phi is equal to the uncasted phi /// because we already proved that the casted phi is equal to the uncasted phi

/// in the vectorized loop. There is no need to vectorize the cast - the same /// in the vectorized loop. There is no need to vectorize the cast - the same

/// value can be used for both the phi and casts in the vector loop. /// value can be used for both the phi and casts in the vector loop.

Show All 25 Lines struct VPlanTransforms {

/// not valid. /// not valid.

static bool adjustFixedOrderRecurrences(VPlan &Plan, VPBuilder &Builder); static bool adjustFixedOrderRecurrences(VPlan &Plan, VPBuilder &Builder);

/// Optimize \p Plan based on \p BestVF and \p BestUF. This may restrict the /// Optimize \p Plan based on \p BestVF and \p BestUF. This may restrict the

/// resulting plan to \p BestVF and \p BestUF. /// resulting plan to \p BestVF and \p BestUF.

static void optimizeForVFAndUF(VPlan &Plan, ElementCount BestVF, static void optimizeForVFAndUF(VPlan &Plan, ElementCount BestVF,

unsigned BestUF, unsigned BestUF,

PredicatedScalarEvolution &PSE); PredicatedScalarEvolution &PSE);

/// Insert truncates and extends for any truncated instructions as hints to

/// InstCombine.

AyalUnsubmitted

Done

Note: a VPlan-based InstCombine could take care of these "hints" by folding redundant extend-truncate pairs.

Ayal: Note: a VPlan-based InstCombine could take care of these "hints" by folding redundant extend…

fhahnAuthorUnsubmitted

Done

Agreed, I think we already have a few separate transforms that could fit into a general instcombine transform

fhahn: Agreed, I think we already have a few separate transforms that could fit into a general…

AyalUnsubmitted

Done

The dead casts removal at the end of current truncateToMinimalBitwidths() should already be taken care of by recipe dce, right?

Ayal: The dead casts removal at the end of current truncateToMinimalBitwidths() should already be…

fhahnAuthorUnsubmitted

Done

Yes that should be taken care of.

fhahn: Yes that should be taken care of.

static void

truncateToMinimalBitwidths(VPlan &Plan,

const MapVector<Instruction *, uint64_t> &MinBWs);

}; };

} // namespace llvm } // namespace llvm

#endif // LLVM_TRANSFORMS_VECTORIZE_VPLANTRANSFORMS_H #endif // LLVM_TRANSFORMS_VECTORIZE_VPLANTRANSFORMS_H

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

Show First 20 Lines • Show All 747 Lines • ▼ Show 20 Lines auto *RecurSplice = cast<VPInstruction>(

{FOR, FOR->getBackedgeValue()})); {FOR, FOR->getBackedgeValue()}));

FOR->replaceAllUsesWith(RecurSplice); FOR->replaceAllUsesWith(RecurSplice);

// Set the first operand of RecurSplice to FOR again, after replacing // Set the first operand of RecurSplice to FOR again, after replacing

// all users. // all users.

RecurSplice->setOperand(0, FOR); RecurSplice->setOperand(0, FOR);

} }

return true; return true;

} }

AyalUnsubmitted

Done

nit: are these still hints to InstCombine, or to subsequent VPlan cleanups?

Ayal: nit: are these still hints to InstCombine, or to subsequent VPlan cleanups?

fhahnAuthorUnsubmitted

Done

Updated, thanks!

fhahn: Updated, thanks!

void VPlanTransforms::truncateToMinimalBitwidths(

VPlan &Plan, const MapVector<Instruction *, uint64_t> &MinBWs) {

AyalUnsubmitted

Done

optimizeInductions(Plan, SE);

- VPTypeAnalysis TypeInfo(SE.getContext());

- if (!Plan.hasVF(ElementCount::getFixed(1)))

+ if (!Plan.hasVF(ElementCount::getFixed(1))) {

+ VPTypeAnalysis TypeInfo(SE.getContext());

truncateToMinimalBitwidths(Plan, MinBWs, TypeInfo);

+ }

simplifyRecipes(Plan, SE.getContext());

nit

Ayal: nit

fhahnAuthorUnsubmitted

Done

Done thanks! This also limits the scope of TypeInfo to the range where it is valid. after `truncateToMinimalBitwidths, we would need to invalidate the info for the modified recipes otherwise. This can be done in the future.

fhahn: Done thanks! This also limits the scope of TypeInfo to the range where it is valid. after…

AyalUnsubmitted

Done

Very well. Worth commenting that TypeInfo should not be used following truncateToMinimalBitwidths.

Ayal: Very well. Worth commenting that `TypeInfo` should not be used following…

fhahnAuthorUnsubmitted

Done

Sunk further into truncateToMinimualBitwidths

fhahn: Sunk further into truncateToMinimualBitwidths

auto GetType = [](VPValue *Op) {

AyalUnsubmitted

Done

nit: can return the type size in bits, as that is what is needed here. Op >> VPV?

Thought: worth introducing as a member of VPValue, to be overridden by VPWidenCastRecipe? Note that this is Element/Scalar Type.

Ayal: nit: can return the type size in bits, as that is what is needed here. Op >> VPV? Thought…

fhahnAuthorUnsubmitted

Done

Adjusted to return size in bits to simplify code, thanks!

Thought: worth introducing as a member of VPValue, to be overridden by VPWidenCastRecipe? Note that this is Element/Scalar Type.

Effectively adding scalar type info to all VPValues? Might be good to investigate separately, although the current use-cases would probably be very limited

fhahn: Adjusted to return size in bits to simplify code, thanks! > Thought: worth introducing as a…

AyalUnsubmitted

Done

Thought: worth introducing as a member of VPValue, to be overridden by VPWidenCastRecipe? Note that this is Element/Scalar Type.

Effectively adding scalar type info to all VPValues? Might be good to investigate separately, although the current use-cases would probably be very limited

Very well.

Ayal: >> Thought: worth introducing as a member of VPValue, to be overridden by VPWidenCastRecipe?

fhahnAuthorUnsubmitted

Done

This has been updated to now use VPTypeAnalysis.

fhahn: This has been updated to now use VPTypeAnalysis.

AyalUnsubmitted

Done

nit: VPValue *Op >> VPValue *VPV?

Ayal: nit: `VPValue *Op` >> `VPValue *VPV`?

fhahnAuthorUnsubmitted

Done

Updated, thanks!

fhahn: Updated, thanks!

AyalUnsubmitted

Done

auto GetSizeInBits = [](VPValue *VPV) {

- auto *UV = VPV->getUnderlyingValue();

- if (UV)

+ if (auto *UV = VPV->getUnderlyingValue())

return UV->getType()->getScalarSizeInBits();

nit

Ayal: nit

fhahnAuthorUnsubmitted

Done

Code has been moved to D159202

fhahn: Code has been moved to D159202

auto *UV = Op->getUnderlyingValue();

if (UV)

return UV->getType();

AyalUnsubmitted

Done

nit: ProcessedRecipesNum?

Ayal: nit: `ProcessedRecipesNum`?

fhahnAuthorUnsubmitted

Done

Changed to NumProcessedRecipes

fhahn: Changed to `NumProcessedRecipes`

AyalUnsubmitted

Done

ProcessedTruncs is used outside ifdef below, move its definition out of ifdef here? Or is it meant to ensure truncated operands are counted once by ProcessedRecipes for debugging only? If an operand is truncated multiple times, all its truncations must be to the same size, because "MinBW should specify a consistent minimal bit width for all users(, and for all operands)"?

Worth explaining why processed truncs are recorded.

Ayal: `ProcessedTruncs` is used outside ifdef below, move its definition out of ifdef here? Or is it…

fhahnAuthorUnsubmitted

Done

It's to re-use previously generated truncates. Note that we cannot RAUW after creating the new truncate, as this may make other uses not well typed (until they are processed and all their operands are truncated)

Moved out of ifdef

fhahn: It's to re-use previously generated truncates. Note that we cannot RAUW after creating the new…

AyalUnsubmitted

Done

Note that we cannot RAUW after creating the new truncate, as this may make other uses not well typed (until they are processed and all their operands are truncated)

Very well, may deserve a comment.

Ayal: > Note that we cannot RAUW after creating the new truncate, as this may make other uses not…

fhahnAuthorUnsubmitted

Done

Added a comment to ProcessedTruncs definition.

fhahn: Added a comment to ProcessedTruncs definition.

if (auto *VPC = dyn_cast<VPWidenCastRecipe>(Op)) {

AyalUnsubmitted

Done

return UV->getType()->getScalarSizeInBits();

- if (auto *VPC = dyn_cast<VPWidenCastRecipe>(VPV)) {

+ if (auto *VPC = dyn_cast<VPWidenCastRecipe>(VPV))

return VPC->getResultType()->getScalarSizeInBits();

- }

llvm_unreachable("trying to get type of a VPValue without type info");

nit

Ayal: nit

fhahnAuthorUnsubmitted

Done

code has been moved to D159202

fhahn: code has been moved to D159202

return VPC->getResultType();

}

AyalUnsubmitted

Done

Should PH be skipped? Trying to shrink the (live-in) operands of recipes in PH will insert them at the end of PH...

Ayal: Should `PH` be skipped? Trying to shrink the (live-in) operands of recipes in PH will insert…

fhahnAuthorUnsubmitted

Done

Good point, there should be nothing to shrink in PH for now, as the analysis is for the loop body only, adjusted!

fhahn: Good point, there should be nothing to shrink in PH for now, as the analysis is for the loop…

llvm_unreachable("trying to get type of a VPValue without type info");

};

AyalUnsubmitted

Done

nit: worth an empty line?

Ayal: nit: worth an empty line?

fhahnAuthorUnsubmitted

Done

added, thanks!

fhahn: added, thanks!

AyalUnsubmitted

Done

Define ProcessedRecipes only for debug?

/// First truncate live-ins that represent relevant Instructions.

Ayal: Define `ProcessedRecipes` only for debug? /// First truncate live-ins that represent relevant…

fhahnAuthorUnsubmitted

Done

Wrapped and added comment, thanks!

fhahn: Wrapped and added comment, thanks!

AyalUnsubmitted

Done

Suffice to ask if (!NewResSizeInBits)?

Ayal: Suffice to ask `if (!NewResSizeInBits)`?

fhahnAuthorUnsubmitted

Done

This code has now been removed; LiveIns are handled when truncating the other operands of an instruction; otherwise we leave the type info in an inconsistent state.

fhahn: This code has now been removed; LiveIns are handled when truncating the other operands of an…

for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(

AyalUnsubmitted

Done

(Future) Thought: wonder if instead of iterating over all live-ins looking to truncate any, it may be better to iterate over MinBWs and check if any are live-ins. Or lookup MinBWs upon construction of a live-in.

Ayal: (Future) Thought: wonder if instead of iterating over all live-ins looking to truncate any, it…

AyalUnsubmitted

Done

Thoughts about the above? Hopefully avoids exposing getLiveIns(), at the expense of holding a mapping between Values and LiveIns, as in LiveOuts.

Ayal: Thoughts about the above? Hopefully avoids exposing getLiveIns(), at the expense of holding a…

fhahnAuthorUnsubmitted

Done

LiveIns are now handled directly when truncating other operands; getLiveIns has been removed.

fhahn: LiveIns are now handled directly when truncating other operands; getLiveIns has been removed.

AyalUnsubmitted

Done

#endif

- VPBasicBlock *PH =

- cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getSinglePredecessor());

- ReversePostOrderTraversal<VPBlockDeepTraversalWrapper<VPBlockBase *>> RPOT(

- Plan.getEntry());

+ VPBasicBlock *PH = Plan.getEntry();

+ ReversePostOrderTraversal<VPBlockDeepTraversalWrapper<VPBlockBase *>> RPOT(PH);

for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(RPOT)) {

Ayal: ?

fhahnAuthorUnsubmitted

Done

Simplified , thanks!

fhahn: Simplified , thanks!

AyalUnsubmitted

Done

Shrunk operands are placed before R, but its extension is placed after - and calls for this make_early_inc_range, right?

Ayal: Shrunk operands are placed before R, but its extension is placed after - and calls for this…

fhahnAuthorUnsubmitted

Done

Yep

fhahn: Yep

vp_depth_first_deep(Plan.getEntry()))) {

AyalUnsubmitted

Done

nit: use LiveInInst or something similar rather than UI?

Ayal: nit: use `LiveInInst` or something similar rather than `UI`?

fhahnAuthorUnsubmitted

Done

Renamed, thanks!

fhahn: Renamed, thanks!

for (VPRecipeBase &R : make_early_inc_range(*VPBB)) {

AyalUnsubmitted

Done

Would `MinBWs.lookup(UI) look better? Returning zero clearly indicates unfound.

Ayal: Would ``MinBWs.lookup(UI)` look better? Returning zero clearly indicates unfound.

fhahnAuthorUnsubmitted

Done

Updated, thanks!

fhahn: Updated, thanks!

if (R.getNumDefinedValues() != 1)

AyalUnsubmitted

Done

assert "MinBW member must be integer" rather than continue - thereby skipping a MinBW member.

Ayal: assert "MinBW member must be integer" rather than continue - thereby skipping a MinBW member.

fhahnAuthorUnsubmitted

Done

Turned into an assert, thanks!

fhahn: Turned into an assert, thanks!

continue;

auto *UV =

cast_or_null<Instruction>(R.getVPSingleValue()->getUnderlyingValue());

AyalUnsubmitted

Done

continue;

- auto *UI =

- cast_or_null<Instruction>(R.getVPSingleValue()->getUnderlyingValue());

+ VPValue *ResultVPV = R.getVPSingleValue();

+ auto *UI = cast_or_null<Instruction>(ResultVPV->getUnderlyingValue());

auto I = MinBWs.find(UI);

Ayal:

fhahnAuthorUnsubmitted

Done

Updated, thanks!

fhahn: Updated, thanks!

AyalUnsubmitted

Done

unsigned NewResSizeInBits = I->second;

- Type *ResTy = VPV->getLiveInIRValue()->getType();

+ Type *ResTy = UI->getType();

if (!ResTy->isIntegerTy())

Ayal:

fhahnAuthorUnsubmitted

Done

Adjusted, thanks!

fhahn: Adjusted, thanks!

auto I = MinBWs.find(UV);

AyalUnsubmitted

Done

nit: is find() ok given a null UI?

Ayal: nit: is find() ok given a null UI?

fhahnAuthorUnsubmitted

Done

Yes I think so, the keys are pointers and they shouldn't be dereferenced.

fhahn: Yes I think so, the keys are pointers and they shouldn't be dereferenced.

AyalUnsubmitted

Done

Can this happen - continuing will lose a member of MinBWs - better assert instead?

Ayal: Can this happen - continuing will lose a member of MinBWs - better assert instead?

fhahnAuthorUnsubmitted

Done

Turned into assert, thanks!

fhahn: Turned into assert, thanks!

if (!UV || I == MinBWs.end())

continue;

AyalUnsubmitted

Done

continue;

+ unsigned ResSizeInBits = GetSizeInBits(ResultVPV);

unsigned NewResSizeInBits = I->second;

Ayal:

fhahnAuthorUnsubmitted

Done

Adjusted, thanks!

fhahn: Adjusted, thanks!

Type *ResTy = UV->getType();

if (!ResTy->isIntegerTy() || ResTy->getScalarSizeInBits() == I->second)

continue;

AyalUnsubmitted

Done

auto *Shrunk = new VPWidenCastRecipe(Instruction::Trunc, VPV, NewResTy);

- VPBasicBlock *PH = dyn_cast<VPBasicBlock>(

+ VPBasicBlock *PH = cast<VPBasicBlock>(

Plan.getVectorLoopRegion()->getSinglePredecessor());

Set once before the loop for all live-ins to be truncated.

Ayal: Set once before the loop for all live-ins to be truncated.

fhahnAuthorUnsubmitted

Done

hoisted, thanks!

fhahn: hoisted, thanks!

AyalUnsubmitted

Done

Type *ResTy = UI->getType();

- if (!ResTy->isIntegerTy() ||

- ResTy->getScalarSizeInBits() == NewResSizeInBits)

+ if (!ResTy->isIntegerTy() || ResSizeInBits == NewResSizeInBits)

continue;

Ayal:

fhahnAuthorUnsubmitted

Done

Done, thanks!

fhahn: Done, thanks!

if (!isa<VPWidenRecipe, VPWidenSelectRecipe, VPWidenCastRecipe>(&R))

AyalUnsubmitted

Done

nit: this can be checked first, instead of checking for single defined value.

Thought: could/should each MinBW be attached to its recipe asap - when the latter is created, considering it depends on associated underlying instruction?

Ayal: nit: this can be checked first, instead of checking for single defined value. Thought…

fhahnAuthorUnsubmitted

Done

Moved the check up, thanks!

Thought: could/should each MinBW be attached to its recipe asap - when the latter is created, considering it depends on associated underlying instruction?

Might be a potential follow-up, but we would still potentially updated MinBWs on each recipe replacement?

fhahn: Moved the check up, thanks! > Thought: could/should each MinBW be attached to its recipe asap…

AyalUnsubmitted

Not Done

Thought: could/should each MinBW be attached to its recipe asap - when the latter is created, considering it depends on associated underlying instruction?

Might be a potential follow-up, but we would still potentially updated MinBWs on each recipe replacement?

Sure, like updating any other property of a recipe when replaced.

Ayal: >> Thought: could/should each MinBW be attached to its recipe asap - when the latter is created…

AyalUnsubmitted

Not Done

Just noting potential follow-up, possibly as a TODO somewhere: attach each MinBW to its recipe when the latter is created, supplementing its underlying inst.

Ayal: Just noting potential follow-up, possibly as a TODO somewhere: attach each MinBW to its recipe…

AyalUnsubmitted

Done

Just note that the counting of ProcessedRecipes may miss casts that fail to be processed later.

Ayal: Just note that the counting of ProcessedRecipes may miss casts that fail to be processed later.

fhahnAuthorUnsubmitted

Done

Do you mean updating the comment here or just a general note? We need to include the recipes in the count, otherwise the verification later will fail

fhahn: Do you mean updating the comment here or just a general note? We need to include the recipes in…

AyalUnsubmitted

Done

I mean we count casts as if they are processed, expecting they will be later, w/o checking that they actually do.

Ayal: I mean we count casts as if they are processed, expecting they will be later, w/o checking that…

fhahnAuthorUnsubmitted

Done

They don't need handling explicitly, as redundant casts will be removed later. Expanded the comment slightly to

Also skip casts which do not need to be handled explicitly here, as redundant casts will be removed during recipe simplification.

fhahn: They don't need handling explicitly, as redundant casts will be removed later. Expanded the…

continue;

LLVMContext &Ctx = ResTy->getContext();

AyalUnsubmitted

Done

nit: auto ResNewTyInBits = I->second;
nit: auto ResNewTy = IntegerType::get(ResTy->getContext(), ResNewTyInBits); ?

Ayal: nit: `auto ResNewTyInBits = I->second;` nit: `auto ResNewTy = IntegerType::get(ResTy…

fhahnAuthorUnsubmitted

Done

Added variables, thanks!

fhahn: Added variables, thanks!

AyalUnsubmitted

Done

Can skip phi's, none are included in MinBWs.

Ayal: Can skip phi's, none are included in MinBWs.

fhahnAuthorUnsubmitted

Done

There's an early continue now that skips phis and other unsupported recipes.

fhahn: There's an early continue now that skips phis and other unsupported recipes.

AyalUnsubmitted

Done

Are any loads included in MinBWs, or is this dead code? Stores of course are irrelevant.

Ayal: Are any loads included in MinBWs, or is this dead code? Stores of course are irrelevant.

fhahnAuthorUnsubmitted

Done

Nope, looks like this is not needed in the latest version.

fhahn: Nope, looks like this is not needed in the latest version.

// Try to replace wider SExt/ZExts with narrower ones if possible.

AyalUnsubmitted

Done

Does OldResSizeInBits equal to the size of OldResTy, for the non-cast Widen or Select R?

Ayal: Does `OldResSizeInBits` equal to the size of `OldResTy`, for the non-cast Widen or Select `R`?

fhahnAuthorUnsubmitted

Done

Yes, I forgot to remove this use of IR getType. Updated to use TypeInfo.inferScalarType(ResultVPV) and then getScalarSizeInBits of the returned type.

fhahn: Yes, I forgot to remove this use of IR `getType`. Updated to use ` TypeInfo.inferScalarType…

AyalUnsubmitted

Done

Ah, ok, wondered if using the size of the type of UI directly would be simpler?

Ayal: Ah, ok, wondered if using the size of the type of `UI` directly would be simpler?

fhahnAuthorUnsubmitted

Done

It might be slightly simpler, but would mean this may lead to a crash further down the line, once we support recipes without underlying values/instructions (and we forget to update this line) and/or if some other transform adjusted the type. Left as is for now

fhahn: It might be slightly simpler, but would mean this may lead to a crash further down the line…

if (auto *VPW = dyn_cast<VPWidenCastRecipe>(&R)) {

AyalUnsubmitted

Done

nit: suffice to check isa<> and continue to work with R instead of VPW?

Ayal: nit: suffice to check isa<> and continue to work with R instead of VPW?

fhahnAuthorUnsubmitted

Done

Done, thanks!

fhahn: Done, thanks!

Instruction *UI = VPW->getUnderlyingInstr();

AyalUnsubmitted

Done

UI is aka UV. Better call it UI from the start, as it's an Instruction* rather than Value*.

Ayal: UI is aka UV. Better call it UI from the start, as it's an Instruction* rather than Value*.

fhahnAuthorUnsubmitted

Done

Renamed, thanks

fhahn: Renamed, thanks

AyalUnsubmitted

Done

Any order other than depth first would also do, right?

Ayal: Any order other than depth first would also do, right?

fhahnAuthorUnsubmitted

Done

Yes, I think the order doesn't matter here.

fhahn: Yes, I think the order doesn't matter here.

AyalUnsubmitted

Done

But a (more) expensive RPOT order is needed, to handle defs before uses?

Ayal: But a (more) expensive RPOT order is needed, to handle defs before uses?

fhahnAuthorUnsubmitted

Done

The latest version should not need RPO, as the bit width of the results do not change for any user (previously they might due to early cast simplifications). Changed to depth first.

fhahn: The latest version should not need RPO, as the bit width of the results do not change for any…

switch (UI->getOpcode()) {

default:

break;

AyalUnsubmitted

Done

Should be the same Ctx passed in as parameter?

Ayal: Should be the same `Ctx` passed in as parameter?

fhahnAuthorUnsubmitted

Done

Yes, fixed!

fhahn: Yes, fixed!

case Instruction::SExt:

case Instruction::ZExt: {

if (UI->getType()->getScalarSizeInBits() > I->second) {

AyalUnsubmitted

Done

UI->getType() is aka ResTy. Already early-continued if it was equal in size to I->second. Can it be smaller in size than I->second? If so worth early-continuing above, if not worth asserting?

Ayal: UI->getType() is aka ResTy. Already early-continued if it was equal in size to I->second. Can…

fhahnAuthorUnsubmitted

Done

Updated to use ResTy and replace check with assert, thanks!

fhahn: Updated to use `ResTy` and replace check with assert, thanks!

if (GetType(VPW->getOperand(0))->getScalarSizeInBits() >= I->second)

AyalUnsubmitted

Done

Operand of SExt/ZExt must be smaller in size than its result, so if result is at most I->second so must its operand be?

Ayal: Operand of SExt/ZExt must be smaller in size than its result, so if result is at most I->second…

fhahnAuthorUnsubmitted

Done

Current must be ResTy > NewResTy, and the operand can also be >= NewResTy I think. There also are test cases exercising the path.

fhahn: Current must be `ResTy > NewResTy`, and the operand can also be `>= NewResTy` I think. There…

AyalUnsubmitted

Done

case Instruction::ZExt: {

- assert(ResTy->getScalarSizeInBits() > NewResSizeInBits &&

- "Nothing to shrink?");

+ assert(ResSizeInBits > NewResSizeInBits && "Nothing to shrink?");

if (GetSizeInBits(R.getOperand(0)) >= NewResSizeInBits)

Ayal:

fhahnAuthorUnsubmitted

Done

Updated, thanks!

fhahn: Updated, thanks!

AyalUnsubmitted

Done

nit: can set auto *Op = R.getOperand(0); for consistency with below.

Ayal: nit: can set `auto *Op = R.getOperand(0);` for consistency with below.

AyalUnsubmitted

Done

nit: can assert ResSizeInBits > NewResSizeInBits above, after early-continuing if they're equal.
Actually, they shouldn't even be equal (also compares?), assuming MinBWs is up-to-date and each insn is visited and optimized once. Current code also early-continues when equal, so replacing it with an assert can be done in a separate patch.

Ayal: nit: can assert ResSizeInBits > NewResSizeInBits above, after early-continuing if they're equal.

fhahnAuthorUnsubmitted

Done

I think the assertion might not always hold ,e.g. for truncate recipes.

fhahn: I think the assertion might not always hold ,e.g. for truncate recipes.

break;

auto *C = new VPWidenCastRecipe(cast<CastInst>(UI)->getOpcode(),

AyalUnsubmitted

Done

OK, operand < ResTy due to SExt/ZExt,
and NewResTy < ResTy due to MinBW.
NewResTy == ResTy cases should arguably be excluded from MinBWs? (independent of this patch)
Now if operand < NewResTy (< ResTy) then we SExt/ZExt the operand directly to NewResTy instead, and continue - why is the "Extend result to original width" part skipped in this case?
If OTOH operand > NewResTy a Trunc is needed rather than an Extend, and provided by subsequent code which is reached by break, followed by ZExt back to ResTy.
Otherwise if operand == NewResTy, the SExt/ZExt could be dropped, but we keep it and end up generating a redundant ZExt from R to ResTy - which have same sizes? It's probably ok because the knowledge that NewResTy bits suffice is already there, but would be good to clarify/clean up.

Ayal: OK, operand < ResTy due to SExt/ZExt, and NewResTy < ResTy due to MinBW. NewResTy == ResTy…

fhahnAuthorUnsubmitted

Done

Now if operand < NewResTy (< ResTy) then we SExt/ZExt the operand directly to NewResTy instead, and continue - why is the "Extend result to original width" part skipped in this case?

In that case, the original (wider) cast is replaced by a new (narrower) cast and there's no need to truncate.

If OTOH operand > NewResTy a Trunc is needed rather than an Extend, and provided by subsequent code which is reached by break, followed by ZExt back to ResTy.

Yep.

Otherwise if operand == NewResTy, the SExt/ZExt could be dropped, but we keep it and end up generating a redundant ZExt from R to ResTy - which have same sizes? It's probably ok because the knowledge that NewResTy bits suffice is already there, but would be good to clarify/clean up.

Yes we would at the moment generate redundant extend/trunc chains, which would indeed be good to clean up. I think we could fold those as follow-up.

fhahn: > Now if operand < NewResTy (< ResTy) then we SExt/ZExt the operand directly to NewResTy…

AyalUnsubmitted

Done

Now if operand < NewResTy (< ResTy) then we SExt/ZExt the operand directly to NewResTy instead, and continue - why is the "Extend result to original width" part skipped in this case?

In that case, the original (wider) cast is replaced by a new (narrower) cast and there's no need to truncate.

Yes, the extend-to-Res is replaced by a narrower extend-to-NewRes, but w/o another extend-back-to-Res to provide the original width, might it feed a user, say, a binary operation with mismatched size operands - where the other operand can also shrink to NewRes (as guaranteed by MinBWs) but was extended-back-to-Res? I.e., should all shrunks extend-back-to-Res, or none of them? May need better test coverage.

Ayal: >> Now if operand < NewResTy (< ResTy) then we SExt/ZExt the operand directly to NewResTy…

fhahnAuthorUnsubmitted

Done

Hm I am not sure, but if MinBWs is set the a specific bit width, wouldn't this require that all users to have the same minimal bit width for the value?

fhahn: Hm I am not sure, but if MinBWs is set the a specific bit width, wouldn't this require that all…

AyalUnsubmitted

Done

Agreed - MinBW should specify a consistent minimal bit width for all users, and for all operands, but there seems to be some discrepancy that is confusing:

A. Instructions whose operands and return value are all of a single type (excluding condition operand of selects) are converted to operate on a narrower type by (a) shrinking their operands to the narrower type and (b) extending their result from the narrower type to their original type. Instructions that feed values to such instructions or use their values, continue to feed and use values of the original type.
A pair of such instructions where one feeds the other will be added a zext-trunc pair between them which will later be folded.

B. Instructions that convert between two distinct types, continue to digest the original source type but are updated to produce values of the new destination type. Their users, when reached subsequently, need to check if any of their operands have been narrowed. But if this is the case, why bother expanding results in (b) above? OTOH, the narrowed results of conversion instructions can also be expanded (to be folded later), keeping the treatment consistent? Always expecting the new type to be strictly smaller than the current one. Perhaps conversion instructions could be skipped now and handled by subsequent folding pass - looking for trunc-trunc and sext-trunc pairs in addition to zext-trunc ones?

C. Loads are ignored - excluded from MiinBWs? They could potentially be narrowed to load only the required bits, though its unclear if a strided narrow load is better than a unit-strided wider load and trunc - as in an interleave-group(?)

D. Phis are ignored - excluded from MinBWs. Truncated header induction phi's are handled separately. Other phi's may deserve narrowing(?)

Ayal: Agreed - MinBW should specify a consistent minimal bit width for all users, and for all…

fhahnAuthorUnsubmitted

Done

The latest version doesn't have special treatment for casts, they remain unchanged and VPlan recipe simplification will take care of folding them if possible.

fhahn: The latest version doesn't have special treatment for casts, they remain unchanged and VPlan…

VPW->getOperand(0),

AyalUnsubmitted

Done

nit: may look better to take R's opcode than UI's, but that requires casting it to VPWidenCastRecipe, so above isa maybe worth dyn_cast after all...

Ayal: nit: may look better to take R's opcode than UI's, but that requires casting it to…

fhahnAuthorUnsubmitted

Done

Updated, thanks!

fhahn: Updated, thanks!

AyalUnsubmitted

Done

Suffice to ask if (!NewResSizeInBits)?

Ayal: Suffice to ask `if (!NewResSizeInBits)`?

fhahnAuthorUnsubmitted

Done

Simplified, thanks!

fhahn: Simplified, thanks!

IntegerType::get(Ctx, I->second));

AyalUnsubmitted

Done

(Future) Thought: this is an awkward way of retrieving "the" recipe that corresponds to each member of MinBWs - look through all recipes for those having the desired "underlying" insn. Perhaps better lookup MinBWs upon construction of a recipe for an Instruction.
Or migrate the analysis that builds MinBWs to run on VPlan.

Ayal: (Future) Thought: this is an awkward way of retrieving "the" recipe that corresponds to each…

AyalUnsubmitted

Done

Thoughts about the above?

Ayal: Thoughts about the above?

fhahnAuthorUnsubmitted

Done

I think it would be best to have the analysis based on VPlan. Building MinBWs early would probably require extra work to update/invalidate it during transforms.

fhahn: I think it would be best to have the analysis based on VPlan. Building MinBWs early would…

C->insertBefore(VPW);

AyalUnsubmitted

Done

nit: lookup.

Ayal: nit: lookup.

fhahnAuthorUnsubmitted

Done

Done, thanks!

fhahn: Done, thanks!

VPW->replaceAllUsesWith(C);

continue;

}

AyalUnsubmitted

Done

Ins? Perhaps ProcessedTrunc?

Ayal: `Ins`? Perhaps `ProcessedTrunc`?

fhahnAuthorUnsubmitted

Done

Updated, thanks!

fhahn: Updated, thanks!

}

AyalUnsubmitted

Done

Handle the simple if !ins.second /* Op already processed */ case first, potentially early-continuing?

Clearer to check if ProcessedTruncs.lookup(Op) or if ProcessedTruncs.contains(Op) and if so use ProcessedTruncs[Op], otherwise insert it?

Ayal: Handle the simple if !ins.second /* Op already processed */ case first, potentially early…

fhahnAuthorUnsubmitted

Done

Early continue would mean duplicating the code to update the operands, I left things for now as is, including using insert. insert means we only need to lookup the insert-pos once, vs 2 lookups with separate lookup and then `[]. WDYT?

fhahn: Early continue would mean duplicating the code to update the operands, I left things for now…

AyalUnsubmitted

Not Done

OK, WDYT of the something as follows:

        auto [ProcessedIter, DidNotExist] = ProcessedTruncs.insert({Op, nullptr});
        VPWidenCastRecipe *NewOp = DidNotExist ? new VPWidenCastRecipe(Instruction::Trunc, Op, NewResTy)
                                               : ProcessedIter->second;
        R.setOperand(Idx, NewOp);
        if (!DidNotExist)
          continue;
        ProcessedIter->second = NewOp;
        if (!Op->isLiveIn()) {
          Shrunk->insertBefore(&R);
        } else {
          PH->appendRecipe(Shrunk);
#ifndef NDEBUG
          auto *OpInst = dyn_cast<Instruction>(Op->getLiveInIRValue());
          bool IsContained = MinBWs.contains(OpInst);
          assert((!OpInst || IsContained) &&
                 "All processed instructions should be contained in MinBWs.");
          NumProcessedRecipes += IsContained;
#endif
        }

Ayal: OK, WDYT of the something as follows: ``` auto [ProcessedIter, DidNotExist] =…

AyalUnsubmitted

Not Done

Maybe IterIsEmpty would be a better name, to avoid double negation, as in:

        auto [ProcessedIter, IterIsEmpty] = ProcessedTruncs.insert({Op, nullptr});
        VPWidenCastRecipe *NewOp = IterIsEmpty ? new VPWidenCastRecipe(Instruction::Trunc, Op, NewResTy)
                                               : ProcessedIter->second;
        R.setOperand(Idx, NewOp);
        if (!IterIsEmpty)
          continue;
        ProcessedIter->second = NewOp;
        if (!Op->isLiveIn()) {
          NewOp->insertBefore(&R);
        } else {
          PH->appendRecipe(NewOp);
#ifndef NDEBUG
          auto *OpInst = dyn_cast<Instruction>(Op->getLiveInIRValue());
          bool IsContained = MinBWs.contains(OpInst);
          assert((!OpInst || IsContained) &&
                 "All processed instructions should be contained in MinBWs.");
          NumProcessedRecipes += IsContained;
#endif
        }

Ayal: Maybe `IterIsEmpty` would be a better name, to avoid double negation, as in: ``` auto…

}

AyalUnsubmitted

Done

Would be good to comment how memory and replicate cases are (not) processed.

Ayal: Would be good to comment how memory and replicate cases are (not) processed.

fhahnAuthorUnsubmitted

Done

Added a comment, thanks!

fhahn: Added a comment, thanks!

}

AyalUnsubmitted

Done

Should replicate recipes be handled next to handling widen memory recipes above?

Ayal: Should replicate recipes be handled next to handling widen memory recipes above?

fhahnAuthorUnsubmitted

Done

We still need to count them for verification

fhahn: We still need to count them for verification

AyalUnsubmitted

Done

nit: place simpler if !isLiveIn case first?

Ayal: nit: place simpler if !isLiveIn case first?

fhahnAuthorUnsubmitted

Done

Done, thanks!

fhahn: Done, thanks!

// Shrink operands by introducing truncates as needed.

for (unsigned Idx = 0; Idx != R.getNumOperands(); ++Idx) {

auto *Op = R.getOperand(Idx);

if (GetType(Op)->getScalarSizeInBits() == I->second)

AyalUnsubmitted

Done

assert Op > NewRes? What about the condition operand of select?

Ayal: assert Op > NewRes? What about the condition operand of select?

fhahnAuthorUnsubmitted

Done

Added assert, thanks!

Hmm, select would indeed be handled incorrectly, but I wasn't able to find a suitable test case. Removed VPWidenSelect for now, but will try to come up with a test case. Alternatively could leave select-handling in + assert to surface a test case, if one exists.

fhahn: Added assert, thanks! Hmm, select would indeed be handled incorrectly, but I wasn't able to…

AyalUnsubmitted

Done

Current code seems to handle selects, and compares, as well as loads and phi's - extending only their result - although MinBWs seems to exclude them(?). So Blend and WidenMemory recipes need not be considered, neither should Replicate recipe - those are to retain their current BW (hence all should extend back to ResTy rather than shrinking all to NewResTy). Worth trying to check if all insns of MinBWs were considered somehow?

Ayal: Current code seems to handle selects, and compares, as well as loads and phi's - extending only…

fhahnAuthorUnsubmitted

Done

Updated to also handle selects and replicate recipes. New tests should have been added a while ago.

I also added an assert checking if the number of processed instructions matches MinBWs.size().

fhahn: Updated to also handle selects and replicate recipes. New tests should have been added a while…

AyalUnsubmitted

Done

Better assert than continue? Here ProcessedRecipes was already bumped, but should all MinBWs members correspond to Integer types, of distinct (smaller) size, whether live-in or not?

Ayal: Better assert than continue? Here ProcessedRecipes was already bumped, but should all MinBWs…

fhahnAuthorUnsubmitted

Done

Turned isIntegerTy into assert but retained size check as there entries where the sizes are the same (e.g. for truncs).

fhahn: Turned `isIntegerTy` into assert but retained size check as there entries where the sizes are…

AyalUnsubmitted

Done

nit: ResTy >> OldResTy, ResSizeInBits >> OldResSizeInBits

Ayal: nit: `ResTy` >> `OldResTy`, `ResSizeInBits` >> `OldResSizeInBits`

fhahnAuthorUnsubmitted

Done

Renamed, thanks!

fhahn: Renamed, thanks!

continue;

AyalUnsubmitted

Done

Is it possible for MinBWs not to contain Op's live-in IR value in this case?

Ayal: Is it possible for MinBWs not to contain Op's live-in IR value in this case?

fhahnAuthorUnsubmitted

Done

Yes, MinBWs only contains instructions, but not other values like arguments. Added a clarifying assert.

fhahn: Yes, MinBWs only contains instructions, but not other values like arguments. Added a clarifying…

AyalUnsubmitted

Done

#ifndef NDEBUG

- bool IsContained =

- MinBWs.contains(dyn_cast<Instruction>(Op->getLiveInIRValue()));

+ auto *OpInst = dyn_cast<Instruction>(Op->getLiveInIRValue());

+ bool IsContained = MinBWs.contains(OpInst);

+ assert((!OpInst || IsContained) && "...");

ProcessedRecipes += IsContained;

- assert((IsContained || !isa<Instruction>(Op->getLiveInIRValue())) &&

"All processed instructions should be contained in MinBWs.");

nit

Ayal: nit

if (auto *VPW = dyn_cast<VPWidenRecipe>(&R))

AyalUnsubmitted

Done

continue;

- auto *Shrunk = new VPWidenCastRecipe(

- Instruction::Trunc, Op, IntegerType::get(Ctx, NewResSizeInBits));

+ auto *Shrunk = new VPWidenCastRecipe(Instruction::Trunc, Op, NewResTy);

R.setOperand(Idx, Shrunk);

Ayal:

fhahnAuthorUnsubmitted

Done

Updated, thanks!

fhahn: Updated, thanks!

VPW->dropPoisonGeneratingFlags();

AyalUnsubmitted

Done

nit: first take care of creating and inserting Shrunk, then take care of R's flags drop and operand set?

Ayal: nit: first take care of creating and inserting Shrunk, then take care of R's flags drop and…

fhahnAuthorUnsubmitted

Done

Done, thanks!

fhahn: Done, thanks!

AyalUnsubmitted

Done

assert(ResSizeInBits > NewResSizeInBits && "Nothing to shrink?"); here instead of below?

Ayal: `assert(ResSizeInBits > NewResSizeInBits && "Nothing to shrink?");` here instead of below?

fhahnAuthorUnsubmitted

Done

Done, and also removed continue

fhahn: Done, and also removed continue

auto *Shrunk =

new VPWidenCastRecipe(Instruction::Trunc, R.getOperand(Idx),

AyalUnsubmitted

Done

R.getOperand(Idx) is aka Op.

Ayal: R.getOperand(Idx) is aka Op.

fhahnAuthorUnsubmitted

Done

Done, thanks!

fhahn: Done, thanks!

AyalUnsubmitted

Done

Note that truncations of live-ins could also be inserted before R, thereby leaving the treatment of live-ins to debugging only, and leaving their LICM and commoning to a subsequent VPlan cleanup pass, along with trunc-zext foldings.

Ayal: Note that truncations of live-ins could also be inserted before R, thereby leaving the…

fhahnAuthorUnsubmitted

Done

Yep, for now it is simpler and results in a smaller test diff to do it directly there as it is not only LICM but also very simple CSE

fhahn: Yep, for now it is simpler and results in a smaller test diff to do it directly there as it is…

IntegerType::get(Ctx, I->second));

R.setOperand(Idx, Shrunk);

Shrunk->insertBefore(&R);

AyalUnsubmitted

Done

nit: VPC >> OldExt, Opc >> OldOpc?

Ayal: nit: `VPC` >> `OldExt`, `Opc` >> `OldOpc`?

fhahnAuthorUnsubmitted

Done

This code is now gone, handled by recipe simplification.

fhahn: This code is now gone, handled by recipe simplification.

}

AyalUnsubmitted

Done

This deals only with ZExt/SExt, easier to check directly if Opcode is one or the other?

OTOH, better handle Trunc here as well? Is it handled well below?

Ayal: This deals only with ZExt/SExt, easier to check directly if Opcode is one or the other? OTOH…

fhahnAuthorUnsubmitted

Done

Thanks, changed to if. I don't think Trunc is handled explicitly in the latest version.

fhahn: Thanks, changed to `if`. I don't think Trunc is handled explicitly in the latest version.

AyalUnsubmitted

Not Done

Does Trunc (which can truncate to a smaller bitwidth) implicitly fall through and has its operand shrunk to the smaller bitwidth, effectively turning it into a ZExt?

Ayal: Does Trunc (which can truncate to a smaller bitwidth) implicitly fall through and has its…

AyalUnsubmitted

Done

// Extend result to original width.

- auto *Ext =

- new VPWidenCastRecipe(Instruction::ZExt, R.getVPSingleValue(), ResTy);

+ auto *Ext = new VPWidenCastRecipe(Instruction::ZExt, ResultVPV, ResTy);

ResultVPV->replaceAllUsesWith(Ext);

Ayal:

fhahnAuthorUnsubmitted

Done

Done, thanks!

fhahn: Done, thanks!

// Extend result to original width.

auto *Ext =

AyalUnsubmitted

Done

ResultVPV->replaceAllUsesWith(Ext);

- Ext->setOperand(0, R.getVPSingleValue());

+ Ext->setOperand(0, ResultVPV);

Ext->insertAfter(&R);

Ayal:

fhahnAuthorUnsubmitted

Done

Updated, thanks!

fhahn: Updated, thanks!

AyalUnsubmitted

Done

Comment is obsolete here - dealt with new type being equal to operand type, which should result in replacing the SExt/ZExt with its operand, see below.

Ayal: Comment is obsolete here - dealt with new type being equal to operand type, which should result…

fhahnAuthorUnsubmitted

Done

Code is gone now

fhahn: Code is gone now

new VPWidenCastRecipe(Instruction::ZExt, R.getVPSingleValue(), ResTy);

AyalUnsubmitted

Done

nit: define auto *RVPValue = R.getVPSingleValue() above?

Would be good to have a common base class for all recipes having a single value, as this amounts to a cast.

Ayal: nit: define `auto *RVPValue = R.getVPSingleValue()` above? Would be good to have a common base…

fhahnAuthorUnsubmitted

Done

nit: define auto *RVPValue = R.getVPSingleValue() above?

Done thanks!

Would be good to have a common base class for all recipes having a single value, as this amounts to a cast.

Yes, I think that came up in earlier patches as well.

fhahn: > nit: define auto *RVPValue = R.getVPSingleValue() above? Done thanks! > Would be good to…

AyalUnsubmitted

Done

// SExt/Zext is redundant - stick with its operand.
?

Ayal: `// SExt/Zext is redundant - stick with its operand.` ?

fhahnAuthorUnsubmitted

Done

this check has been moved up and is not needed any longer.

fhahn: this check has been moved up and is not needed any longer.

AyalUnsubmitted

Done

// SExt/Zext is redundant - stick with its operand.

- Instruction::CastOps Opcode = VPC->getOpcode();

+ Instruction::CastOps NewOpc = Opc;

VPValue *Op = R.getOperand(0);

Ayal: ?

fhahnAuthorUnsubmitted

Done

Code now gone.

fhahn: Code now gone.

R.getVPSingleValue()->replaceAllUsesWith(Ext);

Ext->setOperand(0, R.getVPSingleValue());

Ext->insertAfter(&R);

AyalUnsubmitted

Done

Other insertions of shrunk operands and smaller extends are placed before R; this one is placed after - and calls for make_early_inc_range, right?

Ayal: Other insertions of shrunk operands and smaller extends are placed before R; this one is placed…

fhahnAuthorUnsubmitted

Done

Yep.

fhahn: Yep.

}

AyalUnsubmitted

Done

nit: C >> NewCast?

If getTypeSizeInBits(Op) == NewResSizeInBits should C be set to Op (w/o inserting it) instead of creating a redundant cast?

Ayal: nit: `C` >> `NewCast`? If getTypeSizeInBits(Op) == NewResSizeInBits should C be set to Op (w/o…

fhahnAuthorUnsubmitted

Done

Code gone now.

fhahn: Code gone now.

AyalUnsubmitted

Not Done

#endif

}

- R.setOperand(Idx, ProcessedIter->second);

}

// Any wrapping introduced by shrinking this operation shouldn't be

redundant - hoist above the early-continue.

Ayal: redundant - hoist above the early-continue.

fhahnAuthorUnsubmitted

Done

Fixed in the committed version, thanks!

fhahn: Fixed in the committed version, thanks!

}

AyalUnsubmitted

Done

Place assert earlier?

Ayal: Place assert earlier?

fhahnAuthorUnsubmitted

Done

moved up,, thanks!

fhahn: moved up,, thanks!

AyalUnsubmitted

Done

auto *C = new VPWidenCastRecipe(Opcode, Op, NewResTy);

- C->insertBefore(&R);

- ResultVPV->replaceAllUsesWith(C);

+ C->insertBefore(&VPC);

+ VPC->replaceAllUsesWith(C);

continue;

Ayal:

fhahnAuthorUnsubmitted

Done

adjusted, thanks!

fhahn: adjusted, thanks!

AyalUnsubmitted

Done

This means the size of all operands is equal to NewResSizeInBits, can this be?

Ayal: This means the size of all operands is equal to NewResSizeInBits, can this be?

fhahnAuthorUnsubmitted

Done

There are cases where a Zext narrowed earlier is used as operand here, so the tie is already adjusted.

fhahn: There are cases where a Zext narrowed earlier is used as operand here, so the tie is already…

AyalUnsubmitted

Not Done

Maybe worth a comment.

Ayal: Maybe worth a comment.

AyalUnsubmitted

Done

auto *Shrunk = new VPWidenCastRecipe(Instruction::Trunc, Op, NewResTy);

- R.setOperand(Idx, Shrunk);

Shrunk->insertBefore(&R);

+ R.setOperand(Idx, Shrunk);

}

if (auto *VPW = dyn_cast<VPRecipeWithIRFlags>(&R))

nit: keep consistent with above.

Ayal: nit: keep consistent with above.

fhahnAuthorUnsubmitted

Done

Adjusted, thanks!

fhahn: Adjusted, thanks!

AyalUnsubmitted

Done

auto *Ext = new VPWidenCastRecipe(Instruction::ZExt, ResultVPV, ResTy);

- ResultVPV->replaceAllUsesWith(Ext);

- Ext->setOperand(0, ResultVPV);

Ext->insertAfter(&R);

+ Ext->setOperand(0, ResultVPV);

+ ResultVPV->replaceAllUsesWith(Ext);

}

nit: keep consistent with above.

Ayal: nit: keep consistent with above.

fhahnAuthorUnsubmitted

Done

reordered, thanks!

fhahn: reordered, thanks!

AyalUnsubmitted

Done

removeRedundantCanonicalIVs(Plan);

removeRedundantInductionCasts(Plan);

- optimizeInductions(Plan, SE);

+ optimizeInductions(Plan, SE);

simplifyRecipes(Plan, SE.getContext());

nit: redundant move of empty line?

Ayal: nit: redundant move of empty line?

fhahnAuthorUnsubmitted

Done

changed back, thanks!

fhahn: changed back, thanks!

llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll

; RUN: opt -opaque-pointers=0 -S < %s -passes=loop-vectorize,instcombine 2>&1 \| FileCheck %s		; RUN: opt -opaque-pointers=0 -S < %s -passes=loop-vectorize,instcombine 2>&1 \| FileCheck %s

target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"		target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
target triple = "aarch64"		target triple = "aarch64"

;; See https://llvm.org/bugs/show_bug.cgi?id=25490		;; See https://llvm.org/bugs/show_bug.cgi?id=25490
;; Due to the data structures used, the LLVM IR was not determinisic.		;; Due to the data structures used, the LLVM IR was not determinisic.
;; This test comes from the PR.		;; This test comes from the PR.

;; CHECK-LABEL: @test(		;; CHECK-LABEL: @test(
		AyalUnsubmitted Not Done Reply Inline Actions hmm, we now spot the redundant duplicate zext of WIDE_LOAD from <16 x i8> to <16 x i16>, originally both TMP4 and TMP10. Ayal: hmm, we now spot the redundant duplicate zext of WIDE_LOAD from <16 x i8> to <16 x i16>…
; CHECK: load <16 x i8>		; CHECK: load <16 x i8>
		; CHECK-NEXT: zext <16 x i8>
; CHECK-NEXT: getelementptr		; CHECK-NEXT: getelementptr
; CHECK-NEXT: bitcast		; CHECK-NEXT: bitcast
; CHECK-NEXT: load <16 x i8>		; CHECK-NEXT: load <16 x i8>
		AyalUnsubmitted Not Done Reply Inline Actions Spotted and removed duplicate zext of WIDE_LOAD8. Ayal: Spotted and removed duplicate zext of WIDE_LOAD8.
; CHECK-NEXT: zext <16 x i8>		; CHECK-NEXT: zext <16 x i8>
; CHECK-NEXT: zext <16 x i8>
define void @test(i32 %n, i8* nocapture %a, i8* nocapture %b, i8* nocapture readonly %c) {		define void @test(i32 %n, i8* nocapture %a, i8* nocapture %b, i8* nocapture readonly %c) {
entry:		entry:
%cmp.28 = icmp eq i32 %n, 0		%cmp.28 = icmp eq i32 %n, 0
br i1 %cmp.28, label %for.cond.cleanup, label %for.body.preheader		br i1 %cmp.28, label %for.cond.cleanup, label %for.body.preheader

for.body.preheader: ; preds = %entry		for.body.preheader: ; preds = %entry
br label %for.body		br label %for.body

Show All 21 Lines	for.body: ; preds = %for.body.preheader, %for.body
%mul10 = mul nuw nsw i32 %conv9, %conv		%mul10 = mul nuw nsw i32 %conv9, %conv
%shr11.27 = lshr i32 %mul10, 8		%shr11.27 = lshr i32 %mul10, 8
%conv12 = trunc i32 %shr11.27 to i8		%conv12 = trunc i32 %shr11.27 to i8
store i8 %conv12, i8* %arrayidx8, align 1		store i8 %conv12, i8* %arrayidx8, align 1
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%lftr.wideiv = trunc i64 %indvars.iv.next to i32		%lftr.wideiv = trunc i64 %indvars.iv.next to i32
%exitcond = icmp eq i32 %lftr.wideiv, %n		%exitcond = icmp eq i32 %lftr.wideiv, %n
br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body		br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body
}		}
		AyalUnsubmitted Done Reply Inline Actions Both insertelement's now use poison. Ayal: Both insertelement's now use poison.
		fhahnAuthorUnsubmitted Done Reply Inline Actions I think the use of undef is a leftover that wasn't updated; it should be poison. fhahn: I think the use of undef is a leftover that wasn't updated; it should be poison.
		AyalUnsubmitted Done Reply Inline Actions BROADCAST_SPLAT is (still) trunc'ed twice due to UF=2? Ayal: BROADCAST_SPLAT is (still) trunc'ed twice due to UF=2?
		fhahnAuthorUnsubmitted Done Reply Inline Actions The latest version avoids truncating the same value twice. fhahn: The latest version avoids truncating the same value twice.
		AyalUnsubmitted Not Done Reply Inline Actions Duplicated TMP0 and TMP1 still here? Ayal: Duplicated TMP0 and TMP1 still here?
		fhahnAuthorUnsubmitted Done Reply Inline Actions They were due to redundant casts being added for Live-in values, fixed by checking in VPWidenCastRecipe::execute for now, with a FIXME to address this with explicit unrolling. fhahn: They were due to redundant casts being added for Live-in values, fixed by checking in…
		AyalUnsubmitted Done Reply Inline Actions BROADCAST_SPLAT2 is (still) trunc'ed twice due to UF=2? Ayal: BROADCAST_SPLAT2 is (still) trunc'ed twice due to UF=2?
		fhahnAuthorUnsubmitted Done Reply Inline Actions The latest version avoids truncating the same value twice. fhahn: The latest version avoids truncating the same value twice.
		AyalUnsubmitted Not Done Reply Inline Actions Still seeing duplicate TMP2 and TMP3? Ayal: Still seeing duplicate TMP2 and TMP3?
		AyalUnsubmitted Done Reply Inline Actions This testcase stores the 2nd least significant byte of a 32b product (of two invariant values, one 16b and the other 32b) checking that computing 16b product suffices. But more optimizations should take place: the expansion of the multipliers to 32b should be eliminated (along with their truncation to 16b), and the invariant multiplication-lshr-trunc sequence should be hoisted out of the loop. Ayal: This testcase stores the 2nd least significant byte of a 32b product (of two invariant values…
		fhahnAuthorUnsubmitted Done Reply Inline Actions still more work to do :) Arguably the invariant instructions are artificial, in the regular pipeline, no invariant instructions should remain. fhahn: still more work to do :) Arguably the invariant instructions are artificial, in the regular…
		AyalUnsubmitted Not Done Reply Inline Actions Trunc & insertelement LICM'd from vec.epilog.vector.body to vec.epilog.ph. Ayal: Trunc & insertelement LICM'd from vec.epilog.vector.body to vec.epilog.ph.
		AyalUnsubmitted Not Done Reply Inline Actions ditto. Ayal: ditto.

llvm/test/Transforms/LoopVectorize/ARM/pointer_iv.ll

Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[NEXT_GEP:%.*]] = getelementptr i8, ptr [[A]], i32 [[TMP0]]		; CHECK-NEXT: [[NEXT_GEP:%.*]] = getelementptr i8, ptr [[A]], i32 [[TMP0]]
; CHECK-NEXT: [[TMP1:%.*]] = shl i32 [[INDEX]], 2		; CHECK-NEXT: [[TMP1:%.*]] = shl i32 [[INDEX]], 2
; CHECK-NEXT: [[NEXT_GEP4:%.*]] = getelementptr i8, ptr [[B]], i32 [[TMP1]]		; CHECK-NEXT: [[NEXT_GEP4:%.*]] = getelementptr i8, ptr [[B]], i32 [[TMP1]]
; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <8 x i32>, ptr [[NEXT_GEP]], align 4		; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <8 x i32>, ptr [[NEXT_GEP]], align 4
; CHECK-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>		; CHECK-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <8 x i32> [[WIDE_VEC]], <8 x i32> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
; CHECK-NEXT: [[TMP2:%.*]] = add nsw <4 x i32> [[STRIDED_VEC]], [[BROADCAST_SPLAT]]		; CHECK-NEXT: [[TMP2:%.*]] = add nsw <4 x i32> [[STRIDED_VEC]], [[BROADCAST_SPLAT]]
; CHECK-NEXT: store <4 x i32> [[TMP2]], ptr [[NEXT_GEP4]], align 4		; CHECK-NEXT: store <4 x i32> [[TMP2]], ptr [[NEXT_GEP4]], align 4
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4		; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4
; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i32 [[INDEX_NEXT]], 996		; CHECK-NEXT: [[TMP3:%.*]] = icmp eq i32 [[INDEX_NEXT]], 996
; CHECK-NEXT: br i1 [[TMP4]], label [[FOR_BODY:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]		; CHECK-NEXT: br i1 [[TMP3]], label [[FOR_BODY:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
; CHECK: for.body:		; CHECK: for.body:
; CHECK-NEXT: [[A_ADDR_09:%.]] = phi ptr [ [[ADD_PTR:%.]], [[FOR_BODY]] ], [ [[IND_END]], [[VECTOR_BODY]] ]		; CHECK-NEXT: [[A_ADDR_09:%.]] = phi ptr [ [[ADD_PTR:%.]], [[FOR_BODY]] ], [ [[IND_END]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[I_08:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 996, [[VECTOR_BODY]] ]		; CHECK-NEXT: [[I_08:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 996, [[VECTOR_BODY]] ]
; CHECK-NEXT: [[B_ADDR_07:%.]] = phi ptr [ [[INCDEC_PTR:%.]], [[FOR_BODY]] ], [ [[IND_END2]], [[VECTOR_BODY]] ]		; CHECK-NEXT: [[B_ADDR_07:%.]] = phi ptr [ [[INCDEC_PTR:%.]], [[FOR_BODY]] ], [ [[IND_END2]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP4:%.*]] = load i32, ptr [[A_ADDR_09]], align 4		; CHECK-NEXT: [[TMP4:%.*]] = load i32, ptr [[A_ADDR_09]], align 4
; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds i32, ptr [[A_ADDR_09]], i32 2		; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds i32, ptr [[A_ADDR_09]], i32 2
; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP4]], [[Y]]		; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP4]], [[Y]]
; CHECK-NEXT: store i32 [[ADD]], ptr [[B_ADDR_07]], align 4		; CHECK-NEXT: store i32 [[ADD]], ptr [[B_ADDR_07]], align 4
▲ Show 20 Lines • Show All 801 Lines • ▼ Show 20 Lines	for.body:
%inc = add nuw nsw i32 %i.07, 1		%inc = add nuw nsw i32 %i.07, 1
%exitcond = icmp eq i32 %inc, 10000		%exitcond = icmp eq i32 %inc, 10000
br i1 %exitcond, label %for.cond.cleanup, label %for.body, !llvm.loop !2		br i1 %exitcond, label %for.cond.cleanup, label %for.body, !llvm.loop !2
}		}

define hidden void @mult_ptr_iv(ptr noalias nocapture readonly %x, ptr noalias nocapture %z) {		define hidden void @mult_ptr_iv(ptr noalias nocapture readonly %x, ptr noalias nocapture %z) {
; CHECK-LABEL: @mult_ptr_iv(		; CHECK-LABEL: @mult_ptr_iv(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[UGLYGEP:%.]] = getelementptr i8, ptr [[Z:%.]], i32 3000		; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr i8, ptr [[Z:%.]], i32 3000
; CHECK-NEXT: [[UGLYGEP1:%.]] = getelementptr i8, ptr [[X:%.]], i32 3000		; CHECK-NEXT: [[SCEVGEP1:%.]] = getelementptr i8, ptr [[X:%.]], i32 3000
; CHECK-NEXT: [[BOUND0:%.*]] = icmp ugt ptr [[UGLYGEP1]], [[Z]]		; CHECK-NEXT: [[BOUND0:%.*]] = icmp ugt ptr [[SCEVGEP1]], [[Z]]
; CHECK-NEXT: [[BOUND1:%.*]] = icmp ugt ptr [[UGLYGEP]], [[X]]		; CHECK-NEXT: [[BOUND1:%.*]] = icmp ugt ptr [[SCEVGEP]], [[X]]
		AyalUnsubmitted Done Reply Inline Actions (These UGLY's don't belong to this patch, but probably worth cleaning up, independent of this patch). Ayal: (These UGLY's don't belong to this patch, but probably worth cleaning up, independent of this…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Yep, removed. fhahn: Yep, removed.
; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]		; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[FOR_BODY:%.]], label [[VECTOR_PH:%.]]		; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[FOR_BODY:%.]], label [[VECTOR_PH:%.]]
; CHECK: vector.ph:		; CHECK: vector.ph:
; CHECK-NEXT: [[IND_END:%.*]] = getelementptr i8, ptr [[X]], i32 3000		; CHECK-NEXT: [[IND_END:%.*]] = getelementptr i8, ptr [[X]], i32 3000
; CHECK-NEXT: [[IND_END2:%.*]] = getelementptr i8, ptr [[Z]], i32 3000		; CHECK-NEXT: [[IND_END2:%.*]] = getelementptr i8, ptr [[Z]], i32 3000
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]		; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK-NEXT: [[POINTER_PHI:%.]] = phi ptr [ [[X]], [[VECTOR_PH]] ], [ [[PTR_IND:%.]], [[VECTOR_BODY]] ]		; CHECK-NEXT: [[POINTER_PHI:%.]] = phi ptr [ [[X]], [[VECTOR_PH]] ], [ [[PTR_IND:%.]], [[VECTOR_BODY]] ]
▲ Show 20 Lines • Show All 82 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Replace IR based truncateToMinimalBitwidths with VPlan version.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 519651

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/lib/Transforms/Vectorize/VPlanTransforms.h

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll

llvm/test/Transforms/LoopVectorize/ARM/pointer_iv.ll

[VPlan] Replace IR based truncateToMinimalBitwidths with VPlan version.
ClosedPublic