This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/
-
CodeGen/
3/34
ExpandVectorPredication.cpp
-
test/CodeGen/Generic/
-
CodeGen/
-
Generic/
3/5
expand-vp-load-store.ll

Differential D109584

[VP] Implementing expansion pass for VP load and store.
ClosedPublic

Authored by loralb on Sep 10 2021, 1:58 AM.

Download Raw Diff

Details

Reviewers

bmahjour
nemanjai
simoll
frasercrmck
craig.topper
hussainjk

Commits

rGf390781cec5c: [VP] Implementing expansion pass for VP load and store.

Summary

Added function to the ExpandVectorPredication pass to handle VP loads and stores.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

hussainjk created this revision.Sep 10 2021, 1:58 AM

Herald added subscribers: rogfer01, hiraditya. · View Herald TranscriptSep 10 2021, 1:58 AM

hussainjk requested review of this revision.Sep 10 2021, 1:58 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 10 2021, 1:58 AM

Herald added subscribers: llvm-commits, vkmr. · View Herald Transcript

hussainjk edited the summary of this revision. (Show Details)Sep 10 2021, 2:00 AM

hussainjk added reviewers: bmahjour, nemanjai.

Harbormaster completed remote builds in B123391: Diff 371830.Sep 10 2021, 2:50 AM

Fixing errors that were introduced from rebasing.

Harbormaster completed remote builds in B123470: Diff 371956.Sep 10 2021, 11:05 AM

More fixes to the length-only scalarization, and added tests using the Power8 target for this scalarization.

Harbormaster completed remote builds in B123536: Diff 372046.Sep 10 2021, 7:33 PM

bmahjour mentioned this in D109416: getVPMemoryOpCost interface.Oct 19 2021, 8:51 AM

Adding some extra reviewers to help keep on top of this

This needs "generic" testing e.g. like those in test/CodeGen/Generic/expand-vp.ll

llvm/lib/CodeGen/ExpandVectorPredication.cpp
541	I wonder if the addition of this method should be in a follow-up patch? That is, first get the "basic" support in using `masked.*` intrinsics in and consider this an enhancement/optimization for certain targets like PowerPC?

simoll added inline comments.Jan 5 2022, 5:48 AM

llvm/lib/CodeGen/ExpandVectorPredication.cpp
528	We should pick the expansion strategy based on the target's preferences. Either extend `VPLegalizationStrategy` to allow for strategy selection or add a new function to TTI that returns the expansion method (enum).
541	I agree. Better to add the default expansion strategy first, then extend TTI to allow for strategy selection and the cascading scheme in a followup.

bmahjour added inline comments.Jan 5 2022, 7:31 AM

llvm/lib/CodeGen/ExpandVectorPredication.cpp
541	This scheme is meant for any target where masked load/stores are not supported in hardware and all the active lanes are packed on the left (ie EVL only). I have a hard time imagining a target that wouldn't benefit from this scheme. I think that's why Hussain added this as part of the default expansion.

simoll added inline comments.Jan 7 2022, 10:29 AM

llvm/lib/CodeGen/ExpandVectorPredication.cpp
541	If there are two expansion schemes, targets should be able to choose between them through TTI. AVX.* supports masked load/store well - PPC may benefit more from the piecewise expansion scheme.

bmahjour added inline comments.Jan 7 2022, 11:07 AM

llvm/lib/CodeGen/ExpandVectorPredication.cpp
541	The only other expansion scheme is a fail-safe that applies to legalization of masked load/stores in general (not just the EVL cases), so it's less optimal regardless of the architecture. I neither see the need nor strongly object to adding a TTI query for this, but if we add one I think the default should be the cascading scheme.

bmahjour added inline comments.Jan 7 2022, 11:08 AM

llvm/lib/CodeGen/ExpandVectorPredication.cpp
541	(and again we are talking about targets where masked/load stores are not supported in hardware, so I guess AVX doesn't apply).

I'm still in favor of splitting up the patch into the default expansion (which can be the cascading loads) and a second one for masked.load expansion.

llvm/lib/CodeGen/ExpandVectorPredication.cpp
485	I had overlooked this before. You are checking whether `masked.load` is supported, so my argument for selecting the expansion scheme with TTI is moot.

simoll added inline comments.Jan 11 2022, 2:10 AM

llvm/lib/CodeGen/ExpandVectorPredication.cpp
492	`TTI.isLegalMaskedStore`?

simoll added inline comments.Jan 11 2022, 2:15 AM

llvm/lib/CodeGen/ExpandVectorPredication.cpp
528	load/store unfolding does not work for scalable vectors this way. We should default to the memory intrinsic path for scalable types.

AaronLiu added a subscriber: AaronLiu.Jan 12 2022, 7:40 AM

craig.topper added inline comments.Jan 14 2022, 12:51 PM

llvm/lib/CodeGen/ExpandVectorPredication.cpp
406	VPIntrinsic is a subclass of Instruction, we shouldn't need an explicit cast.
408	Don't use auto here.
418	Why not llvm_unreachable?
476	Doesn't IRBuilder's constructor also set the debug location?
479	Don't use auto.
483	Don't use auto
488	Drop else after return
495	Drop else after return
502	Does't IRBuilder's constructor set the debug location?
525	llvm_unreachable
530	Drop else after return and drop the curly braces for the if body
546	Shouldn't be needed
555	Use StringRef
559	llvm_unreachable
569	Capitalize
586	break should be inside the curly braces
592	Same here
605	Drop parentheses around the statement
611	Using StringRef for Prefix should eliminate the need for Twine constructor here

liaolucy added a subscriber: liaolucy.Feb 24 2022, 5:04 AM

loralb added a child revision: D120564: [VP] IR expansion pass for VP strided load and store.Feb 25 2022, 6:17 AM

hello @hussainjk @bmahjour, are you still working on this? Otherwise, I may take it over on your behalf.

In D109584#3345571, @loralb wrote:

hello @hussainjk @bmahjour, are you still working on this? Otherwise, I may take it over on your behalf.

Thanks @loralb. Please go ahead.

loralb commandeered this revision.Feb 28 2022, 8:39 AM

loralb added a reviewer: hussainjk.

Changelog:

Remove VP gather/scatter references (to be added in a follow-up patch)
Following the discussion in the comments, remove expandPredicationInUnfoldedLoadStore() function (to be added in a follow-up patch)
Apply suggestions in comments
Add tests

loralb added inline comments.Feb 28 2022, 8:51 AM

llvm/lib/CodeGen/ExpandVectorPredication.cpp
525	I changed the behaviour of the defalut case to reflect what was done before this patch, but I am not sure which one is the right approach: what do you think is best?

loralb added a child revision: D120664: [VP] IR expansion pass for VP gather and scatter.Feb 28 2022, 8:58 AM

Harbormaster completed remote builds in B151761: Diff 411818.Feb 28 2022, 9:32 AM

loralb mentioned this in D120564: [VP] IR expansion pass for VP strided load and store.Mar 1 2022, 3:03 AM

frasercrmck added inline comments.Mar 1 2022, 4:40 AM

llvm/lib/CodeGen/ExpandVectorPredication.cpp
178	`unpredicated` seems misleading here, since we're using `masked.load` and `masked.store`: those are predicated in a sense.
416	nit, but I don't think you need these braces around the switch statements. `NewStore`/`NewLoad` are defined in their own scope.
418	I'd prefer something like `/IsVolatile/ false`
llvm/test/CodeGen/Generic/expand-vp-load-store.ll
16	We should be testing the `IsUnmasked` path here too.

Changelog:

Address comments

N.B.: the IsUnmasked == true path in expandPredicationInMemoryIntrinsic() is not reachable, as also shown by the tests. How should this be handled in the code? Do we still handle this case or we add something like assert(!isAllTrueMask(MaskParam)) instead of defining the IsUnmasked boloean?

loralb marked 4 inline comments as done.Mar 1 2022, 11:07 AM

Harbormaster completed remote builds in B152006: Diff 412168.Mar 1 2022, 12:15 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 1 2022, 12:15 PM

In D109584#3352203, @loralb wrote:

Changelog:

Address comments

N.B.: the IsUnmasked == true path in expandPredicationInMemoryIntrinsic() is not reachable, as also shown by the tests. How should this be handled in the code? Do we still handle this case or we add something like assert(!isAllTrueMask(MaskParam)) instead of defining the IsUnmasked boloean?

Is that because you're using scalable vectors and the isAllTrueMask is expecting a ConstantVector? To my mind we should either:

use a better true-mask check (surely there's one we can reuse instead of implementing our own)
add tests for fixed vectors
use constantexpr all-ones masks in the scalable-vector tests

The last is a bit of a hack, but the other two sound reasonable. Adding tests for fixed vectors sounds like a good idea for this patch anyway, and it'd give us coverage of this code path. The first one should be done, but done in a separate patch. Just add a FIXME in the scalable-vector tests for now?

Changelog:

Improve isAllTrueMask() (for scalable vectors only)
Add and update tests

I did not manage to find any ready to use function in place of isAllTrueMask(), so I added this PatternMatch approach found in other places of the codebase. Maybe we can unify its behaviour in a later patch?

Herald added a subscriber: StephenFan. · View Herald TranscriptMar 28 2022, 8:04 AM

Harbormaster completed remote builds in B156564: Diff 418582.Mar 29 2022, 12:07 PM

Changelog:

Use getSplatValue() instead of PatternMatch in isAllTrueMask()
Update tests

frasercrmck added inline comments.Mar 30 2022, 4:46 AM

llvm/lib/CodeGen/ExpandVectorPredication.cpp
88	I don't know if it matters whether this is `isOneValue` or `isAllOnesValue`? In practice for `i1` masks it's the same, but somehow the latter sounds more appropriate to me.
llvm/test/CodeGen/Generic/expand-vp-load-store.ll
35	Shouldn't we see regular `load` here?
79	Same here: regular `store`?

loralb added inline comments.Mar 30 2022, 5:50 AM

llvm/test/CodeGen/Generic/expand-vp-load-store.ll
35	If I understand this correctly, since the `evl` value is unknown, using here a regular `load` means we may try to load elements that we should not, hence why we need the masked version
79	This should work like the `load` above

Changelog:

Use isAllOnesValue() instead of isOneValue()

Harbormaster completed remote builds in B156944: Diff 419124.Mar 30 2022, 5:04 PM

LGTM

This revision is now accepted and ready to land.Jun 2 2022, 2:24 AM

Following the discussion in the comments, remove expandPredicationInUnfoldedLoadStore() function (to be added in a follow-up patch)

Is there a follow up patch for this yet?

In D109584#3553244, @bmahjour wrote:

Following the discussion in the comments, remove expandPredicationInUnfoldedLoadStore() function (to be added in a follow-up patch)

Is there a follow up patch for this yet?

hello @bmahjour, there is no follow-up patch yet and right now it is not in my short-term plan; I may be able to keep working on it later on. There wouldn't be any problem though if someone is interested on taking this over.

Changelog:

Rebase

Harbormaster completed remote builds in B175389: Diff 444639.Jul 14 2022, 9:09 AM

Closed by commit rGf390781cec5c: [VP] Implementing expansion pass for VP load and store. (authored by loralb, committed by simoll). · Explain WhyJul 17 2022, 11:48 PM

This revision was automatically updated to reflect the committed changes.

simoll added a commit: rGf390781cec5c: [VP] Implementing expansion pass for VP load and store..

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

ExpandVectorPredication.cpp

67 lines

test/

CodeGen/

Generic/

expand-vp-load-store.ll

205 lines

Diff 445394

llvm/lib/CodeGen/ExpandVectorPredication.cpp

Show All 9 Lines
// targets to enable vector predication until just before codegen.		// targets to enable vector predication until just before codegen.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/CodeGen/ExpandVectorPredication.h"		#include "llvm/CodeGen/ExpandVectorPredication.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
		#include "llvm/Analysis/VectorUtils.h"
#include "llvm/CodeGen/Passes.h"		#include "llvm/CodeGen/Passes.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstIterator.h"		#include "llvm/IR/InstIterator.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines

STATISTIC(NumFoldedVL, "Number of folded vector length params");		STATISTIC(NumFoldedVL, "Number of folded vector length params");
STATISTIC(NumLoweredVPOps, "Number of folded vector predication operations");		STATISTIC(NumLoweredVPOps, "Number of folded vector predication operations");

///// Helpers {		///// Helpers {

/// \returns Whether the vector mask \p MaskVal has all lane bits set.		/// \returns Whether the vector mask \p MaskVal has all lane bits set.
static bool isAllTrueMask(Value *MaskVal) {		static bool isAllTrueMask(Value *MaskVal) {
auto *ConstVec = dyn_cast<ConstantVector>(MaskVal);		if (Value *SplattedVal = getSplatValue(MaskVal))
return ConstVec && ConstVec->isAllOnesValue();		if (auto *ConstValue = dyn_cast<Constant>(SplattedVal))
		return ConstValue->isAllOnesValue();
		frasercrmckUnsubmitted Not Done Reply Inline Actions I don't know if it matters whether this is `isOneValue` or `isAllOnesValue`? In practice for `i1` masks it's the same, but somehow the latter sounds more appropriate to me. frasercrmck: I don't know if it matters whether this is `isOneValue` or `isAllOnesValue`? In practice for…

		return false;
}		}

/// \returns A non-excepting divisor constant for this type.		/// \returns A non-excepting divisor constant for this type.
static Constant getSafeDivisor(Type DivTy) {		static Constant getSafeDivisor(Type DivTy) {
assert(DivTy->isIntOrIntVectorTy() && "Unsupported divisor type");		assert(DivTy->isIntOrIntVectorTy() && "Unsupported divisor type");
return ConstantInt::get(DivTy, 1u, false);		return ConstantInt::get(DivTy, 1u, false);
}		}

▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	struct CachingVPExpander {
Value *expandPredicationInBinaryOperator(IRBuilder<> &Builder,		Value *expandPredicationInBinaryOperator(IRBuilder<> &Builder,
VPIntrinsic &PI);		VPIntrinsic &PI);

/// \brief Lower this VP reduction to a call to an unpredicated reduction		/// \brief Lower this VP reduction to a call to an unpredicated reduction
/// intrinsic.		/// intrinsic.
Value *expandPredicationInReduction(IRBuilder<> &Builder,		Value *expandPredicationInReduction(IRBuilder<> &Builder,
VPReductionIntrinsic &PI);		VPReductionIntrinsic &PI);

		/// \brief Lower this VP memory operation to a non-VP intrinsic.
		frasercrmckUnsubmitted Done Reply Inline Actions `unpredicated` seems misleading here, since we're using `masked.load` and `masked.store`: those are predicated in a sense. frasercrmck: `unpredicated` seems misleading here, since we're using `masked.load` and `masked.store`: those…
		Value *expandPredicationInMemoryIntrinsic(IRBuilder<> &Builder,
		VPIntrinsic &VPI);

/// \brief Query TTI and expand the vector predication in \p P accordingly.		/// \brief Query TTI and expand the vector predication in \p P accordingly.
Value *expandPredication(VPIntrinsic &PI);		Value *expandPredication(VPIntrinsic &PI);

/// \brief Determine how and whether the VPIntrinsic \p VPI shall be		/// \brief Determine how and whether the VPIntrinsic \p VPI shall be
/// expanded. This overrides TTI with the cl::opts listed at the top of this		/// expanded. This overrides TTI with the cl::opts listed at the top of this
/// file.		/// file.
VPLegalization getVPLegalizationStrategy(const VPIntrinsic &VPI) const;		VPLegalization getVPLegalizationStrategy(const VPIntrinsic &VPI) const;
bool UsingTTIOverrides;		bool UsingTTIOverrides;
▲ Show 20 Lines • Show All 202 Lines • ▼ Show 20 Lines	case Intrinsic::vp_reduce_fmul:
Reduction = Builder.CreateFMulReduce(Start, RedOp);		Reduction = Builder.CreateFMulReduce(Start, RedOp);
break;		break;
}		}

replaceOperation(*Reduction, VPI);		replaceOperation(*Reduction, VPI);
return Reduction;		return Reduction;
}		}

		Value *
		CachingVPExpander::expandPredicationInMemoryIntrinsic(IRBuilder<> &Builder,
		VPIntrinsic &VPI) {
		assert(VPI.canIgnoreVectorLengthParam());

		Value *MaskParam = VPI.getMaskParam();
		Value *PtrParam = VPI.getMemoryPointerParam();
		craig.topperUnsubmitted Not Done Reply Inline Actions VPIntrinsic is a subclass of Instruction, we shouldn't need an explicit cast. craig.topper: VPIntrinsic is a subclass of Instruction, we shouldn't need an explicit cast.
		Value *DataParam = VPI.getMemoryDataParam();
		bool IsUnmasked = isAllTrueMask(MaskParam);
		craig.topperUnsubmitted Not Done Reply Inline Actions Don't use auto here. craig.topper: Don't use auto here.

		MaybeAlign AlignOpt = VPI.getPointerAlignment();

		Value *NewMemoryInst = nullptr;
		switch (VPI.getIntrinsicID()) {
		default:
		llvm_unreachable("Not a VP memory intrinsic");
		case Intrinsic::vp_store:
		frasercrmckUnsubmitted Done Reply Inline Actions nit, but I don't think you need these braces around the switch statements. `NewStore`/`NewLoad` are defined in their own scope. frasercrmck: nit, but I don't think you need these braces around the switch statements. `NewStore`/`NewLoad`…
		if (IsUnmasked) {
		StoreInst *NewStore =
		craig.topperUnsubmitted Not Done Reply Inline Actions Why not llvm_unreachable? craig.topper: Why not llvm_unreachable?
		frasercrmckUnsubmitted Done Reply Inline Actions I'd prefer something like `/IsVolatile/ false` frasercrmck: I'd prefer something like `/IsVolatile/ false`
		Builder.CreateStore(DataParam, PtrParam, /IsVolatile/ false);
		if (AlignOpt.hasValue())
		NewStore->setAlignment(AlignOpt.getValue());
		NewMemoryInst = NewStore;
		} else
		NewMemoryInst = Builder.CreateMaskedStore(
		DataParam, PtrParam, AlignOpt.valueOrOne(), MaskParam);

		break;
		case Intrinsic::vp_load:
		if (IsUnmasked) {
		LoadInst *NewLoad =
		Builder.CreateLoad(VPI.getType(), PtrParam, /IsVolatile/ false);
		if (AlignOpt.hasValue())
		NewLoad->setAlignment(AlignOpt.getValue());
		NewMemoryInst = NewLoad;
		} else
		NewMemoryInst = Builder.CreateMaskedLoad(
		VPI.getType(), PtrParam, AlignOpt.valueOrOne(), MaskParam);

		break;
		}

		assert(NewMemoryInst);
		replaceOperation(*NewMemoryInst, VPI);
		return NewMemoryInst;
		}

void CachingVPExpander::discardEVLParameter(VPIntrinsic &VPI) {		void CachingVPExpander::discardEVLParameter(VPIntrinsic &VPI) {
LLVM_DEBUG(dbgs() << "Discard EVL parameter in " << VPI << "\n");		LLVM_DEBUG(dbgs() << "Discard EVL parameter in " << VPI << "\n");

if (VPI.canIgnoreVectorLengthParam())		if (VPI.canIgnoreVectorLengthParam())
return;		return;

Value *EVLParam = VPI.getVectorLengthParam();		Value *EVLParam = VPI.getVectorLengthParam();
if (!EVLParam)		if (!EVLParam)
Show All 13 Lines	if (StaticElemCount.isScalable()) {
MaxEVL = Builder.CreateMul(VScale, FactorConst, "scalable_size",		MaxEVL = Builder.CreateMul(VScale, FactorConst, "scalable_size",
/NUW/ true, /NSW/ false);		/NUW/ true, /NSW/ false);
} else {		} else {
MaxEVL = ConstantInt::get(Int32Ty, StaticElemCount.getFixedValue(), false);		MaxEVL = ConstantInt::get(Int32Ty, StaticElemCount.getFixedValue(), false);
}		}
VPI.setVectorLengthParam(MaxEVL);		VPI.setVectorLengthParam(MaxEVL);
}		}

Value *CachingVPExpander::foldEVLIntoMask(VPIntrinsic &VPI) {		Value *CachingVPExpander::foldEVLIntoMask(VPIntrinsic &VPI) {
		craig.topperUnsubmitted Not Done Reply Inline Actions Doesn't IRBuilder's constructor also set the debug location? craig.topper: Doesn't IRBuilder's constructor also set the debug location?
LLVM_DEBUG(dbgs() << "Folding vlen for " << VPI << '\n');		LLVM_DEBUG(dbgs() << "Folding vlen for " << VPI << '\n');

IRBuilder<> Builder(&VPI);		IRBuilder<> Builder(&VPI);
		craig.topperUnsubmitted Not Done Reply Inline Actions Don't use auto. craig.topper: Don't use auto.

// Ineffective %evl parameter and so nothing to do here.		// Ineffective %evl parameter and so nothing to do here.
if (VPI.canIgnoreVectorLengthParam())		if (VPI.canIgnoreVectorLengthParam())
return &VPI;		return &VPI;
		craig.topperUnsubmitted Not Done Reply Inline Actions Don't use auto craig.topper: Don't use auto

// Only VP intrinsics can have an %evl parameter.		// Only VP intrinsics can have an %evl parameter.
		simollUnsubmitted Not Done Reply Inline Actions I had overlooked this before. You are checking whether `masked.load` is supported, so my argument for selecting the expansion scheme with TTI is moot. simoll: I had overlooked this before. You are checking whether `masked.load` is supported, so my…
Value *OldMaskParam = VPI.getMaskParam();		Value *OldMaskParam = VPI.getMaskParam();
Value *OldEVLParam = VPI.getVectorLengthParam();		Value *OldEVLParam = VPI.getVectorLengthParam();
assert(OldMaskParam && "no mask param to fold the vl param into");		assert(OldMaskParam && "no mask param to fold the vl param into");
		craig.topperUnsubmitted Not Done Reply Inline Actions Drop else after return craig.topper: Drop else after return
assert(OldEVLParam && "no EVL param to fold away");		assert(OldEVLParam && "no EVL param to fold away");

LLVM_DEBUG(dbgs() << "OLD evl: " << *OldEVLParam << '\n');		LLVM_DEBUG(dbgs() << "OLD evl: " << *OldEVLParam << '\n');
LLVM_DEBUG(dbgs() << "OLD mask: " << *OldMaskParam << '\n');		LLVM_DEBUG(dbgs() << "OLD mask: " << *OldMaskParam << '\n');
		simollUnsubmitted Not Done Reply Inline Actions `TTI.isLegalMaskedStore`? simoll: `TTI.isLegalMaskedStore`?

// Convert the %evl predication into vector mask predication.		// Convert the %evl predication into vector mask predication.
ElementCount ElemCount = VPI.getStaticVectorLength();		ElementCount ElemCount = VPI.getStaticVectorLength();
		craig.topperUnsubmitted Not Done Reply Inline Actions Drop else after return craig.topper: Drop else after return
Value *VLMask = convertEVLToMask(Builder, OldEVLParam, ElemCount);		Value *VLMask = convertEVLToMask(Builder, OldEVLParam, ElemCount);
Value *NewMaskParam = Builder.CreateAnd(VLMask, OldMaskParam);		Value *NewMaskParam = Builder.CreateAnd(VLMask, OldMaskParam);
VPI.setMaskParam(NewMaskParam);		VPI.setMaskParam(NewMaskParam);

// Drop the %evl parameter.		// Drop the %evl parameter.
discardEVLParameter(VPI);		discardEVLParameter(VPI);
assert(VPI.canIgnoreVectorLengthParam() &&		assert(VPI.canIgnoreVectorLengthParam() &&
		craig.topperUnsubmitted Not Done Reply Inline Actions Does't IRBuilder's constructor set the debug location? craig.topper: Does't IRBuilder's constructor set the debug location?
"transformation did not render the evl param ineffective!");		"transformation did not render the evl param ineffective!");

// Reassess the modified instruction.		// Reassess the modified instruction.
return &VPI;		return &VPI;
}		}

Value *CachingVPExpander::expandPredication(VPIntrinsic &VPI) {		Value *CachingVPExpander::expandPredication(VPIntrinsic &VPI) {
LLVM_DEBUG(dbgs() << "Lowering to unpredicated op: " << VPI << '\n');		LLVM_DEBUG(dbgs() << "Lowering to unpredicated op: " << VPI << '\n');

IRBuilder<> Builder(&VPI);		IRBuilder<> Builder(&VPI);

// Try lowering to a LLVM instruction first.		// Try lowering to a LLVM instruction first.
auto OC = VPI.getFunctionalOpcode();		auto OC = VPI.getFunctionalOpcode();

if (OC && Instruction::isBinaryOp(*OC))		if (OC && Instruction::isBinaryOp(*OC))
return expandPredicationInBinaryOperator(Builder, VPI);		return expandPredicationInBinaryOperator(Builder, VPI);

if (auto *VPRI = dyn_cast<VPReductionIntrinsic>(&VPI))		if (auto *VPRI = dyn_cast<VPReductionIntrinsic>(&VPI))
return expandPredicationInReduction(Builder, *VPRI);		return expandPredicationInReduction(Builder, *VPRI);

		switch (VPI.getIntrinsicID()) {
		default:
		break;
		craig.topperUnsubmitted Not Done Reply Inline Actions llvm_unreachable craig.topper: llvm_unreachable
		loralbAuthorUnsubmitted Not Done Reply Inline Actions I changed the behaviour of the defalut case to reflect what was done before this patch, but I am not sure which one is the right approach: what do you think is best? loralb: I changed the behaviour of the defalut case to reflect what was done before this patch, but I…
		case Intrinsic::vp_load:
		case Intrinsic::vp_store:
		return expandPredicationInMemoryIntrinsic(Builder, VPI);
		simollUnsubmitted Not Done Reply Inline Actions We should pick the expansion strategy based on the target's preferences. Either extend `VPLegalizationStrategy` to allow for strategy selection or add a new function to TTI that returns the expansion method (enum). simoll: We should pick the expansion strategy based on the target's preferences. Either extend…
		simollUnsubmitted Not Done Reply Inline Actions load/store unfolding does not work for scalable vectors this way. We should default to the memory intrinsic path for scalable types. simoll: load/store unfolding does not work for scalable vectors this way. We should default to the…
		}

		craig.topperUnsubmitted Not Done Reply Inline Actions Drop else after return and drop the curly braces for the if body craig.topper: Drop else after return and drop the curly braces for the if body
return &VPI;		return &VPI;
}		}

//// } CachingVPExpander		//// } CachingVPExpander

struct TransformJob {		struct TransformJob {
VPIntrinsic *PI;		VPIntrinsic *PI;
TargetTransformInfo::VPLegalization Strategy;		TargetTransformInfo::VPLegalization Strategy;
TransformJob(VPIntrinsic *PI, TargetTransformInfo::VPLegalization InitStrat)		TransformJob(VPIntrinsic *PI, TargetTransformInfo::VPLegalization InitStrat)
: PI(PI), Strategy(InitStrat) {}		: PI(PI), Strategy(InitStrat) {}

		frasercrmckUnsubmitted Not Done Reply Inline Actions I wonder if the addition of this method should be in a follow-up patch? That is, first get the "basic" support in using `masked.` intrinsics in and consider this an enhancement/optimization for certain targets like PowerPC? frasercrmck:* I wonder if the addition of this method should be in a follow-up patch? That is, first get the…
		simollUnsubmitted Not Done Reply Inline Actions I agree. Better to add the default expansion strategy first, then extend TTI to allow for strategy selection and the cascading scheme in a followup. simoll: I agree. Better to add the default expansion strategy first, then extend TTI to allow for…
		bmahjourUnsubmitted Not Done Reply Inline Actions This scheme is meant for any target where masked load/stores are not supported in hardware and all the active lanes are packed on the left (ie EVL only). I have a hard time imagining a target that wouldn't benefit from this scheme. I think that's why Hussain added this as part of the default expansion. bmahjour: This scheme is meant for any target where masked load/stores are not supported in hardware and…
		simollUnsubmitted Not Done Reply Inline Actions If there are two expansion schemes, targets should be able to choose between them through TTI. AVX.* supports masked load/store well - PPC may benefit more from the piecewise expansion scheme. simoll: If there are two expansion schemes, targets should be able to choose between them through TTI.
		bmahjourUnsubmitted Not Done Reply Inline Actions The only other expansion scheme is a fail-safe that applies to legalization of masked load/stores in general (not just the EVL cases), so it's less optimal regardless of the architecture. I neither see the need nor strongly object to adding a TTI query for this, but if we add one I think the default should be the cascading scheme. bmahjour: The only other expansion scheme is a fail-safe that applies to legalization of masked…
		bmahjourUnsubmitted Not Done Reply Inline Actions (and again we are talking about targets where masked/load stores are not supported in hardware, so I guess AVX doesn't apply). bmahjour: (and again we are talking about targets where masked/load stores are not supported in hardware…
bool isDone() const { return Strategy.shouldDoNothing(); }		bool isDone() const { return Strategy.shouldDoNothing(); }
};		};

void sanitizeStrategy(VPIntrinsic &VPI, VPLegalization &LegalizeStrat) {		void sanitizeStrategy(VPIntrinsic &VPI, VPLegalization &LegalizeStrat) {
// Operations with speculatable lanes do not strictly need predication.		// Operations with speculatable lanes do not strictly need predication.
		craig.topperUnsubmitted Not Done Reply Inline Actions Shouldn't be needed craig.topper: Shouldn't be needed
if (maySpeculateLanes(VPI)) {		if (maySpeculateLanes(VPI)) {
// Converting a speculatable VP intrinsic means dropping %mask and %evl.		// Converting a speculatable VP intrinsic means dropping %mask and %evl.
// No need to expand %evl into the %mask only to ignore that code.		// No need to expand %evl into the %mask only to ignore that code.
if (LegalizeStrat.OpStrategy == VPLegalization::Convert)		if (LegalizeStrat.OpStrategy == VPLegalization::Convert)
LegalizeStrat.EVLParamStrategy = VPLegalization::Discard;		LegalizeStrat.EVLParamStrategy = VPLegalization::Discard;
return;		return;
}		}

// We have to preserve the predicating effect of %evl for this		// We have to preserve the predicating effect of %evl for this
		craig.topperUnsubmitted Not Done Reply Inline Actions Use StringRef craig.topper: Use StringRef
// non-speculatable VP intrinsic.		// non-speculatable VP intrinsic.
// 1) Never discard %evl.		// 1) Never discard %evl.
// 2) If this VP intrinsic will be expanded to non-VP code, make sure that		// 2) If this VP intrinsic will be expanded to non-VP code, make sure that
// %evl gets folded into %mask.		// %evl gets folded into %mask.
		craig.topperUnsubmitted Not Done Reply Inline Actions llvm_unreachable craig.topper: llvm_unreachable
if ((LegalizeStrat.EVLParamStrategy == VPLegalization::Discard) \|\|		if ((LegalizeStrat.EVLParamStrategy == VPLegalization::Discard) \|\|
(LegalizeStrat.OpStrategy == VPLegalization::Convert)) {		(LegalizeStrat.OpStrategy == VPLegalization::Convert)) {
LegalizeStrat.EVLParamStrategy = VPLegalization::Convert;		LegalizeStrat.EVLParamStrategy = VPLegalization::Convert;
}		}
}		}

VPLegalization		VPLegalization
CachingVPExpander::getVPLegalizationStrategy(const VPIntrinsic &VPI) const {		CachingVPExpander::getVPLegalizationStrategy(const VPIntrinsic &VPI) const {
auto VPStrat = TTI.getVPLegalizationStrategy(VPI);		auto VPStrat = TTI.getVPLegalizationStrategy(VPI);
if (LLVM_LIKELY(!UsingTTIOverrides)) {		if (LLVM_LIKELY(!UsingTTIOverrides)) {
		craig.topperUnsubmitted Not Done Reply Inline Actions Capitalize craig.topper: Capitalize
// No overrides - we are in production.		// No overrides - we are in production.
return VPStrat;		return VPStrat;
}		}

// Overrides set - we are in testing, the following does not need to be		// Overrides set - we are in testing, the following does not need to be
// efficient.		// efficient.
VPStrat.EVLParamStrategy = parseOverrideOption(EVLTransformOverride);		VPStrat.EVLParamStrategy = parseOverrideOption(EVLTransformOverride);
VPStrat.OpStrategy = parseOverrideOption(MaskTransformOverride);		VPStrat.OpStrategy = parseOverrideOption(MaskTransformOverride);
return VPStrat;		return VPStrat;
}		}

/// \brief Expand llvm.vp.* intrinsics as requested by \p TTI.		/// \brief Expand llvm.vp.* intrinsics as requested by \p TTI.
bool CachingVPExpander::expandVectorPredication() {		bool CachingVPExpander::expandVectorPredication() {
SmallVector<TransformJob, 16> Worklist;		SmallVector<TransformJob, 16> Worklist;

// Collect all VPIntrinsics that need expansion and determine their expansion		// Collect all VPIntrinsics that need expansion and determine their expansion
// strategy.		// strategy.
		craig.topperUnsubmitted Not Done Reply Inline Actions break should be inside the curly braces craig.topper: break should be inside the curly braces
for (auto &I : instructions(F)) {		for (auto &I : instructions(F)) {
auto *VPI = dyn_cast<VPIntrinsic>(&I);		auto *VPI = dyn_cast<VPIntrinsic>(&I);
if (!VPI)		if (!VPI)
continue;		continue;
auto VPStrat = getVPLegalizationStrategy(*VPI);		auto VPStrat = getVPLegalizationStrategy(*VPI);
sanitizeStrategy(*VPI, VPStrat);		sanitizeStrategy(*VPI, VPStrat);
		craig.topperUnsubmitted Not Done Reply Inline Actions Same here craig.topper: Same here
if (!VPStrat.shouldDoNothing())		if (!VPStrat.shouldDoNothing())
Worklist.emplace_back(VPI, VPStrat);		Worklist.emplace_back(VPI, VPStrat);
}		}
if (Worklist.empty())		if (Worklist.empty())
return false;		return false;

// Transform all VPIntrinsics on the worklist.		// Transform all VPIntrinsics on the worklist.
LLVM_DEBUG(dbgs() << "\n:::: Transforming " << Worklist.size()		LLVM_DEBUG(dbgs() << "\n:::: Transforming " << Worklist.size()
<< " instructions ::::\n");		<< " instructions ::::\n");
for (TransformJob Job : Worklist) {		for (TransformJob Job : Worklist) {
// Transform the EVL parameter.		// Transform the EVL parameter.
switch (Job.Strategy.EVLParamStrategy) {		switch (Job.Strategy.EVLParamStrategy) {
case VPLegalization::Legal:		case VPLegalization::Legal:
		craig.topperUnsubmitted Not Done Reply Inline Actions Drop parentheses around the statement craig.topper: Drop parentheses around the statement
break;		break;
case VPLegalization::Discard:		case VPLegalization::Discard:
discardEVLParameter(*Job.PI);		discardEVLParameter(*Job.PI);
break;		break;
case VPLegalization::Convert:		case VPLegalization::Convert:
if (foldEVLIntoMask(*Job.PI))		if (foldEVLIntoMask(*Job.PI))
		craig.topperUnsubmitted Not Done Reply Inline Actions Using StringRef for Prefix should eliminate the need for Twine constructor here craig.topper: Using StringRef for Prefix should eliminate the need for Twine constructor here
++NumFoldedVL;		++NumFoldedVL;
break;		break;
}		}
Job.Strategy.EVLParamStrategy = VPLegalization::Legal;		Job.Strategy.EVLParamStrategy = VPLegalization::Legal;

// Replace with a non-predicated operation.		// Replace with a non-predicated operation.
switch (Job.Strategy.OpStrategy) {		switch (Job.Strategy.OpStrategy) {
case VPLegalization::Legal:		case VPLegalization::Legal:
▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

llvm/test/CodeGen/Generic/expand-vp-load-store.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt --expandvp -S < %s \| FileCheck %s
				; RUN: opt --expandvp --expandvp-override-evl-transform=Legal --expandvp-override-mask-transform=Convert -S < %s \| FileCheck %s

				; Fixed vectors
				define <2 x i64> @vpload_v2i64(<2 x i64>* %ptr, <2 x i1> %m, i32 zeroext %evl) {
				; CHECK-LABEL: @vpload_v2i64(
				; CHECK-NEXT: [[DOTSPLATINSERT:%.]] = insertelement <2 x i32> poison, i32 [[EVL:%.]], i32 0
				; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <2 x i32> [[DOTSPLATINSERT]], <2 x i32> poison, <2 x i32> zeroinitializer
				; CHECK-NEXT: [[TMP1:%.*]] = icmp ult <2 x i32> <i32 0, i32 1>, [[DOTSPLAT]]
				; CHECK-NEXT: [[TMP2:%.]] = and <2 x i1> [[TMP1]], [[M:%.]]
				; CHECK-NEXT: [[TMP3:%.]] = call <2 x i64> @llvm.masked.load.v2i64.p0v2i64(<2 x i64> [[PTR:%.*]], i32 1, <2 x i1> [[TMP2]], <2 x i64> undef)
				; CHECK-NEXT: ret <2 x i64> [[TMP3]]
				;
				%load = call <2 x i64> @llvm.vp.load.v2i64.p0v2i64(<2 x i64>* %ptr, <2 x i1> %m, i32 %evl)
				ret <2 x i64> %load
				frasercrmckUnsubmitted Done Reply Inline Actions We should be testing the `IsUnmasked` path here too. frasercrmck: We should be testing the `IsUnmasked` path here too.
				}

				define <2 x i64> @vpload_v2i64_vlmax(<2 x i64>* %ptr, <2 x i1> %m) {
				; CHECK-LABEL: @vpload_v2i64_vlmax(
				; CHECK-NEXT: [[TMP1:%.]] = call <2 x i64> @llvm.masked.load.v2i64.p0v2i64(<2 x i64> [[PTR:%.]], i32 1, <2 x i1> [[M:%.]], <2 x i64> undef)
				; CHECK-NEXT: ret <2 x i64> [[TMP1]]
				;
				%load = call <2 x i64> @llvm.vp.load.v2i64.p0v2i64(<2 x i64>* %ptr, <2 x i1> %m, i32 2)
				ret <2 x i64> %load
				}

				define <2 x i64> @vpload_v2i64_allones_mask(<2 x i64>* %ptr, i32 zeroext %evl) {
				; CHECK-LABEL: @vpload_v2i64_allones_mask(
				; CHECK-NEXT: [[DOTSPLATINSERT:%.]] = insertelement <2 x i32> poison, i32 [[EVL:%.]], i32 0
				; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <2 x i32> [[DOTSPLATINSERT]], <2 x i32> poison, <2 x i32> zeroinitializer
				; CHECK-NEXT: [[TMP1:%.*]] = icmp ult <2 x i32> <i32 0, i32 1>, [[DOTSPLAT]]
				; CHECK-NEXT: [[TMP2:%.*]] = and <2 x i1> [[TMP1]], <i1 true, i1 true>
				; CHECK-NEXT: [[TMP3:%.]] = call <2 x i64> @llvm.masked.load.v2i64.p0v2i64(<2 x i64> [[PTR:%.*]], i32 1, <2 x i1> [[TMP2]], <2 x i64> undef)
				; CHECK-NEXT: ret <2 x i64> [[TMP3]]
				frasercrmckUnsubmitted Not Done Reply Inline Actions Shouldn't we see regular `load` here? frasercrmck: Shouldn't we see regular `load` here?
				loralbAuthorUnsubmitted Done Reply Inline Actions If I understand this correctly, since the `evl` value is unknown, using here a regular `load` means we may try to load elements that we should not, hence why we need the masked version loralb: If I understand this correctly, since the `evl` value is unknown, using here a regular `load`…
				;
				%load = call <2 x i64> @llvm.vp.load.v2i64.p0v2i64(<2 x i64>* %ptr, <2 x i1> <i1 1, i1 1>, i32 %evl)
				ret <2 x i64> %load
				}

				define <2 x i64> @vpload_v2i64_allones_mask_vlmax(<2 x i64>* %ptr) {
				; CHECK-LABEL: @vpload_v2i64_allones_mask_vlmax(
				; CHECK-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> [[PTR:%.*]], align 16
				; CHECK-NEXT: ret <2 x i64> [[TMP1]]
				;
				%load = call <2 x i64> @llvm.vp.load.v2i64.p0v2i64(<2 x i64>* %ptr, <2 x i1> <i1 1, i1 1>, i32 2)
				ret <2 x i64> %load
				}

				define void @vpstore_v2i64(<2 x i64> %val, <2 x i64>* %ptr, <2 x i1> %m, i32 zeroext %evl) {
				; CHECK-LABEL: @vpstore_v2i64(
				; CHECK-NEXT: [[DOTSPLATINSERT:%.]] = insertelement <2 x i32> poison, i32 [[EVL:%.]], i32 0
				; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <2 x i32> [[DOTSPLATINSERT]], <2 x i32> poison, <2 x i32> zeroinitializer
				; CHECK-NEXT: [[TMP1:%.*]] = icmp ult <2 x i32> <i32 0, i32 1>, [[DOTSPLAT]]
				; CHECK-NEXT: [[TMP2:%.]] = and <2 x i1> [[TMP1]], [[M:%.]]
				; CHECK-NEXT: call void @llvm.masked.store.v2i64.p0v2i64(<2 x i64> [[VAL:%.]], <2 x i64> [[PTR:%.*]], i32 1, <2 x i1> [[TMP2]])
				; CHECK-NEXT: ret void
				;
				call void @llvm.vp.store.v2i64.p0v2i64(<2 x i64> %val, <2 x i64>* %ptr, <2 x i1> %m, i32 %evl)
				ret void
				}

				define void @vpstore_v2i64_vlmax(<2 x i64> %val, <2 x i64>* %ptr, <2 x i1> %m) {
				; CHECK-LABEL: @vpstore_v2i64_vlmax(
				; CHECK-NEXT: call void @llvm.masked.store.v2i64.p0v2i64(<2 x i64> [[VAL:%.]], <2 x i64> [[PTR:%.]], i32 1, <2 x i1> [[M:%.]])
				; CHECK-NEXT: ret void
				;
				call void @llvm.vp.store.v2i64.p0v2i64(<2 x i64> %val, <2 x i64>* %ptr, <2 x i1> %m, i32 2)
				ret void
				}

				define void @vpstore_v2i64_allones_mask(<2 x i64> %val, <2 x i64>* %ptr, i32 zeroext %evl) {
				; CHECK-LABEL: @vpstore_v2i64_allones_mask(
				; CHECK-NEXT: [[DOTSPLATINSERT:%.]] = insertelement <2 x i32> poison, i32 [[EVL:%.]], i32 0
				; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <2 x i32> [[DOTSPLATINSERT]], <2 x i32> poison, <2 x i32> zeroinitializer
				; CHECK-NEXT: [[TMP1:%.*]] = icmp ult <2 x i32> <i32 0, i32 1>, [[DOTSPLAT]]
				; CHECK-NEXT: [[TMP2:%.*]] = and <2 x i1> [[TMP1]], <i1 true, i1 true>
				; CHECK-NEXT: call void @llvm.masked.store.v2i64.p0v2i64(<2 x i64> [[VAL:%.]], <2 x i64> [[PTR:%.*]], i32 1, <2 x i1> [[TMP2]])
				; CHECK-NEXT: ret void
				frasercrmckUnsubmitted Not Done Reply Inline Actions Same here: regular `store`? frasercrmck: Same here: regular `store`?
				loralbAuthorUnsubmitted Done Reply Inline Actions This should work like the `load` above loralb: This should work like the `load` above
				;
				call void @llvm.vp.store.v2i64.p0v2i64(<2 x i64> %val, <2 x i64>* %ptr, <2 x i1> <i1 1, i1 1>, i32 %evl)
				ret void
				}

				define void @vpstore_v2i64_allones_mask_vlmax(<2 x i64> %val, <2 x i64>* %ptr) {
				; CHECK-LABEL: @vpstore_v2i64_allones_mask_vlmax(
				; CHECK-NEXT: store <2 x i64> [[VAL:%.]], <2 x i64> [[PTR:%.*]], align 16
				; CHECK-NEXT: ret void
				;
				call void @llvm.vp.store.v2i64.p0v2i64(<2 x i64> %val, <2 x i64>* %ptr, <2 x i1> <i1 1, i1 1>, i32 2)
				ret void
				}

				; Scalable vectors
				define <vscale x 1 x i64> @vpload_nxv1i64(<vscale x 1 x i64>* %ptr, <vscale x 1 x i1> %m, i32 zeroext %evl) {
				; CHECK-LABEL: @vpload_nxv1i64(
				; CHECK-NEXT: [[TMP1:%.]] = call <vscale x 1 x i1> @llvm.get.active.lane.mask.nxv1i1.i32(i32 0, i32 [[EVL:%.]])
				; CHECK-NEXT: [[TMP2:%.]] = and <vscale x 1 x i1> [[TMP1]], [[M:%.]]
				; CHECK-NEXT: [[VSCALE:%.*]] = call i32 @llvm.vscale.i32()
				; CHECK-NEXT: [[SCALABLE_SIZE:%.*]] = mul nuw i32 [[VSCALE]], 1
				; CHECK-NEXT: [[TMP3:%.]] = call <vscale x 1 x i64> @llvm.masked.load.nxv1i64.p0nxv1i64(<vscale x 1 x i64> [[PTR:%.*]], i32 1, <vscale x 1 x i1> [[TMP2]], <vscale x 1 x i64> undef)
				; CHECK-NEXT: ret <vscale x 1 x i64> [[TMP3]]
				;
				%load = call <vscale x 1 x i64> @llvm.vp.load.nxv1i64.p0nxv1i64(<vscale x 1 x i64>* %ptr, <vscale x 1 x i1> %m, i32 %evl)
				ret <vscale x 1 x i64> %load
				}

				define <vscale x 1 x i64> @vpload_nxv1i64_vscale(<vscale x 1 x i64>* %ptr, <vscale x 1 x i1> %m) {
				; CHECK-LABEL: @vpload_nxv1i64_vscale(
				; CHECK-NEXT: [[VSCALE:%.*]] = call i32 @llvm.vscale.i32()
				; CHECK-NEXT: [[VLMAX:%.*]] = mul nuw i32 [[VSCALE]], 1
				; CHECK-NEXT: [[TMP1:%.]] = call <vscale x 1 x i64> @llvm.masked.load.nxv1i64.p0nxv1i64(<vscale x 1 x i64> [[PTR:%.]], i32 1, <vscale x 1 x i1> [[M:%.]], <vscale x 1 x i64> undef)
				; CHECK-NEXT: ret <vscale x 1 x i64> [[TMP1]]
				;
				%vscale = call i32 @llvm.vscale.i32()
				%vlmax = mul nuw i32 %vscale, 1
				%load = call <vscale x 1 x i64> @llvm.vp.load.nxv1i64.p0nxv1i64(<vscale x 1 x i64>* %ptr, <vscale x 1 x i1> %m, i32 %vlmax)
				ret <vscale x 1 x i64> %load
				}

				define <vscale x 1 x i64> @vpload_nxv1i64_allones_mask(<vscale x 1 x i64>* %ptr, i32 zeroext %evl) {
				; CHECK-LABEL: @vpload_nxv1i64_allones_mask(
				; CHECK-NEXT: [[TMP1:%.]] = call <vscale x 1 x i1> @llvm.get.active.lane.mask.nxv1i1.i32(i32 0, i32 [[EVL:%.]])
				; CHECK-NEXT: [[TMP2:%.*]] = and <vscale x 1 x i1> [[TMP1]], shufflevector (<vscale x 1 x i1> insertelement (<vscale x 1 x i1> poison, i1 true, i32 0), <vscale x 1 x i1> poison, <vscale x 1 x i32> zeroinitializer)
				; CHECK-NEXT: [[VSCALE:%.*]] = call i32 @llvm.vscale.i32()
				; CHECK-NEXT: [[SCALABLE_SIZE:%.*]] = mul nuw i32 [[VSCALE]], 1
				; CHECK-NEXT: [[TMP3:%.]] = call <vscale x 1 x i64> @llvm.masked.load.nxv1i64.p0nxv1i64(<vscale x 1 x i64> [[PTR:%.*]], i32 1, <vscale x 1 x i1> [[TMP2]], <vscale x 1 x i64> undef)
				; CHECK-NEXT: ret <vscale x 1 x i64> [[TMP3]]
				;
				%load = call <vscale x 1 x i64> @llvm.vp.load.nxv1i64.p0nxv1i64(<vscale x 1 x i64>* %ptr, <vscale x 1 x i1> shufflevector (<vscale x 1 x i1> insertelement (<vscale x 1 x i1> poison, i1 true, i32 0), <vscale x 1 x i1> poison, <vscale x 1 x i32> zeroinitializer), i32 %evl)
				ret <vscale x 1 x i64> %load
				}

				define <vscale x 1 x i64> @vpload_nxv1i64_allones_mask_vscale(<vscale x 1 x i64>* %ptr) {
				; CHECK-LABEL: @vpload_nxv1i64_allones_mask_vscale(
				; CHECK-NEXT: [[VSCALE:%.*]] = call i32 @llvm.vscale.i32()
				; CHECK-NEXT: [[VLMAX:%.*]] = mul nuw i32 [[VSCALE]], 1
				; CHECK-NEXT: [[TMP1:%.]] = load <vscale x 1 x i64>, <vscale x 1 x i64> [[PTR:%.*]], align 8
				; CHECK-NEXT: ret <vscale x 1 x i64> [[TMP1]]
				;
				%vscale = call i32 @llvm.vscale.i32()
				%vlmax = mul nuw i32 %vscale, 1
				%load = call <vscale x 1 x i64> @llvm.vp.load.nxv1i64.p0nxv1i64(<vscale x 1 x i64>* %ptr, <vscale x 1 x i1> shufflevector (<vscale x 1 x i1> insertelement (<vscale x 1 x i1> poison, i1 true, i32 0), <vscale x 1 x i1> poison, <vscale x 1 x i32> zeroinitializer), i32 %vlmax)
				ret <vscale x 1 x i64> %load
				}

				define void @vpstore_nxv1i64(<vscale x 1 x i64> %val, <vscale x 1 x i64>* %ptr, <vscale x 1 x i1> %m, i32 zeroext %evl) {
				; CHECK-LABEL: @vpstore_nxv1i64(
				; CHECK-NEXT: [[TMP1:%.]] = call <vscale x 1 x i1> @llvm.get.active.lane.mask.nxv1i1.i32(i32 0, i32 [[EVL:%.]])
				; CHECK-NEXT: [[TMP2:%.]] = and <vscale x 1 x i1> [[TMP1]], [[M:%.]]
				; CHECK-NEXT: [[VSCALE:%.*]] = call i32 @llvm.vscale.i32()
				; CHECK-NEXT: [[SCALABLE_SIZE:%.*]] = mul nuw i32 [[VSCALE]], 1
				; CHECK-NEXT: call void @llvm.masked.store.nxv1i64.p0nxv1i64(<vscale x 1 x i64> [[VAL:%.]], <vscale x 1 x i64> [[PTR:%.*]], i32 1, <vscale x 1 x i1> [[TMP2]])
				; CHECK-NEXT: ret void
				;
				call void @llvm.vp.store.nxv1i64.p0nxv1i64(<vscale x 1 x i64> %val, <vscale x 1 x i64>* %ptr, <vscale x 1 x i1> %m, i32 %evl)
				ret void
				}

				define void @vpstore_nxv1i64_vscale(<vscale x 1 x i64> %val, <vscale x 1 x i64>* %ptr, <vscale x 1 x i1> %m, i32 zeroext %evl) {
				; CHECK-LABEL: @vpstore_nxv1i64_vscale(
				; CHECK-NEXT: [[VSCALE:%.*]] = call i32 @llvm.vscale.i32()
				; CHECK-NEXT: [[VLMAX:%.*]] = mul nuw i32 [[VSCALE]], 1
				; CHECK-NEXT: call void @llvm.masked.store.nxv1i64.p0nxv1i64(<vscale x 1 x i64> [[VAL:%.]], <vscale x 1 x i64> [[PTR:%.]], i32 1, <vscale x 1 x i1> [[M:%.]])
				; CHECK-NEXT: ret void
				;
				%vscale = call i32 @llvm.vscale.i32()
				%vlmax = mul nuw i32 %vscale, 1
				call void @llvm.vp.store.nxv1i64.p0nxv1i64(<vscale x 1 x i64> %val, <vscale x 1 x i64>* %ptr, <vscale x 1 x i1> %m, i32 %vlmax)
				ret void
				}

				define void @vpstore_nxv1i64_allones_mask(<vscale x 1 x i64> %val, <vscale x 1 x i64>* %ptr, i32 zeroext %evl) {
				; CHECK-LABEL: @vpstore_nxv1i64_allones_mask(
				; CHECK-NEXT: [[TMP1:%.]] = call <vscale x 1 x i1> @llvm.get.active.lane.mask.nxv1i1.i32(i32 0, i32 [[EVL:%.]])
				; CHECK-NEXT: [[TMP2:%.*]] = and <vscale x 1 x i1> [[TMP1]], shufflevector (<vscale x 1 x i1> insertelement (<vscale x 1 x i1> poison, i1 true, i32 0), <vscale x 1 x i1> poison, <vscale x 1 x i32> zeroinitializer)
				; CHECK-NEXT: [[VSCALE:%.*]] = call i32 @llvm.vscale.i32()
				; CHECK-NEXT: [[SCALABLE_SIZE:%.*]] = mul nuw i32 [[VSCALE]], 1
				; CHECK-NEXT: call void @llvm.masked.store.nxv1i64.p0nxv1i64(<vscale x 1 x i64> [[VAL:%.]], <vscale x 1 x i64> [[PTR:%.*]], i32 1, <vscale x 1 x i1> [[TMP2]])
				; CHECK-NEXT: ret void
				;
				call void @llvm.vp.store.nxv1i64.p0nxv1i64(<vscale x 1 x i64> %val, <vscale x 1 x i64>* %ptr, <vscale x 1 x i1> shufflevector (<vscale x 1 x i1> insertelement (<vscale x 1 x i1> poison, i1 true, i32 0), <vscale x 1 x i1> poison, <vscale x 1 x i32> zeroinitializer), i32 %evl)
				ret void
				}

				define void @vpstore_nxv1i64_allones_mask_vscale(<vscale x 1 x i64> %val, <vscale x 1 x i64>* %ptr) {
				; CHECK-LABEL: @vpstore_nxv1i64_allones_mask_vscale(
				; CHECK-NEXT: [[VSCALE:%.*]] = call i32 @llvm.vscale.i32()
				; CHECK-NEXT: [[VLMAX:%.*]] = mul nuw i32 [[VSCALE]], 1
				; CHECK-NEXT: store <vscale x 1 x i64> [[VAL:%.]], <vscale x 1 x i64> [[PTR:%.*]], align 8
				; CHECK-NEXT: ret void
				;
				%vscale = call i32 @llvm.vscale.i32()
				%vlmax = mul nuw i32 %vscale, 1
				call void @llvm.vp.store.nxv1i64.p0nxv1i64(<vscale x 1 x i64> %val, <vscale x 1 x i64>* %ptr, <vscale x 1 x i1> shufflevector (<vscale x 1 x i1> insertelement (<vscale x 1 x i1> poison, i1 true, i32 0), <vscale x 1 x i1> poison, <vscale x 1 x i32> zeroinitializer), i32 %vlmax)
				ret void
				}

				declare i32 @llvm.vscale.i32()

				declare <2 x i64> @llvm.vp.load.v2i64.p0v2i64(<2 x i64>*, <2 x i1>, i32)
				declare void @llvm.vp.store.v2i64.p0v2i64(<2 x i64>, <2 x i64>*, <2 x i1>, i32)

				declare <vscale x 1 x i64> @llvm.vp.load.nxv1i64.p0nxv1i64(<vscale x 1 x i64>*, <vscale x 1 x i1>, i32)
				declare void @llvm.vp.store.nxv1i64.p0nxv1i64(<vscale x 1 x i64>, <vscale x 1 x i64>*, <vscale x 1 x i1>, i32)

This is an archive of the discontinued LLVM Phabricator instance.

[VP] Implementing expansion pass for VP load and store.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 445394

llvm/lib/CodeGen/ExpandVectorPredication.cpp

llvm/test/CodeGen/Generic/expand-vp-load-store.ll

[VP] Implementing expansion pass for VP load and store.
ClosedPublic