This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
MachineFunction.h
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
-
LegalizeVectorTypes.cpp
-
SelectionDAGBuilder.cpp
-
Target/Hexagon/
-
Hexagon/
-
HexagonISelLoweringHVX.cpp
-
test/CodeGen/
-
CodeGen/
-
Thumb2/
-
mve-masked-store-mmo.ll
-
X86/
-
masked_compressstore.ll
-
vmaskmov-offset.ll

Differential D113888

[SDAG] Use UnknownSize for masked load/store MMO size
ClosedPublic

Authored by dmgreen on Nov 15 2021, 5:37 AM.

Download Raw Diff

Details

Reviewers

efriedma
craig.topper
SjoerdMeijer
RKSimon
nikic
kparzysz

Commits

rG32b6c17b2907: [SDAG] Use UnknownSize for masked load/store MMO size

Summary

A masked load or store will load a potentially unknown size of bytes from a memory location - that is not generally known at compile time. They do not necessarily load/store the entire vector width, and treating them as such can lead to incorrect aliasing information (for example, if the underlying object is smaller than the size of the vector).

This makes sure that the MMO is given an unknown size to represent this. which is less accurate that "may load/store from up to 16 bytes", but less incorrect that "will load/store from 16 bytes".

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dmgreen created this revision.Nov 15 2021, 5:37 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptNov 15 2021, 5:37 AM

dmgreen requested review of this revision.Nov 15 2021, 5:37 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 15 2021, 5:37 AM

Harbormaster completed remote builds in B134234: Diff 387228.Nov 15 2021, 5:38 AM

Wouldn't LocationSize::upperBound() be a better fit here? https://github.com/llvm/llvm-project/blob/95102b7dc3c1b5b3f1b688221d9aa28cb1e17974/llvm/include/llvm/Analysis/MemoryLocation.h#L115-L119

In D113888#3131285, @nikic wrote:

Wouldn't LocationSize::upperBound() be a better fit here? https://github.com/llvm/llvm-project/blob/95102b7dc3c1b5b3f1b688221d9aa28cb1e17974/llvm/include/llvm/Analysis/MemoryLocation.h#L115-L119

That would be good! Unfortunately the size is really stored as a LLT, which means we shouldn't be passing MemoryLocation::UnknownSize but can't use a LocationSize either.
https://github.com/llvm/llvm-project/blob/95102b7dc3c1b5b3f1b688221d9aa28cb1e17974/llvm/include/llvm/CodeGen/MachineFunction.h#L937
This should be creating an invalid LLT, I'll change that. The LLT gets used in the GlobalISel lowering a fair amount it seems.

Create an invalid LLT using LLT().

Harbormaster completed remote builds in B134256: Diff 387257.Nov 15 2021, 7:36 AM

They do not necessarily load/store the entire vector width, and treating them as such can lead to incorrect aliasing information (for example, if the underlying object is smaller than the size of the vector).

Does this mean any operation that's a mayload/maystore needs to have the size of the operand set to "unknown"? I suspect we need to do significantly more work than just this patch if we want memory operands to work this way. (e.g. type legalization, if conversion.) What happens if we just say that size of a memory location is supposed to represent an upper bound, at least for now?

we shouldn't be passing MemoryLocation::UnknownSize

If that's true, maybe we need to assert this? Apparently we are actually passing in MemoryLocation::UnknownSize in some places.

Does this mean any operation that's a mayload/maystore needs to have the size of the operand set to "unknown"? I suspect we need to do significantly more work than just this patch if we want memory operands to work this way. (e.g. type legalization, if conversion.) What happens if we just say that size of a memory location is supposed to represent an upper bound, at least for now?

Why would it be any operation, not just masked loads/stores? The problem is in this case the underlying object only had 10 dereferenceable chars, where as the entire vector width is 16bytes. I don't think that would apply to any normal load/store - it would already be UB at the llvm-ir level.

It's the equivalent of this code, but there is unfortunately no current way of adding MemoryLocations directly, it goes via a LLT size. Maybe we do want to change that, but I didn't want to try and do it here. It looks quite involved.
https://github.com/llvm/llvm-project/blob/913d78c40c37c9c3428285d868ce454b058e40f3/llvm/lib/Analysis/MemoryLocation.cpp#L165

The exact thing going wrong here is that this code in the scheduler is adding chain dependencies:
https://github.com/llvm/llvm-project/blob/913d78c40c37c9c3428285d868ce454b058e40f3/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp#L942
Which checks this AA check, which returns false, I think from basic-aa:
https://github.com/llvm/llvm-project/blob/913d78c40c37c9c3428285d868ce454b058e40f3/llvm/lib/CodeGen/ScheduleDAGInstrs.cpp#L543
So no scheduling dependencies between the instructions and we end up in the wrong order. This started happening after we moved the VPT block pass later so predicated MVE instructions are no long in bundles during the post-ra scheduler.

we shouldn't be passing MemoryLocation::UnknownSize

If that's true, maybe we need to assert this? Apparently we are actually passing in MemoryLocation::UnknownSize in some places.

Sure that sounds good, I can take a look. It might be simpler to make sute the relevant methods work sensibly with MemoryLocation::UnknownSize.

In D113888#3136093, @dmgreen wrote:

Does this mean any operation that's a mayload/maystore needs to have the size of the operand set to "unknown"? I suspect we need to do significantly more work than just this patch if we want memory operands to work this way. (e.g. type legalization, if conversion.) What happens if we just say that size of a memory location is supposed to represent an upper bound, at least for now?

Why would it be any operation, not just masked loads/stores? The problem is in this case the underlying object only had 10 dereferenceable chars, where as the entire vector width is 16bytes. I don't think that would apply to any normal load/store - it would already be UB at the llvm-ir level.

It's basically "masked" load/store operations. At the IR level, those are rare, sure. At the machine code level, there are operations other than llvm.masked.load that end up masked, though. For example, the ARM instruction "strne". Not sure how many places can end up with instructions like that off the top of my head.

The type legalization for llvm.masked.load also has the same issue as SelectionDAGBuilder.

It's basically "masked" load/store operations. At the IR level, those are rare, sure. At the machine code level, there are operations other than llvm.masked.load that end up masked, though. For example, the ARM instruction "strne". Not sure how many places can end up with instructions like that off the top of my head.

I don't believe that is how it works though, not exactly. In this case the backend is constructing a query of the form "does (%1, 16) alias (%arrayidx4, 1)". And because the underlying object of %arrayidx4 is only 10 bytes large, the answer is no as the alternative would be UB. But we are using the aliasing information at the IR level, not what the mir has become. If the mir has been altered to the point that the llvm-lr level aliasing info no longer applies then yes that would be invalid, but I haven't seen that happen anywhere. A predicated store (strne) still uses the llvm-ir level aliasing info which will be valid if the original store was in an if block (or however it was predicated). And so long as that aliasing query remains valid at the ir level, the size of the predicated store's memory location should be OK.

At least - I don't have an example of that going wrong. If you do know of a way to make it act incorrectly, let me know.

The type legalization for llvm.masked.load also has the same issue as SelectionDAGBuilder.

What do you mean by "type legalization" here exactly? I don't think global-isel handles masked loads/stores yet, and most of the illegal masked operation lowering is performed pre-isel. I wasn't sure what other type legalization you were referring to.

Updated getMachineMemOperand overload to handle -1 sizes correctly.

Harbormaster completed remote builds in B134872: Diff 388158.Nov 18 2021, 4:52 AM

In D113888#3139874, @dmgreen wrote:

It's basically "masked" load/store operations. At the IR level, those are rare, sure. At the machine code level, there are operations other than llvm.masked.load that end up masked, though. For example, the ARM instruction "strne". Not sure how many places can end up with instructions like that off the top of my head.

I don't believe that is how it works though, not exactly. In this case the backend is constructing a query of the form "does (%1, 16) alias (%arrayidx4, 1)". And because the underlying object of %arrayidx4 is only 10 bytes large, the answer is no as the alternative would be UB. But we are using the aliasing information at the IR level, not what the mir has become. If the mir has been altered to the point that the llvm-lr level aliasing info no longer applies then yes that would be invalid, but I haven't seen that happen anywhere. A predicated store (strne) still uses the llvm-ir level aliasing info which will be valid if the original store was in an if block (or however it was predicated). And so long as that aliasing query remains valid at the ir level, the size of the predicated store's memory location should be OK.

At least - I don't have an example of that going wrong. If you do know of a way to make it act incorrectly, let me know.

The type legalization for llvm.masked.load also has the same issue as SelectionDAGBuilder.

What do you mean by "type legalization" here exactly? I don't think global-isel handles masked loads/stores yet, and most of the illegal masked operation lowering is performed pre-isel. I wasn't sure what other type legalization you were referring to.

I believe Eli was referring to functions like DAGTypeLegalizer::SplitVecRes_MLOAD and DAGTypeLegalizer::SplitVecOp_MSTORE.

I believe Eli was referring to functions like DAGTypeLegalizer::SplitVecRes_MLOAD and DAGTypeLegalizer::SplitVecOp_MSTORE.

I see. Because it's recreating a MMO with the new widths of the vector, not using the existing MMO. Thanks.

Update DAGTypeLegalizer::SplitVecRes_MLOAD, DAGTypeLegalizer::SplitVecOp_MSTORE and HexagonTargetLowering::SplitHvxMemOp, which now shows some other differences in other tests. The other uses of getMaskedLoad/store looked OK from what I could see.

Herald added a subscriber: pengfei. · View Herald TranscriptNov 22 2021, 7:32 AM

Harbormaster completed remote builds in B135424: Diff 388913.Nov 22 2021, 7:33 AM

LGTM

This revision is now accepted and ready to land.Nov 22 2021, 12:57 PM

Thanks

This revision was landed with ongoing or failed builds.Nov 23 2021, 1:48 AM

Closed by commit rG32b6c17b2907: [SDAG] Use UnknownSize for masked load/store MMO size (authored by dmgreen). · Explain Why

This revision was automatically updated to reflect the committed changes.

dmgreen added a commit: rG32b6c17b2907: [SDAG] Use UnknownSize for masked load/store MMO size.

craig.topper mentioned this in D114422: [VP] Propagate align parameter attr on VP load/store to ISel.Nov 24 2021, 2:27 PM

Hi, @dmgreen With this patch, the llc command crashed. Would you take a look?
llc -mcpu=core-avx2 main.ll

define void @main() unnamed_addr #0 {
entry:
  %P.i150.i.i = alloca [3 x [3 x double]], align 16
  %0 = bitcast [3 x [3 x double]]* %P.i150.i.i to <8 x double>*
  call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> zeroinitializer, <8 x double>* %0, i32 8, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false>)
  ret void
}

; Function Attrs: argmemonly nofree nosync nounwind willreturn writeonly
declare void @llvm.masked.store.v8f64.p0v8f64(<8 x double>, <8 x double>*, i32 immarg, <8 x i1>)

In D113888#3152950, @yubing wrote:

Hi, @dmgreen With this patch, the llc command crashed. Would you take a look?
llc -mcpu=core-avx2 main.ll

define void @main() unnamed_addr #0 {
entry:
  %P.i150.i.i = alloca [3 x [3 x double]], align 16
  %0 = bitcast [3 x [3 x double]]* %P.i150.i.i to <8 x double>*
  call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> zeroinitializer, <8 x double>* %0, i32 8, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false>)
  ret void
}

; Function Attrs: argmemonly nofree nosync nounwind willreturn writeonly
declare void @llvm.masked.store.v8f64.p0v8f64(<8 x double>, <8 x double>*, i32 immarg, <8 x i1>)

Thanks. Seems that a masked store is turned back into an (unmasked) store, still with an unknown size. The assert seems superfluous, but I will try and fix the size when we promote the masked store to a store too. Thanks for the simple reproducer.

dmgreen mentioned this in rG3a700cabdcba: [SDAG] Allow Unknown sizes when refining MMO alignments. NFC.Nov 25 2021, 2:19 AM

dmgreen mentioned this in D114582: [SDAG] Refine MMO size when converting masked load/store to normal load/store.Nov 25 2021, 4:27 AM

frasercrmck mentioned this in D115036: [SelectionDAG] Use UnknownSize for VP memory ops.Dec 3 2021, 4:07 AM

frasercrmck mentioned this in rG40d51de5cb72: [SelectionDAG] Use UnknownSize for VP memory ops.Dec 7 2021, 3:00 AM

dmgreen mentioned this in rG5d7efd4758b3: [SDAG] Refine MMO size when converting masked load/store to normal load/store.Dec 8 2021, 2:13 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

MachineFunction.h

3 lines

lib/

CodeGen/

SelectionDAG/

LegalizeVectorTypes.cpp

21 lines

SelectionDAGBuilder.cpp

16 lines

Target/

Hexagon/

HexagonISelLoweringHVX.cpp

11 lines

test/

CodeGen/

Thumb2/

mve-masked-store-mmo.ll

8 lines

X86/

masked_compressstore.ll

8 lines

vmaskmov-offset.ll

16 lines

Diff 389123

llvm/include/llvm/CodeGen/MachineFunction.h

Show First 20 Lines • Show All 932 Lines • ▼ Show 20 Lines	public:
/// getMachineMemOperand - Allocate a new MachineMemOperand by copying		/// getMachineMemOperand - Allocate a new MachineMemOperand by copying
/// an existing one, adjusting by an offset and using the given size.		/// an existing one, adjusting by an offset and using the given size.
/// MachineMemOperands are owned by the MachineFunction and need not be		/// MachineMemOperands are owned by the MachineFunction and need not be
/// explicitly deallocated.		/// explicitly deallocated.
MachineMemOperand getMachineMemOperand(const MachineMemOperand MMO,		MachineMemOperand getMachineMemOperand(const MachineMemOperand MMO,
int64_t Offset, LLT Ty);		int64_t Offset, LLT Ty);
MachineMemOperand getMachineMemOperand(const MachineMemOperand MMO,		MachineMemOperand getMachineMemOperand(const MachineMemOperand MMO,
int64_t Offset, uint64_t Size) {		int64_t Offset, uint64_t Size) {
return getMachineMemOperand(MMO, Offset, LLT::scalar(8 * Size));		return getMachineMemOperand(
		MMO, Offset, Size == ~UINT64_C(0) ? LLT() : LLT::scalar(8 * Size));
}		}

/// getMachineMemOperand - Allocate a new MachineMemOperand by copying		/// getMachineMemOperand - Allocate a new MachineMemOperand by copying
/// an existing one, replacing only the MachinePointerInfo and size.		/// an existing one, replacing only the MachinePointerInfo and size.
/// MachineMemOperands are owned by the MachineFunction and need not be		/// MachineMemOperands are owned by the MachineFunction and need not be
/// explicitly deallocated.		/// explicitly deallocated.
MachineMemOperand getMachineMemOperand(const MachineMemOperand MMO,		MachineMemOperand getMachineMemOperand(const MachineMemOperand MMO,
const MachinePointerInfo &PtrInfo,		const MachinePointerInfo &PtrInfo,
▲ Show 20 Lines • Show All 350 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

Show First 20 Lines • Show All 1,814 Lines • ▼ Show 20 Lines	std::tie(LoMemVT, HiMemVT) =
DAG.GetDependentSplitDestVTs(MemoryVT, LoVT, &HiIsEmpty);		DAG.GetDependentSplitDestVTs(MemoryVT, LoVT, &HiIsEmpty);

SDValue PassThruLo, PassThruHi;		SDValue PassThruLo, PassThruHi;
if (getTypeAction(PassThru.getValueType()) == TargetLowering::TypeSplitVector)		if (getTypeAction(PassThru.getValueType()) == TargetLowering::TypeSplitVector)
GetSplitVector(PassThru, PassThruLo, PassThruHi);		GetSplitVector(PassThru, PassThruLo, PassThruHi);
else		else
std::tie(PassThruLo, PassThruHi) = DAG.SplitVector(PassThru, dl);		std::tie(PassThruLo, PassThruHi) = DAG.SplitVector(PassThru, dl);

unsigned LoSize = MemoryLocation::getSizeOrUnknown(LoMemVT.getStoreSize());
MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(		MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
MLD->getPointerInfo(), MachineMemOperand::MOLoad, LoSize, Alignment,		MLD->getPointerInfo(), MachineMemOperand::MOLoad,
MLD->getAAInfo(), MLD->getRanges());		MemoryLocation::UnknownSize, Alignment, MLD->getAAInfo(),
		MLD->getRanges());

Lo = DAG.getMaskedLoad(LoVT, dl, Ch, Ptr, Offset, MaskLo, PassThruLo, LoMemVT,		Lo = DAG.getMaskedLoad(LoVT, dl, Ch, Ptr, Offset, MaskLo, PassThruLo, LoMemVT,
MMO, MLD->getAddressingMode(), ExtType,		MMO, MLD->getAddressingMode(), ExtType,
MLD->isExpandingLoad());		MLD->isExpandingLoad());

if (HiIsEmpty) {		if (HiIsEmpty) {
// The hi masked load has zero storage size. We therefore simply set it to		// The hi masked load has zero storage size. We therefore simply set it to
// the low masked load and rely on subsequent removal from the chain.		// the low masked load and rely on subsequent removal from the chain.
Hi = Lo;		Hi = Lo;
} else {		} else {
// Generate hi masked load.		// Generate hi masked load.
Ptr = TLI.IncrementMemoryAddress(Ptr, MaskLo, dl, LoMemVT, DAG,		Ptr = TLI.IncrementMemoryAddress(Ptr, MaskLo, dl, LoMemVT, DAG,
MLD->isExpandingLoad());		MLD->isExpandingLoad());
unsigned HiSize = MemoryLocation::getSizeOrUnknown(HiMemVT.getStoreSize());

MachinePointerInfo MPI;		MachinePointerInfo MPI;
if (LoMemVT.isScalableVector())		if (LoMemVT.isScalableVector())
MPI = MachinePointerInfo(MLD->getPointerInfo().getAddrSpace());		MPI = MachinePointerInfo(MLD->getPointerInfo().getAddrSpace());
else		else
MPI = MLD->getPointerInfo().getWithOffset(		MPI = MLD->getPointerInfo().getWithOffset(
LoMemVT.getStoreSize().getFixedSize());		LoMemVT.getStoreSize().getFixedSize());

MMO = DAG.getMachineFunction().getMachineMemOperand(		MMO = DAG.getMachineFunction().getMachineMemOperand(
MPI, MachineMemOperand::MOLoad, HiSize, Alignment, MLD->getAAInfo(),		MPI, MachineMemOperand::MOLoad, MemoryLocation::UnknownSize, Alignment,
MLD->getRanges());		MLD->getAAInfo(), MLD->getRanges());

Hi = DAG.getMaskedLoad(HiVT, dl, Ch, Ptr, Offset, MaskHi, PassThruHi,		Hi = DAG.getMaskedLoad(HiVT, dl, Ch, Ptr, Offset, MaskHi, PassThruHi,
HiMemVT, MMO, MLD->getAddressingMode(), ExtType,		HiMemVT, MMO, MLD->getAddressingMode(), ExtType,
MLD->isExpandingLoad());		MLD->isExpandingLoad());
}		}

// Build a factor node to remember that this load is independent of the		// Build a factor node to remember that this load is independent of the
// other one.		// other one.
▲ Show 20 Lines • Show All 797 Lines • ▼ Show 20 Lines	SDValue DAGTypeLegalizer::SplitVecOp_MSTORE(MaskedStoreSDNode *N,

EVT MemoryVT = N->getMemoryVT();		EVT MemoryVT = N->getMemoryVT();
EVT LoMemVT, HiMemVT;		EVT LoMemVT, HiMemVT;
bool HiIsEmpty = false;		bool HiIsEmpty = false;
std::tie(LoMemVT, HiMemVT) =		std::tie(LoMemVT, HiMemVT) =
DAG.GetDependentSplitDestVTs(MemoryVT, DataLo.getValueType(), &HiIsEmpty);		DAG.GetDependentSplitDestVTs(MemoryVT, DataLo.getValueType(), &HiIsEmpty);

SDValue Lo, Hi, Res;		SDValue Lo, Hi, Res;
unsigned LoSize = MemoryLocation::getSizeOrUnknown(LoMemVT.getStoreSize());
MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(		MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
N->getPointerInfo(), MachineMemOperand::MOStore, LoSize, Alignment,		N->getPointerInfo(), MachineMemOperand::MOStore,
N->getAAInfo(), N->getRanges());		MemoryLocation::UnknownSize, Alignment, N->getAAInfo(), N->getRanges());

Lo = DAG.getMaskedStore(Ch, DL, DataLo, Ptr, Offset, MaskLo, LoMemVT, MMO,		Lo = DAG.getMaskedStore(Ch, DL, DataLo, Ptr, Offset, MaskLo, LoMemVT, MMO,
N->getAddressingMode(), N->isTruncatingStore(),		N->getAddressingMode(), N->isTruncatingStore(),
N->isCompressingStore());		N->isCompressingStore());

if (HiIsEmpty) {		if (HiIsEmpty) {
// The hi masked store has zero storage size.		// The hi masked store has zero storage size.
// Only the lo masked store is needed.		// Only the lo masked store is needed.
Res = Lo;		Res = Lo;
} else {		} else {

Ptr = TLI.IncrementMemoryAddress(Ptr, MaskLo, DL, LoMemVT, DAG,		Ptr = TLI.IncrementMemoryAddress(Ptr, MaskLo, DL, LoMemVT, DAG,
N->isCompressingStore());		N->isCompressingStore());

MachinePointerInfo MPI;		MachinePointerInfo MPI;
if (LoMemVT.isScalableVector()) {		if (LoMemVT.isScalableVector()) {
Alignment = commonAlignment(		Alignment = commonAlignment(
Alignment, LoMemVT.getSizeInBits().getKnownMinSize() / 8);		Alignment, LoMemVT.getSizeInBits().getKnownMinSize() / 8);
MPI = MachinePointerInfo(N->getPointerInfo().getAddrSpace());		MPI = MachinePointerInfo(N->getPointerInfo().getAddrSpace());
} else		} else
MPI = N->getPointerInfo().getWithOffset(		MPI = N->getPointerInfo().getWithOffset(
LoMemVT.getStoreSize().getFixedSize());		LoMemVT.getStoreSize().getFixedSize());

unsigned HiSize = MemoryLocation::getSizeOrUnknown(HiMemVT.getStoreSize());
MMO = DAG.getMachineFunction().getMachineMemOperand(		MMO = DAG.getMachineFunction().getMachineMemOperand(
MPI, MachineMemOperand::MOStore, HiSize, Alignment, N->getAAInfo(),		MPI, MachineMemOperand::MOStore, MemoryLocation::UnknownSize, Alignment,
N->getRanges());		N->getAAInfo(), N->getRanges());

Hi = DAG.getMaskedStore(Ch, DL, DataHi, Ptr, Offset, MaskHi, HiMemVT, MMO,		Hi = DAG.getMaskedStore(Ch, DL, DataHi, Ptr, Offset, MaskHi, HiMemVT, MMO,
N->getAddressingMode(), N->isTruncatingStore(),		N->getAddressingMode(), N->isTruncatingStore(),
N->isCompressingStore());		N->isCompressingStore());

// Build a factor node to remember that this store is independent of the		// Build a factor node to remember that this store is independent of the
// other one.		// other one.
Res = DAG.getNode(ISD::TokenFactor, DL, MVT::Other, Lo, Hi);		Res = DAG.getNode(ISD::TokenFactor, DL, MVT::Other, Lo, Hi);
▲ Show 20 Lines • Show All 3,146 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,330 Lines • ▼ Show 20 Lines	void SelectionDAGBuilder::visitMaskedStore(const CallInst &I,
SDValue Offset = DAG.getUNDEF(Ptr.getValueType());		SDValue Offset = DAG.getUNDEF(Ptr.getValueType());

EVT VT = Src0.getValueType();		EVT VT = Src0.getValueType();
if (!Alignment)		if (!Alignment)
Alignment = DAG.getEVTAlign(VT);		Alignment = DAG.getEVTAlign(VT);

MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(		MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
MachinePointerInfo(PtrOperand), MachineMemOperand::MOStore,		MachinePointerInfo(PtrOperand), MachineMemOperand::MOStore,
// TODO: Make MachineMemOperands aware of scalable		MemoryLocation::UnknownSize, *Alignment, I.getAAMetadata());
// vectors.
VT.getStoreSize().getKnownMinSize(), *Alignment, I.getAAMetadata());
SDValue StoreNode =		SDValue StoreNode =
DAG.getMaskedStore(getMemoryRoot(), sdl, Src0, Ptr, Offset, Mask, VT, MMO,		DAG.getMaskedStore(getMemoryRoot(), sdl, Src0, Ptr, Offset, Mask, VT, MMO,
ISD::UNINDEXED, false /* Truncating */, IsCompressing);		ISD::UNINDEXED, false /* Truncating */, IsCompressing);
DAG.setRoot(StoreNode);		DAG.setRoot(StoreNode);
setValue(&I, StoreNode);		setValue(&I, StoreNode);
}		}

// Get a uniform base for the Gather/Scatter intrinsic.		// Get a uniform base for the Gather/Scatter intrinsic.
▲ Show 20 Lines • Show All 141 Lines • ▼ Show 20 Lines	void SelectionDAGBuilder::visitMaskedLoad(const CallInst &I, bool IsExpanding) {
EVT VT = Src0.getValueType();		EVT VT = Src0.getValueType();
if (!Alignment)		if (!Alignment)
Alignment = DAG.getEVTAlign(VT);		Alignment = DAG.getEVTAlign(VT);

AAMDNodes AAInfo = I.getAAMetadata();		AAMDNodes AAInfo = I.getAAMetadata();
const MDNode *Ranges = I.getMetadata(LLVMContext::MD_range);		const MDNode *Ranges = I.getMetadata(LLVMContext::MD_range);

// Do not serialize masked loads of constant memory with anything.		// Do not serialize masked loads of constant memory with anything.
MemoryLocation ML;		MemoryLocation ML = MemoryLocation::getAfter(PtrOperand, AAInfo);
if (VT.isScalableVector())
ML = MemoryLocation::getAfter(PtrOperand);
else
ML = MemoryLocation(PtrOperand, LocationSize::precise(
DAG.getDataLayout().getTypeStoreSize(I.getType())),
AAInfo);
bool AddToChain = !AA \|\| !AA->pointsToConstantMemory(ML);		bool AddToChain = !AA \|\| !AA->pointsToConstantMemory(ML);

SDValue InChain = AddToChain ? DAG.getRoot() : DAG.getEntryNode();		SDValue InChain = AddToChain ? DAG.getRoot() : DAG.getEntryNode();

MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(		MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
MachinePointerInfo(PtrOperand), MachineMemOperand::MOLoad,		MachinePointerInfo(PtrOperand), MachineMemOperand::MOLoad,
// TODO: Make MachineMemOperands aware of scalable		MemoryLocation::UnknownSize, *Alignment, AAInfo, Ranges);
// vectors.
VT.getStoreSize().getKnownMinSize(), *Alignment, AAInfo, Ranges);

SDValue Load =		SDValue Load =
DAG.getMaskedLoad(VT, sdl, InChain, Ptr, Offset, Mask, Src0, VT, MMO,		DAG.getMaskedLoad(VT, sdl, InChain, Ptr, Offset, Mask, Src0, VT, MMO,
ISD::UNINDEXED, ISD::NON_EXTLOAD, IsExpanding);		ISD::UNINDEXED, ISD::NON_EXTLOAD, IsExpanding);
if (AddToChain)		if (AddToChain)
PendingLoads.push_back(Load.getValue(1));		PendingLoads.push_back(Load.getValue(1));
setValue(&I, Load);		setValue(&I, Load);
}		}
▲ Show 20 Lines • Show All 6,757 Lines • Show Last 20 Lines

llvm/lib/Target/Hexagon/HexagonISelLoweringHVX.cpp

//===-- HexagonISelLoweringHVX.cpp --- Lowering HVX operations ------------===//		//===-- HexagonISelLoweringHVX.cpp --- Lowering HVX operations ------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "HexagonISelLowering.h"		#include "HexagonISelLowering.h"
#include "HexagonRegisterInfo.h"		#include "HexagonRegisterInfo.h"
#include "HexagonSubtarget.h"		#include "HexagonSubtarget.h"
		#include "llvm/Analysis/MemoryLocation.h"
#include "llvm/IR/IntrinsicsHexagon.h"		#include "llvm/IR/IntrinsicsHexagon.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"

using namespace llvm;		using namespace llvm;

static cl::opt<unsigned> HvxWidenThreshold("hexagon-hvx-widen",		static cl::opt<unsigned> HvxWidenThreshold("hexagon-hvx-widen",
cl::Hidden, cl::init(16),		cl::Hidden, cl::init(16),
cl::desc("Lower threshold (in bytes) for widening to HVX vectors"));		cl::desc("Lower threshold (in bytes) for widening to HVX vectors"));
▲ Show 20 Lines • Show All 1,821 Lines • ▼ Show 20 Lines	if (!isHvxPairTy(MemTy))
return Op;		return Op;

const SDLoc &dl(Op);		const SDLoc &dl(Op);
unsigned HwLen = Subtarget.getVectorLength();		unsigned HwLen = Subtarget.getVectorLength();
MVT SingleTy = typeSplit(MemTy).first;		MVT SingleTy = typeSplit(MemTy).first;
SDValue Chain = MemN->getChain();		SDValue Chain = MemN->getChain();
SDValue Base0 = MemN->getBasePtr();		SDValue Base0 = MemN->getBasePtr();
SDValue Base1 = DAG.getMemBasePlusOffset(Base0, TypeSize::Fixed(HwLen), dl);		SDValue Base1 = DAG.getMemBasePlusOffset(Base0, TypeSize::Fixed(HwLen), dl);
		unsigned MemOpc = MemN->getOpcode();

MachineMemOperand MOp0 = nullptr, MOp1 = nullptr;		MachineMemOperand MOp0 = nullptr, MOp1 = nullptr;
if (MachineMemOperand *MMO = MemN->getMemOperand()) {		if (MachineMemOperand *MMO = MemN->getMemOperand()) {
MachineFunction &MF = DAG.getMachineFunction();		MachineFunction &MF = DAG.getMachineFunction();
MOp0 = MF.getMachineMemOperand(MMO, 0, HwLen);		uint64_t MemSize = (MemOpc == ISD::MLOAD \|\| MemOpc == ISD::MSTORE)
MOp1 = MF.getMachineMemOperand(MMO, HwLen, HwLen);		? (uint64_t)MemoryLocation::UnknownSize
		: HwLen;
		MOp0 = MF.getMachineMemOperand(MMO, 0, MemSize);
		MOp1 = MF.getMachineMemOperand(MMO, HwLen, MemSize);
}		}

unsigned MemOpc = MemN->getOpcode();

if (MemOpc == ISD::LOAD) {		if (MemOpc == ISD::LOAD) {
assert(cast<LoadSDNode>(Op)->isUnindexed());		assert(cast<LoadSDNode>(Op)->isUnindexed());
SDValue Load0 = DAG.getLoad(SingleTy, dl, Chain, Base0, MOp0);		SDValue Load0 = DAG.getLoad(SingleTy, dl, Chain, Base0, MOp0);
SDValue Load1 = DAG.getLoad(SingleTy, dl, Chain, Base1, MOp1);		SDValue Load1 = DAG.getLoad(SingleTy, dl, Chain, Base1, MOp1);
return DAG.getMergeValues(		return DAG.getMergeValues(
{ DAG.getNode(ISD::CONCAT_VECTORS, dl, MemTy, Load0, Load1),		{ DAG.getNode(ISD::CONCAT_VECTORS, dl, MemTy, Load0, Load1),
DAG.getNode(ISD::TokenFactor, dl, MVT::Other,		DAG.getNode(ISD::TokenFactor, dl, MVT::Other,
Load0.getValue(1), Load1.getValue(1)) }, dl);		Load0.getValue(1), Load1.getValue(1)) }, dl);
▲ Show 20 Lines • Show All 494 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/mve-masked-store-mmo.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=thumbv8.1m.main-none-none-eabi -mattr=+mve -verify-machineinstrs %s -o - \| FileCheck %s			; RUN: llc -mtriple=thumbv8.1m.main-none-none-eabi -mattr=+mve -verify-machineinstrs %s -o - \| FileCheck %s

	define i32 @incorrectmmo() {			define i32 @incorrectmmo() {
	; CHECK-LABEL: incorrectmmo:			; CHECK-LABEL: incorrectmmo:
	; CHECK: @ %bb.0: @ %entry			; CHECK: @ %bb.0: @ %entry
	; CHECK-NEXT: .pad #12			; CHECK-NEXT: .pad #12
	; CHECK-NEXT: sub sp, #12			; CHECK-NEXT: sub sp, #12
	; CHECK-NEXT: movw r0, #1023			; CHECK-NEXT: movw r0, #1023
	; CHECK-NEXT: vmsr p0, r0			; CHECK-NEXT: vmsr p0, r0
	; CHECK-NEXT: adr r0, .LCPI0_0			; CHECK-NEXT: adr r0, .LCPI0_0
	; CHECK-NEXT: vldrw.u32 q0, [r0]			; CHECK-NEXT: vldrw.u32 q0, [r0]
	; CHECK-NEXT: add.w r0, sp, #2			; CHECK-NEXT: add.w r0, sp, #2
	; CHECK-NEXT: ldrb.w r1, [sp, #3]
	; CHECK-NEXT: vpst			; CHECK-NEXT: vpst
	; CHECK-NEXT: vstrbt.8 q0, [r0]			; CHECK-NEXT: vstrbt.8 q0, [r0]
	; CHECK-NEXT: ldrb.w r2, [sp, #4]
	; CHECK-NEXT: ldrb.w r0, [sp, #2]			; CHECK-NEXT: ldrb.w r0, [sp, #2]
	; CHECK-NEXT: ldrb.w r3, [sp, #10]			; CHECK-NEXT: ldrb.w r1, [sp, #3]
				; CHECK-NEXT: ldrb.w r2, [sp, #4]
	; CHECK-NEXT: add r0, r1			; CHECK-NEXT: add r0, r1
	; CHECK-NEXT: ldrb.w r1, [sp, #11]			; CHECK-NEXT: ldrb.w r3, [sp, #10]
	; CHECK-NEXT: add r0, r2			; CHECK-NEXT: add r0, r2
				; CHECK-NEXT: ldrb.w r1, [sp, #11]
	; CHECK-NEXT: add r0, r3			; CHECK-NEXT: add r0, r3
	; CHECK-NEXT: add r0, r1			; CHECK-NEXT: add r0, r1
	; CHECK-NEXT: add sp, #12			; CHECK-NEXT: add sp, #12
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	; CHECK-NEXT: .p2align 4			; CHECK-NEXT: .p2align 4
	; CHECK-NEXT: @ %bb.1:			; CHECK-NEXT: @ %bb.1:
	; CHECK-NEXT: .LCPI0_0:			; CHECK-NEXT: .LCPI0_0:
	; CHECK-NEXT: .byte 0 @ 0x0			; CHECK-NEXT: .byte 0 @ 0x0
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/masked_compressstore.ll

	Show First 20 Lines • Show All 510 Lines • ▼ Show 20 Lines
	; AVX1OR2-NEXT: vzeroupper			; AVX1OR2-NEXT: vzeroupper
	; AVX1OR2-NEXT: retq			; AVX1OR2-NEXT: retq
	;			;
	; AVX512F-LABEL: compressstore_v16f64_v16i1:			; AVX512F-LABEL: compressstore_v16f64_v16i1:
	; AVX512F: ## %bb.0:			; AVX512F: ## %bb.0:
	; AVX512F-NEXT: vpmovsxbd %xmm2, %zmm2			; AVX512F-NEXT: vpmovsxbd %xmm2, %zmm2
	; AVX512F-NEXT: vpslld $31, %zmm2, %zmm2			; AVX512F-NEXT: vpslld $31, %zmm2, %zmm2
	; AVX512F-NEXT: vptestmd %zmm2, %zmm2, %k1			; AVX512F-NEXT: vptestmd %zmm2, %zmm2, %k1
	; AVX512F-NEXT: kshiftrw $8, %k1, %k2
	; AVX512F-NEXT: vcompresspd %zmm0, (%rdi) {%k1}
	; AVX512F-NEXT: kmovw %k1, %eax			; AVX512F-NEXT: kmovw %k1, %eax
	; AVX512F-NEXT: movzbl %al, %eax			; AVX512F-NEXT: movzbl %al, %eax
	; AVX512F-NEXT: movl %eax, %ecx			; AVX512F-NEXT: movl %eax, %ecx
	; AVX512F-NEXT: shrl %ecx			; AVX512F-NEXT: shrl %ecx
	; AVX512F-NEXT: andl $-43, %ecx			; AVX512F-NEXT: andl $-43, %ecx
	; AVX512F-NEXT: subl %ecx, %eax			; AVX512F-NEXT: subl %ecx, %eax
	; AVX512F-NEXT: movl %eax, %ecx			; AVX512F-NEXT: movl %eax, %ecx
	; AVX512F-NEXT: andl $858993459, %ecx ## imm = 0x33333333			; AVX512F-NEXT: andl $858993459, %ecx ## imm = 0x33333333
	; AVX512F-NEXT: shrl $2, %eax			; AVX512F-NEXT: shrl $2, %eax
	; AVX512F-NEXT: andl $858993459, %eax ## imm = 0x33333333			; AVX512F-NEXT: andl $858993459, %eax ## imm = 0x33333333
	; AVX512F-NEXT: addl %ecx, %eax			; AVX512F-NEXT: addl %ecx, %eax
	; AVX512F-NEXT: movl %eax, %ecx			; AVX512F-NEXT: movl %eax, %ecx
	; AVX512F-NEXT: shrl $4, %ecx			; AVX512F-NEXT: shrl $4, %ecx
	; AVX512F-NEXT: addl %eax, %ecx			; AVX512F-NEXT: addl %eax, %ecx
	; AVX512F-NEXT: andl $252645135, %ecx ## imm = 0xF0F0F0F			; AVX512F-NEXT: andl $252645135, %ecx ## imm = 0xF0F0F0F
	; AVX512F-NEXT: imull $16843009, %ecx, %eax ## imm = 0x1010101			; AVX512F-NEXT: imull $16843009, %ecx, %eax ## imm = 0x1010101
	; AVX512F-NEXT: shrl $24, %eax			; AVX512F-NEXT: shrl $24, %eax
				; AVX512F-NEXT: kshiftrw $8, %k1, %k2
	; AVX512F-NEXT: vcompresspd %zmm1, (%rdi,%rax,8) {%k2}			; AVX512F-NEXT: vcompresspd %zmm1, (%rdi,%rax,8) {%k2}
				; AVX512F-NEXT: vcompresspd %zmm0, (%rdi) {%k1}
	; AVX512F-NEXT: vzeroupper			; AVX512F-NEXT: vzeroupper
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VLDQ-LABEL: compressstore_v16f64_v16i1:			; AVX512VLDQ-LABEL: compressstore_v16f64_v16i1:
	; AVX512VLDQ: ## %bb.0:			; AVX512VLDQ: ## %bb.0:
	; AVX512VLDQ-NEXT: vpmovsxbd %xmm2, %zmm2			; AVX512VLDQ-NEXT: vpmovsxbd %xmm2, %zmm2
	; AVX512VLDQ-NEXT: vpslld $31, %zmm2, %zmm2			; AVX512VLDQ-NEXT: vpslld $31, %zmm2, %zmm2
	; AVX512VLDQ-NEXT: vpmovd2m %zmm2, %k1			; AVX512VLDQ-NEXT: vpmovd2m %zmm2, %k1
	Show All 18 Lines
	; AVX512VLDQ-NEXT: vcompresspd %zmm0, (%rdi) {%k1}			; AVX512VLDQ-NEXT: vcompresspd %zmm0, (%rdi) {%k1}
	; AVX512VLDQ-NEXT: vzeroupper			; AVX512VLDQ-NEXT: vzeroupper
	; AVX512VLDQ-NEXT: retq			; AVX512VLDQ-NEXT: retq
	;			;
	; AVX512VLBW-LABEL: compressstore_v16f64_v16i1:			; AVX512VLBW-LABEL: compressstore_v16f64_v16i1:
	; AVX512VLBW: ## %bb.0:			; AVX512VLBW: ## %bb.0:
	; AVX512VLBW-NEXT: vpsllw $7, %xmm2, %xmm2			; AVX512VLBW-NEXT: vpsllw $7, %xmm2, %xmm2
	; AVX512VLBW-NEXT: vpmovb2m %xmm2, %k1			; AVX512VLBW-NEXT: vpmovb2m %xmm2, %k1
	; AVX512VLBW-NEXT: kshiftrw $8, %k1, %k2
	; AVX512VLBW-NEXT: vcompresspd %zmm0, (%rdi) {%k1}
	; AVX512VLBW-NEXT: kmovd %k1, %eax			; AVX512VLBW-NEXT: kmovd %k1, %eax
	; AVX512VLBW-NEXT: movzbl %al, %eax			; AVX512VLBW-NEXT: movzbl %al, %eax
	; AVX512VLBW-NEXT: movl %eax, %ecx			; AVX512VLBW-NEXT: movl %eax, %ecx
	; AVX512VLBW-NEXT: shrl %ecx			; AVX512VLBW-NEXT: shrl %ecx
	; AVX512VLBW-NEXT: andl $-43, %ecx			; AVX512VLBW-NEXT: andl $-43, %ecx
	; AVX512VLBW-NEXT: subl %ecx, %eax			; AVX512VLBW-NEXT: subl %ecx, %eax
	; AVX512VLBW-NEXT: movl %eax, %ecx			; AVX512VLBW-NEXT: movl %eax, %ecx
	; AVX512VLBW-NEXT: andl $858993459, %ecx ## imm = 0x33333333			; AVX512VLBW-NEXT: andl $858993459, %ecx ## imm = 0x33333333
	; AVX512VLBW-NEXT: shrl $2, %eax			; AVX512VLBW-NEXT: shrl $2, %eax
	; AVX512VLBW-NEXT: andl $858993459, %eax ## imm = 0x33333333			; AVX512VLBW-NEXT: andl $858993459, %eax ## imm = 0x33333333
	; AVX512VLBW-NEXT: addl %ecx, %eax			; AVX512VLBW-NEXT: addl %ecx, %eax
	; AVX512VLBW-NEXT: movl %eax, %ecx			; AVX512VLBW-NEXT: movl %eax, %ecx
	; AVX512VLBW-NEXT: shrl $4, %ecx			; AVX512VLBW-NEXT: shrl $4, %ecx
	; AVX512VLBW-NEXT: addl %eax, %ecx			; AVX512VLBW-NEXT: addl %eax, %ecx
	; AVX512VLBW-NEXT: andl $252645135, %ecx ## imm = 0xF0F0F0F			; AVX512VLBW-NEXT: andl $252645135, %ecx ## imm = 0xF0F0F0F
	; AVX512VLBW-NEXT: imull $16843009, %ecx, %eax ## imm = 0x1010101			; AVX512VLBW-NEXT: imull $16843009, %ecx, %eax ## imm = 0x1010101
	; AVX512VLBW-NEXT: shrl $24, %eax			; AVX512VLBW-NEXT: shrl $24, %eax
				; AVX512VLBW-NEXT: kshiftrw $8, %k1, %k2
	; AVX512VLBW-NEXT: vcompresspd %zmm1, (%rdi,%rax,8) {%k2}			; AVX512VLBW-NEXT: vcompresspd %zmm1, (%rdi,%rax,8) {%k2}
				; AVX512VLBW-NEXT: vcompresspd %zmm0, (%rdi) {%k1}
	; AVX512VLBW-NEXT: vzeroupper			; AVX512VLBW-NEXT: vzeroupper
	; AVX512VLBW-NEXT: retq			; AVX512VLBW-NEXT: retq
	call void @llvm.masked.compressstore.v16f64(<16 x double> %V, double* %base, <16 x i1> %mask)			call void @llvm.masked.compressstore.v16f64(<16 x double> %V, double* %base, <16 x i1> %mask)
	ret void			ret void
	}			}

	;			;
	; vXf32			; vXf32
	▲ Show 20 Lines • Show All 3,842 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vmaskmov-offset.ll

	; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	; RUN: llc -mtriple=x86_64-unknown-unknown -mcpu=core-avx2 -stop-after finalize-isel -o - %s \| FileCheck %s			; RUN: llc -mtriple=x86_64-unknown-unknown -mcpu=core-avx2 -stop-after finalize-isel -o - %s \| FileCheck %s

	declare void @llvm.masked.store.v16f32.p0v16f32(<16 x float>, <16 x float>*, i32, <16 x i1>)			declare void @llvm.masked.store.v16f32.p0v16f32(<16 x float>, <16 x float>*, i32, <16 x i1>)
	declare <16 x float> @llvm.masked.load.v16f32.p0v16f32(<16 x float>*, i32, <16 x i1>, <16 x float>)			declare <16 x float> @llvm.masked.load.v16f32.p0v16f32(<16 x float>*, i32, <16 x i1>, <16 x float>)

	define void @test_v16f(<16 x i32> %x) {			define void @test_v16f(<16 x i32> %x) {
	; CHECK-LABEL: name: test_v16f			; CHECK-LABEL: name: test_v16f
	; CHECK: bb.0.bb:			; CHECK: bb.0.bb:
	; CHECK-NEXT: liveins: $ymm0, $ymm1			; CHECK-NEXT: liveins: $ymm0, $ymm1
	; CHECK-NEXT: {{ $}}			; CHECK-NEXT: {{ $}}
	; CHECK-NEXT: [[COPY:%[0-9]+]]:vr256 = COPY $ymm1			; CHECK-NEXT: [[COPY:%[0-9]+]]:vr256 = COPY $ymm1
	; CHECK-NEXT: [[COPY1:%[0-9]+]]:vr256 = COPY $ymm0			; CHECK-NEXT: [[COPY1:%[0-9]+]]:vr256 = COPY $ymm0
	; CHECK-NEXT: [[AVX_SET0_:%[0-9]+]]:vr256 = AVX_SET0			; CHECK-NEXT: [[AVX_SET0_:%[0-9]+]]:vr256 = AVX_SET0
	; CHECK-NEXT: [[VPCMPEQDYrr:%[0-9]+]]:vr256 = VPCMPEQDYrr [[COPY]], [[AVX_SET0_]]			; CHECK-NEXT: [[VPCMPEQDYrr:%[0-9]+]]:vr256 = VPCMPEQDYrr [[COPY]], [[AVX_SET0_]]
	; CHECK-NEXT: [[VPCMPEQDYrr1:%[0-9]+]]:vr256 = VPCMPEQDYrr [[COPY1]], [[AVX_SET0_]]			; CHECK-NEXT: [[VPCMPEQDYrr1:%[0-9]+]]:vr256 = VPCMPEQDYrr [[COPY1]], [[AVX_SET0_]]
	; CHECK-NEXT: [[VMASKMOVPSYrm:%[0-9]+]]:vr256 = VMASKMOVPSYrm [[VPCMPEQDYrr1]], %stack.0.stack_input_vec, 1, $noreg, 0, $noreg :: (load (s256) from %ir.stack_input_vec, align 4)			; CHECK-NEXT: [[VMASKMOVPSYrm:%[0-9]+]]:vr256 = VMASKMOVPSYrm [[VPCMPEQDYrr1]], %stack.0.stack_input_vec, 1, $noreg, 0, $noreg :: (load unknown-size from %ir.stack_input_vec, align 4)
	; CHECK-NEXT: [[VMASKMOVPSYrm1:%[0-9]+]]:vr256 = VMASKMOVPSYrm [[VPCMPEQDYrr]], %stack.0.stack_input_vec, 1, $noreg, 32, $noreg :: (load (s256) from %ir.stack_input_vec + 32, align 4)			; CHECK-NEXT: [[VMASKMOVPSYrm1:%[0-9]+]]:vr256 = VMASKMOVPSYrm [[VPCMPEQDYrr]], %stack.0.stack_input_vec, 1, $noreg, 32, $noreg :: (load unknown-size from %ir.stack_input_vec + 32, align 4)
	; CHECK-NEXT: VMASKMOVPSYmr %stack.1.stack_output_vec, 1, $noreg, 32, $noreg, [[VPCMPEQDYrr]], killed [[VMASKMOVPSYrm1]] :: (store (s256) into %ir.stack_output_vec + 32, align 4)			; CHECK-NEXT: VMASKMOVPSYmr %stack.1.stack_output_vec, 1, $noreg, 32, $noreg, [[VPCMPEQDYrr]], killed [[VMASKMOVPSYrm1]] :: (store unknown-size into %ir.stack_output_vec + 32, align 4)
	; CHECK-NEXT: VMASKMOVPSYmr %stack.1.stack_output_vec, 1, $noreg, 0, $noreg, [[VPCMPEQDYrr1]], killed [[VMASKMOVPSYrm]] :: (store (s256) into %ir.stack_output_vec, align 4)			; CHECK-NEXT: VMASKMOVPSYmr %stack.1.stack_output_vec, 1, $noreg, 0, $noreg, [[VPCMPEQDYrr1]], killed [[VMASKMOVPSYrm]] :: (store unknown-size into %ir.stack_output_vec, align 4)
	; CHECK-NEXT: RET 0			; CHECK-NEXT: RET 0
	bb:			bb:
	%stack_input_vec = alloca <16 x float>, align 64			%stack_input_vec = alloca <16 x float>, align 64
	%stack_output_vec = alloca <16 x float>, align 64			%stack_output_vec = alloca <16 x float>, align 64
	%mask = icmp eq <16 x i32> %x, zeroinitializer			%mask = icmp eq <16 x i32> %x, zeroinitializer
	%masked_loaded_vec = call <16 x float> @llvm.masked.load.v16f32.p0v16f32(<16 x float>* nonnull %stack_input_vec, i32 4, <16 x i1> %mask, <16 x float> undef)			%masked_loaded_vec = call <16 x float> @llvm.masked.load.v16f32.p0v16f32(<16 x float>* nonnull %stack_input_vec, i32 4, <16 x i1> %mask, <16 x float> undef)
	call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> %masked_loaded_vec, <16 x float>* nonnull %stack_output_vec, i32 4, <16 x i1> %mask)			call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> %masked_loaded_vec, <16 x float>* nonnull %stack_output_vec, i32 4, <16 x i1> %mask)
	ret void			ret void
	}			}

	declare void @llvm.masked.store.v8f64.p0v8f64(<8 x double>, <8 x double>*, i32, <8 x i1>)			declare void @llvm.masked.store.v8f64.p0v8f64(<8 x double>, <8 x double>*, i32, <8 x i1>)
	declare <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double>*, i32, <8 x i1>, <8 x double>)			declare <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double>*, i32, <8 x i1>, <8 x double>)

	define void @test_v8d(<8 x i64> %x) {			define void @test_v8d(<8 x i64> %x) {
	; CHECK-LABEL: name: test_v8d			; CHECK-LABEL: name: test_v8d
	; CHECK: bb.0.bb:			; CHECK: bb.0.bb:
	; CHECK-NEXT: liveins: $ymm0, $ymm1			; CHECK-NEXT: liveins: $ymm0, $ymm1
	; CHECK-NEXT: {{ $}}			; CHECK-NEXT: {{ $}}
	; CHECK-NEXT: [[COPY:%[0-9]+]]:vr256 = COPY $ymm1			; CHECK-NEXT: [[COPY:%[0-9]+]]:vr256 = COPY $ymm1
	; CHECK-NEXT: [[COPY1:%[0-9]+]]:vr256 = COPY $ymm0			; CHECK-NEXT: [[COPY1:%[0-9]+]]:vr256 = COPY $ymm0
	; CHECK-NEXT: [[AVX_SET0_:%[0-9]+]]:vr256 = AVX_SET0			; CHECK-NEXT: [[AVX_SET0_:%[0-9]+]]:vr256 = AVX_SET0
	; CHECK-NEXT: [[VPCMPEQQYrr:%[0-9]+]]:vr256 = VPCMPEQQYrr [[COPY]], [[AVX_SET0_]]			; CHECK-NEXT: [[VPCMPEQQYrr:%[0-9]+]]:vr256 = VPCMPEQQYrr [[COPY]], [[AVX_SET0_]]
	; CHECK-NEXT: [[VPCMPEQQYrr1:%[0-9]+]]:vr256 = VPCMPEQQYrr [[COPY1]], [[AVX_SET0_]]			; CHECK-NEXT: [[VPCMPEQQYrr1:%[0-9]+]]:vr256 = VPCMPEQQYrr [[COPY1]], [[AVX_SET0_]]
	; CHECK-NEXT: [[VMASKMOVPDYrm:%[0-9]+]]:vr256 = VMASKMOVPDYrm [[VPCMPEQQYrr1]], %stack.0.stack_input_vec, 1, $noreg, 0, $noreg :: (load (s256) from %ir.stack_input_vec, align 4)			; CHECK-NEXT: [[VMASKMOVPDYrm:%[0-9]+]]:vr256 = VMASKMOVPDYrm [[VPCMPEQQYrr1]], %stack.0.stack_input_vec, 1, $noreg, 0, $noreg :: (load unknown-size from %ir.stack_input_vec, align 4)
	; CHECK-NEXT: [[VMASKMOVPDYrm1:%[0-9]+]]:vr256 = VMASKMOVPDYrm [[VPCMPEQQYrr]], %stack.0.stack_input_vec, 1, $noreg, 32, $noreg :: (load (s256) from %ir.stack_input_vec + 32, align 4)			; CHECK-NEXT: [[VMASKMOVPDYrm1:%[0-9]+]]:vr256 = VMASKMOVPDYrm [[VPCMPEQQYrr]], %stack.0.stack_input_vec, 1, $noreg, 32, $noreg :: (load unknown-size from %ir.stack_input_vec + 32, align 4)
	; CHECK-NEXT: VMASKMOVPDYmr %stack.1.stack_output_vec, 1, $noreg, 32, $noreg, [[VPCMPEQQYrr]], killed [[VMASKMOVPDYrm1]] :: (store (s256) into %ir.stack_output_vec + 32, align 4)			; CHECK-NEXT: VMASKMOVPDYmr %stack.1.stack_output_vec, 1, $noreg, 32, $noreg, [[VPCMPEQQYrr]], killed [[VMASKMOVPDYrm1]] :: (store unknown-size into %ir.stack_output_vec + 32, align 4)
	; CHECK-NEXT: VMASKMOVPDYmr %stack.1.stack_output_vec, 1, $noreg, 0, $noreg, [[VPCMPEQQYrr1]], killed [[VMASKMOVPDYrm]] :: (store (s256) into %ir.stack_output_vec, align 4)			; CHECK-NEXT: VMASKMOVPDYmr %stack.1.stack_output_vec, 1, $noreg, 0, $noreg, [[VPCMPEQQYrr1]], killed [[VMASKMOVPDYrm]] :: (store unknown-size into %ir.stack_output_vec, align 4)
	; CHECK-NEXT: RET 0			; CHECK-NEXT: RET 0
	bb:			bb:
	%stack_input_vec = alloca <8 x double>, align 64			%stack_input_vec = alloca <8 x double>, align 64
	%stack_output_vec = alloca <8 x double>, align 64			%stack_output_vec = alloca <8 x double>, align 64
	%mask = icmp eq <8 x i64> %x, zeroinitializer			%mask = icmp eq <8 x i64> %x, zeroinitializer
	%masked_loaded_vec = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double>* nonnull %stack_input_vec, i32 4, <8 x i1> %mask, <8 x double> undef)			%masked_loaded_vec = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double>* nonnull %stack_input_vec, i32 4, <8 x i1> %mask, <8 x double> undef)
	call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> %masked_loaded_vec, <8 x double>* nonnull %stack_output_vec, i32 4, <8 x i1> %mask)			call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> %masked_loaded_vec, <8 x double>* nonnull %stack_output_vec, i32 4, <8 x i1> %mask)
	ret void			ret void
	Show All 31 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SDAG] Use UnknownSize for masked load/store MMO sizeClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 389123

llvm/include/llvm/CodeGen/MachineFunction.h

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

llvm/lib/Target/Hexagon/HexagonISelLoweringHVX.cpp

llvm/test/CodeGen/Thumb2/mve-masked-store-mmo.ll

llvm/test/CodeGen/X86/masked_compressstore.ll

llvm/test/CodeGen/X86/vmaskmov-offset.ll

[SDAG] Use UnknownSize for masked load/store MMO size
ClosedPublic