This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Support/
-
llvm/
-
Support/
-
TypeSize.h
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
-
LegalizeTypes.h
2/6
LegalizeVectorTypes.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
sve-intrinsics-stores.ll
-
sve-st1-addressing-mode-reg-imm.ll

Differential D84937

[SVE][CodeGen] Fix scalable vector issues in DAGTypeLegalizer::GenWidenVectorStores
ClosedPublic

Authored by david-arm on Jul 30 2020, 5:37 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
efriedma
kmclaughlin
rengolin

Commits

rG6af1677161fb: [SVE][CodeGen] Fix scalable vector issues in DAGTypeLegalizer…

Summary

In DAGTypeLegalizer::GenWidenVectorStores the algorithm assumes it only
ever deals with fixed width types, hence the offsets for each individual
store never take 'vscale' into account. I've changed the main loop in
that function to use TypeSize instead of unsigned for tracking the
remaining store amount and offset increment. In addition, I've changed
the loop to use the new IncrementPointer helper function for updating
the addresses in each iteration, since this handles scalable vector
types.

Whilst fixing this function I also fixed a minor issue in
IncrementPointer whereby we were not adding the no-unsigned-wrap flag
for the add instruction in the same way as the fixed width case does.

Also, I've added a report_fatal_error in GenWidenVectorTruncStores,
since this code currently uses a sequence of element-by-element scalar
stores.

I've added new tests in

CodeGen/AArch64/sve-intrinsics-stores.ll
CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll

for the changes in GenWidenVectorStores.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

david-arm created this revision.Jul 30 2020, 5:37 AM

Herald added a reviewer: rengolin. · View Herald TranscriptJul 30 2020, 5:37 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, psnobl, hiraditya and 2 others. · View Herald Transcript

david-arm requested review of this revision.Jul 30 2020, 5:37 AM

david-arm edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B66376: Diff 281890.Jul 30 2020, 6:15 AM

The existing algorithm is too complex to adapt to take scalable vectors into account

Really? At first glance, I don't see any obvious reason the original algorithm wouldn't just work, assuming you treat all the offsets/sizes as scaled values.

HI @efriedma. When I first tried fixing this issue I did originally try adapting the original loop, but the patch gradually grew to be bigger and bigger in order to support cases that I believe could never happen. So the original loop conceptually allows for the possibility of fixed width vector and scalar stores being mixed together in any order. By adding scalable vector stores into the mix we then have to support the possibility of scalable vector stores, fixed width vector stores and scalar stores happening in any combination, any order. That even means changing the existing scalar and fixed width stores to support polynomial offsets, since we could have mixtures of fixed and scaled offsets. In addition, I had to change all the parameters such as StWidth, Offset, etc. to TypeSize, then add new "-=" operators to the TypeSize class. So I began to think this seemed like a much more invasive change than was necessary because in reality we can only ever use legal scalable vector stores to store out wider scalable vector values. However, if you think there is real value in adapting the original loop to allow scalable vectors I am happy to take another look?

Hi @efriedma, I'm now trying to see if I can adapt the original loop by changing all unsigned sizes and offsets to TypeSize, adding -= and += operators to TypeSize, and adding various asserts in the scalar stores and fixed width stores to ensure we haven't got any scaled offsets.

david-arm updated this revision to Diff 282580.Aug 3 2020, 4:37 AM

david-arm edited the summary of this revision. (Show Details)

It looks like this turned out pretty clean. I like the new += and -= operators.

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
5172–5175	TmpSt isn't a great name.

Also, can you add a test for storing something like <vscale x 6 x float>?

david-arm updated this revision to Diff 282808.Aug 4 2020, 12:05 AM

Hi @efriedma, yeah when I first started this patch I did try rewriting the original loop, but for some reason my first attempt didn't seem so clean. However, the second attempt does look a lot better!

Renamed TmpSt -> PartStore
Added tuple tests for floats.

I actually wanted <vscale x 6 x float>, not <vscale x 12 x float>. The difference being that it exercises the path where the two stores have different types. (I guess it might be a little tricky to construct one without tripping over other issues; I think you should be able to use a splat?)

Hi @efriedma, I tried adding a test as you suggested, but hit other errors. For this test:

define void @store_nxv6f32(<vscale x 6 x float>* %out) {

%ins = insertelement <vscale x 6 x float> undef, float 1.0, i32 0
%splat = shufflevector <vscale x 6 x float> %ins, <vscale x 6 x float> undef, <vscale x 6 x i32> zeroinitializer
store <vscale x 6 x float> %splat, <vscale x 6 x float>* %out
ret void

}

I hit this error:

WidenVectorResult #0: t8: nxv6f32 = splat_vector ConstantFP:f32<1.000000e+00>

Do not know how to widen the result of this operator!
UNREACHABLE executed at /home/davshe01/upstream/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp:2735!

Would like me to try fixing this first in a different patch or part of this patch?

I thought I fixed that in D84706; try rebasing on trunk?

Hi @efriedma I tried rebasing and I do get a little further, but then fail at isel:

LLVM ERROR: Cannot select: t22: ch = store<(store unknown-size, align 32)> t0, t27, t19, undef:i64

t27: nxv2f32 = extract_subvector t28, Constant:i64<0>
  t28: nxv4f32 = AArch64ISD::DUP ConstantFP:f32<1.000000e+00>
    t4: f32 = ConstantFP<1.000000e+00>
  t14: i64 = Constant<0>
t19: i64 = add t2, t18
  t2: i64,ch = CopyFromReg t0, Register:i64 %0
    t1: i64 = Register %0
  t18: i64 = vscale Constant:i64<16>
    t17: i64 = Constant<16>
t9: i64 = undef

In function: store_nxv6f32

I can try to fix this as part of this change if you want? Alternatively, this could be done in a parent patch that adds support for storing <vscale x 2 x float>if you think this is worthwhile? The alignment also looks wrong for the store.

For the store itself, maybe worth doing as a parent patch? Probably a small change.

The alignment is a bug in this patch, I think. The usage of getOriginalAlign() assumes that the MachinePointerInfo has an offset. If it doesn't have one, we need to reduce the alignment.

david-arm added a parent revision: D85441: [SVE][CodeGen] Fix bug with store of unpacked FP scalable vectors.Aug 6 2020, 9:02 AM

david-arm added a parent revision: D85516: [SVE][CodeGen] Fix issues with EXTRACT_SUBVECTOR when using scalable FP vectors.Aug 7 2020, 4:54 AM

david-arm updated this revision to Diff 283871.Aug 7 2020, 4:59 AM

david-arm edited the summary of this revision. (Show Details)

Hi @efriedma, in this patch I've:

Added another parent patch that fixes extract_subvector issues, which are needed for the <vscale x 6 x float> tests (without them the tests hit isel lowering errors)
Added alignment info for the case where the MachinePointerInfo is null.

efriedma added inline comments.Aug 7 2020, 12:24 PM

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
5170	getPrefTypeAlign() isn't related to anything relevant. We need to compute the common alignment based on the original alignment, the offset of the original MachinePointerInfo, and the current offset.

efriedma added inline comments.Aug 7 2020, 12:28 PM

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
5170	Something like `commonAlignment(ST->getAlign(), Idx * ValEltWIdth)`, I guess? Maybe some better way to compute the offset so far from the loop.

david-arm added inline comments.Aug 10 2020, 1:33 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
5170	OK, I can try that, but I'm not sure what to do with the offset? The loop currently (in theory) permits arbitrary offsets containing any combination of scalable vectors, fixed vectors and scalar types. I could re-introduce the Offset variable that I tried to eliminate from the previous code and keep track of that around the loop? For the case of scaled (or indeed polynomial) offsets I'm not sure what I would pass to commonAlignment though as I can't simply pass in Offset.getKnownMinSize() in this case. I could do something like this perhaps: commonAlignment(ST->getOriginalAlign(), Offset.isScalable() ? <something> : Offset.getFixedSize()); where in place of <something> I could introduce a new target query function that asks the target what the alignment should be? Or do you think it's better to leave the alignment unchanged like in the original code, i.e. always pass in ST->getOriginalAlign()?

efriedma added inline comments.Aug 10 2020, 12:59 PM

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
5170	commonAlignment just computes the GCD (greatest common divisor) of the operands. Passing in Offset.getKnownMinSize() does the right thing: the actual offset has to be a multiple of the Offset.getKnownMinSize(), so the resulting alignment should be conservatively correct.

david-arm updated this revision to Diff 284745.Aug 11 2020, 8:30 AM

efriedma added inline comments.Aug 11 2020, 1:58 PM

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
5168	getAlign(), not getOriginalAlign(), I think? Try writing a testcase where the store is at some offset inside a global variable, or something like that, and the difference should be clear. Also, ideally, we'd continue to use plain getOriginalAlign() for the non-scalable case. This dance with the alignment is only necessary because we can't specify a scalable offset in the MachinePointerInfo. Maybe IncrementPointer should be involved in this somehow?

Added optional ScaledOffset parameter to IncrementPointer to keep a running total of the increments.
Returned alignment for non-scalable vectors back to using ST->getOriginalAlign(). For scalable vectors we use the common alignment of ST->getAlign() and the ScaledOffset.

Hi @efriedma, hopefully I've addressed your comments! I've tried to make use of IncrementPointer as you suggested to track the scaled offsets, then only do something different for the alignment for the scalable case.

LGTM

This revision is now accepted and ready to land.Aug 12 2020, 2:44 PM

Closed by commit rG6af1677161fb: [SVE][CodeGen] Fix scalable vector issues in DAGTypeLegalizer… (authored by david-arm). · Explain WhyAug 13 2020, 3:22 AM

This revision was automatically updated to reflect the committed changes.

david-arm added a commit: rG6af1677161fb: [SVE][CodeGen] Fix scalable vector issues in DAGTypeLegalizer….

Revision Contents

Path

Size

llvm/

include/

llvm/

Support/

TypeSize.h

14 lines

lib/

CodeGen/

SelectionDAG/

LegalizeTypes.h

4 lines

LegalizeVectorTypes.cpp

74 lines

test/

CodeGen/

AArch64/

sve-intrinsics-stores.ll

72 lines

sve-st1-addressing-mode-reg-imm.ll

34 lines

Diff 285308

llvm/include/llvm/Support/TypeSize.h

Show First 20 Lines • Show All 125 Lines • ▼ Show 20 Lines	public:
friend TypeSize operator*(const unsigned LHS, const TypeSize &RHS) {		friend TypeSize operator*(const unsigned LHS, const TypeSize &RHS) {
return { LHS * RHS.MinSize, RHS.IsScalable };		return { LHS * RHS.MinSize, RHS.IsScalable };
}		}

TypeSize operator/(unsigned RHS) const {		TypeSize operator/(unsigned RHS) const {
return { MinSize / RHS, IsScalable };		return { MinSize / RHS, IsScalable };
}		}

		TypeSize &operator-=(TypeSize RHS) {
		assert(IsScalable == RHS.IsScalable &&
		"Subtraction using mixed scalable and fixed types");
		MinSize -= RHS.MinSize;
		return *this;
		}

		TypeSize &operator+=(TypeSize RHS) {
		assert(IsScalable == RHS.IsScalable &&
		"Addition using mixed scalable and fixed types");
		MinSize += RHS.MinSize;
		return *this;
		}

// Return the minimum size with the assumption that the size is exact.		// Return the minimum size with the assumption that the size is exact.
// Use in places where a scalable size doesn't make sense (e.g. non-vector		// Use in places where a scalable size doesn't make sense (e.g. non-vector
// types, or vectors in backends which don't support scalable vectors).		// types, or vectors in backends which don't support scalable vectors).
uint64_t getFixedSize() const {		uint64_t getFixedSize() const {
assert(!IsScalable && "Request for a fixed size on a scalable object");		assert(!IsScalable && "Request for a fixed size on a scalable object");
return MinSize;		return MinSize;
}		}

▲ Show 20 Lines • Show All 125 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h

Show First 20 Lines • Show All 774 Lines • ▼ Show 20 Lines	private:
/// For example, if Op is a v8i32 that was split into two v4i32's, then this		/// For example, if Op is a v8i32 that was split into two v4i32's, then this
/// method returns the two v4i32's, with Lo corresponding to the first 4		/// method returns the two v4i32's, with Lo corresponding to the first 4
/// elements of Op, and Hi to the last 4 elements.		/// elements of Op, and Hi to the last 4 elements.
void GetSplitVector(SDValue Op, SDValue &Lo, SDValue &Hi);		void GetSplitVector(SDValue Op, SDValue &Lo, SDValue &Hi);
void SetSplitVector(SDValue Op, SDValue Lo, SDValue Hi);		void SetSplitVector(SDValue Op, SDValue Lo, SDValue Hi);

// Helper function for incrementing the pointer when splitting		// Helper function for incrementing the pointer when splitting
// memory operations		// memory operations
void IncrementPointer(MemSDNode *N, EVT MemVT,		void IncrementPointer(MemSDNode *N, EVT MemVT, MachinePointerInfo &MPI,
MachinePointerInfo &MPI, SDValue &Ptr);		SDValue &Ptr, uint64_t *ScaledOffset = nullptr);

// Vector Result Splitting: <128 x ty> -> 2 x <64 x ty>.		// Vector Result Splitting: <128 x ty> -> 2 x <64 x ty>.
void SplitVectorResult(SDNode *N, unsigned ResNo);		void SplitVectorResult(SDNode *N, unsigned ResNo);
void SplitVecRes_BinOp(SDNode *N, SDValue &Lo, SDValue &Hi);		void SplitVecRes_BinOp(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_TernaryOp(SDNode *N, SDValue &Lo, SDValue &Hi);		void SplitVecRes_TernaryOp(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_UnaryOp(SDNode *N, SDValue &Lo, SDValue &Hi);		void SplitVecRes_UnaryOp(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_ExtendOp(SDNode *N, SDValue &Lo, SDValue &Hi);		void SplitVecRes_ExtendOp(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_InregOp(SDNode *N, SDValue &Lo, SDValue &Hi);		void SplitVecRes_InregOp(SDNode *N, SDValue &Lo, SDValue &Hi);
▲ Show 20 Lines • Show All 238 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

Show First 20 Lines • Show All 978 Lines • ▼ Show 20 Lines	#include "llvm/IR/ConstrainedOps.def"
}		}

// If Lo/Hi is null, the sub-method took care of registering results etc.		// If Lo/Hi is null, the sub-method took care of registering results etc.
if (Lo.getNode())		if (Lo.getNode())
SetSplitVector(SDValue(N, ResNo), Lo, Hi);		SetSplitVector(SDValue(N, ResNo), Lo, Hi);
}		}

void DAGTypeLegalizer::IncrementPointer(MemSDNode *N, EVT MemVT,		void DAGTypeLegalizer::IncrementPointer(MemSDNode *N, EVT MemVT,
MachinePointerInfo &MPI,		MachinePointerInfo &MPI, SDValue &Ptr,
SDValue &Ptr) {		uint64_t *ScaledOffset) {
SDLoc DL(N);		SDLoc DL(N);
unsigned IncrementSize = MemVT.getSizeInBits().getKnownMinSize() / 8;		unsigned IncrementSize = MemVT.getSizeInBits().getKnownMinSize() / 8;

if (MemVT.isScalableVector()) {		if (MemVT.isScalableVector()) {
		SDNodeFlags Flags;
SDValue BytesIncrement = DAG.getVScale(		SDValue BytesIncrement = DAG.getVScale(
DL, Ptr.getValueType(),		DL, Ptr.getValueType(),
APInt(Ptr.getValueSizeInBits().getFixedSize(), IncrementSize));		APInt(Ptr.getValueSizeInBits().getFixedSize(), IncrementSize));
MPI = MachinePointerInfo(N->getPointerInfo().getAddrSpace());		MPI = MachinePointerInfo(N->getPointerInfo().getAddrSpace());
		Flags.setNoUnsignedWrap(true);
		if (ScaledOffset)
		*ScaledOffset += IncrementSize;
Ptr = DAG.getNode(ISD::ADD, DL, Ptr.getValueType(), Ptr, BytesIncrement);		Ptr = DAG.getNode(ISD::ADD, DL, Ptr.getValueType(), Ptr, BytesIncrement);
} else {		} else {
MPI = N->getPointerInfo().getWithOffset(IncrementSize);		MPI = N->getPointerInfo().getWithOffset(IncrementSize);
// Increment the pointer to the other half.		// Increment the pointer to the other half.
Ptr = DAG.getObjectPtrOffset(DL, Ptr, TypeSize::Fixed(IncrementSize));		Ptr = DAG.getObjectPtrOffset(DL, Ptr, TypeSize::Fixed(IncrementSize));
}		}
}		}

▲ Show 20 Lines • Show All 3,834 Lines • ▼ Show 20 Lines	static EVT FindMemType(SelectionDAG& DAG, const TargetLowering &TLI,
EVT WidenEltVT = WidenVT.getVectorElementType();		EVT WidenEltVT = WidenVT.getVectorElementType();
const bool Scalable = WidenVT.isScalableVector();		const bool Scalable = WidenVT.isScalableVector();
unsigned WidenWidth = WidenVT.getSizeInBits().getKnownMinSize();		unsigned WidenWidth = WidenVT.getSizeInBits().getKnownMinSize();
unsigned WidenEltWidth = WidenEltVT.getSizeInBits();		unsigned WidenEltWidth = WidenEltVT.getSizeInBits();
unsigned AlignInBits = Align*8;		unsigned AlignInBits = Align*8;

// If we have one element to load/store, return it.		// If we have one element to load/store, return it.
EVT RetVT = WidenEltVT;		EVT RetVT = WidenEltVT;
if (Width == WidenEltWidth)		if (!Scalable && Width == WidenEltWidth)
return RetVT;		return RetVT;

// See if there is larger legal integer than the element type to load/store.		// See if there is larger legal integer than the element type to load/store.
unsigned VT;		unsigned VT;
// Don't bother looking for an integer type if the vector is scalable, skip		// Don't bother looking for an integer type if the vector is scalable, skip
// to vector types.		// to vector types.
if (!Scalable) {		if (!Scalable) {
for (VT = (unsigned)MVT::LAST_INTEGER_VALUETYPE;		for (VT = (unsigned)MVT::LAST_INTEGER_VALUETYPE;
▲ Show 20 Lines • Show All 278 Lines • ▼ Show 20 Lines	void DAGTypeLegalizer::GenWidenVectorStores(SmallVectorImpl<SDValue> &StChain,
SDValue Chain = ST->getChain();		SDValue Chain = ST->getChain();
SDValue BasePtr = ST->getBasePtr();		SDValue BasePtr = ST->getBasePtr();
MachineMemOperand::Flags MMOFlags = ST->getMemOperand()->getFlags();		MachineMemOperand::Flags MMOFlags = ST->getMemOperand()->getFlags();
AAMDNodes AAInfo = ST->getAAInfo();		AAMDNodes AAInfo = ST->getAAInfo();
SDValue ValOp = GetWidenedVector(ST->getValue());		SDValue ValOp = GetWidenedVector(ST->getValue());
SDLoc dl(ST);		SDLoc dl(ST);

EVT StVT = ST->getMemoryVT();		EVT StVT = ST->getMemoryVT();
unsigned StWidth = StVT.getSizeInBits();		TypeSize StWidth = StVT.getSizeInBits();
EVT ValVT = ValOp.getValueType();		EVT ValVT = ValOp.getValueType();
unsigned ValWidth = ValVT.getSizeInBits();		TypeSize ValWidth = ValVT.getSizeInBits();
EVT ValEltVT = ValVT.getVectorElementType();		EVT ValEltVT = ValVT.getVectorElementType();
unsigned ValEltWidth = ValEltVT.getSizeInBits();		unsigned ValEltWidth = ValEltVT.getSizeInBits().getFixedSize();
assert(StVT.getVectorElementType() == ValEltVT);		assert(StVT.getVectorElementType() == ValEltVT);
		assert(StVT.isScalableVector() == ValVT.isScalableVector() &&
		"Mismatch between store and value types");

int Idx = 0; // current index to store		int Idx = 0; // current index to store
unsigned Offset = 0; // offset from base to store
while (StWidth != 0) {		MachinePointerInfo MPI = ST->getPointerInfo();
		uint64_t ScaledOffset = 0;
		while (StWidth.isNonZero()) {
// Find the largest vector type we can store with.		// Find the largest vector type we can store with.
EVT NewVT = FindMemType(DAG, TLI, StWidth, ValVT);		EVT NewVT = FindMemType(DAG, TLI, StWidth.getKnownMinSize(), ValVT);
unsigned NewVTWidth = NewVT.getSizeInBits();		TypeSize NewVTWidth = NewVT.getSizeInBits();
unsigned Increment = NewVTWidth / 8;
if (NewVT.isVector()) {		if (NewVT.isVector()) {
unsigned NumVTElts = NewVT.getVectorNumElements();		unsigned NumVTElts = NewVT.getVectorMinNumElements();
do {		do {
		Align NewAlign = ScaledOffset == 0
		? ST->getOriginalAlign()
		efriedmaUnsubmitted Not Done Reply Inline Actions getAlign(), not getOriginalAlign(), I think? Try writing a testcase where the store is at some offset inside a global variable, or something like that, and the difference should be clear. Also, ideally, we'd continue to use plain getOriginalAlign() for the non-scalable case. This dance with the alignment is only necessary because we can't specify a scalable offset in the MachinePointerInfo. Maybe IncrementPointer should be involved in this somehow? efriedma: getAlign(), not getOriginalAlign(), I think? Try writing a testcase where the store is at some…
		: commonAlignment(ST->getAlign(), ScaledOffset);
SDValue EOp = DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, NewVT, ValOp,		SDValue EOp = DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, NewVT, ValOp,
		efriedmaUnsubmitted Not Done Reply Inline Actions getPrefTypeAlign() isn't related to anything relevant. We need to compute the common alignment based on the original alignment, the offset of the original MachinePointerInfo, and the current offset. efriedma: getPrefTypeAlign() isn't related to anything relevant. We need to compute the common alignment…
		efriedmaUnsubmitted Not Done Reply Inline Actions Something like `commonAlignment(ST->getAlign(), Idx * ValEltWIdth)`, I guess? Maybe some better way to compute the offset so far from the loop. efriedma: Something like `commonAlignment(ST->getAlign(), Idx * ValEltWIdth)`, I guess? Maybe some…
		david-armAuthorUnsubmitted Done Reply Inline Actions OK, I can try that, but I'm not sure what to do with the offset? The loop currently (in theory) permits arbitrary offsets containing any combination of scalable vectors, fixed vectors and scalar types. I could re-introduce the Offset variable that I tried to eliminate from the previous code and keep track of that around the loop? For the case of scaled (or indeed polynomial) offsets I'm not sure what I would pass to commonAlignment though as I can't simply pass in Offset.getKnownMinSize() in this case. I could do something like this perhaps: commonAlignment(ST->getOriginalAlign(), Offset.isScalable() ? <something> : Offset.getFixedSize()); where in place of <something> I could introduce a new target query function that asks the target what the alignment should be? Or do you think it's better to leave the alignment unchanged like in the original code, i.e. always pass in ST->getOriginalAlign()? david-arm: OK, I can try that, but I'm not sure what to do with the offset? The loop currently (in theory)…
		efriedmaUnsubmitted Not Done Reply Inline Actions commonAlignment just computes the GCD (greatest common divisor) of the operands. Passing in Offset.getKnownMinSize() does the right thing: the actual offset has to be a multiple of the Offset.getKnownMinSize(), so the resulting alignment should be conservatively correct. efriedma: commonAlignment just computes the GCD (greatest common divisor) of the operands. Passing in…
DAG.getVectorIdxConstant(Idx, dl));		DAG.getVectorIdxConstant(Idx, dl));
StChain.push_back(DAG.getStore(		SDValue PartStore = DAG.getStore(Chain, dl, EOp, BasePtr, MPI, NewAlign,
Chain, dl, EOp, BasePtr, ST->getPointerInfo().getWithOffset(Offset),		MMOFlags, AAInfo);
ST->getOriginalAlign(), MMOFlags, AAInfo));		StChain.push_back(PartStore);

		efriedmaUnsubmitted Done Reply Inline Actions TmpSt isn't a great name. efriedma: TmpSt isn't a great name.
StWidth -= NewVTWidth;		StWidth -= NewVTWidth;
Offset += Increment;
Idx += NumVTElts;		Idx += NumVTElts;

BasePtr =		IncrementPointer(cast<StoreSDNode>(PartStore), NewVT, MPI, BasePtr,
DAG.getObjectPtrOffset(dl, BasePtr, TypeSize::Fixed(Increment));		&ScaledOffset);
} while (StWidth != 0 && StWidth >= NewVTWidth);		} while (StWidth.isNonZero() && StWidth >= NewVTWidth);
} else {		} else {
// Cast the vector to the scalar type we can store.		// Cast the vector to the scalar type we can store.
unsigned NumElts = ValWidth / NewVTWidth;		unsigned NumElts = ValWidth.getFixedSize() / NewVTWidth.getFixedSize();
EVT NewVecVT = EVT::getVectorVT(*DAG.getContext(), NewVT, NumElts);		EVT NewVecVT = EVT::getVectorVT(*DAG.getContext(), NewVT, NumElts);
SDValue VecOp = DAG.getNode(ISD::BITCAST, dl, NewVecVT, ValOp);		SDValue VecOp = DAG.getNode(ISD::BITCAST, dl, NewVecVT, ValOp);
// Readjust index position based on new vector type.		// Readjust index position based on new vector type.
Idx = Idx * ValEltWidth / NewVTWidth;		Idx = Idx * ValEltWidth / NewVTWidth.getFixedSize();
do {		do {
SDValue EOp = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, NewVT, VecOp,		SDValue EOp = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, NewVT, VecOp,
DAG.getVectorIdxConstant(Idx++, dl));		DAG.getVectorIdxConstant(Idx++, dl));
StChain.push_back(DAG.getStore(		SDValue PartStore =
Chain, dl, EOp, BasePtr, ST->getPointerInfo().getWithOffset(Offset),		DAG.getStore(Chain, dl, EOp, BasePtr, MPI, ST->getOriginalAlign(),
ST->getOriginalAlign(), MMOFlags, AAInfo));		MMOFlags, AAInfo);
		StChain.push_back(PartStore);

StWidth -= NewVTWidth;		StWidth -= NewVTWidth;
Offset += Increment;		IncrementPointer(cast<StoreSDNode>(PartStore), NewVT, MPI, BasePtr);
BasePtr =		} while (StWidth.isNonZero() && StWidth >= NewVTWidth);
DAG.getObjectPtrOffset(dl, BasePtr, TypeSize::Fixed(Increment));
} while (StWidth != 0 && StWidth >= NewVTWidth);
// Restore index back to be relative to the original widen element type.		// Restore index back to be relative to the original widen element type.
Idx = Idx * NewVTWidth / ValEltWidth;		Idx = Idx * NewVTWidth.getFixedSize() / ValEltWidth;
}		}
}		}
}		}

void		void
DAGTypeLegalizer::GenWidenVectorTruncStores(SmallVectorImpl<SDValue> &StChain,		DAGTypeLegalizer::GenWidenVectorTruncStores(SmallVectorImpl<SDValue> &StChain,
StoreSDNode *ST) {		StoreSDNode *ST) {
// For extension loads, it may not be more efficient to truncate the vector		// For extension loads, it may not be more efficient to truncate the vector
// and then store it. Instead, we extract each element and then store it.		// and then store it. Instead, we extract each element and then store it.
SDValue Chain = ST->getChain();		SDValue Chain = ST->getChain();
SDValue BasePtr = ST->getBasePtr();		SDValue BasePtr = ST->getBasePtr();
MachineMemOperand::Flags MMOFlags = ST->getMemOperand()->getFlags();		MachineMemOperand::Flags MMOFlags = ST->getMemOperand()->getFlags();
AAMDNodes AAInfo = ST->getAAInfo();		AAMDNodes AAInfo = ST->getAAInfo();
SDValue ValOp = GetWidenedVector(ST->getValue());		SDValue ValOp = GetWidenedVector(ST->getValue());
SDLoc dl(ST);		SDLoc dl(ST);

EVT StVT = ST->getMemoryVT();		EVT StVT = ST->getMemoryVT();
EVT ValVT = ValOp.getValueType();		EVT ValVT = ValOp.getValueType();

// It must be true that the wide vector type is bigger than where we need to		// It must be true that the wide vector type is bigger than where we need to
// store.		// store.
assert(StVT.isVector() && ValOp.getValueType().isVector());		assert(StVT.isVector() && ValOp.getValueType().isVector());
		assert(StVT.isScalableVector() == ValOp.getValueType().isScalableVector());
assert(StVT.bitsLT(ValOp.getValueType()));		assert(StVT.bitsLT(ValOp.getValueType()));

		if (StVT.isScalableVector())
		report_fatal_error("Generating widen scalable vector truncating stores not "
		"yet supported");

// For truncating stores, we can not play the tricks of chopping legal vector		// For truncating stores, we can not play the tricks of chopping legal vector
// types and bitcast it to the right type. Instead, we unroll the store.		// types and bitcast it to the right type. Instead, we unroll the store.
EVT StEltVT = StVT.getVectorElementType();		EVT StEltVT = StVT.getVectorElementType();
EVT ValEltVT = ValVT.getVectorElementType();		EVT ValEltVT = ValVT.getVectorElementType();
unsigned Increment = ValEltVT.getSizeInBits() / 8;		unsigned Increment = ValEltVT.getSizeInBits() / 8;
unsigned NumElts = StVT.getVectorNumElements();		unsigned NumElts = StVT.getVectorNumElements();
SDValue EOp = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, ValEltVT, ValOp,		SDValue EOp = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, ValEltVT, ValOp,
DAG.getVectorIdxConstant(0, dl));		DAG.getVectorIdxConstant(0, dl));
▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-intrinsics-stores.ll

	Show First 20 Lines • Show All 431 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	call void @llvm.aarch64.sve.stnt1.nxv2f64(<vscale x 2 x double> %data,			call void @llvm.aarch64.sve.stnt1.nxv2f64(<vscale x 2 x double> %data,
	<vscale x 2 x i1> %pred,			<vscale x 2 x i1> %pred,
	double* %addr)			double* %addr)
	ret void			ret void
	}			}


				; Stores (tuples)

				define void @store_i64_tuple3(<vscale x 6 x i64>* %out, <vscale x 2 x i64> %in1, <vscale x 2 x i64> %in2, <vscale x 2 x i64> %in3) {
				; CHECK-LABEL: store_i64_tuple3
				; CHECK: st1d { z2.d }, p0, [x0, #2, mul vl]
				; CHECK-NEXT: st1d { z1.d }, p0, [x0, #1, mul vl]
				; CHECK-NEXT: st1d { z0.d }, p0, [x0]
				%tuple = tail call <vscale x 6 x i64> @llvm.aarch64.sve.tuple.create3.nxv6i64.nxv2i64(<vscale x 2 x i64> %in1, <vscale x 2 x i64> %in2, <vscale x 2 x i64> %in3)
				store <vscale x 6 x i64> %tuple, <vscale x 6 x i64>* %out
				ret void
				}

				define void @store_i64_tuple4(<vscale x 8 x i64>* %out, <vscale x 2 x i64> %in1, <vscale x 2 x i64> %in2, <vscale x 2 x i64> %in3, <vscale x 2 x i64> %in4) {
				; CHECK-LABEL: store_i64_tuple4
				; CHECK: st1d { z3.d }, p0, [x0, #3, mul vl]
				; CHECK-NEXT: st1d { z2.d }, p0, [x0, #2, mul vl]
				; CHECK-NEXT: st1d { z1.d }, p0, [x0, #1, mul vl]
				; CHECK-NEXT: st1d { z0.d }, p0, [x0]
				%tuple = tail call <vscale x 8 x i64> @llvm.aarch64.sve.tuple.create4.nxv8i64.nxv2i64(<vscale x 2 x i64> %in1, <vscale x 2 x i64> %in2, <vscale x 2 x i64> %in3, <vscale x 2 x i64> %in4)
				store <vscale x 8 x i64> %tuple, <vscale x 8 x i64>* %out
				ret void
				}

				define void @store_i16_tuple2(<vscale x 16 x i16>* %out, <vscale x 8 x i16> %in1, <vscale x 8 x i16> %in2) {
				; CHECK-LABEL: store_i16_tuple2
				; CHECK: st1h { z1.h }, p0, [x0, #1, mul vl]
				; CHECK-NEXT: st1h { z0.h }, p0, [x0]
				%tuple = tail call <vscale x 16 x i16> @llvm.aarch64.sve.tuple.create2.nxv16i16.nxv8i16(<vscale x 8 x i16> %in1, <vscale x 8 x i16> %in2)
				store <vscale x 16 x i16> %tuple, <vscale x 16 x i16>* %out
				ret void
				}

				define void @store_i16_tuple3(<vscale x 24 x i16>* %out, <vscale x 8 x i16> %in1, <vscale x 8 x i16> %in2, <vscale x 8 x i16> %in3) {
				; CHECK-LABEL: store_i16_tuple3
				; CHECK: st1h { z2.h }, p0, [x0, #2, mul vl]
				; CHECK-NEXT: st1h { z1.h }, p0, [x0, #1, mul vl]
				; CHECK-NEXT: st1h { z0.h }, p0, [x0]
				%tuple = tail call <vscale x 24 x i16> @llvm.aarch64.sve.tuple.create3.nxv24i16.nxv8i16(<vscale x 8 x i16> %in1, <vscale x 8 x i16> %in2, <vscale x 8 x i16> %in3)
				store <vscale x 24 x i16> %tuple, <vscale x 24 x i16>* %out
				ret void
				}

				define void @store_f32_tuple3(<vscale x 12 x float>* %out, <vscale x 4 x float> %in1, <vscale x 4 x float> %in2, <vscale x 4 x float> %in3) {
				; CHECK-LABEL: store_f32_tuple3
				; CHECK: st1w { z2.s }, p0, [x0, #2, mul vl]
				; CHECK-NEXT: st1w { z1.s }, p0, [x0, #1, mul vl]
				; CHECK-NEXT: st1w { z0.s }, p0, [x0]
				%tuple = tail call <vscale x 12 x float> @llvm.aarch64.sve.tuple.create3.nxv12f32.nxv4f32(<vscale x 4 x float> %in1, <vscale x 4 x float> %in2, <vscale x 4 x float> %in3)
				store <vscale x 12 x float> %tuple, <vscale x 12 x float>* %out
				ret void
				}

				define void @store_f32_tuple4(<vscale x 16 x float>* %out, <vscale x 4 x float> %in1, <vscale x 4 x float> %in2, <vscale x 4 x float> %in3, <vscale x 4 x float> %in4) {
				; CHECK-LABEL: store_f32_tuple4
				; CHECK: st1w { z3.s }, p0, [x0, #3, mul vl]
				; CHECK-NEXT: st1w { z2.s }, p0, [x0, #2, mul vl]
				; CHECK-NEXT: st1w { z1.s }, p0, [x0, #1, mul vl]
				; CHECK-NEXT: st1w { z0.s }, p0, [x0]
				%tuple = tail call <vscale x 16 x float> @llvm.aarch64.sve.tuple.create4.nxv16f32.nxv4f32(<vscale x 4 x float> %in1, <vscale x 4 x float> %in2, <vscale x 4 x float> %in3, <vscale x 4 x float> %in4)
				store <vscale x 16 x float> %tuple, <vscale x 16 x float>* %out
				ret void
				}

	declare void @llvm.aarch64.sve.st2.nxv16i8(<vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i1>, i8*)			declare void @llvm.aarch64.sve.st2.nxv16i8(<vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i1>, i8*)
	declare void @llvm.aarch64.sve.st2.nxv8i16(<vscale x 8 x i16>, <vscale x 8 x i16>, <vscale x 8 x i1>, i16*)			declare void @llvm.aarch64.sve.st2.nxv8i16(<vscale x 8 x i16>, <vscale x 8 x i16>, <vscale x 8 x i1>, i16*)
	declare void @llvm.aarch64.sve.st2.nxv4i32(<vscale x 4 x i32>, <vscale x 4 x i32>, <vscale x 4 x i1>, i32*)			declare void @llvm.aarch64.sve.st2.nxv4i32(<vscale x 4 x i32>, <vscale x 4 x i32>, <vscale x 4 x i1>, i32*)
	declare void @llvm.aarch64.sve.st2.nxv2i64(<vscale x 2 x i64>, <vscale x 2 x i64>, <vscale x 2 x i1>, i64*)			declare void @llvm.aarch64.sve.st2.nxv2i64(<vscale x 2 x i64>, <vscale x 2 x i64>, <vscale x 2 x i1>, i64*)
	declare void @llvm.aarch64.sve.st2.nxv8f16(<vscale x 8 x half>, <vscale x 8 x half>, <vscale x 8 x i1>, half*)			declare void @llvm.aarch64.sve.st2.nxv8f16(<vscale x 8 x half>, <vscale x 8 x half>, <vscale x 8 x i1>, half*)
	declare void @llvm.aarch64.sve.st2.nxv8bf16(<vscale x 8 x bfloat>, <vscale x 8 x bfloat>, <vscale x 8 x i1>, bfloat*)			declare void @llvm.aarch64.sve.st2.nxv8bf16(<vscale x 8 x bfloat>, <vscale x 8 x bfloat>, <vscale x 8 x i1>, bfloat*)
	declare void @llvm.aarch64.sve.st2.nxv4f32(<vscale x 4 x float>, <vscale x 4 x float>, <vscale x 4 x i1>, float*)			declare void @llvm.aarch64.sve.st2.nxv4f32(<vscale x 4 x float>, <vscale x 4 x float>, <vscale x 4 x i1>, float*)
	declare void @llvm.aarch64.sve.st2.nxv2f64(<vscale x 2 x double>, <vscale x 2 x double>, <vscale x 2 x i1>, double*)			declare void @llvm.aarch64.sve.st2.nxv2f64(<vscale x 2 x double>, <vscale x 2 x double>, <vscale x 2 x i1>, double*)
	Show All 20 Lines
	declare void @llvm.aarch64.sve.stnt1.nxv8i16(<vscale x 8 x i16>, <vscale x 8 x i1>, i16*)			declare void @llvm.aarch64.sve.stnt1.nxv8i16(<vscale x 8 x i16>, <vscale x 8 x i1>, i16*)
	declare void @llvm.aarch64.sve.stnt1.nxv4i32(<vscale x 4 x i32>, <vscale x 4 x i1>, i32*)			declare void @llvm.aarch64.sve.stnt1.nxv4i32(<vscale x 4 x i32>, <vscale x 4 x i1>, i32*)
	declare void @llvm.aarch64.sve.stnt1.nxv2i64(<vscale x 2 x i64>, <vscale x 2 x i1>, i64*)			declare void @llvm.aarch64.sve.stnt1.nxv2i64(<vscale x 2 x i64>, <vscale x 2 x i1>, i64*)
	declare void @llvm.aarch64.sve.stnt1.nxv8f16(<vscale x 8 x half>, <vscale x 8 x i1>, half*)			declare void @llvm.aarch64.sve.stnt1.nxv8f16(<vscale x 8 x half>, <vscale x 8 x i1>, half*)
	declare void @llvm.aarch64.sve.stnt1.nxv8bf16(<vscale x 8 x bfloat>, <vscale x 8 x i1>, bfloat*)			declare void @llvm.aarch64.sve.stnt1.nxv8bf16(<vscale x 8 x bfloat>, <vscale x 8 x i1>, bfloat*)
	declare void @llvm.aarch64.sve.stnt1.nxv4f32(<vscale x 4 x float>, <vscale x 4 x i1>, float*)			declare void @llvm.aarch64.sve.stnt1.nxv4f32(<vscale x 4 x float>, <vscale x 4 x i1>, float*)
	declare void @llvm.aarch64.sve.stnt1.nxv2f64(<vscale x 2 x double>, <vscale x 2 x i1>, double*)			declare void @llvm.aarch64.sve.stnt1.nxv2f64(<vscale x 2 x double>, <vscale x 2 x i1>, double*)

				declare <vscale x 6 x i64> @llvm.aarch64.sve.tuple.create3.nxv6i64.nxv2i64(<vscale x 2 x i64>, <vscale x 2 x i64>, <vscale x 2 x i64>)
				declare <vscale x 8 x i64> @llvm.aarch64.sve.tuple.create4.nxv8i64.nxv2i64(<vscale x 2 x i64>, <vscale x 2 x i64>, <vscale x 2 x i64>, <vscale x 2 x i64>)

				declare <vscale x 16 x i16> @llvm.aarch64.sve.tuple.create2.nxv16i16.nxv8i16(<vscale x 8 x i16>, <vscale x 8 x i16>)
				declare <vscale x 24 x i16> @llvm.aarch64.sve.tuple.create3.nxv24i16.nxv8i16(<vscale x 8 x i16>, <vscale x 8 x i16>, <vscale x 8 x i16>)

				declare <vscale x 12 x float> @llvm.aarch64.sve.tuple.create3.nxv12f32.nxv4f32(<vscale x 4 x float>, <vscale x 4 x float>, <vscale x 4 x float>)
				declare <vscale x 16 x float> @llvm.aarch64.sve.tuple.create4.nxv16f32.nxv4f32(<vscale x 4 x float>, <vscale x 4 x float>, <vscale x 4 x float>, <vscale x 4 x float>)

	; +bf16 is required for the bfloat version.			; +bf16 is required for the bfloat version.
	attributes #0 = { "target-features"="+sve,+bf16" }			attributes #0 = { "target-features"="+sve,+bf16" }

llvm/test/CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll

	Show First 20 Lines • Show All 127 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ptrue p0.s			; CHECK-NEXT: ptrue p0.s
	; CHECK-NEXT: st1h { z0.s }, p0, [x0]			; CHECK-NEXT: st1h { z0.s }, p0, [x0]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%ins = insertelement <vscale x 4 x half> undef, half 1.0, i32 0			%ins = insertelement <vscale x 4 x half> undef, half 1.0, i32 0
	%splat = shufflevector <vscale x 4 x half> %ins, <vscale x 4 x half> undef, <vscale x 4 x i32> zeroinitializer			%splat = shufflevector <vscale x 4 x half> %ins, <vscale x 4 x half> undef, <vscale x 4 x i32> zeroinitializer
	store <vscale x 4 x half> %splat, <vscale x 4 x half>* %out			store <vscale x 4 x half> %splat, <vscale x 4 x half>* %out
	ret void			ret void
	}			}

				; Splat stores of unusual FP scalable vector types

				define void @store_nxv6f32(<vscale x 6 x float>* %out) {
				; CHECK-LABEL: store_nxv6f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: fmov z0.s, #1.00000000
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: st1w { z0.s }, p0, [x0]
				; CHECK-NEXT: uunpklo z0.d, z0.s
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: st1w { z0.d }, p0, [x0, #2, mul vl]
				; CHECK-NEXT: ret
				%ins = insertelement <vscale x 6 x float> undef, float 1.0, i32 0
				%splat = shufflevector <vscale x 6 x float> %ins, <vscale x 6 x float> undef, <vscale x 6 x i32> zeroinitializer
				store <vscale x 6 x float> %splat, <vscale x 6 x float>* %out
				ret void
				}

				define void @store_nxv12f16(<vscale x 12 x half>* %out) {
				; CHECK-LABEL: store_nxv12f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: fmov z0.h, #1.00000000
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: st1h { z0.h }, p0, [x0]
				; CHECK-NEXT: uunpklo z0.s, z0.h
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: st1h { z0.s }, p0, [x0, #2, mul vl]
				; CHECK-NEXT: ret
				%ins = insertelement <vscale x 12 x half> undef, half 1.0, i32 0
				%splat = shufflevector <vscale x 12 x half> %ins, <vscale x 12 x half> undef, <vscale x 12 x i32> zeroinitializer
				store <vscale x 12 x half> %splat, <vscale x 12 x half>* %out
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[SVE][CodeGen] Fix scalable vector issues in DAGTypeLegalizer::GenWidenVectorStoresClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 285308

llvm/include/llvm/Support/TypeSize.h

llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

llvm/test/CodeGen/AArch64/sve-intrinsics-stores.ll

llvm/test/CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll

[SVE][CodeGen] Fix scalable vector issues in DAGTypeLegalizer::GenWidenVectorStores
ClosedPublic