Download Raw Diff

Details

Reviewers

foad
arsenm
bjope

Commits

rG53a4adc0deb2: [AMDGPU] Fix crash with 160-bit p7's by manually defining getPointerTy

Summary

While pointers in address space 7 (128 bit rsrc + 32 bit offset)
should be rewritten out of the code before IR translation on AMDGPU,
higher-level analyses may still call MVT getPointerTy() and the like
on the target machine. Currently, since there is no MVT::i160, this
operation ends up causing crashes.

The changes to the data layout that caused such crashes were D149776.

This patch causes getPointerTy() to return the type MVT::v5i32
and getPointerMemTy() to be MVT::v8i32. These are accurate types,
but mean that we can't use vectors of address space 7 pointers during
codegen. This is mostly OK, since vectors of buffers aren't supported
in LPC anyway, but it's a noticable limitation.

Potential alternative solutions include adjusting getPointerTy() to return
an EVT or adding MVT::i160 and MVT::i256, both of which are rather
disruptive to the rest of the compiler.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

krzysz00 created this revision.May 5 2023, 3:14 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2023, 3:14 PM

Herald added subscribers: kosarev, foad, kerbowa and 8 others. · View Herald Transcript

krzysz00 requested review of this revision.May 5 2023, 3:14 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2023, 3:14 PM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

krzysz00 added reviewers: foad, arsenm, bjope.May 5 2023, 3:18 PM

Herald added a subscriber: StephenFan. · View Herald TranscriptMay 5 2023, 3:18 PM

Harbormaster completed remote builds in B230341: Diff 519989.May 5 2023, 3:47 PM

Can you just use v5i32

llvm/test/Transforms/IndVarSimplify/AMDGPU/addrspace-7-doesnt-crash.ll
26	Better to have a test with a non foldable branch

Move to MVT::v5i32, sacrificing the ability to handle vectors of p7 in codegen but being more correct otherwise.

The problem with using v5i32 as the MVT is that you get an inevitable crash when you getValueType() on a vector of p7's per https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/CodeGen/TargetLowering.h#L1520

Which ... having looked, could well be fine

Harbormaster completed remote builds in B230661: Diff 520400.May 8 2023, 9:51 AM

It also would not be difficult to just define i160. It should be easier than ever now that MVT is generated

@arsenm Given that, from what I can tell, MVT is the type for "things some architecture somewhere needed for codegen", I was somewhat concerned that adding MVT::i160 might break legalization or something.

But I would prefer to go that route if it's something you think everyone'll be OK with

In D150002#4329912, @krzysz00 wrote:

@arsenm Given that, from what I can tell, MVT is the type for "things some architecture somewhere needed for codegen", I was somewhat concerned that adding MVT::i160 might break legalization or something.

I doubt it will break anything. Most everything looks for next-power-of-2 and wouldn't see it

I went looking and ran into, for example, TargetLoweringBase::computeRegisterProperties, which assumes the MVT integers all double in size.

We could probably special-case around it, but ... for temporarily un-breaking higher-level analyses that end up looking at TargetTransformInfo, we can probably either go for the vector type or lie.

... though, we could also hack in support for 2x5xi32 -> 10xi32 type transformations.

piotr added a subscriber: piotr.May 10 2023, 3:57 AM

Ping to land this because there's active breakage? @foad @arsenm

arichardson added inline comments.May 11 2023, 9:05 AM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
982	Comment needs updating for the v5 change

Update comment fon the new approach

Harbormaster completed remote builds in B231368: Diff 521349.May 11 2023, 11:13 AM

Confirmed that this fixes the issues we were observing.

In D150002#4335104, @krzysz00 wrote:

Ping to land this because there's active breakage? @foad @arsenm

Yes please land this.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
990	Nit: why do you need the getPointerSizeInBits test?

This revision is now accepted and ready to land.May 12 2023, 2:22 AM

foad added inline comments.May 12 2023, 2:23 AM

llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-non-integral-address-spaces-vectors.ll
4	Nit: you're not actually checking anything.

Update checks, coments, remase

This revision was landed with ongoing or failed builds.May 12 2023, 8:58 AM

Closed by commit rG53a4adc0deb2: [AMDGPU] Fix crash with 160-bit p7's by manually defining getPointerTy (authored by krzysz00). · Explain Why

This revision was automatically updated to reflect the committed changes.

krzysz00 added a commit: rG53a4adc0deb2: [AMDGPU] Fix crash with 160-bit p7's by manually defining getPointerTy.

foad added inline comments.May 12 2023, 9:36 AM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
990	I mean why do you need it at all? Don't we know statically that the size of BUFFER_FAT_POINTER is 160 bits?

krzysz00 added inline comments.May 12 2023, 9:49 AM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
990	Paranoia around making sure this isn't an old data layout or the like. I might even want to make the check more precise.
990	I was concerned that, for example, auto-upgrade might not fire or that you'd otherwise end up in a position where `p7` doesn't have its usual definition and so thought to go ahead and check.

Harbormaster completed remote builds in B231616: Diff 521671.May 12 2023, 9:51 AM

In D150002#4330056, @krzysz00 wrote:

I went looking and ran into, for example, TargetLoweringBase::computeRegisterProperties, which assumes the MVT integers all double in size.

Just add it and see it anything breaks

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
990	Just drop it, there's no reason to try defending against that here

dyung mentioned this in rG97b73e35eb0f: Add 'REQUIRES: asserts' to test added in D150002 (53a4adc) because it tests for….May 12 2023, 7:03 PM

Just a heads up that we noticed in our internal testing that llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-non-integral-address-spaces-vectors.ll failed in a release build without assertions because it is looking for a crash that is caused by an assertion failure. I've added that requirement to the test in 97b73e3.

Diff 521672

llvm/lib/Target/AMDGPU/SIISelLowering.h

Show First 20 Lines • Show All 267 Lines • ▼ Show 20 Lines	public:
bool isFPExtFoldable(const SelectionDAG &DAG, unsigned Opcode, EVT DestVT,		bool isFPExtFoldable(const SelectionDAG &DAG, unsigned Opcode, EVT DestVT,
EVT SrcVT) const override;		EVT SrcVT) const override;

bool isFPExtFoldable(const MachineInstr &MI, unsigned Opcode, LLT DestTy,		bool isFPExtFoldable(const MachineInstr &MI, unsigned Opcode, LLT DestTy,
LLT SrcTy) const override;		LLT SrcTy) const override;

bool isShuffleMaskLegal(ArrayRef<int> /Mask/, EVT /VT/) const override;		bool isShuffleMaskLegal(ArrayRef<int> /Mask/, EVT /VT/) const override;

		// While address space 7 should never make it to codegen, it still needs to
		// have a MVT to prevent some analyses that query this function from breaking,
		// so, to work around the lack of i160, map it to v5i32.
		MVT getPointerTy(const DataLayout &DL, unsigned AS) const override;
		MVT getPointerMemTy(const DataLayout &DL, unsigned AS) const override;

bool getTgtMemIntrinsic(IntrinsicInfo &, const CallInst &,		bool getTgtMemIntrinsic(IntrinsicInfo &, const CallInst &,
MachineFunction &MF,		MachineFunction &MF,
unsigned IntrinsicID) const override;		unsigned IntrinsicID) const override;

bool getAddrModeArguments(IntrinsicInst * /I/,		bool getAddrModeArguments(IntrinsicInst * /I/,
SmallVectorImpl<Value> &/Ops*/,		SmallVectorImpl<Value> &/Ops*/,
Type &/AccessTy*/) const override;		Type &/AccessTy*/) const override;

▲ Show 20 Lines • Show All 264 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 973 Lines • ▼ Show 20 Lines	if (!ST)
return memVTFromLoadIntrData(Ty, MaxNumLanes);		return memVTFromLoadIntrData(Ty, MaxNumLanes);

// TFE intrinsics return an aggregate type.		// TFE intrinsics return an aggregate type.
assert(ST->getNumContainedTypes() == 2 &&		assert(ST->getNumContainedTypes() == 2 &&
ST->getContainedType(1)->isIntegerTy(32));		ST->getContainedType(1)->isIntegerTy(32));
return memVTFromLoadIntrData(ST->getContainedType(0), MaxNumLanes);		return memVTFromLoadIntrData(ST->getContainedType(0), MaxNumLanes);
}		}

		/// Map address space 7 to MVT::v5i32 because that's its in-memory
		arichardsonUnsubmitted Not Done Reply Inline Actions Comment needs updating for the v5 change arichardson: Comment needs updating for the v5 change
		/// representation. This return value is vector-typed because there is no
		/// MVT::i160 and it is not clear if one can be added. While this could
		/// cause issues during codegen, these address space 7 pointers will be
		/// rewritten away by then. Therefore, we can return MVT::v5i32 in order
		/// to allow pre-codegen passes that query TargetTransformInfo, often for cost
		/// modeling, to work.
		MVT SITargetLowering::getPointerTy(const DataLayout &DL, unsigned AS) const {
		if (AMDGPUAS::BUFFER_FAT_POINTER == AS && DL.getPointerSizeInBits(AS) == 160)
		foadUnsubmitted Not Done Reply Inline Actions Nit: why do you need the getPointerSizeInBits test? foad: Nit: why do you need the getPointerSizeInBits test?
		krzysz00AuthorUnsubmitted Done Reply Inline Actions Paranoia around making sure this isn't an old data layout or the like. I might even want to make the check more precise. krzysz00: Paranoia around making sure this isn't an old data layout or the like. I might even want to…
		foadUnsubmitted Not Done Reply Inline Actions I mean why do you need it at all? Don't we know statically that the size of BUFFER_FAT_POINTER is 160 bits? foad: I mean why do you need it at all? Don't we know statically that the size of BUFFER_FAT_POINTER…
		krzysz00AuthorUnsubmitted Not Done Reply Inline Actions I was concerned that, for example, auto-upgrade might not fire or that you'd otherwise end up in a position where `p7` doesn't have its usual definition and so thought to go ahead and check. krzysz00: I was concerned that, for example, auto-upgrade might not fire or that you'd otherwise end up…
		arsenmUnsubmitted Not Done Reply Inline Actions Just drop it, there's no reason to try defending against that here arsenm: Just drop it, there's no reason to try defending against that here
		return MVT::v5i32;
		return AMDGPUTargetLowering::getPointerTy(DL, AS);
		}
		/// Similarly, the in-memory representation of a p7 is {p8, i32}, aka
		/// v8i32 when padding is added.
		MVT SITargetLowering::getPointerMemTy(const DataLayout &DL, unsigned AS) const {
		if (AMDGPUAS::BUFFER_FAT_POINTER == AS && DL.getPointerSizeInBits(AS) == 160)
		return MVT::v8i32;
		return AMDGPUTargetLowering::getPointerMemTy(DL, AS);
		}

bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info,		bool SITargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info,
const CallInst &CI,		const CallInst &CI,
MachineFunction &MF,		MachineFunction &MF,
unsigned IntrID) const {		unsigned IntrID) const {
Info.flags = MachineMemOperand::MONone;		Info.flags = MachineMemOperand::MONone;
if (CI.hasMetadata(LLVMContext::MD_invariant_load))		if (CI.hasMetadata(LLVMContext::MD_invariant_load))
Info.flags \|= MachineMemOperand::MOInvariant;		Info.flags \|= MachineMemOperand::MOInvariant;

▲ Show 20 Lines • Show All 12,546 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-non-integral-address-spaces-vectors.ll

This file was added.

				; RUN: not --crash llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx900 -o - -stop-after=irtranslator < %s

				; Confirm that no one's gotten vectors of addrspace(7) pointers to go through the
				; IR translater incidentally.
				foadUnsubmitted Done Reply Inline Actions Nit: you're not actually checking anything. foad: Nit: you're not actually checking anything.

				define <2 x ptr addrspace(7)> @no_auto_constfold_gep_vector() {
				%gep = getelementptr i8, <2 x ptr addrspace(7)> zeroinitializer, <2 x i32> <i32 123, i32 123>
				ret <2 x ptr addrspace(7)> %gep
				}

				define <2 x ptr addrspace(7)> @gep_vector_splat(<2 x ptr addrspace(7)> %ptrs, i64 %idx) {
				%gep = getelementptr i8, <2 x ptr addrspace(7)> %ptrs, i64 %idx
				ret <2 x ptr addrspace(7)> %gep
				}

llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-non-integral-address-spaces.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx900 -o - -stop-after=irtranslator %s \| FileCheck %s

				; Check that the CSEMIRBuilder doesn't fold away the getelementptr during IRTranslator
				define ptr addrspace(7) @no_auto_constfold_gep() {
				; CHECK-LABEL: name: no_auto_constfold_gep
				; CHECK: bb.1 (%ir-block.0):
				; CHECK-NEXT: [[C:%[0-9]+]]:_(p7) = G_CONSTANT i160 0
				; CHECK-NEXT: [[C1:%[0-9]+]]:_(s160) = G_CONSTANT i160 123
				; CHECK-NEXT: [[PTR_ADD:%[0-9]+]]:_(p7) = G_PTR_ADD [[C]], [[C1]](s160)
				; CHECK-NEXT: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32), [[UV4:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[PTR_ADD]](p7)
				; CHECK-NEXT: $vgpr0 = COPY [[UV]](s32)
				; CHECK-NEXT: $vgpr1 = COPY [[UV1]](s32)
				; CHECK-NEXT: $vgpr2 = COPY [[UV2]](s32)
				; CHECK-NEXT: $vgpr3 = COPY [[UV3]](s32)
				; CHECK-NEXT: $vgpr4 = COPY [[UV4]](s32)
				; CHECK-NEXT: SI_RETURN implicit $vgpr0, implicit $vgpr1, implicit $vgpr2, implicit $vgpr3, implicit $vgpr4
				%gep = getelementptr i8, ptr addrspace(7) null, i32 123
				ret ptr addrspace(7) %gep
				}

llvm/test/Transforms/IndVarSimplify/AMDGPU/addrspace-7-doesnt-crash.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
				; RUN: opt -passes=indvars -S < %s \| FileCheck %s

				target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8"
				target triple = "amdgcn--amdpal"

				define void @f(ptr addrspace(7) %arg) {
				; CHECK-LABEL: define void @f
				; CHECK-SAME: (ptr addrspace(7) [[ARG:%.*]]) {
				; CHECK-NEXT: bb:
				; CHECK-NEXT: br label [[BB1:%.*]]
				; CHECK: bb1:
				; CHECK-NEXT: br i1 false, label [[BB2:%.*]], label [[BB1]]
				; CHECK: bb2:
				; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr addrspace(7) [[ARG]], i32 8
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb3:
				; CHECK-NEXT: [[I4:%.*]] = load i32, ptr addrspace(7) [[SCEVGEP]], align 4
				; CHECK-NEXT: br label [[BB3]]
				;
				bb:
				br label %bb1
				bb1:
				%i = getelementptr i32, ptr addrspace(7) %arg, i32 2
				br i1 false, label %bb2, label %bb1
				bb2:
				arsenmUnsubmitted Not Done Reply Inline Actions Better to have a test with a non foldable branch arsenm: Better to have a test with a non foldable branch
				br label %bb3
				bb3:
				%i4 = load i32, ptr addrspace(7) %i, align 4
				br label %bb3
				}

				define void @f2(<2 x ptr addrspace(7)> %arg) {
				; CHECK-LABEL: define void @f2
				; CHECK-SAME: (<2 x ptr addrspace(7)> [[ARG:%.*]]) {
				; CHECK-NEXT: bb:
				; CHECK-NEXT: br label [[BB1:%.*]]
				; CHECK: bb1:
				; CHECK-NEXT: [[P:%.*]] = extractelement <2 x ptr addrspace(7)> [[ARG]], i32 0
				; CHECK-NEXT: [[I:%.*]] = getelementptr i32, ptr addrspace(7) [[P]], i32 2
				; CHECK-NEXT: br i1 false, label [[BB2:%.*]], label [[BB1]]
				; CHECK: bb2:
				; CHECK-NEXT: [[I_LCSSA:%.*]] = phi ptr addrspace(7) [ [[I]], [[BB1]] ]
				; CHECK-NEXT: br label [[BB3:%.*]]
				; CHECK: bb3:
				; CHECK-NEXT: [[I4:%.*]] = load i32, ptr addrspace(7) [[I_LCSSA]], align 4
				; CHECK-NEXT: br label [[BB3]]
				;
				bb:
				br label %bb1
				bb1:
				%p = extractelement <2 x ptr addrspace(7)> %arg, i32 0
				%i = getelementptr i32, ptr addrspace(7) %p, i32 2
				br i1 false, label %bb2, label %bb1
				bb2:
				br label %bb3
				bb3:
				%i4 = load i32, ptr addrspace(7) %i, align 4
				br label %bb3
				}

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix crash with 160-bit p7's by manually defining getPointerTy
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 521672

llvm/lib/Target/AMDGPU/SIISelLowering.h

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-non-integral-address-spaces-vectors.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-non-integral-address-spaces.ll

llvm/test/Transforms/IndVarSimplify/AMDGPU/addrspace-7-doesnt-crash.ll

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix crash with 160-bit p7's by manually defining getPointerTyClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 521672

llvm/lib/Target/AMDGPU/SIISelLowering.h

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-non-integral-address-spaces-vectors.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-non-integral-address-spaces.ll

llvm/test/Transforms/IndVarSimplify/AMDGPU/addrspace-7-doesnt-crash.ll

[AMDGPU] Fix crash with 160-bit p7's by manually defining getPointerTy
ClosedPublic