Download Raw Diff

Details

Reviewers

arsenm
t-tye
kzhuravl

Commits

rGa126a13bb3eb: AMDGPU : Widen extending scalar loads to 32-bits.
rL309178: AMDGPU : Widen extending scalar loads to 32-bits.

Diff Detail

Repository: rL LLVM

Event Timeline

wdng created this revision.Jul 7 2017, 2:40 PM

Herald added subscribers: tpr, dstuttard, yaxunl, nhaehnle. · View Herald TranscriptJul 7 2017, 2:40 PM

arsenm added a subscriber: llvm-commits.Jul 10 2017, 9:05 AM

Needs tests

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
448 ↗	(On Diff #105700)	There's no point to having this since you never set it to anything else
453 ↗	(On Diff #105700)	Ideally this would handle vectors like <2 x i8>
454 ↗	(On Diff #105700)	The address space check should be first
458 ↗	(On Diff #105700)	The builder already has a getInt32Ty
460 ↗	(On Diff #105700)	I.getPointerOperand
468 ↗	(On Diff #105700)	This should also skip volatile loads

Address code reviews. Looks like adding "isDereferenceableAndAlignedPointer" is too strong to prevent expected code transformations.
Modify related LIT tests.

wdng marked an inline comment as done.Jul 17 2017, 10:09 AM

This needs a dedicated test

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
452 ↗	(On Diff #106894)	This should be able to handle vectors. This should also use the DataLayout so it works for pointers
456 ↗	(On Diff #106894)	getI32Ty is the wrong thing to use here

wdng added inline comments.Jul 17 2017, 2:51 PM

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
452 ↗	(On Diff #106894)	This one (VT && VT->getBitWidth() < 32) is able to handle vectors with bitwidth < 32 or scalar (!VT). Are you saying to use DataLayout to handle the pointer dereferenceable issue?

Address code reviews.

wdng marked an inline comment as done.Jul 21 2017, 9:21 AM

arsenm added inline comments.Jul 21 2017, 9:55 AM

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
449 ↗	(On Diff #107684)	No space after *
451 ↗	(On Diff #107684)	M->getDataLayout()
462–475 ↗	(On Diff #107684)	You don't need an entire block of code that is mostly the same for the vector case. The non-vector case requires a bitcast as well.
test/CodeGen/AMDGPU/widen_extending_scalar_loads.ll
3 ↗	(On Diff #107684)	These should be running just this pass with opt
12 ↗	(On Diff #107684)	This should not be converted because you don't know if it's 4 byte aligned
48 ↗	(On Diff #107684)	Needs tests with half and <2 x half>, i1, and maybe another exotic size. I'm pretty sure this will assert for half as is now. Also need tests with various alignments and volatile

arsenm added inline comments.Jul 21 2017, 9:56 AM

test/CodeGen/AMDGPU/widen_extending_scalar_loads.ll
48 ↗	(On Diff #107684)	Also would be good to have a test specifically loading from the dispatch packet like happens in the workitem ID calculation

kzhuravl added inline comments.Jul 21 2017, 10:10 AM

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
42 ↗	(On Diff #107684)	Included twice.

Address code reviews.

wdng marked 2 inline comments as done.Jul 21 2017, 3:42 PM

arsenm added inline comments.Jul 24 2017, 10:14 AM

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
455 ↗	(On Diff #107736)	Move align check before DA. It would also be better to move the datalayout alignment check into a helper function checked here
test/CodeGen/AMDGPU/widen_extending_scalar_loads.ll
5 ↗	(On Diff #107736)	Don't use FUNC, you don't have this as a check prefix
24–27 ↗	(On Diff #107736)	This isn't checking the relevant parts
133 ↗	(On Diff #107736)	_f16 is the naming convention
145 ↗	(On Diff #107736)	_v2f16
161 ↗	(On Diff #107736)	Needs to check the type

Address code reviews.

wdng marked an inline comment as done.Jul 24 2017, 1:53 PM

Fixed a mistake for align2_i8 lit test.

Upload correct diff file.

Ping.

arsenm added inline comments.Jul 25 2017, 9:42 AM

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
130–135 ↗	(On Diff #107982)	The comment explains too specifically what it is doing, it should be describing intent and why.
test/CodeGen/AMDGPU/widen_extending_scalar_loads.ll
1 ↗	(On Diff #107982)	These are IR check lines, so shouldn't use GCN. You also don't need or use the HSA check prefix
88–90 ↗	(On Diff #107982)	This should check the integer type/operands
172 ↗	(On Diff #107982)	Spelling _addrespace1

Address code reviews.

Ping.

arsenm added inline comments.Jul 26 2017, 1:29 PM

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
238 ↗	(On Diff #108113)	This doesn't actually do the widening, so the name should be something like canWidenScalarExtLoad

Modified function name.

LGTM

This revision is now accepted and ready to land.Jul 26 2017, 1:52 PM

arsenm added inline comments.Jul 26 2017, 1:53 PM

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
245 ↗	(On Diff #108351)	This should really be I.isSimple(), in case it's atomic.

wdng marked an inline comment as done.Jul 26 2017, 2:08 PM

Closed by commit rL309178: AMDGPU : Widen extending scalar loads to 32-bits. (authored by wdng). · Explain WhyJul 26 2017, 2:09 PM

This revision was automatically updated to reflect the committed changes.

Diff 108357

llvm/trunk/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp

Show All 12 Lines
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AMDGPU.h"		#include "AMDGPU.h"
#include "AMDGPUSubtarget.h"		#include "AMDGPUSubtarget.h"
#include "AMDGPUTargetMachine.h"		#include "AMDGPUTargetMachine.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/Analysis/DivergenceAnalysis.h"		#include "llvm/Analysis/DivergenceAnalysis.h"
		#include "llvm/Analysis/Loads.h"
#include "llvm/CodeGen/Passes.h"		#include "llvm/CodeGen/Passes.h"
#include "llvm/CodeGen/TargetPassConfig.h"		#include "llvm/CodeGen/TargetPassConfig.h"
#include "llvm/IR/Attributes.h"		#include "llvm/IR/Attributes.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DerivedTypes.h"		#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
Show All 19 Lines
namespace {		namespace {

class AMDGPUCodeGenPrepare : public FunctionPass,		class AMDGPUCodeGenPrepare : public FunctionPass,
public InstVisitor<AMDGPUCodeGenPrepare, bool> {		public InstVisitor<AMDGPUCodeGenPrepare, bool> {
const SISubtarget *ST = nullptr;		const SISubtarget *ST = nullptr;
DivergenceAnalysis *DA = nullptr;		DivergenceAnalysis *DA = nullptr;
Module *Mod = nullptr;		Module *Mod = nullptr;
bool HasUnsafeFPMath = false;		bool HasUnsafeFPMath = false;
		AMDGPUAS AMDGPUASI;

/// \brief Copies exact/nsw/nuw flags (if any) from binary operation \p I to		/// \brief Copies exact/nsw/nuw flags (if any) from binary operation \p I to
/// binary operation \p V.		/// binary operation \p V.
///		///
/// \returns Binary operation \p V.		/// \returns Binary operation \p V.
/// \returns \p T's base element bit width.		/// \returns \p T's base element bit width.
unsigned getBaseElementBitWidth(const Type *T) const;		unsigned getBaseElementBitWidth(const Type *T) const;

▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	class AMDGPUCodeGenPrepare : public FunctionPass,
/// than or equal 16. Promotion is done by zero extending the operand to 32		/// than or equal 16. Promotion is done by zero extending the operand to 32
/// bits, replacing \p I with 32 bit 'bitreverse' intrinsic, shifting the		/// bits, replacing \p I with 32 bit 'bitreverse' intrinsic, shifting the
/// result of 32 bit 'bitreverse' intrinsic to the right with zero fill (the		/// result of 32 bit 'bitreverse' intrinsic to the right with zero fill (the
/// shift amount is 32 minus \p I's base element bit width), and truncating		/// shift amount is 32 minus \p I's base element bit width), and truncating
/// the result of the shift operation back to \p I's original type.		/// the result of the shift operation back to \p I's original type.
///		///
/// \returns True.		/// \returns True.
bool promoteUniformBitreverseToI32(IntrinsicInst &I) const;		bool promoteUniformBitreverseToI32(IntrinsicInst &I) const;
		/// \brief Widen a scalar load.
		///
		/// \details \p Widen scalar load for uniform, small type loads from constant
		// memory / to a full 32-bits and then truncate the input to allow a scalar
		// load instead of a vector load.
		//
		/// \returns True.

		bool canWidenScalarExtLoad(LoadInst &I) const;

public:		public:
static char ID;		static char ID;

AMDGPUCodeGenPrepare() : FunctionPass(ID) {}		AMDGPUCodeGenPrepare() : FunctionPass(ID) {}

bool visitFDiv(BinaryOperator &I);		bool visitFDiv(BinaryOperator &I);

bool visitInstruction(Instruction &I) { return false; }		bool visitInstruction(Instruction &I) { return false; }
bool visitBinaryOperator(BinaryOperator &I);		bool visitBinaryOperator(BinaryOperator &I);
		bool visitLoadInst(LoadInst &I);
bool visitICmpInst(ICmpInst &I);		bool visitICmpInst(ICmpInst &I);
bool visitSelectInst(SelectInst &I);		bool visitSelectInst(SelectInst &I);

bool visitIntrinsicInst(IntrinsicInst &I);		bool visitIntrinsicInst(IntrinsicInst &I);
bool visitBitreverseIntrinsicInst(IntrinsicInst &I);		bool visitBitreverseIntrinsicInst(IntrinsicInst &I);

bool doInitialization(Module &M) override;		bool doInitialization(Module &M) override;
bool runOnFunction(Function &F) override;		bool runOnFunction(Function &F) override;
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	case Instruction::Mul:
return true;		return true;
case Instruction::Sub:		case Instruction::Sub:
return I.hasNoUnsignedWrap();		return I.hasNoUnsignedWrap();
default:		default:
return false;		return false;
}		}
}		}

		bool AMDGPUCodeGenPrepare::canWidenScalarExtLoad(LoadInst &I) const {
		Type *Ty = I.getType();
		const DataLayout &DL = Mod->getDataLayout();
		int TySize = DL.getTypeSizeInBits(Ty);
		unsigned Align = I.getAlignment() ?
		I.getAlignment() : DL.getABITypeAlignment(Ty);

		return I.isSimple() && TySize < 32 && Align >= 4 && DA->isUniform(&I);
		}

bool AMDGPUCodeGenPrepare::promoteUniformOpToI32(BinaryOperator &I) const {		bool AMDGPUCodeGenPrepare::promoteUniformOpToI32(BinaryOperator &I) const {
assert(needsPromotionToI32(I.getType()) &&		assert(needsPromotionToI32(I.getType()) &&
"I does not need promotion to i32");		"I does not need promotion to i32");

if (I.getOpcode() == Instruction::SDiv \|\|		if (I.getOpcode() == Instruction::SDiv \|\|
I.getOpcode() == Instruction::UDiv)		I.getOpcode() == Instruction::UDiv)
return false;		return false;

▲ Show 20 Lines • Show All 204 Lines • ▼ Show 20 Lines	bool AMDGPUCodeGenPrepare::visitBinaryOperator(BinaryOperator &I) {

if (ST->has16BitInsts() && needsPromotionToI32(I.getType()) &&		if (ST->has16BitInsts() && needsPromotionToI32(I.getType()) &&
DA->isUniform(&I))		DA->isUniform(&I))
Changed \|= promoteUniformOpToI32(I);		Changed \|= promoteUniformOpToI32(I);

return Changed;		return Changed;
}		}

		bool AMDGPUCodeGenPrepare::visitLoadInst(LoadInst &I) {
		if (I.getPointerAddressSpace() == AMDGPUASI.CONSTANT_ADDRESS &&
		canWidenScalarExtLoad(I)) {
		IRBuilder<> Builder(&I);
		Builder.SetCurrentDebugLocation(I.getDebugLoc());

		Type *I32Ty = Builder.getInt32Ty();
		Type *PT = PointerType::get(I32Ty, I.getPointerAddressSpace());
		Value *BitCast= Builder.CreateBitCast(I.getPointerOperand(), PT);
		Value *WidenLoad = Builder.CreateLoad(BitCast);

		int TySize = Mod->getDataLayout().getTypeSizeInBits(I.getType());
		Type *IntNTy = Builder.getIntNTy(TySize);
		Value *ValTrunc = Builder.CreateTrunc(WidenLoad, IntNTy);
		Value *ValOrig = Builder.CreateBitCast(ValTrunc, I.getType());
		I.replaceAllUsesWith(ValOrig);
		I.eraseFromParent();
		return true;
		}

		return false;
		}

bool AMDGPUCodeGenPrepare::visitICmpInst(ICmpInst &I) {		bool AMDGPUCodeGenPrepare::visitICmpInst(ICmpInst &I) {
bool Changed = false;		bool Changed = false;

if (ST->has16BitInsts() && needsPromotionToI32(I.getOperand(0)->getType()) &&		if (ST->has16BitInsts() && needsPromotionToI32(I.getOperand(0)->getType()) &&
DA->isUniform(&I))		DA->isUniform(&I))
Changed \|= promoteUniformOpToI32(I);		Changed \|= promoteUniformOpToI32(I);

return Changed;		return Changed;
▲ Show 20 Lines • Show All 73 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/unaligned-load-store.ll

	Show First 20 Lines • Show All 513 Lines • ▼ Show 20 Lines
	; SI: buffer_store_dwordx4			; SI: buffer_store_dwordx4
	define amdgpu_kernel void @constant_unaligned_load_v4i32(<4 x i32> addrspace(2)* %p, <4 x i32> addrspace(1)* %r) #0 {			define amdgpu_kernel void @constant_unaligned_load_v4i32(<4 x i32> addrspace(2)* %p, <4 x i32> addrspace(1)* %r) #0 {
	%v = load <4 x i32>, <4 x i32> addrspace(2)* %p, align 1			%v = load <4 x i32>, <4 x i32> addrspace(2)* %p, align 1
	store <4 x i32> %v, <4 x i32> addrspace(1)* %r, align 4			store <4 x i32> %v, <4 x i32> addrspace(1)* %r, align 4
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}constant_align4_load_i8:			; SI-LABEL: {{^}}constant_align4_load_i8:
	; SI: buffer_load_ubyte			; SI: s_load_dword
	; SI: buffer_store_byte			; SI: buffer_store_byte
	define amdgpu_kernel void @constant_align4_load_i8(i8 addrspace(2)* %p, i8 addrspace(1)* %r) #0 {			define amdgpu_kernel void @constant_align4_load_i8(i8 addrspace(2)* %p, i8 addrspace(1)* %r) #0 {
	%v = load i8, i8 addrspace(2)* %p, align 4			%v = load i8, i8 addrspace(2)* %p, align 4
	store i8 %v, i8 addrspace(1)* %r, align 4			store i8 %v, i8 addrspace(1)* %r, align 4
	ret void			ret void
	}			}

	; SI-LABEL: {{^}}constant_align2_load_i8:			; SI-LABEL: {{^}}constant_align2_load_i8:
	▲ Show 20 Lines • Show All 74 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/widen_extending_scalar_loads.ll

				; RUN: opt -S -mtriple=amdgcn-- -amdgpu-codegenprepare < %s \| FileCheck -check-prefix=OPT %s

				declare i8 addrspace(2)* @llvm.amdgcn.dispatch.ptr() #0

				; OPT-LABEL: @constant_load_i1
				; OPT: load i1
				; OPT-NEXT: store i1
				define amdgpu_kernel void @constant_load_i1(i1 addrspace(1)* %out, i1 addrspace(2)* %in) #0 {
				%val = load i1, i1 addrspace(2)* %in
				store i1 %val, i1 addrspace(1)* %out
				ret void
				}

				; OPT-LABEL: @constant_load_i1_align2
				; OPT: load i1
				; OPT-NEXT: store
				define amdgpu_kernel void @constant_load_i1_align2(i1 addrspace(1)* %out, i1 addrspace(2)* %in) #0 {
				%val = load i1, i1 addrspace(2)* %in, align 2
				store i1 %val, i1 addrspace(1)* %out, align 2
				ret void
				}

				; OPT-LABEL: @constant_load_i1_align4
				; OPT: bitcast
				; OPT-NEXT: load i32
				; OPT-NEXT: trunc
				; OPT-NEXT: store
				define amdgpu_kernel void @constant_load_i1_align4(i1 addrspace(1)* %out, i1 addrspace(2)* %in) #0 {
				%val = load i1, i1 addrspace(2)* %in, align 4
				store i1 %val, i1 addrspace(1)* %out, align 4
				ret void
				}

				; OPT-LABEL: @constant_load_i8
				; OPT: load i8
				; OPT-NEXT: store
				define amdgpu_kernel void @constant_load_i8(i8 addrspace(1)* %out, i8 addrspace(2)* %in) #0 {
				%val = load i8, i8 addrspace(2)* %in
				store i8 %val, i8 addrspace(1)* %out
				ret void
				}

				; OPT-LABEL: @constant_load_i8_align2
				; OPT: load i8
				; OPT-NEXT: store
				define amdgpu_kernel void @constant_load_i8_align2(i8 addrspace(1)* %out, i8 addrspace(2)* %in) #0 {
				%val = load i8, i8 addrspace(2)* %in, align 2
				store i8 %val, i8 addrspace(1)* %out, align 2
				ret void
				}

				; OPT-LABEL: @constant_load_i8align4
				; OPT: bitcast
				; OPT-NEXT: load i32
				; OPT-NEXT: trunc
				; OPT-NEXT: store
				define amdgpu_kernel void @constant_load_i8align4(i8 addrspace(1)* %out, i8 addrspace(2)* %in) #0 {
				%val = load i8, i8 addrspace(2)* %in, align 4
				store i8 %val, i8 addrspace(1)* %out, align 4
				ret void
				}


				; OPT-LABEL: @constant_load_v2i8
				; OPT: load <2 x i8>
				; OPT-NEXT: store
				define amdgpu_kernel void @constant_load_v2i8(<2 x i8> addrspace(1)* %out, <2 x i8> addrspace(2)* %in) #0 {
				%ld = load <2 x i8>, <2 x i8> addrspace(2)* %in
				store <2 x i8> %ld, <2 x i8> addrspace(1)* %out
				ret void
				}

				; OPT-LABEL: @constant_load_v2i8_align4
				; OPT: bitcast
				; OPT-NEXT: load i32
				; OPT-NEXT: trunc
				; OPT-NEXT: bitcast
				; OPT-NEXT: store
				define amdgpu_kernel void @constant_load_v2i8_align4(<2 x i8> addrspace(1)* %out, <2 x i8> addrspace(2)* %in) #0 {
				%ld = load <2 x i8>, <2 x i8> addrspace(2)* %in, align 4
				store <2 x i8> %ld, <2 x i8> addrspace(1)* %out, align 4
				ret void
				}

				; OPT-LABEL: @constant_load_v3i8
				; OPT: bitcast <3 x i8>
				; OPT-NEXT: load i32, i32 addrspace(2)
				; OPT-NEXT: trunc i32
				; OPT-NEXT: bitcast i24
				; OPT-NEXT: store <3 x i8>
				define amdgpu_kernel void @constant_load_v3i8(<3 x i8> addrspace(1)* %out, <3 x i8> addrspace(2)* %in) #0 {
				%ld = load <3 x i8>, <3 x i8> addrspace(2)* %in
				store <3 x i8> %ld, <3 x i8> addrspace(1)* %out
				ret void
				}

				; OPT-LABEL: @constant_load_v3i8_align4
				; OPT: bitcast <3 x i8>
				; OPT-NEXT: load i32, i32 addrspace(2)
				; OPT-NEXT: trunc i32
				; OPT-NEXT: bitcast i24
				; OPT-NEXT: store <3 x i8>
				define amdgpu_kernel void @constant_load_v3i8_align4(<3 x i8> addrspace(1)* %out, <3 x i8> addrspace(2)* %in) #0 {
				%ld = load <3 x i8>, <3 x i8> addrspace(2)* %in, align 4
				store <3 x i8> %ld, <3 x i8> addrspace(1)* %out, align 4
				ret void
				}

				; OPT-LABEL: @constant_load_i16
				; OPT: load i16
				; OPT: sext
				; OPT-NEXT: store
				define amdgpu_kernel void @constant_load_i16(i32 addrspace(1)* %out, i16 addrspace(2)* %in) #0 {
				%ld = load i16, i16 addrspace(2)* %in
				%ext = sext i16 %ld to i32
				store i32 %ext, i32 addrspace(1)* %out
				ret void
				}

				; OPT-LABEL: @constant_load_i16_align4
				; OPT: bitcast
				; OPT-NEXT: load i32
				; OPT-NEXT: trunc
				; OPT-NEXT: sext
				; OPT-NEXT: store
				define amdgpu_kernel void @constant_load_i16_align4(i32 addrspace(1)* %out, i16 addrspace(2)* %in) #0 {
				%ld = load i16, i16 addrspace(2)* %in, align 4
				%ext = sext i16 %ld to i32
				store i32 %ext, i32 addrspace(1)* %out, align 4
				ret void
				}

				; OPT-LABEL: @constant_load_f16
				; OPT: load half
				; OPT-NEXT: store
				define amdgpu_kernel void @constant_load_f16(half addrspace(1)* %out, half addrspace(2)* %in) #0 {
				%ld = load half, half addrspace(2)* %in
				store half %ld, half addrspace(1)* %out
				ret void
				}

				; OPT-LABEL: @constant_load_v2f16
				; OPT: load <2 x half>
				; OPT-NEXT: store
				define amdgpu_kernel void @constant_load_v2f16(<2 x half> addrspace(1)* %out, <2 x half> addrspace(2)* %in) #0 {
				%ld = load <2 x half>, <2 x half> addrspace(2)* %in
				store <2 x half> %ld, <2 x half> addrspace(1)* %out
				ret void
				}

				; OPT-LABEL: @load_volatile
				; OPT: load volatile i16
				; OPT-NEXT: store
				define amdgpu_kernel void @load_volatile(i16 addrspace(1)* %out, i16 addrspace(2)* %in) {
				%a = load volatile i16, i16 addrspace(2)* %in
				store i16 %a, i16 addrspace(1)* %out
				ret void
				}

				; OPT-LABEL: @constant_load_v2i8_volatile
				; OPT: load volatile <2 x i8>
				; OPT-NEXT: store
				define amdgpu_kernel void @constant_load_v2i8_volatile(<2 x i8> addrspace(1)* %out, <2 x i8> addrspace(2)* %in) #0 {
				%ld = load volatile <2 x i8>, <2 x i8> addrspace(2)* %in
				store <2 x i8> %ld, <2 x i8> addrspace(1)* %out
				ret void
				}

				; OPT-LABEL: @constant_load_v2i8_addrspace1
				; OPT: load <2 x i8>
				; OPT-NEXT: store
				define amdgpu_kernel void @constant_load_v2i8_addrspace1(<2 x i8> addrspace(1)* %out, <2 x i8> addrspace(1)* %in) #0 {
				%ld = load <2 x i8>, <2 x i8> addrspace(1)* %in
				store <2 x i8> %ld, <2 x i8> addrspace(1)* %out
				ret void
				}

				; OPT-LABEL: @use_dispatch_ptr
				; OPT: bitcast
				; OPT-NEXT: load i32
				; OPT-NEXT: trunc
				; OPT-NEXT: zext
				; OPT-NEXT: store
				define amdgpu_kernel void @use_dispatch_ptr(i32 addrspace(1)* %ptr) #1 {
				%dispatch.ptr = call i8 addrspace(2)* @llvm.amdgcn.dispatch.ptr()
				%val = load i8, i8 addrspace(2)* %dispatch.ptr, align 4
				%ld = zext i8 %val to i32
				store i32 %ld, i32 addrspace(1)* %ptr
				ret void
				}

				attributes #0 = { nounwind }

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU : Widen extending scalar loads to 32-bits
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 108357

llvm/trunk/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp

llvm/trunk/test/CodeGen/AMDGPU/unaligned-load-store.ll

llvm/trunk/test/CodeGen/AMDGPU/widen_extending_scalar_loads.ll

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU : Widen extending scalar loads to 32-bitsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 108357

llvm/trunk/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp

llvm/trunk/test/CodeGen/AMDGPU/unaligned-load-store.ll

llvm/trunk/test/CodeGen/AMDGPU/widen_extending_scalar_loads.ll

AMDGPU : Widen extending scalar loads to 32-bits
ClosedPublic