Download Raw Diff

Details

Reviewers

• tstellarAMD
arsenm
wdng

Commits

rGe14df4b23659: [AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit…
rL282624: [AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit…

Diff Detail

Event Timeline

kzhuravl updated this revision to Diff 69978.Sep 1 2016, 3:46 AM

kzhuravl retitled this revision from to [AMDGPU] Promote uniform i16 ops to i32 ops.

kzhuravl updated this object.

kzhuravl added reviewers: • tstellarAMD, arsenm, wdng, Restricted Project.

Herald added subscribers: wdng, arsenm. · View Herald TranscriptSep 1 2016, 3:46 AM

kzhuravl added a subscriber: llvm-commits.Sep 1 2016, 3:49 AM

Needs a dedicated test which just runs the pass with opt for all of the operations

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
179	You can't use zext to promote any operation, you must use sext for the signed operations.
179	This also won't get selects for the min/max pattern
183	This should check the type first since it is cheaper. We also should investigate whether this should be done for smaller types that will be legalized to i16
test/CodeGen/AMDGPU/mul_uint24.ll
28	I'm surprised this test broke from this. These changed cases should then be duplicated into a version with VGPRs to keep checking the pattern this was meant to

Address review feedback

Herald added a subscriber: nhaehnle. · View Herald TranscriptSep 12 2016, 3:47 PM

kzhuravl added a subscriber: Restricted Project.Sep 12 2016, 3:48 PM

• tstellarAMD added inline comments.Sep 13 2016, 9:08 AM

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
51	CmpInst has an isSigned() memeber function that you can use instead.
62	Same thing here.
test/CodeGen/AMDGPU/ctlz.ll
247 ↗	(On Diff #71067)	Did you mean to change this?

There are various integer intrinsics that can also be handled but that can be a later patch

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
83	I don't think you need this assert. It's not like the code will be incorrect if this is too aggressive
105	You shouldn't need this
114	Ditto
134	ditto
197–199	I think all of these should be skipped if the target doesn't have i16 instructions
test/CodeGen/AMDGPU/amdgpu-codegenprepare.ll
234–243 ↗	(On Diff #71067)	I think we should probably split this test and rename the existing one since this is testing something very different from the fdiv handling
241 ↗	(On Diff #71067)	There should be tests ensuring that nsw/nuw are preserved as well (I think that should be correct)
267–276 ↗	(On Diff #71067)	I'm not sure we want division to be promoted. At least right now it's going to end up emitting the same i32 code
340 ↗	(On Diff #71067)	There should be a test that shows exact is preserved
521–525 ↗	(On Diff #71067)	There should also be tests that this happens with vectors

Does the min/max pattern actually get matched for i16 after the promotions are inserted?

In D24125#545266, @arsenm wrote:

There are various integer intrinsics that can also be handled but that can be a later patch

In D24125#545277, @arsenm wrote:

Does the min/max pattern actually get matched for i16 after the promotions are inserted?

Yes

Address review feedback

Herald added subscribers: tony-tye, yaxunl, kzhuravl. · View Herald TranscriptSep 23 2016, 11:16 AM

In D24125#550972, @kzhuravl wrote:

In D24125#545266, @arsenm wrote:

There are various integer intrinsics that can also be handled but that can be a later patch

Ok

In D24125#545277, @arsenm wrote:

Does the min/max pattern actually get matched for i16 after the promotions are inserted?

Yes

Does this patch work for umed3 instruction? Just want to check.

• tstellarAMD added inline comments.Sep 23 2016, 12:37 PM

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
209–215	I think you can always zero extend for select, since you will be discarding the high-bits with the truncate.
test/CodeGen/AMDGPU/mul_uint24.ll
36–39	Was this meant to be the duplicate of the above test? If so, I think it would be better to load %a and %b from a global pointer passed in as a kernel argument to guarantee the operands would be in VGPRS:

In D24125#550988, @wdng wrote:

Does this patch work for umed3 instruction? Just want to check.

Yes

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
209–215	min/max tests fail when always zero extending

Address review feedback

Ping

Just one small comment about the sext/zext for selects. With that change, this LGTM.

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
209–215	Ok, I just noticed Matt's comment below. What happens if you always sign extend? If you get no regressions from that I think that would be better.

kzhuravl added inline comments.Sep 28 2016, 10:23 AM

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
209–215	some min patterns do not get matched, which causes min lit test to fail

• tstellarAMD requested changes to this revision.Sep 28 2016, 10:32 AM

• tstellarAMD edited edge metadata.

• tstellarAMD added inline comments.

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
198	I just noticed that this condition is wrong. We should only be doing the promotion when the target supports 16-bit operations not when it does not support them.

This revision now requires changes to proceed.Sep 28 2016, 10:32 AM

Address review feedback

LGTM.

This revision is now accepted and ready to land.Sep 28 2016, 12:26 PM

kzhuravl retitled this revision from [AMDGPU] Promote uniform i16 ops to i32 ops to [AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit instructions.Sep 28 2016, 1:10 PM

kzhuravl edited edge metadata.

Closed by commit rL282624: [AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit… (authored by kzhuravl). · Explain WhySep 28 2016, 1:14 PM

This revision was automatically updated to reflect the committed changes.

Diff 69978

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	public:
static char ID;		static char ID;
AMDGPUCodeGenPrepare(const TargetMachine *TM = nullptr) :		AMDGPUCodeGenPrepare(const TargetMachine *TM = nullptr) :
FunctionPass(ID),		FunctionPass(ID),
TM(static_cast<const GCNTargetMachine *>(TM)),		TM(static_cast<const GCNTargetMachine *>(TM)),
ST(nullptr),		ST(nullptr),
DA(nullptr),		DA(nullptr),
Mod(nullptr),		Mod(nullptr),
HasUnsafeFPMath(false) { }		HasUnsafeFPMath(false) { }

		tstellarAMDUnsubmitted Done Reply Inline Actions CmpInst has an isSigned() memeber function that you can use instead. tstellarAMD: CmpInst has an isSigned() memeber function that you can use instead.
		/// \brief Promotes uniform 16 bit operation to equivalent 32 bit operation by
		/// zero extending operands to 32 bits, replacing 16 bit operation with
		/// equivalent 32 bit operation, and truncating the result of 32 bit operation
		/// back to 16 bits. Always returns true.
		bool promoteUniformI16OpToI32Op(BinaryOperator &I) const;

bool visitFDiv(BinaryOperator &I);		bool visitFDiv(BinaryOperator &I);

		bool visitBinaryOperator(BinaryOperator &I);

bool visitInstruction(Instruction &I) {		bool visitInstruction(Instruction &I) {
		tstellarAMDUnsubmitted Done Reply Inline Actions Same thing here. tstellarAMD: Same thing here.
return false;		return false;
}		}

bool doInitialization(Module &M) override;		bool doInitialization(Module &M) override;
bool runOnFunction(Function &F) override;		bool runOnFunction(Function &F) override;

const char *getPassName() const override {		const char *getPassName() const override {
return "AMDGPU IR optimizations";		return "AMDGPU IR optimizations";
}		}

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addRequired<DivergenceAnalysis>();		AU.addRequired<DivergenceAnalysis>();
AU.setPreservesAll();		AU.setPreservesAll();
}		}
};		};

} // End anonymous namespace		} // End anonymous namespace

static bool shouldKeepFDivF32(Value *Num, bool UnsafeDiv) {		static bool shouldKeepFDivF32(Value *Num, bool UnsafeDiv) {
const ConstantFP *CNum = dyn_cast<ConstantFP>(Num);		const ConstantFP *CNum = dyn_cast<ConstantFP>(Num);
if (!CNum)		if (!CNum)
		arsenmUnsubmitted Done Reply Inline Actions I don't think you need this assert. It's not like the code will be incorrect if this is too aggressive arsenm: I don't think you need this assert. It's not like the code will be incorrect if this is too…
return false;		return false;

// Reciprocal f32 is handled separately without denormals.		// Reciprocal f32 is handled separately without denormals.
return UnsafeDiv \|\| CNum->isExactlyValue(+1.0);		return UnsafeDiv \|\| CNum->isExactlyValue(+1.0);
}		}

		bool AMDGPUCodeGenPrepare::promoteUniformI16OpToI32Op(BinaryOperator &I) const {
		assert(DA->isUniform(&I) && "Op must be uniform");
		assert(I.getType()->isIntegerTy(16) && "Op must be 16 bits");

		IRBuilder<> Builder(&I);
		Builder.SetCurrentDebugLocation(I.getDebugLoc());

		Value *ZExtOp0 = Builder.CreateZExt(I.getOperand(0), Builder.getInt32Ty());
		Value *ZExtOp1 = Builder.CreateZExt(I.getOperand(1), Builder.getInt32Ty());
		Value *ZExtRes = Builder.CreateBinOp(I.getOpcode(), ZExtOp0, ZExtOp1);
		Value *TruncRes = Builder.CreateTrunc(ZExtRes, Builder.getInt16Ty());

		I.replaceAllUsesWith(TruncRes);
		I.dropAllReferences();
		I.eraseFromParent();

		arsenmUnsubmitted Done Reply Inline Actions You shouldn't need this arsenm: You shouldn't need this
		return true;
		}

// Insert an intrinsic for fast fdiv for safe math situations where we can		// Insert an intrinsic for fast fdiv for safe math situations where we can
// reduce precision. Leave fdiv for situations where the generic node is		// reduce precision. Leave fdiv for situations where the generic node is
// expected to be optimized.		// expected to be optimized.
bool AMDGPUCodeGenPrepare::visitFDiv(BinaryOperator &FDiv) {		bool AMDGPUCodeGenPrepare::visitFDiv(BinaryOperator &FDiv) {
Type *Ty = FDiv.getType();		Type *Ty = FDiv.getType();

		arsenmUnsubmitted Done Reply Inline Actions Ditto arsenm: Ditto
// TODO: Handle half		// TODO: Handle half
if (!Ty->getScalarType()->isFloatTy())		if (!Ty->getScalarType()->isFloatTy())
return false;		return false;

MDNode *FPMath = FDiv.getMetadata(LLVMContext::MD_fpmath);		MDNode *FPMath = FDiv.getMetadata(LLVMContext::MD_fpmath);
if (!FPMath)		if (!FPMath)
return false;		return false;

const FPMathOperator *FPOp = cast<const FPMathOperator>(&FDiv);		const FPMathOperator *FPOp = cast<const FPMathOperator>(&FDiv);
float ULP = FPOp->getFPAccuracy();		float ULP = FPOp->getFPAccuracy();
if (ULP < 2.5f)		if (ULP < 2.5f)
return false;		return false;

FastMathFlags FMF = FPOp->getFastMathFlags();		FastMathFlags FMF = FPOp->getFastMathFlags();
bool UnsafeDiv = HasUnsafeFPMath \|\| FMF.unsafeAlgebra() \|\|		bool UnsafeDiv = HasUnsafeFPMath \|\| FMF.unsafeAlgebra() \|\|
FMF.allowReciprocal();		FMF.allowReciprocal();
if (ST->hasFP32Denormals() && !UnsafeDiv)		if (ST->hasFP32Denormals() && !UnsafeDiv)
return false;		return false;

IRBuilder<> Builder(FDiv.getParent(), std::next(FDiv.getIterator()), FPMath);		IRBuilder<> Builder(FDiv.getParent(), std::next(FDiv.getIterator()), FPMath);
		arsenmUnsubmitted Done Reply Inline Actions ditto arsenm: ditto
Builder.setFastMathFlags(FMF);		Builder.setFastMathFlags(FMF);
Builder.SetCurrentDebugLocation(FDiv.getDebugLoc());		Builder.SetCurrentDebugLocation(FDiv.getDebugLoc());

const AMDGPUIntrinsicInfo *II = TM->getIntrinsicInfo();		const AMDGPUIntrinsicInfo *II = TM->getIntrinsicInfo();
Function *Decl		Function *Decl
= II->getDeclaration(Mod, AMDGPUIntrinsic::amdgcn_fdiv_fast, {});		= II->getDeclaration(Mod, AMDGPUIntrinsic::amdgcn_fdiv_fast, {});

Value *Num = FDiv.getOperand(0);		Value *Num = FDiv.getOperand(0);
Show All 28 Lines	if (NewFDiv) {
FDiv.replaceAllUsesWith(NewFDiv);		FDiv.replaceAllUsesWith(NewFDiv);
NewFDiv->takeName(&FDiv);		NewFDiv->takeName(&FDiv);
FDiv.eraseFromParent();		FDiv.eraseFromParent();
}		}

return true;		return true;
}		}

		bool AMDGPUCodeGenPrepare::visitBinaryOperator(BinaryOperator &I) {
		arsenmUnsubmitted Done Reply Inline Actions You can't use zext to promote any operation, you must use sext for the signed operations. arsenm: You can't use zext to promote any operation, you must use sext for the signed operations.
		arsenmUnsubmitted Done Reply Inline Actions This also won't get selects for the min/max pattern arsenm: This also won't get selects for the min/max pattern
		bool Changed = false;

		// Promote uniform 16 bit operation to equivalent 32 bit operation.
		if (DA->isUniform(&I) && I.getType()->isIntegerTy(16))
		arsenmUnsubmitted Done Reply Inline Actions This should check the type first since it is cheaper. We also should investigate whether this should be done for smaller types that will be legalized to i16 arsenm: This should check the type first since it is cheaper. We also should investigate whether this…
		Changed \|= promoteUniformI16OpToI32Op(I);

		return Changed;
		}

static bool hasUnsafeFPMath(const Function &F) {		static bool hasUnsafeFPMath(const Function &F) {
Attribute Attr = F.getFnAttribute("unsafe-fp-math");		Attribute Attr = F.getFnAttribute("unsafe-fp-math");
return Attr.getValueAsString() == "true";		return Attr.getValueAsString() == "true";
}		}

bool AMDGPUCodeGenPrepare::doInitialization(Module &M) {		bool AMDGPUCodeGenPrepare::doInitialization(Module &M) {
Mod = &M;		Mod = &M;
return false;		return false;
}		}

		tstellarAMDUnsubmitted Done Reply Inline Actions I just noticed that this condition is wrong. We should only be doing the promotion when the target supports 16-bit operations not when it does not support them. tstellarAMD: I just noticed that this condition is wrong. We should only be doing the promotion when the…
bool AMDGPUCodeGenPrepare::runOnFunction(Function &F) {		bool AMDGPUCodeGenPrepare::runOnFunction(Function &F) {
		arsenmUnsubmitted Done Reply Inline Actions I think all of these should be skipped if the target doesn't have i16 instructions arsenm: I think all of these should be skipped if the target doesn't have i16 instructions
if (!TM \|\| skipFunction(F))		if (!TM \|\| skipFunction(F))
return false;		return false;

ST = &TM->getSubtarget<SISubtarget>(F);		ST = &TM->getSubtarget<SISubtarget>(F);
DA = &getAnalysis<DivergenceAnalysis>();		DA = &getAnalysis<DivergenceAnalysis>();
HasUnsafeFPMath = hasUnsafeFPMath(F);		HasUnsafeFPMath = hasUnsafeFPMath(F);

bool MadeChange = false;		bool MadeChange = false;

for (BasicBlock &BB : F) {		for (BasicBlock &BB : F) {
BasicBlock::iterator Next;		BasicBlock::iterator Next;
for (BasicBlock::iterator I = BB.begin(), E = BB.end(); I != E; I = Next) {		for (BasicBlock::iterator I = BB.begin(), E = BB.end(); I != E; I = Next) {
Next = std::next(I);		Next = std::next(I);
MadeChange \|= visit(*I);		MadeChange \|= visit(*I);
}		}
}		}
		tstellarAMDUnsubmitted Not Done Reply Inline Actions I think you can always zero extend for select, since you will be discarding the high-bits with the truncate. tstellarAMD: I think you can always zero extend for select, since you will be discarding the high-bits with…
		kzhuravlAuthorUnsubmitted Not Done Reply Inline Actions min/max tests fail when always zero extending kzhuravl: min/max tests fail when always zero extending
		tstellarAMDUnsubmitted Not Done Reply Inline Actions Ok, I just noticed Matt's comment below. What happens if you always sign extend? If you get no regressions from that I think that would be better. tstellarAMD: Ok, I just noticed Matt's comment below. What happens if you always sign extend? If you get…
		kzhuravlAuthorUnsubmitted Not Done Reply Inline Actions some min patterns do not get matched, which causes min lit test to fail kzhuravl: some min patterns do not get matched, which causes min lit test to fail

return MadeChange;		return MadeChange;
}		}

INITIALIZE_TM_PASS_BEGIN(AMDGPUCodeGenPrepare, DEBUG_TYPE,		INITIALIZE_TM_PASS_BEGIN(AMDGPUCodeGenPrepare, DEBUG_TYPE,
"AMDGPU IR optimizations", false, false)		"AMDGPU IR optimizations", false, false)
INITIALIZE_PASS_DEPENDENCY(DivergenceAnalysis)		INITIALIZE_PASS_DEPENDENCY(DivergenceAnalysis)
INITIALIZE_TM_PASS_END(AMDGPUCodeGenPrepare, DEBUG_TYPE,		INITIALIZE_TM_PASS_END(AMDGPUCodeGenPrepare, DEBUG_TYPE,
"AMDGPU IR optimizations", false, false)		"AMDGPU IR optimizations", false, false)

char AMDGPUCodeGenPrepare::ID = 0;		char AMDGPUCodeGenPrepare::ID = 0;

FunctionPass llvm::createAMDGPUCodeGenPreparePass(const GCNTargetMachine TM) {		FunctionPass llvm::createAMDGPUCodeGenPreparePass(const GCNTargetMachine TM) {
return new AMDGPUCodeGenPrepare(TM);		return new AMDGPUCodeGenPrepare(TM);
}		}

lib/Target/AMDGPU/SIISelLowering.cpp

	Show First 20 Lines • Show All 534 Lines • ▼ Show 20 Lines
	bool SITargetLowering::shouldConvertConstantLoadToIntImm(const APInt &Imm,			bool SITargetLowering::shouldConvertConstantLoadToIntImm(const APInt &Imm,
	Type *Ty) const {			Type *Ty) const {
	// FIXME: Could be smarter if called for vector constants.			// FIXME: Could be smarter if called for vector constants.
	return true;			return true;
	}			}

	bool SITargetLowering::isTypeDesirableForOp(unsigned Op, EVT VT) const {			bool SITargetLowering::isTypeDesirableForOp(unsigned Op, EVT VT) const {

				// i16 is not desirable unless it is a load or a store.
				if (VT == MVT::i16 && Op != ISD::LOAD && Op != ISD::STORE)
				return false;

	// SimplifySetCC uses this function to determine whether or not it should			// SimplifySetCC uses this function to determine whether or not it should
	// create setcc with i1 operands. We don't have instructions for i1 setcc.			// create setcc with i1 operands. We don't have instructions for i1 setcc.
	if (VT == MVT::i1 && Op == ISD::SETCC)			if (VT == MVT::i1 && Op == ISD::SETCC)
	return false;			return false;

	return TargetLowering::isTypeDesirableForOp(Op, VT);			return TargetLowering::isTypeDesirableForOp(Op, VT);
	}			}

	▲ Show 20 Lines • Show All 3,262 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/mul_uint24.ll

	Show All 17 Lines
	}			}

	; FUNC-LABEL: {{^}}test_umul24_i16_sext:			; FUNC-LABEL: {{^}}test_umul24_i16_sext:
	; EG: MUL_UINT24 {{[* ]*}}T{{[0-9]}}.[[MUL_CHAN:[XYZW]]]			; EG: MUL_UINT24 {{[* ]*}}T{{[0-9]}}.[[MUL_CHAN:[XYZW]]]
	; The result must be sign-extended			; The result must be sign-extended
	; EG: BFE_INT {{[* ]*}}T{{[0-9]}}.{{[XYZW]}}, PV.[[MUL_CHAN]], 0.0, literal.x			; EG: BFE_INT {{[* ]*}}T{{[0-9]}}.{{[XYZW]}}, PV.[[MUL_CHAN]], 0.0, literal.x
	; EG: 16			; EG: 16

	; SI: v_mul_u32_u24_e{{(32\|64)}} [[MUL:v[0-9]]], {{[sv][0-9], [sv][0-9]}}			; SI: s_mul_i32
	; SI: v_bfe_i32 v{{[0-9]}}, [[MUL]], 0, 16			; SI: s_sext_i32_i16
	define void @test_umul24_i16_sext(i32 addrspace(1)* %out, i16 %a, i16 %b) {			define void @test_umul24_i16_sext(i32 addrspace(1)* %out, i16 %a, i16 %b) {
				arsenmUnsubmitted Done Reply Inline Actions I'm surprised this test broke from this. These changed cases should then be duplicated into a version with VGPRs to keep checking the pattern this was meant to arsenm: I'm surprised this test broke from this. These changed cases should then be duplicated into a…
	entry:			entry:
	%mul = mul i16 %a, %b			%mul = mul i16 %a, %b
	%ext = sext i16 %mul to i32			%ext = sext i16 %mul to i32
	store i32 %ext, i32 addrspace(1)* %out			store i32 %ext, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}test_umul24_i16:			; FUNC-LABEL: {{^}}test_umul24_i16:
				; SI: s_mul_i32
	; SI: s_and_b32			; SI: s_and_b32
	; SI: v_mul_u32_u24_e32			; SI: v_mov_b32_e32
				tstellarAMDUnsubmitted Done Reply Inline Actions Was this meant to be the duplicate of the above test? If so, I think it would be better to load %a and %b from a global pointer passed in as a kernel argument to guarantee the operands would be in VGPRS: tstellarAMD: Was this meant to be the duplicate of the above test? If so, I think it would be better to…
	; SI: v_and_b32_e32
	define void @test_umul24_i16(i32 addrspace(1)* %out, i16 %a, i16 %b) {			define void @test_umul24_i16(i32 addrspace(1)* %out, i16 %a, i16 %b) {
	entry:			entry:
	%mul = mul i16 %a, %b			%mul = mul i16 %a, %b
	%ext = zext i16 %mul to i32			%ext = zext i16 %mul to i32
	store i32 %ext, i32 addrspace(1)* %out			store i32 %ext, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	▲ Show 20 Lines • Show All 150 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/sdivrem24.ll

Show All 16 Lines	define void @sdiv24_i8(i8 addrspace(1)* %out, i8 addrspace(1)* %in) {
%num = load i8, i8 addrspace(1) * %in		%num = load i8, i8 addrspace(1) * %in
%den = load i8, i8 addrspace(1) * %den_ptr		%den = load i8, i8 addrspace(1) * %den_ptr
%result = sdiv i8 %num, %den		%result = sdiv i8 %num, %den
store i8 %result, i8 addrspace(1)* %out		store i8 %result, i8 addrspace(1)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}sdiv24_i16:		; FUNC-LABEL: {{^}}sdiv24_i16:
; SI: v_cvt_f32_i32		; SI: v_cvt_f32_u32_e32
; SI: v_cvt_f32_i32		; SI: v_cvt_f32_u32_e32
; SI: v_rcp_f32		; SI: v_rcp_f32_e32
; SI: v_cvt_i32_f32		; SI: v_cvt_u32_f32_e32

; EG: INT_TO_FLT		; EG: INT_TO_FLT
; EG-DAG: INT_TO_FLT		; EG-DAG: INT_TO_FLT
; EG-DAG: RECIP_IEEE		; EG-DAG: RECIP_IEEE
; EG: FLT_TO_INT		; EG: FLT_TO_INT
define void @sdiv24_i16(i16 addrspace(1)* %out, i16 addrspace(1)* %in) {		define void @sdiv24_i16(i16 addrspace(1)* %out, i16 addrspace(1)* %in) {
%den_ptr = getelementptr i16, i16 addrspace(1)* %in, i16 1		%den_ptr = getelementptr i16, i16 addrspace(1)* %in, i16 1
%num = load i16, i16 addrspace(1) * %in, align 2		%num = load i16, i16 addrspace(1) * %in, align 2
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	define void @srem24_i8(i8 addrspace(1)* %out, i8 addrspace(1)* %in) {
%num = load i8, i8 addrspace(1) * %in		%num = load i8, i8 addrspace(1) * %in
%den = load i8, i8 addrspace(1) * %den_ptr		%den = load i8, i8 addrspace(1) * %den_ptr
%result = srem i8 %num, %den		%result = srem i8 %num, %den
store i8 %result, i8 addrspace(1)* %out		store i8 %result, i8 addrspace(1)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}srem24_i16:		; FUNC-LABEL: {{^}}srem24_i16:
; SI: v_cvt_f32_i32		; SI: v_cvt_f32_u32_e32
; SI: v_cvt_f32_i32		; SI: v_cvt_f32_u32_e32
; SI: v_rcp_f32		; SI: v_rcp_f32_e32
; SI: v_cvt_i32_f32		; SI: v_cvt_u32_f32_e32

; EG: INT_TO_FLT		; EG: INT_TO_FLT
; EG-DAG: INT_TO_FLT		; EG-DAG: INT_TO_FLT
; EG-DAG: RECIP_IEEE		; EG-DAG: RECIP_IEEE
; EG: FLT_TO_INT		; EG: FLT_TO_INT
define void @srem24_i16(i16 addrspace(1)* %out, i16 addrspace(1)* %in) {		define void @srem24_i16(i16 addrspace(1)* %out, i16 addrspace(1)* %in) {
%den_ptr = getelementptr i16, i16 addrspace(1)* %in, i16 1		%den_ptr = getelementptr i16, i16 addrspace(1)* %in, i16 1
%num = load i16, i16 addrspace(1) * %in, align 2		%num = load i16, i16 addrspace(1) * %in, align 2
▲ Show 20 Lines • Show All 183 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit instructions
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 69978

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp

lib/Target/AMDGPU/SIISelLowering.cpp

test/CodeGen/AMDGPU/mul_uint24.ll

test/CodeGen/AMDGPU/sdivrem24.ll

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit instructionsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 69978

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp

lib/Target/AMDGPU/SIISelLowering.cpp

test/CodeGen/AMDGPU/mul_uint24.ll

test/CodeGen/AMDGPU/sdivrem24.ll

[AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit instructions
ClosedPublic