Download Raw Diff

Details

Reviewers

arsenm
thegameg
cfang
foad

Commits

rG2e5dc4a1efe1: [AMDGPU] [CodeGen] Fold negate llvm.amdgcn.class into test mask

Summary

Implemented the following fold-negate transformation in the very beginning of AMDGPUCodegenPrepare.cpp:

xor (llvm.amdgcn.class x, mask), -1 --> llvm.amdgcn.class(x, ~mask)

Added regression tests

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	210 ms	x64 debian > LLVM.CodeGen/AMDGPU::amdgpu-codegenprepare-i16-to-i32.ll
	500 ms	x64 debian > LLVM.CodeGen/AMDGPU/GlobalISel::xnor.ll

Event Timeline

gandhi21299 created this revision.Jun 10 2021, 11:01 AM

Herald added subscribers: foad, kerbowa, hiraditya and 7 others. · View Herald TranscriptJun 10 2021, 11:01 AM

gandhi21299 requested review of this revision.Jun 10 2021, 11:01 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 10 2021, 11:01 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

Harbormaster completed remote builds in B108654: Diff 351217.Jun 10 2021, 11:58 AM

arsenm added inline comments.Jun 10 2021, 2:27 PM

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
1393	This should not be a separate pass over the function. This should be a visitXor function
1444	You don't need to create a constant to check the value
1454	Should check hasOneUse
1458–1469	I don't know why you are looking at all of these extensions. The xor should directly consume the call
1473–1475	You can just dyn_cast<IntrinsicInst>
1480	Since you know it's a constant, you can also 0 the irrelevant high bits
1485	You don't need to collect dead instructions, the xor should always be dead
llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-foldnegate.ll
19	Need a negative test with a variable mask. Also a negative test for multiple uses. Plus also could use tests for all of the FP types
27	You shouldn't be trying to look through this zext, this IR should have been optimized to xor on i1
37–39	This didn't do anything

arsenm requested changes to this revision.Jun 10 2021, 2:27 PM

This revision now requires changes to proceed.Jun 10 2021, 2:27 PM

gandhi21299 marked 6 inline comments as done.Jun 11 2021, 9:09 AM

gandhi21299 added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
1458–1469	I am considering the case where the result from the class intrinsic is extended/truncated for some reason. Is it possible to get the IntrinsicInst directly from the xor operand?

gandhi21299 marked 3 inline comments as done.Jun 11 2021, 12:46 PM

gandhi21299 added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
1480	Do I use IRBuilder to do this or is there a simpler way?

gandhi21299 marked 2 inline comments as done.Jun 11 2021, 1:21 PM

changes requested by @arsenm

Harbormaster completed remote builds in B108891: Diff 351554.Jun 11 2021, 2:10 PM

check if one of the operands of the xor instruction is a ConstantInt, fixes two of the tests

Harbormaster completed remote builds in B109045: Diff 351770.Jun 13 2021, 10:26 PM

arsenm added inline comments.Jun 15 2021, 10:32 AM

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
21	Definitely should not include this here
30	Shouldn't need this
819–827	I believe PatternMatch has a nicer way to check for a not
835	I'd prefer to just use the dyn_cast below and return on that rather than checking isa first
842	You don't need this intermediate APInt (plus this is required to be an i32)
llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-foldnegate.ll
55	It would be a better test to have a non-identical use (i.e. just add a store of the value)

arsenm added inline comments.Jun 15 2021, 10:46 AM

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
819–827	Plus constants are canonicalized to the RHS, so you don't need to check both
828–829	Invert condition and exit early to reduce indentation

arsenm added inline comments.Jun 15 2021, 11:08 AM

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
845	Arg->getType()

changes requested by @arsenm

changes as requested by Matt

gandhi21299 marked an inline comment as done.Jun 15 2021, 1:09 PM

gandhi21299 added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
819–827	After removing the else-if condition, this seems clean enough to me. Perhaps, PatternMatch won't be required any more.

Harbormaster completed remote builds in B109351: Diff 352206.Jun 15 2021, 1:39 PM

refreshing diff, only 2 tests should be failing instead of 4

arsenm added inline comments.Jun 15 2021, 5:41 PM

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
817	No else after return
821–822	Using m_Not would still be clearer
821–823	Using dyn_cast and isa is redundant, just use dyn_cast and check the result
841	There are fewer bits in the test mask than this
llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-foldnegate.ll
2	You didn't actually generate these checks. The operand tests are mostly missing

Harbormaster completed remote builds in B109389: Diff 352262.Jun 15 2021, 7:23 PM

Please run all of check-llvm-codegen-amdgpu. I tried your patch and it looks like a couple more tests need updating.

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
17–39	I don't think you need any of these changes to the #includes.
815	This check is wrong. It's OK for there to be multiple uses of the Xor, since you replace all of them. It is not OK for there to be other uses of the intrinsic call, since you modify it in-place.

requested changes
@foad two tests fail after running ninja check-llvm-codegen-amdgpu

gandhi21299 added inline comments.Jun 16 2021, 2:50 PM

llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-foldnegate.ll
2	Can you elaborate on the kind of tests we need?

Harbormaster completed remote builds in B109604: Diff 352563.Jun 17 2021, 12:44 AM

foad added inline comments.Jun 17 2021, 1:38 AM

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
23	Don't need this because you don't use the IRBuilder for anything
25	Don't need this unless you actually do some pattern matching
28	Don't need this, it's already included by default
826	Don't need this, you don't use it for anything
830–832	This is wrong. If the value was 4 you will xor it with 7 giving 3, but you need to flip all the bits that amdgcn_class cares about, i.e. 10 low order bits. You should either xor with a fixed value of 0x3ff, or perhaps move this enum from AMDGPUInstCombineIntrinsic.cpp to a common header (maybe SIDefines.h?) and add an "ALL" value to it: 260: case Intrinsic::amdgcn_class: { 261- enum { 262- S_NAN = 1 << 0, // Signaling NaN 263- Q_NAN = 1 << 1, // Quiet NaN 264- N_INFINITY = 1 << 2, // Negative infinity 265- N_NORMAL = 1 << 3, // Negative normal 266- N_SUBNORMAL = 1 << 4, // Negative subnormal 267- N_ZERO = 1 << 5, // Negative zero 268- P_ZERO = 1 << 6, // Positive zero 269- P_SUBNORMAL = 1 << 7, // Positive subnormal 270- P_NORMAL = 1 << 8, // Positive normal 271- P_INFINITY = 1 << 9 // Positive infinity 272- };

gandhi21299 marked 4 inline comments as done.Jun 17 2021, 9:10 AM

gandhi21299 added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
23	Turns out, it is used in other places. This pass does not compile without including this header.
830–832	Flipping only the low 10 bits fails several other tests, thoughts? @arsenm @foad

foad added inline comments.Jun 17 2021, 9:13 AM

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
830–832	You need to look at why they fail. Probably your patch changes (improves) the generated code, so the tests need to be updated to expect the improved code sequence.

get lower 10 bits of the test mask by xor-ing with 0x3ff
updated tests accordingly
refactoring
2 tests still fail from the AMDGPU codegen test suite

I am not too sure what is causing the test CodeGen/AMDGPU/amdgpu-codegenprepare-i16-to-i32.ll to fail. There is no amdgcn class intrinsic being used anywhere in this test case so there should not be any transformation happening. @arsenm

Harbormaster completed remote builds in B109782: Diff 352811.Jun 18 2021, 2:39 AM

I am not too sure what is causing the test CodeGen/AMDGPU/amdgpu-codegenprepare-i16-to-i32.ll to fail. There is no amdgcn class intrinsic being used anywhere in this test case so there should not be any transformation happening.

Your visitXor function overrides the handling of 16-bit xors which were previously handled by visitBinaryOperator.

For the cases you can't handle, instead of return false you need something like return visitBinaryOperator(I).

now passes all tests, thanks @foad for the fix

Looks OK to me, but please wait a day in case other reviewers still have comments.

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
816	All ConstantInts have IntegerType so you don't need to check that.

eliminated RHS IntegerTy check since it is already declared as a ConstantInt

arsenm added inline comments.Jun 18 2021, 11:02 AM

llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-foldnegate.ll
47	Should check the operands
61	Should check the operands

added checks on operands for the class intrinsic

arsenm accepted this revision.Jun 18 2021, 11:49 AM

This revision is now accepted and ready to land.Jun 18 2021, 11:49 AM

Thanks for the review process, I will merge this patch in

This revision was landed with ongoing or failed builds.Jun 18 2021, 12:04 PM

Closed by commit rG2e5dc4a1efe1: [AMDGPU] [CodeGen] Fold negate llvm.amdgcn.class into test mask (authored by gandhi21299). · Explain Why

This revision was automatically updated to reflect the committed changes.

gandhi21299 added a commit: rG2e5dc4a1efe1: [AMDGPU] [CodeGen] Fold negate llvm.amdgcn.class into test mask.

Harbormaster completed remote builds in B109976: Diff 353065.Jun 19 2021, 8:12 AM

Diff 352811

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp

//===-- AMDGPUCodeGenPrepare.cpp ------------------------------------------===//		//===-- AMDGPUCodeGenPrepare.cpp ------------------------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
/// \file		/// \file
/// This pass does misc. AMDGPU optimizations on IR before instruction		/// This pass does misc. AMDGPU optimizations on IR before instruction
/// selection.		/// selection.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AMDGPU.h"		#include "AMDGPU.h"
#include "AMDGPUTargetMachine.h"		#include "AMDGPUTargetMachine.h"
#include "llvm/Analysis/AssumptionCache.h"		#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/ConstantFolding.h"		#include "llvm/Analysis/ConstantFolding.h"
#include "llvm/Analysis/LegacyDivergenceAnalysis.h"		#include "llvm/Analysis/LegacyDivergenceAnalysis.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/CodeGen/TargetPassConfig.h"		#include "llvm/CodeGen/TargetPassConfig.h"
		arsenmUnsubmitted Done Reply Inline Actions Definitely should not include this here arsenm: Definitely should not include this here
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/InstVisitor.h"		#include "llvm/IR/InstVisitor.h"
		foadUnsubmitted Done Reply Inline Actions Don't need this because you don't use the IRBuilder for anything foad: Don't need this because you don't use the IRBuilder for anything
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions Turns out, it is used in other places. This pass does not compile without including this header. gandhi21299: Turns out, it is used in other places. This pass does not compile without including this header.
#include "llvm/IR/IntrinsicsAMDGPU.h"		#include "llvm/IR/IntrinsicsAMDGPU.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
		foadUnsubmitted Done Reply Inline Actions Don't need this unless you actually do some pattern matching foad: Don't need this unless you actually do some pattern matching
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/KnownBits.h"		#include "llvm/Support/KnownBits.h"
		foadUnsubmitted Done Reply Inline Actions Don't need this, it's already included by default foad: Don't need this, it's already included by default
#include "llvm/Transforms/Utils/IntegerDivision.h"		#include "llvm/Transforms/Utils/IntegerDivision.h"

		arsenmUnsubmitted Done Reply Inline Actions Shouldn't need this arsenm: Shouldn't need this
#define DEBUG_TYPE "amdgpu-codegenprepare"		#define DEBUG_TYPE "amdgpu-codegenprepare"

using namespace llvm;		using namespace llvm;

namespace {		namespace {

static cl::opt<bool> WidenLoads(		static cl::opt<bool> WidenLoads(
"amdgpu-codegenprepare-widen-constant-loads",		"amdgpu-codegenprepare-widen-constant-loads",
cl::desc("Widen sub-dword constant address space loads in AMDGPUCodeGenPrepare"),		cl::desc("Widen sub-dword constant address space loads in AMDGPUCodeGenPrepare"),
		foadUnsubmitted Done Reply Inline Actions I don't think you need any of these changes to the #includes. foad: I don't think you need //any// of these changes to the #includes.
cl::ReallyHidden,		cl::ReallyHidden,
cl::init(false));		cl::init(false));

static cl::opt<bool> Widen16BitOps(		static cl::opt<bool> Widen16BitOps(
"amdgpu-codegenprepare-widen-16-bit-ops",		"amdgpu-codegenprepare-widen-16-bit-ops",
cl::desc("Widen uniform 16-bit instructions to 32-bit in AMDGPUCodeGenPrepare"),		cl::desc("Widen uniform 16-bit instructions to 32-bit in AMDGPUCodeGenPrepare"),
cl::ReallyHidden,		cl::ReallyHidden,
cl::init(true));		cl::init(true));
▲ Show 20 Lines • Show All 148 Lines • ▼ Show 20 Lines	class AMDGPUCodeGenPrepare : public FunctionPass,
bool canWidenScalarExtLoad(LoadInst &I) const;		bool canWidenScalarExtLoad(LoadInst &I) const;

public:		public:
static char ID;		static char ID;

AMDGPUCodeGenPrepare() : FunctionPass(ID) {}		AMDGPUCodeGenPrepare() : FunctionPass(ID) {}

bool visitFDiv(BinaryOperator &I);		bool visitFDiv(BinaryOperator &I);
		bool visitXor(BinaryOperator &I);

bool visitInstruction(Instruction &I) { return false; }		bool visitInstruction(Instruction &I) { return false; }
bool visitBinaryOperator(BinaryOperator &I);		bool visitBinaryOperator(BinaryOperator &I);
bool visitLoadInst(LoadInst &I);		bool visitLoadInst(LoadInst &I);
bool visitICmpInst(ICmpInst &I);		bool visitICmpInst(ICmpInst &I);
bool visitSelectInst(SelectInst &I);		bool visitSelectInst(SelectInst &I);

bool visitIntrinsicInst(IntrinsicInst &I);		bool visitIntrinsicInst(IntrinsicInst &I);
▲ Show 20 Lines • Show All 591 Lines • ▼ Show 20 Lines	if (NewFDiv) {
FDiv.replaceAllUsesWith(NewFDiv);		FDiv.replaceAllUsesWith(NewFDiv);
NewFDiv->takeName(&FDiv);		NewFDiv->takeName(&FDiv);
FDiv.eraseFromParent();		FDiv.eraseFromParent();
}		}

return !!NewFDiv;		return !!NewFDiv;
}		}

		bool AMDGPUCodeGenPrepare::visitXor(BinaryOperator &I) {
		// Match the Xor instruction, its type and its operands
		IntrinsicInst *IntrinsicCall = dyn_cast<IntrinsicInst>(I.getOperand(0));
		ConstantInt *RHS = dyn_cast<ConstantInt>(I.getOperand(1));
		foadUnsubmitted Done Reply Inline Actions This check is wrong. It's OK for there to be multiple uses of the Xor, since you replace all of them. It is not OK for there to be other uses of the intrinsic call, since you modify it in-place. foad: This check is wrong. It's OK for there to be multiple uses of the Xor, since you replace all of…
		if (!RHS \|\| !IntrinsicCall \|\| !RHS->getType()->isIntegerTy() \|\|
		foadUnsubmitted Done Reply Inline Actions All ConstantInts have IntegerType so you don't need to check that. foad: All ConstantInts have IntegerType so you don't need to check that.
		RHS->getSExtValue() != -1)
		arsenmUnsubmitted Done Reply Inline Actions No else after return arsenm: No else after return
		return false;

		// Check if the Call is an intrinsic intruction to amdgcn_class intrinsic
		// has only one use
		if (IntrinsicCall->getIntrinsicID() != Intrinsic::amdgcn_class \|\|
		arsenmUnsubmitted Done Reply Inline Actions Using m_Not would still be clearer arsenm: Using m_Not would still be clearer
		!IntrinsicCall->hasOneUse())
		arsenmUnsubmitted Done Reply Inline Actions Using dyn_cast and isa is redundant, just use dyn_cast and check the result arsenm: Using dyn_cast and isa is redundant, just use dyn_cast and check the result
		return false;

		// "Not" the second argument of the intrinsic call
		foadUnsubmitted Done Reply Inline Actions Don't need this, you don't use it for anything foad: Don't need this, you don't use it for anything
		ConstantInt *Arg = dyn_cast<ConstantInt>(IntrinsicCall->getOperand(1));
		arsenmUnsubmitted Done Reply Inline Actions I believe PatternMatch has a nicer way to check for a not arsenm: I believe PatternMatch has a nicer way to check for a not
		arsenmUnsubmitted Done Reply Inline Actions Plus constants are canonicalized to the RHS, so you don't need to check both arsenm: Plus constants are canonicalized to the RHS, so you don't need to check both
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions After removing the else-if condition, this seems clean enough to me. Perhaps, PatternMatch won't be required any more. gandhi21299: After removing the else-if condition, this seems clean enough to me. Perhaps, PatternMatch…
		if (!Arg)
		return false;
		arsenmUnsubmitted Done Reply Inline Actions Invert condition and exit early to reduce indentation arsenm: Invert condition and exit early to reduce indentation

		IntrinsicCall->setOperand(
		1, ConstantInt::get(Arg->getType(), Arg->getZExtValue() ^ 0x3ff));
		foadUnsubmitted Done Reply Inline Actions This is wrong. If the value was 4 you will xor it with 7 giving 3, but you need to flip all the bits that amdgcn_class cares about, i.e. 10 low order bits. You should either xor with a fixed value of 0x3ff, or perhaps move this enum from AMDGPUInstCombineIntrinsic.cpp to a common header (maybe SIDefines.h?) and add an "ALL" value to it: 260: case Intrinsic::amdgcn_class: { 261- enum { 262- S_NAN = 1 << 0, // Signaling NaN 263- Q_NAN = 1 << 1, // Quiet NaN 264- N_INFINITY = 1 << 2, // Negative infinity 265- N_NORMAL = 1 << 3, // Negative normal 266- N_SUBNORMAL = 1 << 4, // Negative subnormal 267- N_ZERO = 1 << 5, // Negative zero 268- P_ZERO = 1 << 6, // Positive zero 269- P_SUBNORMAL = 1 << 7, // Positive subnormal 270- P_NORMAL = 1 << 8, // Positive normal 271- P_INFINITY = 1 << 9 // Positive infinity 272- }; foad: This is wrong. If the value was 4 you will xor it with 7 giving 3, but you need to flip all the…
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions Flipping only the low 10 bits fails several other tests, thoughts? @arsenm @foad gandhi21299: Flipping only the low 10 bits fails several other tests, thoughts? @arsenm @foad
		foadUnsubmitted Done Reply Inline Actions You need to look at why they fail. Probably your patch changes (improves) the generated code, so the tests need to be updated to expect the improved code sequence. foad: You need to look at why they fail. Probably your patch changes (improves) the generated code…
		I.replaceAllUsesWith(IntrinsicCall);
		I.eraseFromParent();
		return true;
		arsenmUnsubmitted Done Reply Inline Actions I'd prefer to just use the dyn_cast below and return on that rather than checking isa first arsenm: I'd prefer to just use the dyn_cast below and return on that rather than checking isa first
		}

static bool hasUnsafeFPMath(const Function &F) {		static bool hasUnsafeFPMath(const Function &F) {
Attribute Attr = F.getFnAttribute("unsafe-fp-math");		Attribute Attr = F.getFnAttribute("unsafe-fp-math");
return Attr.getValueAsBool();		return Attr.getValueAsBool();
}		}
		arsenmUnsubmitted Done Reply Inline Actions There are fewer bits in the test mask than this arsenm: There are fewer bits in the test mask than this

		arsenmUnsubmitted Done Reply Inline Actions You don't need this intermediate APInt (plus this is required to be an i32) arsenm: You don't need this intermediate APInt (plus this is required to be an i32)
static std::pair<Value, Value> getMul64(IRBuilder<> &Builder,		static std::pair<Value, Value> getMul64(IRBuilder<> &Builder,
Value LHS, Value RHS) {		Value LHS, Value RHS) {
Type *I32Ty = Builder.getInt32Ty();		Type *I32Ty = Builder.getInt32Ty();
		arsenmUnsubmitted Done Reply Inline Actions Arg->getType() arsenm: Arg->getType()
Type *I64Ty = Builder.getInt64Ty();		Type *I64Ty = Builder.getInt64Ty();

Value *LHS_EXT64 = Builder.CreateZExt(LHS, I64Ty);		Value *LHS_EXT64 = Builder.CreateZExt(LHS, I64Ty);
Value *RHS_EXT64 = Builder.CreateZExt(RHS, I64Ty);		Value *RHS_EXT64 = Builder.CreateZExt(RHS, I64Ty);
Value *MUL64 = Builder.CreateMul(LHS_EXT64, RHS_EXT64);		Value *MUL64 = Builder.CreateMul(LHS_EXT64, RHS_EXT64);
Value *Lo = Builder.CreateTrunc(MUL64, I32Ty);		Value *Lo = Builder.CreateTrunc(MUL64, I32Ty);
Value *Hi = Builder.CreateLShr(MUL64, Builder.getInt64(32));		Value *Hi = Builder.CreateLShr(MUL64, Builder.getInt64(32));
Hi = Builder.CreateTrunc(Hi, I32Ty);		Hi = Builder.CreateTrunc(Hi, I32Ty);
▲ Show 20 Lines • Show All 531 Lines • ▼ Show 20 Lines	bool AMDGPUCodeGenPrepare::doInitialization(Module &M) {
DL = &Mod->getDataLayout();		DL = &Mod->getDataLayout();
return false;		return false;
}		}

bool AMDGPUCodeGenPrepare::runOnFunction(Function &F) {		bool AMDGPUCodeGenPrepare::runOnFunction(Function &F) {
if (skipFunction(F))		if (skipFunction(F))
return false;		return false;

auto *TPC = getAnalysisIfAvailable<TargetPassConfig>();		auto *TPC = getAnalysisIfAvailable<TargetPassConfig>();
		arsenmUnsubmitted Done Reply Inline Actions This should not be a separate pass over the function. This should be a visitXor function arsenm: This should not be a separate pass over the function. This should be a visitXor function
if (!TPC)		if (!TPC)
return false;		return false;

const AMDGPUTargetMachine &TM = TPC->getTM<AMDGPUTargetMachine>();		const AMDGPUTargetMachine &TM = TPC->getTM<AMDGPUTargetMachine>();
ST = &TM.getSubtarget<GCNSubtarget>(F);		ST = &TM.getSubtarget<GCNSubtarget>(F);
AC = &getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);		AC = &getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);
DA = &getAnalysis<LegacyDivergenceAnalysis>();		DA = &getAnalysis<LegacyDivergenceAnalysis>();

Show All 34 Lines

INITIALIZE_PASS_BEGIN(AMDGPUCodeGenPrepare, DEBUG_TYPE,		INITIALIZE_PASS_BEGIN(AMDGPUCodeGenPrepare, DEBUG_TYPE,
"AMDGPU IR optimizations", false, false)		"AMDGPU IR optimizations", false, false)
INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)		INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
INITIALIZE_PASS_DEPENDENCY(LegacyDivergenceAnalysis)		INITIALIZE_PASS_DEPENDENCY(LegacyDivergenceAnalysis)
INITIALIZE_PASS_END(AMDGPUCodeGenPrepare, DEBUG_TYPE, "AMDGPU IR optimizations",		INITIALIZE_PASS_END(AMDGPUCodeGenPrepare, DEBUG_TYPE, "AMDGPU IR optimizations",
false, false)		false, false)

char AMDGPUCodeGenPrepare::ID = 0;		char AMDGPUCodeGenPrepare::ID = 0;
		arsenmUnsubmitted Done Reply Inline Actions You don't need to create a constant to check the value arsenm: You don't need to create a constant to check the value

FunctionPass *llvm::createAMDGPUCodeGenPreparePass() {		FunctionPass *llvm::createAMDGPUCodeGenPreparePass() {
return new AMDGPUCodeGenPrepare();		return new AMDGPUCodeGenPrepare();
}		}
		arsenmUnsubmitted Done Reply Inline Actions I don't know why you are looking at all of these extensions. The xor should directly consume the call arsenm: I don't know why you are looking at all of these extensions. The xor should directly consume…
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions I am considering the case where the result from the class intrinsic is extended/truncated for some reason. Is it possible to get the IntrinsicInst directly from the xor operand? gandhi21299: I am considering the case where the result from the class intrinsic is extended/truncated for…
		arsenmUnsubmitted Done Reply Inline Actions You can just dyn_cast<IntrinsicInst> arsenm: You can just dyn_cast<IntrinsicInst>
		arsenmUnsubmitted Done Reply Inline Actions You don't need to collect dead instructions, the xor should always be dead arsenm: You don't need to collect dead instructions, the xor should always be dead
		arsenmUnsubmitted Done Reply Inline Actions Should check hasOneUse arsenm: Should check hasOneUse
		arsenmUnsubmitted Done Reply Inline Actions Since you know it's a constant, you can also 0 the irrelevant high bits arsenm: Since you know it's a constant, you can also 0 the irrelevant high bits
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions Do I use IRBuilder to do this or is there a simpler way? gandhi21299: Do I use IRBuilder to do this or is there a simpler way?

llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-foldnegate.ll

This file was added.

				; RUN: opt -mtriple=amdgcn-amd-amdhsa -amdgpu-codegenprepare -verify -S %s -o - \| FileCheck %s

				arsenmUnsubmitted Done Reply Inline Actions You didn't actually generate these checks. The operand tests are mostly missing arsenm: You didn't actually generate these checks. The operand tests are mostly missing
				gandhi21299AuthorUnsubmitted Done Reply Inline Actions Can you elaborate on the kind of tests we need? gandhi21299: Can you elaborate on the kind of tests we need?
				declare i1 @llvm.amdgcn.class.f32(float, i32) nounwind readnone
				declare i1 @llvm.amdgcn.class.f64(double, i32) nounwind readnone

				; Trivial case, xor instruction should be removed and
				; the second argument of the intrinsic call should be
				; bitwise-negated
				; CHECK: @fold_negate_intrinsic_test_mask
				; CHECK: %1 = call i1 @llvm.amdgcn.class.f32(float %x, i32 1018)
				define i1 @fold_negate_intrinsic_test_mask(float %x) nounwind {
				%1 = call i1 @llvm.amdgcn.class.f32(float %x, i32 5)
				%2 = xor i1 %1, -1
				ret i1 %2
				}

				; Trivial case, xor instruction should be removed and
				; the second argument of the intrinsic call should be
				; bitwise-negated
				arsenmUnsubmitted Done Reply Inline Actions Need a negative test with a variable mask. Also a negative test for multiple uses. Plus also could use tests for all of the FP types arsenm: Need a negative test with a variable mask. Also a negative test for multiple uses. Plus also…
				; CHECK: @fold_negate_intrinsic_test_mask_dbl
				; CHECK: %1 = call i1 @llvm.amdgcn.class.f64(double %x, i32 1018)
				define i1 @fold_negate_intrinsic_test_mask_dbl(double %x) nounwind {
				%1 = call i1 @llvm.amdgcn.class.f64(double %x, i32 5)
				%2 = xor i1 %1, -1
				ret i1 %2
				}

				arsenmUnsubmitted Done Reply Inline Actions You shouldn't be trying to look through this zext, this IR should have been optimized to xor on i1 arsenm: You shouldn't be trying to look through this zext, this IR should have been optimized to xor on…
				; Negative test: should not transform for variable test masks
				; CHECK: @fold_negate_intrinsic_test_mask_neg_var
				; CHECK: %[[X0:.*]] = alloca i32
				; CHECK: %[[X1:.]] = load i32, i32 %[[X0]]
				; CHECK: call i1 @llvm.amdgcn.class.f32(float %x, i32 %[[X1]])
				; CHECK: xor
				define i1 @fold_negate_intrinsic_test_mask_neg_var(float %x) nounwind {
				%1 = alloca i32
				store i32 7, i32* %1
				%2 = load i32, i32* %1
				%3 = call i1 @llvm.amdgcn.class.f32(float %x, i32 %2)
				%4 = xor i1 %3, -1
				arsenmUnsubmitted Done Reply Inline Actions This didn't do anything arsenm: This didn't do anything
				ret i1 %4
				}

				; Negative test: should not transform for multiple uses of the
				; intrinsic returned value
				; CHECK: @fold_negate_intrinsic_test_mask_neg_multiple_uses
				; CHECK: %[[X1:.*]] = call i1 @llvm.amdgcn.class.f32
				; CHECK: store i1 %[[X1]]
				arsenmUnsubmitted Done Reply Inline Actions Should check the operands arsenm: Should check the operands
				; CHECK: %[[X2:.*]] = xor i1 %[[X1]]
				define i1 @fold_negate_intrinsic_test_mask_neg_multiple_uses(float %x) nounwind {
				%y = alloca i1
				%1 = call i1 @llvm.amdgcn.class.f32(float %x, i32 7)
				%2 = xor i1 %1, -1
				store i1 %1, i1* %y
				%3 = xor i1 %1, -1
				ret i1 %2
				arsenmUnsubmitted Done Reply Inline Actions It would be a better test to have a non-identical use (i.e. just add a store of the value) arsenm: It would be a better test to have a non-identical use (i.e. just add a store of the value)
				}

				; Negative test: should not transform for a xor with no operand equal to -1
				; CHECK: @fold_negate_intrinsic_test_mask_neg_one
				; CHECK: %[[X0:.*]] = call i1 @llvm.amdgcn.class.f32
				; CHECK: xor i1 %[[X0]], false
				arsenmUnsubmitted Done Reply Inline Actions Should check the operands arsenm: Should check the operands
				define i1 @fold_negate_intrinsic_test_mask_neg_one(float %x) nounwind {
				%1 = call i1 @llvm.amdgcn.class.f32(float %x, i32 7)
				%2 = xor i1 %1, false
				ret i1 %2
				}

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] [CodeGen] Fold negate llvm.amdgcn.class into test mask
ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 352811

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp

llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-foldnegate.ll

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] [CodeGen] Fold negate llvm.amdgcn.class into test maskClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 352811

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp

llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-foldnegate.ll

[AMDGPU] [CodeGen] Fold negate llvm.amdgcn.class into test mask
ClosedPublic