This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
33/33
AMDGPUCodeGenPrepare.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
8/8
amdgpu-codegenprepare-foldnegate.ll

Differential D104049

[AMDGPU] [CodeGen] Fold negate llvm.amdgcn.class into test mask
ClosedPublic

Authored by gandhi21299 on Jun 10 2021, 11:01 AM.

Download Raw Diff

Details

Reviewers

arsenm
thegameg
cfang
foad

Commits

rG2e5dc4a1efe1: [AMDGPU] [CodeGen] Fold negate llvm.amdgcn.class into test mask

Summary

Implemented the following fold-negate transformation in the very beginning of AMDGPUCodegenPrepare.cpp:

xor (llvm.amdgcn.class x, mask), -1 --> llvm.amdgcn.class(x, ~mask)

Added regression tests

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	210 ms	x64 debian > LLVM.CodeGen/AMDGPU::amdgpu-codegenprepare-i16-to-i32.ll
	300 ms	x64 debian > LLVM.CodeGen/AMDGPU::constant-fold-mi-operands.ll
	310 ms	x64 debian > LLVM.CodeGen/AMDGPU::xnor.ll
	510 ms	x64 debian > LLVM.CodeGen/AMDGPU/GlobalISel::xnor.ll
	1,140 ms	x64 debian > libFuzzer.libFuzzer::fuzzer-finalstats.test

Event Timeline

gandhi21299 created this revision.Jun 10 2021, 11:01 AM

Herald added subscribers: foad, kerbowa, hiraditya and 7 others. · View Herald TranscriptJun 10 2021, 11:01 AM

gandhi21299 requested review of this revision.Jun 10 2021, 11:01 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 10 2021, 11:01 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

Harbormaster completed remote builds in B108654: Diff 351217.Jun 10 2021, 11:58 AM

arsenm added inline comments.Jun 10 2021, 2:27 PM

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
1414	This should not be a separate pass over the function. This should be a visitXor function
1465	You don't need to create a constant to check the value
1475	Should check hasOneUse
1479–1490	I don't know why you are looking at all of these extensions. The xor should directly consume the call
1494–1496	You can just dyn_cast<IntrinsicInst>
1501	Since you know it's a constant, you can also 0 the irrelevant high bits
1506	You don't need to collect dead instructions, the xor should always be dead
llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-foldnegate.ll
19	Need a negative test with a variable mask. Also a negative test for multiple uses. Plus also could use tests for all of the FP types
27	You shouldn't be trying to look through this zext, this IR should have been optimized to xor on i1
37–39	This didn't do anything

arsenm requested changes to this revision.Jun 10 2021, 2:27 PM

This revision now requires changes to proceed.Jun 10 2021, 2:27 PM

gandhi21299 marked 6 inline comments as done.Jun 11 2021, 9:09 AM

gandhi21299 added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
1479–1490	I am considering the case where the result from the class intrinsic is extended/truncated for some reason. Is it possible to get the IntrinsicInst directly from the xor operand?

gandhi21299 marked 3 inline comments as done.Jun 11 2021, 12:46 PM

gandhi21299 added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
1501	Do I use IRBuilder to do this or is there a simpler way?

gandhi21299 marked 2 inline comments as done.Jun 11 2021, 1:21 PM

changes requested by @arsenm

Harbormaster completed remote builds in B108891: Diff 351554.Jun 11 2021, 2:10 PM

check if one of the operands of the xor instruction is a ConstantInt, fixes two of the tests

Harbormaster completed remote builds in B109045: Diff 351770.Jun 13 2021, 10:26 PM

arsenm added inline comments.Jun 15 2021, 10:32 AM

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
23	Definitely should not include this here
41	Shouldn't need this
831–839	I believe PatternMatch has a nicer way to check for a not
847	I'd prefer to just use the dyn_cast below and return on that rather than checking isa first
854	You don't need this intermediate APInt (plus this is required to be an i32)
llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-foldnegate.ll
55	It would be a better test to have a non-identical use (i.e. just add a store of the value)

arsenm added inline comments.Jun 15 2021, 10:46 AM

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
831–839	Plus constants are canonicalized to the RHS, so you don't need to check both
840–841	Invert condition and exit early to reduce indentation

arsenm added inline comments.Jun 15 2021, 11:08 AM

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
857	Arg->getType()

changes requested by @arsenm

changes as requested by Matt

gandhi21299 marked an inline comment as done.Jun 15 2021, 1:09 PM

gandhi21299 added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
831–839	After removing the else-if condition, this seems clean enough to me. Perhaps, PatternMatch won't be required any more.

Harbormaster completed remote builds in B109351: Diff 352206.Jun 15 2021, 1:39 PM

refreshing diff, only 2 tests should be failing instead of 4

arsenm added inline comments.Jun 15 2021, 5:41 PM

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
829	No else after return
833–834	Using m_Not would still be clearer
833–835	Using dyn_cast and isa is redundant, just use dyn_cast and check the result
853	There are fewer bits in the test mask than this
llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-foldnegate.ll
2	You didn't actually generate these checks. The operand tests are mostly missing

Harbormaster completed remote builds in B109389: Diff 352262.Jun 15 2021, 7:23 PM

Please run all of check-llvm-codegen-amdgpu. I tried your patch and it looks like a couple more tests need updating.

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
17–39	I don't think you need any of these changes to the #includes.
827	This check is wrong. It's OK for there to be multiple uses of the Xor, since you replace all of them. It is not OK for there to be other uses of the intrinsic call, since you modify it in-place.

requested changes
@foad two tests fail after running ninja check-llvm-codegen-amdgpu

gandhi21299 added inline comments.Jun 16 2021, 2:50 PM

llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-foldnegate.ll
2	Can you elaborate on the kind of tests we need?

Harbormaster completed remote builds in B109604: Diff 352563.Jun 17 2021, 12:44 AM

foad added inline comments.Jun 17 2021, 1:38 AM

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
28	Don't need this because you don't use the IRBuilder for anything
33–34	Don't need this unless you actually do some pattern matching
36	Don't need this, it's already included by default
838	Don't need this, you don't use it for anything
842–844	This is wrong. If the value was 4 you will xor it with 7 giving 3, but you need to flip all the bits that amdgcn_class cares about, i.e. 10 low order bits. You should either xor with a fixed value of 0x3ff, or perhaps move this enum from AMDGPUInstCombineIntrinsic.cpp to a common header (maybe SIDefines.h?) and add an "ALL" value to it: 260: case Intrinsic::amdgcn_class: { 261- enum { 262- S_NAN = 1 << 0, // Signaling NaN 263- Q_NAN = 1 << 1, // Quiet NaN 264- N_INFINITY = 1 << 2, // Negative infinity 265- N_NORMAL = 1 << 3, // Negative normal 266- N_SUBNORMAL = 1 << 4, // Negative subnormal 267- N_ZERO = 1 << 5, // Negative zero 268- P_ZERO = 1 << 6, // Positive zero 269- P_SUBNORMAL = 1 << 7, // Positive subnormal 270- P_NORMAL = 1 << 8, // Positive normal 271- P_INFINITY = 1 << 9 // Positive infinity 272- };

gandhi21299 marked 4 inline comments as done.Jun 17 2021, 9:10 AM

gandhi21299 added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
28	Turns out, it is used in other places. This pass does not compile without including this header.
842–844	Flipping only the low 10 bits fails several other tests, thoughts? @arsenm @foad

foad added inline comments.Jun 17 2021, 9:13 AM

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
842–844	You need to look at why they fail. Probably your patch changes (improves) the generated code, so the tests need to be updated to expect the improved code sequence.

get lower 10 bits of the test mask by xor-ing with 0x3ff
updated tests accordingly
refactoring
2 tests still fail from the AMDGPU codegen test suite

I am not too sure what is causing the test CodeGen/AMDGPU/amdgpu-codegenprepare-i16-to-i32.ll to fail. There is no amdgcn class intrinsic being used anywhere in this test case so there should not be any transformation happening. @arsenm

Harbormaster completed remote builds in B109782: Diff 352811.Jun 18 2021, 2:39 AM

I am not too sure what is causing the test CodeGen/AMDGPU/amdgpu-codegenprepare-i16-to-i32.ll to fail. There is no amdgcn class intrinsic being used anywhere in this test case so there should not be any transformation happening.

Your visitXor function overrides the handling of 16-bit xors which were previously handled by visitBinaryOperator.

For the cases you can't handle, instead of return false you need something like return visitBinaryOperator(I).

now passes all tests, thanks @foad for the fix

Looks OK to me, but please wait a day in case other reviewers still have comments.

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
828	All ConstantInts have IntegerType so you don't need to check that.

eliminated RHS IntegerTy check since it is already declared as a ConstantInt

arsenm added inline comments.Jun 18 2021, 11:02 AM

llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-foldnegate.ll
47	Should check the operands
61	Should check the operands

added checks on operands for the class intrinsic

arsenm accepted this revision.Jun 18 2021, 11:49 AM

This revision is now accepted and ready to land.Jun 18 2021, 11:49 AM

Thanks for the review process, I will merge this patch in

This revision was landed with ongoing or failed builds.Jun 18 2021, 12:04 PM

Closed by commit rG2e5dc4a1efe1: [AMDGPU] [CodeGen] Fold negate llvm.amdgcn.class into test mask (authored by gandhi21299). · Explain Why

This revision was automatically updated to reflect the committed changes.

gandhi21299 added a commit: rG2e5dc4a1efe1: [AMDGPU] [CodeGen] Fold negate llvm.amdgcn.class into test mask.

Harbormaster completed remote builds in B109976: Diff 353065.Jun 19 2021, 8:12 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUCodeGenPrepare.cpp

50 lines

test/

CodeGen/

AMDGPU/

amdgpu-codegenprepare-foldnegate.ll

67 lines

Diff 351554

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp

//===-- AMDGPUCodeGenPrepare.cpp ------------------------------------------===//		//===-- AMDGPUCodeGenPrepare.cpp ------------------------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
/// \file		/// \file
/// This pass does misc. AMDGPU optimizations on IR before instruction		/// This pass does misc. AMDGPU optimizations on IR before instruction
/// selection.		/// selection.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AMDGPU.h"		#include "AMDGPU.h"
#include "AMDGPUTargetMachine.h"		#include "AMDGPUTargetMachine.h"
		#include "llvm-c/Core.h"
		#include "llvm/ADT/SmallVector.h"
#include "llvm/Analysis/AssumptionCache.h"		#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/ConstantFolding.h"		#include "llvm/Analysis/ConstantFolding.h"
#include "llvm/Analysis/LegacyDivergenceAnalysis.h"		#include "llvm/Analysis/LegacyDivergenceAnalysis.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
		#include "llvm/CodeGen/GlobalISel/InstructionSelector.h"
		arsenmUnsubmitted Done Reply Inline Actions Definitely should not include this here arsenm: Definitely should not include this here
#include "llvm/CodeGen/TargetPassConfig.h"		#include "llvm/CodeGen/TargetPassConfig.h"
		#include "llvm/IR/BasicBlock.h"
		#include "llvm/IR/Constants.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
		#include "llvm/IR/GlobalValue.h"
		foadUnsubmitted Done Reply Inline Actions Don't need this because you don't use the IRBuilder for anything foad: Don't need this because you don't use the IRBuilder for anything
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions Turns out, it is used in other places. This pass does not compile without including this header. gandhi21299: Turns out, it is used in other places. This pass does not compile without including this header.
		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstVisitor.h"		#include "llvm/IR/InstVisitor.h"
		#include "llvm/IR/Instruction.h"
		#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicsAMDGPU.h"		#include "llvm/IR/IntrinsicsAMDGPU.h"
#include "llvm/IR/IRBuilder.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
		foadUnsubmitted Done Reply Inline Actions Don't need this unless you actually do some pattern matching foad: Don't need this unless you actually do some pattern matching
#include "llvm/Pass.h"		#include "llvm/Pass.h"
		#include "llvm/Support/Casting.h"
		foadUnsubmitted Done Reply Inline Actions Don't need this, it's already included by default foad: Don't need this, it's already included by default
		#include "llvm/Support/Debug.h"
#include "llvm/Support/KnownBits.h"		#include "llvm/Support/KnownBits.h"
		#include "llvm/Transforms/Utils/BasicBlockUtils.h"
		foadUnsubmitted Done Reply Inline Actions I don't think you need any of these changes to the #includes. foad: I don't think you need //any// of these changes to the #includes.
#include "llvm/Transforms/Utils/IntegerDivision.h"		#include "llvm/Transforms/Utils/IntegerDivision.h"
		#include <bits/stdint-uintn.h>
		arsenmUnsubmitted Done Reply Inline Actions Shouldn't need this arsenm: Shouldn't need this

#define DEBUG_TYPE "amdgpu-codegenprepare"		#define DEBUG_TYPE "amdgpu-codegenprepare"

using namespace llvm;		using namespace llvm;

namespace {		namespace {

static cl::opt<bool> WidenLoads(		static cl::opt<bool> WidenLoads(
▲ Show 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	class AMDGPUCodeGenPrepare : public FunctionPass,
bool canWidenScalarExtLoad(LoadInst &I) const;		bool canWidenScalarExtLoad(LoadInst &I) const;

public:		public:
static char ID;		static char ID;

AMDGPUCodeGenPrepare() : FunctionPass(ID) {}		AMDGPUCodeGenPrepare() : FunctionPass(ID) {}

bool visitFDiv(BinaryOperator &I);		bool visitFDiv(BinaryOperator &I);
		bool visitXor(BinaryOperator &I);

bool visitInstruction(Instruction &I) { return false; }		bool visitInstruction(Instruction &I) { return false; }
bool visitBinaryOperator(BinaryOperator &I);		bool visitBinaryOperator(BinaryOperator &I);
bool visitLoadInst(LoadInst &I);		bool visitLoadInst(LoadInst &I);
bool visitICmpInst(ICmpInst &I);		bool visitICmpInst(ICmpInst &I);
bool visitSelectInst(SelectInst &I);		bool visitSelectInst(SelectInst &I);

bool visitIntrinsicInst(IntrinsicInst &I);		bool visitIntrinsicInst(IntrinsicInst &I);
▲ Show 20 Lines • Show All 591 Lines • ▼ Show 20 Lines	if (NewFDiv) {
FDiv.replaceAllUsesWith(NewFDiv);		FDiv.replaceAllUsesWith(NewFDiv);
NewFDiv->takeName(&FDiv);		NewFDiv->takeName(&FDiv);
FDiv.eraseFromParent();		FDiv.eraseFromParent();
}		}

return !!NewFDiv;		return !!NewFDiv;
}		}

		bool AMDGPUCodeGenPrepare::visitXor(BinaryOperator &I) {
		// Match the Xor instruction and its operands
		CallInst *Call = nullptr;
		if (I.hasOneUse()) {
		foadUnsubmitted Done Reply Inline Actions This check is wrong. It's OK for there to be multiple uses of the Xor, since you replace all of them. It is not OK for there to be other uses of the intrinsic call, since you modify it in-place. foad: This check is wrong. It's OK for there to be multiple uses of the Xor, since you replace all of…
		if (isa<CallInst>(I.getOperand(0)) &&
		foadUnsubmitted Done Reply Inline Actions All ConstantInts have IntegerType so you don't need to check that. foad: All ConstantInts have IntegerType so you don't need to check that.
		dyn_cast<ConstantInt>(I.getOperand(1))->getSExtValue() == -1)
		arsenmUnsubmitted Done Reply Inline Actions No else after return arsenm: No else after return
		Call = cast<CallInst>(I.getOperand(0));
		else if (isa<CallInst>(I.getOperand(1)) &&
		dyn_cast<ConstantInt>(I.getOperand(0))->getSExtValue() == -1)
		Call = cast<CallInst>(I.getOperand(1));
		else
		arsenmUnsubmitted Done Reply Inline Actions Using m_Not would still be clearer arsenm: Using m_Not would still be clearer
		return false;
		arsenmUnsubmitted Done Reply Inline Actions Using dyn_cast and isa is redundant, just use dyn_cast and check the result arsenm: Using dyn_cast and isa is redundant, just use dyn_cast and check the result
		} else
		return false;

		foadUnsubmitted Done Reply Inline Actions Don't need this, you don't use it for anything foad: Don't need this, you don't use it for anything
		// Check if the Call is an intrinsic intruction to amdgcn_class intrinsic
		arsenmUnsubmitted Done Reply Inline Actions I believe PatternMatch has a nicer way to check for a not arsenm: I believe PatternMatch has a nicer way to check for a not
		arsenmUnsubmitted Done Reply Inline Actions Plus constants are canonicalized to the RHS, so you don't need to check both arsenm: Plus constants are canonicalized to the RHS, so you don't need to check both
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions After removing the else-if condition, this seems clean enough to me. Perhaps, PatternMatch won't be required any more. gandhi21299: After removing the else-if condition, this seems clean enough to me. Perhaps, PatternMatch…
		IntrinsicInst *IntrinsicCall = dyn_cast<IntrinsicInst>(Call);
		if (!IntrinsicCall \|\|
		arsenmUnsubmitted Done Reply Inline Actions Invert condition and exit early to reduce indentation arsenm: Invert condition and exit early to reduce indentation
		IntrinsicCall->getIntrinsicID() != Intrinsic::amdgcn_class \|\|
		!isa<Constant>(IntrinsicCall->getOperand(1)))
		return false;
		foadUnsubmitted Done Reply Inline Actions This is wrong. If the value was 4 you will xor it with 7 giving 3, but you need to flip all the bits that amdgcn_class cares about, i.e. 10 low order bits. You should either xor with a fixed value of 0x3ff, or perhaps move this enum from AMDGPUInstCombineIntrinsic.cpp to a common header (maybe SIDefines.h?) and add an "ALL" value to it: 260: case Intrinsic::amdgcn_class: { 261- enum { 262- S_NAN = 1 << 0, // Signaling NaN 263- Q_NAN = 1 << 1, // Quiet NaN 264- N_INFINITY = 1 << 2, // Negative infinity 265- N_NORMAL = 1 << 3, // Negative normal 266- N_SUBNORMAL = 1 << 4, // Negative subnormal 267- N_ZERO = 1 << 5, // Negative zero 268- P_ZERO = 1 << 6, // Positive zero 269- P_SUBNORMAL = 1 << 7, // Positive subnormal 270- P_NORMAL = 1 << 8, // Positive normal 271- P_INFINITY = 1 << 9 // Positive infinity 272- }; foad: This is wrong. If the value was 4 you will xor it with 7 giving 3, but you need to flip all the…
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions Flipping only the low 10 bits fails several other tests, thoughts? @arsenm @foad gandhi21299: Flipping only the low 10 bits fails several other tests, thoughts? @arsenm @foad
		foadUnsubmitted Done Reply Inline Actions You need to look at why they fail. Probably your patch changes (improves) the generated code, so the tests need to be updated to expect the improved code sequence. foad: You need to look at why they fail. Probably your patch changes (improves) the generated code…

		// "Not" the second argument of the intrinsic call
		IRBuilder<> Builder(IntrinsicCall);
		arsenmUnsubmitted Done Reply Inline Actions I'd prefer to just use the dyn_cast below and return on that rather than checking isa first arsenm: I'd prefer to just use the dyn_cast below and return on that rather than checking isa first
		ConstantInt *Arg = dyn_cast<ConstantInt>(IntrinsicCall->getOperand(1));
		int64_t BitWidth = Arg->getBitWidth();
		APInt NewArg(BitWidth, ~Arg->getZExtValue());

		IntrinsicCall->setOperand(
		1, ConstantInt::get(IntrinsicCall->getOperand(1)->getType(), NewArg));
		arsenmUnsubmitted Done Reply Inline Actions There are fewer bits in the test mask than this arsenm: There are fewer bits in the test mask than this
		I.replaceAllUsesWith(IntrinsicCall);
		arsenmUnsubmitted Done Reply Inline Actions You don't need this intermediate APInt (plus this is required to be an i32) arsenm: You don't need this intermediate APInt (plus this is required to be an i32)
		I.eraseFromParent();
		return true;
		}
		arsenmUnsubmitted Done Reply Inline Actions Arg->getType() arsenm: Arg->getType()

static bool hasUnsafeFPMath(const Function &F) {		static bool hasUnsafeFPMath(const Function &F) {
Attribute Attr = F.getFnAttribute("unsafe-fp-math");		Attribute Attr = F.getFnAttribute("unsafe-fp-math");
return Attr.getValueAsBool();		return Attr.getValueAsBool();
}		}

static std::pair<Value, Value> getMul64(IRBuilder<> &Builder,		static std::pair<Value, Value> getMul64(IRBuilder<> &Builder,
Value LHS, Value RHS) {		Value LHS, Value RHS) {
Type *I32Ty = Builder.getInt32Ty();		Type *I32Ty = Builder.getInt32Ty();
▲ Show 20 Lines • Show All 539 Lines • ▼ Show 20 Lines	bool AMDGPUCodeGenPrepare::doInitialization(Module &M) {
DL = &Mod->getDataLayout();		DL = &Mod->getDataLayout();
return false;		return false;
}		}

bool AMDGPUCodeGenPrepare::runOnFunction(Function &F) {		bool AMDGPUCodeGenPrepare::runOnFunction(Function &F) {
if (skipFunction(F))		if (skipFunction(F))
return false;		return false;

auto *TPC = getAnalysisIfAvailable<TargetPassConfig>();		auto *TPC = getAnalysisIfAvailable<TargetPassConfig>();
		arsenmUnsubmitted Done Reply Inline Actions This should not be a separate pass over the function. This should be a visitXor function arsenm: This should not be a separate pass over the function. This should be a visitXor function
if (!TPC)		if (!TPC)
return false;		return false;

const AMDGPUTargetMachine &TM = TPC->getTM<AMDGPUTargetMachine>();		const AMDGPUTargetMachine &TM = TPC->getTM<AMDGPUTargetMachine>();
ST = &TM.getSubtarget<GCNSubtarget>(F);		ST = &TM.getSubtarget<GCNSubtarget>(F);
AC = &getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);		AC = &getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);
DA = &getAnalysis<LegacyDivergenceAnalysis>();		DA = &getAnalysis<LegacyDivergenceAnalysis>();

Show All 34 Lines

INITIALIZE_PASS_BEGIN(AMDGPUCodeGenPrepare, DEBUG_TYPE,		INITIALIZE_PASS_BEGIN(AMDGPUCodeGenPrepare, DEBUG_TYPE,
"AMDGPU IR optimizations", false, false)		"AMDGPU IR optimizations", false, false)
INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)		INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
INITIALIZE_PASS_DEPENDENCY(LegacyDivergenceAnalysis)		INITIALIZE_PASS_DEPENDENCY(LegacyDivergenceAnalysis)
INITIALIZE_PASS_END(AMDGPUCodeGenPrepare, DEBUG_TYPE, "AMDGPU IR optimizations",		INITIALIZE_PASS_END(AMDGPUCodeGenPrepare, DEBUG_TYPE, "AMDGPU IR optimizations",
false, false)		false, false)

char AMDGPUCodeGenPrepare::ID = 0;		char AMDGPUCodeGenPrepare::ID = 0;
		arsenmUnsubmitted Done Reply Inline Actions You don't need to create a constant to check the value arsenm: You don't need to create a constant to check the value

FunctionPass *llvm::createAMDGPUCodeGenPreparePass() {		FunctionPass *llvm::createAMDGPUCodeGenPreparePass() {
return new AMDGPUCodeGenPrepare();		return new AMDGPUCodeGenPrepare();
}		}
		arsenmUnsubmitted Done Reply Inline Actions I don't know why you are looking at all of these extensions. The xor should directly consume the call arsenm: I don't know why you are looking at all of these extensions. The xor should directly consume…
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions I am considering the case where the result from the class intrinsic is extended/truncated for some reason. Is it possible to get the IntrinsicInst directly from the xor operand? gandhi21299: I am considering the case where the result from the class intrinsic is extended/truncated for…
		arsenmUnsubmitted Done Reply Inline Actions You can just dyn_cast<IntrinsicInst> arsenm: You can just dyn_cast<IntrinsicInst>
		arsenmUnsubmitted Done Reply Inline Actions You don't need to collect dead instructions, the xor should always be dead arsenm: You don't need to collect dead instructions, the xor should always be dead
		arsenmUnsubmitted Done Reply Inline Actions Should check hasOneUse arsenm: Should check hasOneUse
		arsenmUnsubmitted Done Reply Inline Actions Since you know it's a constant, you can also 0 the irrelevant high bits arsenm: Since you know it's a constant, you can also 0 the irrelevant high bits
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions Do I use IRBuilder to do this or is there a simpler way? gandhi21299: Do I use IRBuilder to do this or is there a simpler way?

llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-foldnegate.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -mtriple=amdgcn-amd-amdhsa -amdgpu-codegenprepare -verify -S %s -o - \| FileCheck %s
				arsenmUnsubmitted Done Reply Inline Actions You didn't actually generate these checks. The operand tests are mostly missing arsenm: You didn't actually generate these checks. The operand tests are mostly missing
				gandhi21299AuthorUnsubmitted Done Reply Inline Actions Can you elaborate on the kind of tests we need? gandhi21299: Can you elaborate on the kind of tests we need?

				declare i1 @llvm.amdgcn.class.f32(float, i32) nounwind readnone
				declare i1 @llvm.amdgcn.class.f64(double, i32) nounwind readnone

				; Trivial case, xor instruction should be removed and
				; the second argument of the intrinsic call should be
				; bitwise-negated
				; CHECK: @fold_negate_intrinsic_test_mask
				; CHECK: %1 = call i1 @llvm.amdgcn.class.f32(float %x, i32 -6)
				define i1 @fold_negate_intrinsic_test_mask(float %x) nounwind {
				%1 = call i1 @llvm.amdgcn.class.f32(float %x, i32 5)
				%2 = xor i1 %1, -1
				ret i1 %2
				}

				; Trivial case, xor instruction should be removed and
				; the second argument of the intrinsic call should be
				arsenmUnsubmitted Done Reply Inline Actions Need a negative test with a variable mask. Also a negative test for multiple uses. Plus also could use tests for all of the FP types arsenm: Need a negative test with a variable mask. Also a negative test for multiple uses. Plus also…
				; bitwise-negated
				; CHECK: @fold_negate_intrinsic_test_mask_dbl
				; CHECK: %1 = call i1 @llvm.amdgcn.class.f64(double %x, i32 -6)
				define i1 @fold_negate_intrinsic_test_mask_dbl(double %x) nounwind {
				%1 = call i1 @llvm.amdgcn.class.f64(double %x, i32 5)
				%2 = xor i1 %1, -1
				ret i1 %2
				}
				arsenmUnsubmitted Done Reply Inline Actions You shouldn't be trying to look through this zext, this IR should have been optimized to xor on i1 arsenm: You shouldn't be trying to look through this zext, this IR should have been optimized to xor on…

				; Negative test: should not transform for variable test masks
				; CHECK: @fold_negate_intrinsic_test_mask_neg_var
				; CHECK: %[[X0:.*]] = alloca i32
				; CHECK: %[[X1:.]] = load i32, i32 %[[X0]]
				; CHECK: call i1 @llvm.amdgcn.class.f32(float %x, i32 %[[X1]])
				; CHECK: xor
				define i1 @fold_negate_intrinsic_test_mask_neg_var(float %x) nounwind {
				%1 = alloca i32
				store i32 7, i32* %1
				%2 = load i32, i32* %1
				%3 = call i1 @llvm.amdgcn.class.f32(float %x, i32 %2)
				arsenmUnsubmitted Done Reply Inline Actions This didn't do anything arsenm: This didn't do anything
				%4 = xor i1 %3, -1
				ret i1 %4
				}

				; Negative test: should not transform for multiple uses of the
				; intrinsic returned value
				; CHECK: @fold_negate_intrinsic_test_mask_neg_multiple_uses
				; CHECK: %[[X0:.*]] = call i1 @llvm.amdgcn.class.f32
				arsenmUnsubmitted Done Reply Inline Actions Should check the operands arsenm: Should check the operands
				; CHECK: %[[X1:.*]] = xor i1 %[[X0]]
				; CHECK: xor i1 %[[X1]]
				; CHECK: xor i1 %[[X1]]
				define i1 @fold_negate_intrinsic_test_mask_neg_multiple_uses(float %x) nounwind {
				%1 = call i1 @llvm.amdgcn.class.f32(float %x, i32 7)
				%2 = xor i1 %1, -1
				%3 = xor i1 %2, -1
				%4 = xor i1 %2, -1
				arsenmUnsubmitted Done Reply Inline Actions It would be a better test to have a non-identical use (i.e. just add a store of the value) arsenm: It would be a better test to have a non-identical use (i.e. just add a store of the value)
				ret i1 %3
				}

				; Negative test: should not transform for a xor with no operand equal to -1
				; CHECK: @fold_negate_intrinsic_test_mask_neg_one
				; CHECK: %[[X0:.*]] = call i1 @llvm.amdgcn.class.f32
				arsenmUnsubmitted Done Reply Inline Actions Should check the operands arsenm: Should check the operands
				; CHECK: xor i1 %[[X0]], false
				define i1 @fold_negate_intrinsic_test_mask_neg_one(float %x) nounwind {
				%1 = call i1 @llvm.amdgcn.class.f32(float %x, i32 7)
				%2 = xor i1 %1, false
				ret i1 %2
				}

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] [CodeGen] Fold negate llvm.amdgcn.class into test maskClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 351554

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp

llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-foldnegate.ll

[AMDGPU] [CodeGen] Fold negate llvm.amdgcn.class into test mask
ClosedPublic