Download Raw Diff

Details

Reviewers

arsenm
thegameg
cfang
foad

Commits

rG2e5dc4a1efe1: [AMDGPU] [CodeGen] Fold negate llvm.amdgcn.class into test mask

Summary

Implemented the following fold-negate transformation in the very beginning of AMDGPUCodegenPrepare.cpp:

xor (llvm.amdgcn.class x, mask), -1 --> llvm.amdgcn.class(x, ~mask)

Added regression tests

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	6,020 ms	x64 debian > libarcher.races::lock-unrelated.c

Event Timeline

gandhi21299 created this revision.Jun 10 2021, 11:01 AM

Herald added subscribers: foad, kerbowa, hiraditya and 7 others. · View Herald TranscriptJun 10 2021, 11:01 AM

gandhi21299 requested review of this revision.Jun 10 2021, 11:01 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 10 2021, 11:01 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

Harbormaster completed remote builds in B108654: Diff 351217.Jun 10 2021, 11:58 AM

arsenm added inline comments.Jun 10 2021, 2:27 PM

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
1387	This should not be a separate pass over the function. This should be a visitXor function
1438	You don't need to create a constant to check the value
1448	Should check hasOneUse
1452–1463	I don't know why you are looking at all of these extensions. The xor should directly consume the call
1467–1469	You can just dyn_cast<IntrinsicInst>
1474	Since you know it's a constant, you can also 0 the irrelevant high bits
1479	You don't need to collect dead instructions, the xor should always be dead
llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-foldnegate.ll
18	Need a negative test with a variable mask. Also a negative test for multiple uses. Plus also could use tests for all of the FP types
26	You shouldn't be trying to look through this zext, this IR should have been optimized to xor on i1
36–38	This didn't do anything

arsenm requested changes to this revision.Jun 10 2021, 2:27 PM

This revision now requires changes to proceed.Jun 10 2021, 2:27 PM

gandhi21299 marked 6 inline comments as done.Jun 11 2021, 9:09 AM

gandhi21299 added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
1452–1463	I am considering the case where the result from the class intrinsic is extended/truncated for some reason. Is it possible to get the IntrinsicInst directly from the xor operand?

gandhi21299 marked 3 inline comments as done.Jun 11 2021, 12:46 PM

gandhi21299 added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
1474	Do I use IRBuilder to do this or is there a simpler way?

gandhi21299 marked 2 inline comments as done.Jun 11 2021, 1:21 PM

changes requested by @arsenm

Harbormaster completed remote builds in B108891: Diff 351554.Jun 11 2021, 2:10 PM

check if one of the operands of the xor instruction is a ConstantInt, fixes two of the tests

Harbormaster completed remote builds in B109045: Diff 351770.Jun 13 2021, 10:26 PM

arsenm added inline comments.Jun 15 2021, 10:32 AM

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
23	Definitely should not include this here
41	Shouldn't need this
839–847	I believe PatternMatch has a nicer way to check for a not
855	I'd prefer to just use the dyn_cast below and return on that rather than checking isa first
862	You don't need this intermediate APInt (plus this is required to be an i32)
llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-foldnegate.ll
55	It would be a better test to have a non-identical use (i.e. just add a store of the value)

arsenm added inline comments.Jun 15 2021, 10:46 AM

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
839–847	Plus constants are canonicalized to the RHS, so you don't need to check both
848–849	Invert condition and exit early to reduce indentation

arsenm added inline comments.Jun 15 2021, 11:08 AM

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
865	Arg->getType()

changes requested by @arsenm

changes as requested by Matt

gandhi21299 marked an inline comment as done.Jun 15 2021, 1:09 PM

gandhi21299 added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
839–847	After removing the else-if condition, this seems clean enough to me. Perhaps, PatternMatch won't be required any more.

Harbormaster completed remote builds in B109351: Diff 352206.Jun 15 2021, 1:39 PM

refreshing diff, only 2 tests should be failing instead of 4

arsenm added inline comments.Jun 15 2021, 5:41 PM

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
837	No else after return
841–842	Using m_Not would still be clearer
841–843	Using dyn_cast and isa is redundant, just use dyn_cast and check the result
861	There are fewer bits in the test mask than this
llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-foldnegate.ll
2	You didn't actually generate these checks. The operand tests are mostly missing

Harbormaster completed remote builds in B109389: Diff 352262.Jun 15 2021, 7:23 PM

Please run all of check-llvm-codegen-amdgpu. I tried your patch and it looks like a couple more tests need updating.

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
17–39	I don't think you need any of these changes to the #includes.
835	This check is wrong. It's OK for there to be multiple uses of the Xor, since you replace all of them. It is not OK for there to be other uses of the intrinsic call, since you modify it in-place.

requested changes
@foad two tests fail after running ninja check-llvm-codegen-amdgpu

gandhi21299 added inline comments.Jun 16 2021, 2:50 PM

llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-foldnegate.ll
2	Can you elaborate on the kind of tests we need?

Harbormaster completed remote builds in B109604: Diff 352563.Jun 17 2021, 12:44 AM

foad added inline comments.Jun 17 2021, 1:38 AM

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
28	Don't need this because you don't use the IRBuilder for anything
33–34	Don't need this unless you actually do some pattern matching
36	Don't need this, it's already included by default
846	Don't need this, you don't use it for anything
850–852	This is wrong. If the value was 4 you will xor it with 7 giving 3, but you need to flip all the bits that amdgcn_class cares about, i.e. 10 low order bits. You should either xor with a fixed value of 0x3ff, or perhaps move this enum from AMDGPUInstCombineIntrinsic.cpp to a common header (maybe SIDefines.h?) and add an "ALL" value to it: 260: case Intrinsic::amdgcn_class: { 261- enum { 262- S_NAN = 1 << 0, // Signaling NaN 263- Q_NAN = 1 << 1, // Quiet NaN 264- N_INFINITY = 1 << 2, // Negative infinity 265- N_NORMAL = 1 << 3, // Negative normal 266- N_SUBNORMAL = 1 << 4, // Negative subnormal 267- N_ZERO = 1 << 5, // Negative zero 268- P_ZERO = 1 << 6, // Positive zero 269- P_SUBNORMAL = 1 << 7, // Positive subnormal 270- P_NORMAL = 1 << 8, // Positive normal 271- P_INFINITY = 1 << 9 // Positive infinity 272- };

gandhi21299 marked 4 inline comments as done.Jun 17 2021, 9:10 AM

gandhi21299 added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
28	Turns out, it is used in other places. This pass does not compile without including this header.
850–852	Flipping only the low 10 bits fails several other tests, thoughts? @arsenm @foad

foad added inline comments.Jun 17 2021, 9:13 AM

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
850–852	You need to look at why they fail. Probably your patch changes (improves) the generated code, so the tests need to be updated to expect the improved code sequence.

get lower 10 bits of the test mask by xor-ing with 0x3ff
updated tests accordingly
refactoring
2 tests still fail from the AMDGPU codegen test suite

I am not too sure what is causing the test CodeGen/AMDGPU/amdgpu-codegenprepare-i16-to-i32.ll to fail. There is no amdgcn class intrinsic being used anywhere in this test case so there should not be any transformation happening. @arsenm

Harbormaster completed remote builds in B109782: Diff 352811.Jun 18 2021, 2:39 AM

I am not too sure what is causing the test CodeGen/AMDGPU/amdgpu-codegenprepare-i16-to-i32.ll to fail. There is no amdgcn class intrinsic being used anywhere in this test case so there should not be any transformation happening.

Your visitXor function overrides the handling of 16-bit xors which were previously handled by visitBinaryOperator.

For the cases you can't handle, instead of return false you need something like return visitBinaryOperator(I).

now passes all tests, thanks @foad for the fix

Looks OK to me, but please wait a day in case other reviewers still have comments.

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
836	All ConstantInts have IntegerType so you don't need to check that.

eliminated RHS IntegerTy check since it is already declared as a ConstantInt

arsenm added inline comments.Jun 18 2021, 11:02 AM

llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-foldnegate.ll
47	Should check the operands
61	Should check the operands

added checks on operands for the class intrinsic

arsenm accepted this revision.Jun 18 2021, 11:49 AM

This revision is now accepted and ready to land.Jun 18 2021, 11:49 AM

Thanks for the review process, I will merge this patch in

This revision was landed with ongoing or failed builds.Jun 18 2021, 12:04 PM

Closed by commit rG2e5dc4a1efe1: [AMDGPU] [CodeGen] Fold negate llvm.amdgcn.class into test mask (authored by gandhi21299). · Explain Why

This revision was automatically updated to reflect the committed changes.

gandhi21299 added a commit: rG2e5dc4a1efe1: [AMDGPU] [CodeGen] Fold negate llvm.amdgcn.class into test mask.

Harbormaster completed remote builds in B109976: Diff 353065.Jun 19 2021, 8:12 AM

Diff 351217

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp

//===-- AMDGPUCodeGenPrepare.cpp ------------------------------------------===//		//===-- AMDGPUCodeGenPrepare.cpp ------------------------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
/// \file		/// \file
/// This pass does misc. AMDGPU optimizations on IR before instruction		/// This pass does misc. AMDGPU optimizations on IR before instruction
/// selection.		/// selection.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AMDGPU.h"		#include "AMDGPU.h"
#include "AMDGPUTargetMachine.h"		#include "AMDGPUTargetMachine.h"
		#include "llvm-c/Core.h"
		#include "llvm/ADT/SmallVector.h"
#include "llvm/Analysis/AssumptionCache.h"		#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/ConstantFolding.h"		#include "llvm/Analysis/ConstantFolding.h"
#include "llvm/Analysis/LegacyDivergenceAnalysis.h"		#include "llvm/Analysis/LegacyDivergenceAnalysis.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
		#include "llvm/CodeGen/GlobalISel/InstructionSelector.h"
		arsenmUnsubmitted Done Reply Inline Actions Definitely should not include this here arsenm: Definitely should not include this here
#include "llvm/CodeGen/TargetPassConfig.h"		#include "llvm/CodeGen/TargetPassConfig.h"
		#include "llvm/IR/BasicBlock.h"
		#include "llvm/IR/Constants.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
		#include "llvm/IR/GlobalValue.h"
		foadUnsubmitted Done Reply Inline Actions Don't need this because you don't use the IRBuilder for anything foad: Don't need this because you don't use the IRBuilder for anything
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions Turns out, it is used in other places. This pass does not compile without including this header. gandhi21299: Turns out, it is used in other places. This pass does not compile without including this header.
		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstVisitor.h"		#include "llvm/IR/InstVisitor.h"
		#include "llvm/IR/Instruction.h"
		#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicsAMDGPU.h"		#include "llvm/IR/IntrinsicsAMDGPU.h"
#include "llvm/IR/IRBuilder.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
		foadUnsubmitted Done Reply Inline Actions Don't need this unless you actually do some pattern matching foad: Don't need this unless you actually do some pattern matching
#include "llvm/Pass.h"		#include "llvm/Pass.h"
		#include "llvm/Support/Casting.h"
		foadUnsubmitted Done Reply Inline Actions Don't need this, it's already included by default foad: Don't need this, it's already included by default
		#include "llvm/Support/Debug.h"
#include "llvm/Support/KnownBits.h"		#include "llvm/Support/KnownBits.h"
		#include "llvm/Transforms/Utils/BasicBlockUtils.h"
		foadUnsubmitted Done Reply Inline Actions I don't think you need any of these changes to the #includes. foad: I don't think you need //any// of these changes to the #includes.
#include "llvm/Transforms/Utils/IntegerDivision.h"		#include "llvm/Transforms/Utils/IntegerDivision.h"

		arsenmUnsubmitted Done Reply Inline Actions Shouldn't need this arsenm: Shouldn't need this
#define DEBUG_TYPE "amdgpu-codegenprepare"		#define DEBUG_TYPE "amdgpu-codegenprepare"

using namespace llvm;		using namespace llvm;

namespace {		namespace {

static cl::opt<bool> WidenLoads(		static cl::opt<bool> WidenLoads(
"amdgpu-codegenprepare-widen-constant-loads",		"amdgpu-codegenprepare-widen-constant-loads",
▲ Show 20 Lines • Show All 151 Lines • ▼ Show 20 Lines	class AMDGPUCodeGenPrepare : public FunctionPass,
/// \details \p Widen scalar load for uniform, small type loads from constant		/// \details \p Widen scalar load for uniform, small type loads from constant
// memory / to a full 32-bits and then truncate the input to allow a scalar		// memory / to a full 32-bits and then truncate the input to allow a scalar
// load instead of a vector load.		// load instead of a vector load.
//		//
/// \returns True.		/// \returns True.

bool canWidenScalarExtLoad(LoadInst &I) const;		bool canWidenScalarExtLoad(LoadInst &I) const;

		/// Fold negate a builtin intrinsic into a test mask
		///
		/// \details \p Perform the following transformation from
		// xor (llvm.amdgcn.class x, mask), -1 into
		// llvm.amdgcn.class(x, ~mask).
		//
		/// \returns True if at least one transformation was performed,
		// false otherwise.
		bool foldNegateIntrinsic(Function &F);

public:		public:
static char ID;		static char ID;

AMDGPUCodeGenPrepare() : FunctionPass(ID) {}		AMDGPUCodeGenPrepare() : FunctionPass(ID) {}

bool visitFDiv(BinaryOperator &I);		bool visitFDiv(BinaryOperator &I);

bool visitInstruction(Instruction &I) { return false; }		bool visitInstruction(Instruction &I) { return false; }
▲ Show 20 Lines • Show All 600 Lines • ▼ Show 20 Lines	bool AMDGPUCodeGenPrepare::visitFDiv(BinaryOperator &FDiv) {
}		}

return !!NewFDiv;		return !!NewFDiv;
}		}

static bool hasUnsafeFPMath(const Function &F) {		static bool hasUnsafeFPMath(const Function &F) {
Attribute Attr = F.getFnAttribute("unsafe-fp-math");		Attribute Attr = F.getFnAttribute("unsafe-fp-math");
return Attr.getValueAsBool();		return Attr.getValueAsBool();
}		}
		foadUnsubmitted Done Reply Inline Actions This check is wrong. It's OK for there to be multiple uses of the Xor, since you replace all of them. It is not OK for there to be other uses of the intrinsic call, since you modify it in-place. foad: This check is wrong. It's OK for there to be multiple uses of the Xor, since you replace all of…

		foadUnsubmitted Done Reply Inline Actions All ConstantInts have IntegerType so you don't need to check that. foad: All ConstantInts have IntegerType so you don't need to check that.
static std::pair<Value, Value> getMul64(IRBuilder<> &Builder,		static std::pair<Value, Value> getMul64(IRBuilder<> &Builder,
		arsenmUnsubmitted Done Reply Inline Actions No else after return arsenm: No else after return
Value LHS, Value RHS) {		Value LHS, Value RHS) {
Type *I32Ty = Builder.getInt32Ty();		Type *I32Ty = Builder.getInt32Ty();
Type *I64Ty = Builder.getInt64Ty();		Type *I64Ty = Builder.getInt64Ty();

Value *LHS_EXT64 = Builder.CreateZExt(LHS, I64Ty);		Value *LHS_EXT64 = Builder.CreateZExt(LHS, I64Ty);
		arsenmUnsubmitted Done Reply Inline Actions Using m_Not would still be clearer arsenm: Using m_Not would still be clearer
Value *RHS_EXT64 = Builder.CreateZExt(RHS, I64Ty);		Value *RHS_EXT64 = Builder.CreateZExt(RHS, I64Ty);
		arsenmUnsubmitted Done Reply Inline Actions Using dyn_cast and isa is redundant, just use dyn_cast and check the result arsenm: Using dyn_cast and isa is redundant, just use dyn_cast and check the result
Value *MUL64 = Builder.CreateMul(LHS_EXT64, RHS_EXT64);		Value *MUL64 = Builder.CreateMul(LHS_EXT64, RHS_EXT64);
Value *Lo = Builder.CreateTrunc(MUL64, I32Ty);		Value *Lo = Builder.CreateTrunc(MUL64, I32Ty);
Value *Hi = Builder.CreateLShr(MUL64, Builder.getInt64(32));		Value *Hi = Builder.CreateLShr(MUL64, Builder.getInt64(32));
		foadUnsubmitted Done Reply Inline Actions Don't need this, you don't use it for anything foad: Don't need this, you don't use it for anything
Hi = Builder.CreateTrunc(Hi, I32Ty);		Hi = Builder.CreateTrunc(Hi, I32Ty);
		arsenmUnsubmitted Done Reply Inline Actions I believe PatternMatch has a nicer way to check for a not arsenm: I believe PatternMatch has a nicer way to check for a not
		arsenmUnsubmitted Done Reply Inline Actions Plus constants are canonicalized to the RHS, so you don't need to check both arsenm: Plus constants are canonicalized to the RHS, so you don't need to check both
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions After removing the else-if condition, this seems clean enough to me. Perhaps, PatternMatch won't be required any more. gandhi21299: After removing the else-if condition, this seems clean enough to me. Perhaps, PatternMatch…
return std::make_pair(Lo, Hi);		return std::make_pair(Lo, Hi);
}		}
		arsenmUnsubmitted Done Reply Inline Actions Invert condition and exit early to reduce indentation arsenm: Invert condition and exit early to reduce indentation

static Value* getMulHu(IRBuilder<> &Builder, Value LHS, Value RHS) {		static Value* getMulHu(IRBuilder<> &Builder, Value LHS, Value RHS) {
return getMul64(Builder, LHS, RHS).second;		return getMul64(Builder, LHS, RHS).second;
		foadUnsubmitted Done Reply Inline Actions This is wrong. If the value was 4 you will xor it with 7 giving 3, but you need to flip all the bits that amdgcn_class cares about, i.e. 10 low order bits. You should either xor with a fixed value of 0x3ff, or perhaps move this enum from AMDGPUInstCombineIntrinsic.cpp to a common header (maybe SIDefines.h?) and add an "ALL" value to it: 260: case Intrinsic::amdgcn_class: { 261- enum { 262- S_NAN = 1 << 0, // Signaling NaN 263- Q_NAN = 1 << 1, // Quiet NaN 264- N_INFINITY = 1 << 2, // Negative infinity 265- N_NORMAL = 1 << 3, // Negative normal 266- N_SUBNORMAL = 1 << 4, // Negative subnormal 267- N_ZERO = 1 << 5, // Negative zero 268- P_ZERO = 1 << 6, // Positive zero 269- P_SUBNORMAL = 1 << 7, // Positive subnormal 270- P_NORMAL = 1 << 8, // Positive normal 271- P_INFINITY = 1 << 9 // Positive infinity 272- }; foad: This is wrong. If the value was 4 you will xor it with 7 giving 3, but you need to flip all the…
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions Flipping only the low 10 bits fails several other tests, thoughts? @arsenm @foad gandhi21299: Flipping only the low 10 bits fails several other tests, thoughts? @arsenm @foad
		foadUnsubmitted Done Reply Inline Actions You need to look at why they fail. Probably your patch changes (improves) the generated code, so the tests need to be updated to expect the improved code sequence. foad: You need to look at why they fail. Probably your patch changes (improves) the generated code…
}		}

/// Figure out how many bits are really needed for this ddivision. \p AtLeast is		/// Figure out how many bits are really needed for this ddivision. \p AtLeast is
		arsenmUnsubmitted Done Reply Inline Actions I'd prefer to just use the dyn_cast below and return on that rather than checking isa first arsenm: I'd prefer to just use the dyn_cast below and return on that rather than checking isa first
/// an optimization hint to bypass the second ComputeNumSignBits call if we the		/// an optimization hint to bypass the second ComputeNumSignBits call if we the
/// first one is insufficient. Returns -1 on failure.		/// first one is insufficient. Returns -1 on failure.
int AMDGPUCodeGenPrepare::getDivNumBits(BinaryOperator &I,		int AMDGPUCodeGenPrepare::getDivNumBits(BinaryOperator &I,
Value Num, Value Den,		Value Num, Value Den,
unsigned AtLeast, bool IsSigned) const {		unsigned AtLeast, bool IsSigned) const {
const DataLayout &DL = Mod->getDataLayout();		const DataLayout &DL = Mod->getDataLayout();
		arsenmUnsubmitted Done Reply Inline Actions There are fewer bits in the test mask than this arsenm: There are fewer bits in the test mask than this
unsigned LHSSignBits = ComputeNumSignBits(Num, DL, 0, AC, &I);		unsigned LHSSignBits = ComputeNumSignBits(Num, DL, 0, AC, &I);
		arsenmUnsubmitted Done Reply Inline Actions You don't need this intermediate APInt (plus this is required to be an i32) arsenm: You don't need this intermediate APInt (plus this is required to be an i32)
if (LHSSignBits < AtLeast)		if (LHSSignBits < AtLeast)
return -1;		return -1;

		arsenmUnsubmitted Done Reply Inline Actions Arg->getType() arsenm: Arg->getType()
unsigned RHSSignBits = ComputeNumSignBits(Den, DL, 0, AC, &I);		unsigned RHSSignBits = ComputeNumSignBits(Den, DL, 0, AC, &I);
if (RHSSignBits < AtLeast)		if (RHSSignBits < AtLeast)
return -1;		return -1;

unsigned SignBits = std::min(LHSSignBits, RHSSignBits);		unsigned SignBits = std::min(LHSSignBits, RHSSignBits);
unsigned DivBits = Num->getType()->getScalarSizeInBits() - SignBits;		unsigned DivBits = Num->getType()->getScalarSizeInBits() - SignBits;
if (IsSigned)		if (IsSigned)
++DivBits;		++DivBits;
▲ Show 20 Lines • Show All 505 Lines • ▼ Show 20 Lines	bool AMDGPUCodeGenPrepare::doInitialization(Module &M) {
DL = &Mod->getDataLayout();		DL = &Mod->getDataLayout();
return false;		return false;
}		}

bool AMDGPUCodeGenPrepare::runOnFunction(Function &F) {		bool AMDGPUCodeGenPrepare::runOnFunction(Function &F) {
if (skipFunction(F))		if (skipFunction(F))
return false;		return false;

		bool MadeChange = foldNegateIntrinsic(F);
		arsenmUnsubmitted Done Reply Inline Actions This should not be a separate pass over the function. This should be a visitXor function arsenm: This should not be a separate pass over the function. This should be a visitXor function

auto *TPC = getAnalysisIfAvailable<TargetPassConfig>();		auto *TPC = getAnalysisIfAvailable<TargetPassConfig>();
if (!TPC)		if (!TPC)
return false;		return false;

const AMDGPUTargetMachine &TM = TPC->getTM<AMDGPUTargetMachine>();		const AMDGPUTargetMachine &TM = TPC->getTM<AMDGPUTargetMachine>();
ST = &TM.getSubtarget<GCNSubtarget>(F);		ST = &TM.getSubtarget<GCNSubtarget>(F);
AC = &getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);		AC = &getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);
DA = &getAnalysis<LegacyDivergenceAnalysis>();		DA = &getAnalysis<LegacyDivergenceAnalysis>();

auto *DTWP = getAnalysisIfAvailable<DominatorTreeWrapperPass>();		auto *DTWP = getAnalysisIfAvailable<DominatorTreeWrapperPass>();
DT = DTWP ? &DTWP->getDomTree() : nullptr;		DT = DTWP ? &DTWP->getDomTree() : nullptr;

HasUnsafeFPMath = hasUnsafeFPMath(F);		HasUnsafeFPMath = hasUnsafeFPMath(F);

AMDGPU::SIModeRegisterDefaults Mode(F);		AMDGPU::SIModeRegisterDefaults Mode(F);
HasFP32Denormals = Mode.allFP32Denormals();		HasFP32Denormals = Mode.allFP32Denormals();

bool MadeChange = false;

Function::iterator NextBB;		Function::iterator NextBB;
for (Function::iterator FI = F.begin(), FE = F.end(); FI != FE; FI = NextBB) {		for (Function::iterator FI = F.begin(), FE = F.end(); FI != FE; FI = NextBB) {
BasicBlock BB = &FI;		BasicBlock BB = &FI;
NextBB = std::next(FI);		NextBB = std::next(FI);

BasicBlock::iterator Next;		BasicBlock::iterator Next;
for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; I = Next) {		for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; I = Next) {
Next = std::next(I);		Next = std::next(I);
Show All 9 Lines	for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; I = Next) {
}		}
}		}
}		}
}		}

return MadeChange;		return MadeChange;
}		}

		bool AMDGPUCodeGenPrepare::foldNegateIntrinsic(Function &F) {
		SmallVector<Instruction *, 8> DeadInstr;
		for (auto BB = F.begin(); BB != F.end(); ++BB) {
		for (auto Inst = BB->begin(); Inst != BB->end(); ++Inst) {
		if (Inst->getOpcode() != Instruction::Xor)
		continue;

		Value *NegOne = ConstantInt::get(Inst->getOperand(0)->getType(), -1);
		arsenmUnsubmitted Done Reply Inline Actions You don't need to create a constant to check the value arsenm: You don't need to create a constant to check the value

		// Match call instruction and constant -1
		Value *ExtCall;
		if (Inst->getOperand(1) == NegOne)
		ExtCall = Inst->getOperand(0);
		else if (Inst->getOperand(0) == NegOne)
		ExtCall = Inst->getOperand(1);
		else
		continue;

		arsenmUnsubmitted Done Reply Inline Actions Should check hasOneUse arsenm: Should check hasOneUse
		// Check if either the parent or the grandparent of the other
		// operand is a function call
		CallInst *IntrinsicCall;
		if (isa<TruncInst>(ExtCall))
		IntrinsicCall =
		dyn_cast<CallInst>(cast<TruncInst>(ExtCall)->getOperand(0));
		else if (isa<SExtInst>(ExtCall))
		IntrinsicCall =
		dyn_cast<CallInst>(cast<SExtInst>(ExtCall)->getOperand(0));
		else if (isa<ZExtInst>(ExtCall))
		IntrinsicCall =
		dyn_cast<CallInst>(cast<ZExtInst>(ExtCall)->getOperand(0));
		else if (isa<CallInst>(ExtCall)) {
		IntrinsicCall = cast<CallInst>(ExtCall);
		} else
		arsenmUnsubmitted Done Reply Inline Actions I don't know why you are looking at all of these extensions. The xor should directly consume the call arsenm: I don't know why you are looking at all of these extensions. The xor should directly consume…
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions I am considering the case where the result from the class intrinsic is extended/truncated for some reason. Is it possible to get the IntrinsicInst directly from the xor operand? gandhi21299: I am considering the case where the result from the class intrinsic is extended/truncated for…
		IntrinsicCall = nullptr;

		// Now check if the CallInst calls llvm.amdgcn.class
		if (!IntrinsicCall \|\| (IntrinsicCall && IntrinsicCall->getIntrinsicID() !=
		Intrinsic::amdgcn_class))
		continue;
		arsenmUnsubmitted Done Reply Inline Actions You can just dyn_cast<IntrinsicInst> arsenm: You can just dyn_cast<IntrinsicInst>

		// "Not" the second argument of the intrinsic call
		IRBuilder<> Builder(IntrinsicCall);
		IntrinsicCall->setArgOperand(
		1, Builder.CreateNot(IntrinsicCall->getOperand(1)));
		arsenmUnsubmitted Done Reply Inline Actions Since you know it's a constant, you can also 0 the irrelevant high bits arsenm: Since you know it's a constant, you can also 0 the irrelevant high bits
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions Do I use IRBuilder to do this or is there a simpler way? gandhi21299: Do I use IRBuilder to do this or is there a simpler way?
		if (isa<CallInst>(ExtCall))
		Inst->replaceAllUsesWith(IntrinsicCall);
		else
		Inst->replaceAllUsesWith(ExtCall);
		DeadInstr.push_back(cast<Instruction>(Inst));
		arsenmUnsubmitted Done Reply Inline Actions You don't need to collect dead instructions, the xor should always be dead arsenm: You don't need to collect dead instructions, the xor should always be dead
		}
		}

		// remove all dead instructions
		for (auto *Inst : DeadInstr)
		Inst->eraseFromParent();
		return !DeadInstr.empty();
		}

INITIALIZE_PASS_BEGIN(AMDGPUCodeGenPrepare, DEBUG_TYPE,		INITIALIZE_PASS_BEGIN(AMDGPUCodeGenPrepare, DEBUG_TYPE,
"AMDGPU IR optimizations", false, false)		"AMDGPU IR optimizations", false, false)
INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)		INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
INITIALIZE_PASS_DEPENDENCY(LegacyDivergenceAnalysis)		INITIALIZE_PASS_DEPENDENCY(LegacyDivergenceAnalysis)
INITIALIZE_PASS_END(AMDGPUCodeGenPrepare, DEBUG_TYPE, "AMDGPU IR optimizations",		INITIALIZE_PASS_END(AMDGPUCodeGenPrepare, DEBUG_TYPE, "AMDGPU IR optimizations",
false, false)		false, false)

char AMDGPUCodeGenPrepare::ID = 0;		char AMDGPUCodeGenPrepare::ID = 0;

FunctionPass *llvm::createAMDGPUCodeGenPreparePass() {		FunctionPass *llvm::createAMDGPUCodeGenPreparePass() {
return new AMDGPUCodeGenPrepare();		return new AMDGPUCodeGenPrepare();
}		}

llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-foldnegate.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -S -amdgpu-codegenprepare -verify %s -o - \| FileCheck %s
				arsenmUnsubmitted Done Reply Inline Actions You didn't actually generate these checks. The operand tests are mostly missing arsenm: You didn't actually generate these checks. The operand tests are mostly missing
				gandhi21299AuthorUnsubmitted Done Reply Inline Actions Can you elaborate on the kind of tests we need? gandhi21299: Can you elaborate on the kind of tests we need?

				declare i1 @llvm.amdgcn.class.f32(float, i32) nounwind readnone
				declare i1 @llvm.amdgcn.class.f64(double, i32) nounwind readnone

				; Trivial case, xor instruction should be removed and
				; the second argument of the intrinsic call should be
				; bitwise-negated
				; CHECK: @fold_negate_intrinsic_test_mask
				; CHECK: %1 = call i1 @llvm.amdgcn.class.f32(float %x, i32 -6)
				; CHECK-NOT: xor
				define i1 @fold_negate_intrinsic_test_mask(float %x) nounwind {
				%1 = call i1 @llvm.amdgcn.class.f32(float %x, i32 5)
				%2 = xor i1 %1, -1
				ret i1 %2
				}

				arsenmUnsubmitted Done Reply Inline Actions Need a negative test with a variable mask. Also a negative test for multiple uses. Plus also could use tests for all of the FP types arsenm: Need a negative test with a variable mask. Also a negative test for multiple uses. Plus also…
				; CHECK: @fold_negate_intrinsic_test_mask_zext
				; CHECK: %[[X:.*]] = call i1 @llvm.amdgcn.class.f32(float %x, i32 -6)
				; CHECK: %[[X1:.*]] = zext i1 %[[X]] to i32
				; CHECK-NOT: xor i32 %[[X1]], -1
				; CHECK: xor i32 %{{.*}}, 42
				define i32 @fold_negate_intrinsic_test_mask_zext(float %x) nounwind {
				%1 = call i1 @llvm.amdgcn.class.f32(float %x, i32 5)
				%2 = zext i1 %1 to i32
				arsenmUnsubmitted Done Reply Inline Actions You shouldn't be trying to look through this zext, this IR should have been optimized to xor on i1 arsenm: You shouldn't be trying to look through this zext, this IR should have been optimized to xor on…
				%3 = xor i32 %2, -1
				%4 = xor i32 %3, 42
				ret i32 %3
				}

				; CHECK: @fold_negate_intrinsic_test_mask_sext
				; CHECK: %1 = call i1 @llvm.amdgcn.class.f32(float %x, i32 5)
				; CHECK: sext
				define i32 @fold_negate_intrinsic_test_mask_sext(float %x) nounwind {
				%1 = call i1 @llvm.amdgcn.class.f32(float %x, i32 5)
				%2 = sext i1 %1 to i32
				ret i32 %2
				arsenmUnsubmitted Done Reply Inline Actions This didn't do anything arsenm: This didn't do anything
				}
				arsenmUnsubmitted Done Reply Inline Actions It would be a better test to have a non-identical use (i.e. just add a store of the value) arsenm: It would be a better test to have a non-identical use (i.e. just add a store of the value)
				arsenmUnsubmitted Done Reply Inline Actions Should check the operands arsenm: Should check the operands
				arsenmUnsubmitted Done Reply Inline Actions Should check the operands arsenm: Should check the operands

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] [CodeGen] Fold negate llvm.amdgcn.class into test mask
ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 351217

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp

llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-foldnegate.ll

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] [CodeGen] Fold negate llvm.amdgcn.class into test maskClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 351217

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp

llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-foldnegate.ll

[AMDGPU] [CodeGen] Fold negate llvm.amdgcn.class into test mask
ClosedPublic