Download Raw Diff

Details

Reviewers

• tstellarAMD
arsenm

Commits

rGcc7067a66895: AMDGPU: Insert two S_NOP instructions for every high level source statement.
rL262579: AMDGPU: Insert two S_NOP instructions for every high level source statement.

Summary

Tools, such as debugger, need to pause execution based on user input (i.e. breakpoint). In order to do this, two S_NOP instructions are inserted for each high level source statement: one before first isa instruction of high level source statement, and one after last isa instruction of high level source statement. Further, debugger may replace S_NOP instructions with S_TRAP instructions based on user input.

Diff Detail

Event Timeline

kzhuravl updated this revision to Diff 48502.Feb 19 2016, 9:12 AM

kzhuravl retitled this revision from to Insert nop for each high level source statement.

kzhuravl updated this object.

kzhuravl added reviewers: • tstellarAMD, arsenm.

kzhuravl added a subscriber: llvm-commits.

Herald added a subscriber: arsenm. · View Herald TranscriptFeb 19 2016, 9:12 AM

• tstellarAMD added inline comments.Feb 19 2016, 10:45 AM

lib/Target/AMDGPU/AMDGPUInsertNopsPass.cpp
1 ↗	(On Diff #48502)	The file name in the header is wrong, but I think the file and the class names should be renamed to use the 'SI' prefix rather than the 'AMDGPU' prefix to be consistent with other GCN only passes.
11–19 ↗	(On Diff #48502)	Why do we need two passes. Can't we just insert the S_NOP instructions in the first pass?
122 ↗	(On Diff #48502)	Style comment. We usually use MBB as a variable name when iterating over blocks and MI when iterating over Machine Instructions. This will make it more obvious what the code is doing.
lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
269	We should also be running this pass if the user specifies -g.
270	This should be added to GCNPassConfig since it is GCN only.

One other question. Is there some reason why the compiler can't add the s_trap instructions instead of having the debugger add them.

tools want to selectively substitute S_NOPs with S_TRAPs based on the user input

lib/Target/AMDGPU/AMDGPUInsertNopsPass.cpp
1 ↗	(On Diff #48502)	ok
11–19 ↗	(On Diff #48502)	this should work with different optimization levels. for o0 one pass works fine. in other opt levels instructions are reordered at different compilation stages. first pass inserts DEBUG_NOP pseudo instructions before register allocation. DEBUG_NOP pseudo instruction has isTerminator attribute, which makes reordering across DEBUG_NOPs not possible. second pass lowers DEBUG_NOPs to S_NOPs right before machine code is emitted.
122 ↗	(On Diff #48502)	ok
lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
269	tools team specifically asked not to do this. for example, for profiling they want to run it with -g, but without inserting nops
270	ok

• tstellarAMD added inline comments.Feb 19 2016, 1:08 PM

lib/Target/AMDGPU/AMDGPUInsertNopsPass.cpp
11–19 ↗	(On Diff #48502)	By the time we get to running the first pass, the code will have already been re-ordered by the LLVM IR passes as well as the SelectionDAG. We also can't insert instructions with terminators in the middle of blocks, because this will break other passes (and the verifier). Can we start with one pass and if the result isn't good enough then maybe look for other solutions?
lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
269	Ok, so the nops are required for both debugging and profiling? Is the command line flag you added accessible from clang?

After discussion with Tools, it was decided to insert two S_NOPs for each high level source statement, this way we do not have to disable any optimizations in non-O0 opt levels

kzhuravl marked 12 inline comments as done.Feb 21 2016, 9:29 AM

kzhuravl added inline comments.

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
269	nops are only required for debugging. for profiling they want to have debug info, but no nops. command line flag is accessible from clang

Tom's feedback

lib/Target/AMDGPU/AMDGPUInsertNopsPass.cpp
11–19 ↗	(On Diff #48502)	After discussion with Tools, it was decided to insert two S_NOPs for each high level source statement, this way we do not have to disable any optimizations in non-O0 opt levels. One S_NOP is inserted before first isa instruction of high level source stmt and after last isa instruction of high level source stmt. Updated the diff which includes one pass

Minor fixes

• tstellarAMD added subscribers: dblaikie, echristo.Feb 22 2016, 10:04 AM

This needs some tests

lib/Target/AMDGPU/SIInsertNopsPass.cpp
24	I don't think this is needed
49–50	If you don't need to initialize dependencies, you can use INITIALIZE_PASS without _BEGIN/_END
63	I think this needs a better name. LineToInst?
66	Can you invert the condition and continue to reduce indentation
71	addImm should be on next line
79–82	I think the use of auto and second here is hard to follow. ->second should be assigned to a variable, and no auto
82	++MI should be new line
87	I would prefer having an explicit MBB variable instead of using the multiple fronds here

Matt's feedback

LGTM.

Closed by commit rL262579: AMDGPU: Insert two S_NOP instructions for every high level source statement. (authored by tstellar). · Explain WhyMar 2 2016, 7:58 PM

This revision was automatically updated to reflect the committed changes.

Diff 48721

lib/Target/AMDGPU/AMDGPU.h

	Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	FunctionPass *createSILowerI1CopiesPass();			FunctionPass *createSILowerI1CopiesPass();
	FunctionPass *createSIShrinkInstructionsPass();			FunctionPass *createSIShrinkInstructionsPass();
	FunctionPass *createSILoadStoreOptimizerPass(TargetMachine &tm);			FunctionPass *createSILoadStoreOptimizerPass(TargetMachine &tm);
	FunctionPass *createSILowerControlFlowPass();			FunctionPass *createSILowerControlFlowPass();
	FunctionPass *createSIFixControlFlowLiveIntervalsPass();			FunctionPass *createSIFixControlFlowLiveIntervalsPass();
	FunctionPass *createSIFixSGPRCopiesPass();			FunctionPass *createSIFixSGPRCopiesPass();
	FunctionPass *createSIFixSGPRLiveRangesPass();			FunctionPass *createSIFixSGPRLiveRangesPass();
	FunctionPass *createSICodeEmitterPass(formatted_raw_ostream &OS);			FunctionPass *createSICodeEmitterPass(formatted_raw_ostream &OS);
				FunctionPass *createSIInsertNopsPass();
	FunctionPass *createSIInsertWaitsPass();			FunctionPass *createSIInsertWaitsPass();

	ScheduleDAGInstrs createSIMachineScheduler(MachineSchedContext C);			ScheduleDAGInstrs createSIMachineScheduler(MachineSchedContext C);

	ModulePass *createAMDGPUAnnotateKernelFeaturesPass();			ModulePass *createAMDGPUAnnotateKernelFeaturesPass();
	void initializeAMDGPUAnnotateKernelFeaturesPass(PassRegistry &);			void initializeAMDGPUAnnotateKernelFeaturesPass(PassRegistry &);
	extern char &AMDGPUAnnotateKernelFeaturesID;			extern char &AMDGPUAnnotateKernelFeaturesID;

	Show All 31 Lines
	extern char &SIFixSGPRLiveRangesID;			extern char &SIFixSGPRLiveRangesID;

	void initializeAMDGPUAnnotateUniformValuesPass(PassRegistry&);			void initializeAMDGPUAnnotateUniformValuesPass(PassRegistry&);
	extern char &AMDGPUAnnotateUniformValuesPassID;			extern char &AMDGPUAnnotateUniformValuesPassID;

	void initializeSIAnnotateControlFlowPass(PassRegistry&);			void initializeSIAnnotateControlFlowPass(PassRegistry&);
	extern char &SIAnnotateControlFlowPassID;			extern char &SIAnnotateControlFlowPassID;

				void initializeSIInsertNopsPass(PassRegistry&);
				extern char &SIInsertNopsID;

	void initializeSIInsertWaitsPass(PassRegistry&);			void initializeSIInsertWaitsPass(PassRegistry&);
	extern char &SIInsertWaitsID;			extern char &SIInsertWaitsID;

	extern Target TheAMDGPUTarget;			extern Target TheAMDGPUTarget;
	extern Target TheGCNTarget;			extern Target TheGCNTarget;

	namespace AMDGPU {			namespace AMDGPU {
	enum TargetIndex {			enum TargetIndex {
	▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show All 24 Lines
#include "llvm/Analysis/Passes.h"		#include "llvm/Analysis/Passes.h"
#include "llvm/CodeGen/MachineFunctionAnalysis.h"		#include "llvm/CodeGen/MachineFunctionAnalysis.h"
#include "llvm/CodeGen/TargetLoweringObjectFileImpl.h"		#include "llvm/CodeGen/TargetLoweringObjectFileImpl.h"
#include "llvm/CodeGen/MachineModuleInfo.h"		#include "llvm/CodeGen/MachineModuleInfo.h"
#include "llvm/CodeGen/Passes.h"		#include "llvm/CodeGen/Passes.h"
#include "llvm/IR/Verifier.h"		#include "llvm/IR/Verifier.h"
#include "llvm/MC/MCAsmInfo.h"		#include "llvm/MC/MCAsmInfo.h"
#include "llvm/IR/LegacyPassManager.h"		#include "llvm/IR/LegacyPassManager.h"
		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/TargetRegistry.h"		#include "llvm/Support/TargetRegistry.h"
#include "llvm/Support/raw_os_ostream.h"		#include "llvm/Support/raw_os_ostream.h"
#include "llvm/Transforms/IPO.h"		#include "llvm/Transforms/IPO.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include <llvm/CodeGen/Passes.h>		#include <llvm/CodeGen/Passes.h>

using namespace llvm;		using namespace llvm;

extern "C" void LLVMInitializeAMDGPUTarget() {		extern "C" void LLVMInitializeAMDGPUTarget() {
// Register the target		// Register the target
RegisterTargetMachine<R600TargetMachine> X(TheAMDGPUTarget);		RegisterTargetMachine<R600TargetMachine> X(TheAMDGPUTarget);
RegisterTargetMachine<GCNTargetMachine> Y(TheGCNTarget);		RegisterTargetMachine<GCNTargetMachine> Y(TheGCNTarget);

PassRegistry *PR = PassRegistry::getPassRegistry();		PassRegistry *PR = PassRegistry::getPassRegistry();
initializeSILowerI1CopiesPass(*PR);		initializeSILowerI1CopiesPass(*PR);
initializeSIFixSGPRCopiesPass(*PR);		initializeSIFixSGPRCopiesPass(*PR);
initializeSIFoldOperandsPass(*PR);		initializeSIFoldOperandsPass(*PR);
initializeSIFixSGPRLiveRangesPass(*PR);		initializeSIFixSGPRLiveRangesPass(*PR);
initializeSIFixControlFlowLiveIntervalsPass(*PR);		initializeSIFixControlFlowLiveIntervalsPass(*PR);
initializeSILoadStoreOptimizerPass(*PR);		initializeSILoadStoreOptimizerPass(*PR);
initializeAMDGPUAnnotateKernelFeaturesPass(*PR);		initializeAMDGPUAnnotateKernelFeaturesPass(*PR);
initializeAMDGPUAnnotateUniformValuesPass(*PR);		initializeAMDGPUAnnotateUniformValuesPass(*PR);
initializeAMDGPUPromoteAllocaPass(*PR);		initializeAMDGPUPromoteAllocaPass(*PR);
initializeSIAnnotateControlFlowPass(*PR);		initializeSIAnnotateControlFlowPass(*PR);
		initializeSIInsertNopsPass(*PR);
initializeSIInsertWaitsPass(*PR);		initializeSIInsertWaitsPass(*PR);
initializeSILowerControlFlowPass(*PR);		initializeSILowerControlFlowPass(*PR);
}		}

static std::unique_ptr<TargetLoweringObjectFile> createTLOF(const Triple &TT) {		static std::unique_ptr<TargetLoweringObjectFile> createTLOF(const Triple &TT) {
if (TT.getOS() == Triple::AMDHSA)		if (TT.getOS() == Triple::AMDHSA)
return make_unique<AMDGPUHSATargetObjectFile>();		return make_unique<AMDGPUHSATargetObjectFile>();

▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	GCNTargetMachine::GCNTargetMachine(const Target &T, const Triple &TT,
CodeModel::Model CM, CodeGenOpt::Level OL)		CodeModel::Model CM, CodeGenOpt::Level OL)
: AMDGPUTargetMachine(T, TT, CPU, FS, Options, RM, CM, OL) {}		: AMDGPUTargetMachine(T, TT, CPU, FS, Options, RM, CM, OL) {}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// AMDGPU Pass Setup		// AMDGPU Pass Setup
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

namespace {		namespace {

		cl::opt<bool> InsertNops(
		"amdgpu-insert-nops",
		cl::desc("Insert two nop instructions for each high level source statement"),
		cl::init(false));

class AMDGPUPassConfig : public TargetPassConfig {		class AMDGPUPassConfig : public TargetPassConfig {
public:		public:
AMDGPUPassConfig(TargetMachine *TM, PassManagerBase &PM)		AMDGPUPassConfig(TargetMachine *TM, PassManagerBase &PM)
: TargetPassConfig(TM, PM) {		: TargetPassConfig(TM, PM) {

// Exceptions and StackMaps are not supported, so these passes will never do		// Exceptions and StackMaps are not supported, so these passes will never do
// anything.		// anything.
disablePass(&StackMapLivenessID);		disablePass(&StackMapLivenessID);
▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines
}		}

bool AMDGPUPassConfig::addGCPasses() {		bool AMDGPUPassConfig::addGCPasses() {
// Do nothing. GC is not supported.		// Do nothing. GC is not supported.
return false;		return false;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// R600 Pass Setup		// R600 Pass Setup
		tstellarAMDUnsubmitted Done Reply Inline Actions We should also be running this pass if the user specifies -g. tstellarAMD: We should also be running this pass if the user specifies -g.
		kzhuravlAuthorUnsubmitted Done Reply Inline Actions tools team specifically asked not to do this. for example, for profiling they want to run it with -g, but without inserting nops kzhuravl: tools team specifically asked not to do this. for example, for profiling they want to run it…
		tstellarAMDUnsubmitted Done Reply Inline Actions Ok, so the nops are required for both debugging and profiling? Is the command line flag you added accessible from clang? tstellarAMD: Ok, so the nops are required for both debugging and profiling? Is the command line flag you…
		kzhuravlAuthorUnsubmitted Not Done Reply Inline Actions nops are only required for debugging. for profiling they want to have debug info, but no nops. command line flag is accessible from clang kzhuravl: nops are only required for debugging. for profiling they want to have debug info, but no nops.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		tstellarAMDUnsubmitted Done Reply Inline Actions This should be added to GCNPassConfig since it is GCN only. tstellarAMD: This should be added to GCNPassConfig since it is GCN only.
		kzhuravlAuthorUnsubmitted Done Reply Inline Actions ok kzhuravl: ok

bool R600PassConfig::addPreISel() {		bool R600PassConfig::addPreISel() {
AMDGPUPassConfig::addPreISel();		AMDGPUPassConfig::addPreISel();
addPass(createR600TextureIntrinsicsReplacer());		addPass(createR600TextureIntrinsicsReplacer());
return false;		return false;
}		}

void R600PassConfig::addPreRegAlloc() {		void R600PassConfig::addPreRegAlloc() {
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
}		}

void GCNPassConfig::addPreSched2() {		void GCNPassConfig::addPreSched2() {
}		}

void GCNPassConfig::addPreEmitPass() {		void GCNPassConfig::addPreEmitPass() {
addPass(createSIInsertWaitsPass(), false);		addPass(createSIInsertWaitsPass(), false);
addPass(createSILowerControlFlowPass(), false);		addPass(createSILowerControlFlowPass(), false);
		if (InsertNops) {
		addPass(createSIInsertNopsPass(), false);
		}
}		}

TargetPassConfig *GCNTargetMachine::createPassConfig(PassManagerBase &PM) {		TargetPassConfig *GCNTargetMachine::createPassConfig(PassManagerBase &PM) {
return new GCNPassConfig(this, PM);		return new GCNPassConfig(this, PM);
}		}

lib/Target/AMDGPU/CMakeLists.txt

Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	add_llvm_target(AMDGPUCodeGen
R600RegisterInfo.cpp		R600RegisterInfo.cpp
R600TextureIntrinsicsReplacer.cpp		R600TextureIntrinsicsReplacer.cpp
SIAnnotateControlFlow.cpp		SIAnnotateControlFlow.cpp
SIFixControlFlowLiveIntervals.cpp		SIFixControlFlowLiveIntervals.cpp
SIFixSGPRCopies.cpp		SIFixSGPRCopies.cpp
SIFixSGPRLiveRanges.cpp		SIFixSGPRLiveRanges.cpp
SIFoldOperands.cpp		SIFoldOperands.cpp
SIFrameLowering.cpp		SIFrameLowering.cpp
		SIInsertNopsPass.cpp
SIInsertWaits.cpp		SIInsertWaits.cpp
SIInstrInfo.cpp		SIInstrInfo.cpp
SIISelLowering.cpp		SIISelLowering.cpp
SILoadStoreOptimizer.cpp		SILoadStoreOptimizer.cpp
SILowerControlFlow.cpp		SILowerControlFlow.cpp
SILowerI1Copies.cpp		SILowerI1Copies.cpp
SIMachineFunctionInfo.cpp		SIMachineFunctionInfo.cpp
SIMachineScheduler.cpp		SIMachineScheduler.cpp
Show All 10 Lines

lib/Target/AMDGPU/SIInsertNopsPass.cpp

This file was added.

				//===--- SIInsertNopsPass.cpp - Use predicates for control flow -----------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				/// \file
				/// \brief Insert two S_NOP instructions for every high level source statement.
				///
				/// Tools, such as debugger, need to pause execution based on user input (i.e.
				/// breakpoint). In order to do this, two S_NOP instructions are inserted for
				/// each high level source statement: one before first isa instruction of high
				/// level source statement, and one after last isa instruction of high level
				/// source statement. Further, debugger may replace S_NOP instructions with
				/// S_TRAP instructions based on user input.
				//
				//===----------------------------------------------------------------------===//

				#include "SIInstrInfo.h"
				#include "llvm/ADT/DenseMap.h"
				#include "llvm/CodeGen/MachineFunction.h"
				arsenmUnsubmitted Done Reply Inline Actions I don't think this is needed arsenm: I don't think this is needed
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				using namespace llvm;

				#define DEBUG_TYPE "si-insert-nops"
				#define PASS_NAME "SI Insert Nops"

				namespace {

				class SIInsertNops : public MachineFunctionPass {
				public:
				static char ID;

				SIInsertNops() : MachineFunctionPass(ID) { }
				const char *getPassName() const override { return PASS_NAME; }

				bool runOnMachineFunction(MachineFunction &MF) override;
				};

				} // anonymous namespace

				INITIALIZE_PASS(SIInsertNops, DEBUG_TYPE, PASS_NAME, false, false)

				char SIInsertNops::ID = 0;
				char &llvm::SIInsertNopsID = SIInsertNops::ID;

				arsenmUnsubmitted Done Reply Inline Actions If you don't need to initialize dependencies, you can use INITIALIZE_PASS without _BEGIN/_END arsenm: If you don't need to initialize dependencies, you can use INITIALIZE_PASS without _BEGIN/_END
				FunctionPass *llvm::createSIInsertNopsPass() {
				return new SIInsertNops();
				}

				bool SIInsertNops::runOnMachineFunction(MachineFunction &MF) {
				const SIInstrInfo *TII =
				static_cast<const SIInstrInfo*>(MF.getSubtarget().getInstrInfo());

				DenseMap<unsigned, MachineBasicBlock::iterator> LineToInst;
				for (auto MBB = MF.begin(); MBB != MF.end(); ++MBB) {
				for (auto MI = MBB->begin(); MI != MBB->end(); ++MI) {
				if (MI->isDebugValue() \|\| !MI->getDebugLoc()) {
				continue;
				arsenmUnsubmitted Done Reply Inline Actions I think this needs a better name. LineToInst? arsenm: I think this needs a better name. LineToInst?
				}
				auto DL = MI->getDebugLoc();
				auto CL = DL.getLine();
				arsenmUnsubmitted Done Reply Inline Actions Can you invert the condition and continue to reduce indentation arsenm: Can you invert the condition and continue to reduce indentation
				auto LineToInstEntry = LineToInst.find(CL);
				if (LineToInstEntry == LineToInst.end()) {
				BuildMI(MBB, MI, DL, TII->get(AMDGPU::S_NOP))
				.addImm(0);
				LineToInst.insert(std::make_pair(CL, MI));
				arsenmUnsubmitted Done Reply Inline Actions addImm should be on next line arsenm: addImm should be on next line
				} else {
				LineToInstEntry->second = MI;
				}
				}
				}
				for (auto LineToInstEntry = LineToInst.begin();
				LineToInstEntry != LineToInst.end(); ++LineToInstEntry) {
				auto MBB = LineToInstEntry->second->getParent();
				auto DL = LineToInstEntry->second->getDebugLoc();
				MachineBasicBlock::iterator MI = LineToInstEntry->second;
				++MI;
				arsenmUnsubmitted Done Reply Inline Actions I think the use of auto and second here is hard to follow. ->second should be assigned to a variable, and no auto arsenm: I think the use of auto and second here is hard to follow. ->second should be assigned to a…
				arsenmUnsubmitted Done Reply Inline Actions ++MI should be new line arsenm: ++MI should be new line
				if (MI != MBB->end()) {
				BuildMI(MBB, MI, DL, TII->get(AMDGPU::S_NOP))
				.addImm(0);
				}
				}
				arsenmUnsubmitted Done Reply Inline Actions I would prefer having an explicit MBB variable instead of using the multiple fronds here arsenm: I would prefer having an explicit MBB variable instead of using the multiple fronds here
				MachineBasicBlock &MBB = MF.front();
				MachineInstr &MI = MBB.front();
				BuildMI(MBB, MI, DebugLoc(), TII->get(AMDGPU::S_NOP))
				.addImm(0);

				return true;
				}

This is an archive of the discontinued LLVM Phabricator instance.

Insert two S_NOP instructions for every high level source statement.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 48721

lib/Target/AMDGPU/AMDGPU.h

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

lib/Target/AMDGPU/CMakeLists.txt

lib/Target/AMDGPU/SIInsertNopsPass.cpp

This is an archive of the discontinued LLVM Phabricator instance.

Insert two S_NOP instructions for every high level source statement.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 48721

lib/Target/AMDGPU/AMDGPU.h

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

lib/Target/AMDGPU/CMakeLists.txt

lib/Target/AMDGPU/SIInsertNopsPass.cpp

Insert two S_NOP instructions for every high level source statement.
ClosedPublic