This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/GlobalISel: Support llvm.trap and llvm.debugtrap intrinsics
ClosedPublic

Authored by hsmhsm on Feb 16 2020, 3:30 AM.

Download Raw Diff

Details

Reviewers

arsenm
nhaehnle
kerbowa
cdevadas
t-tye
kzhuravl

Commits

rG3fda1fde8f7b: AMDGPU/GlobalISel: Support llvm.trap and llvm.debugtrap intrinsics

Summary

Lower trap and debugtrap intrinsics to AMDGPU machine instruction(s).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

hsmhsm created this revision.Feb 16 2020, 3:30 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 16 2020, 3:30 AM

Herald added subscribers: llvm-commits, hiraditya, tpr and 6 others. · View Herald Transcript

Harbormaster completed remote builds in B46596: Diff 244864.Feb 16 2020, 3:33 AM

arsenm added inline comments.Feb 17 2020, 6:37 AM

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
3604–3605	This is directly copying the physical register into another physical register. We don't want that. This needs to go through an intermediate virtual register copy
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.debugtrap.ll
3–14 ↗	(On Diff #244864)	Shouldn't run the DAG tests in the GlobalIsel test directory
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.trap.ll
96	Can you add another test that uses the stack? Ideally we would also have a test in a non-kernel function, but I know that won't work right now since we don't handle the special argument inputs yet

arsenm added inline comments.Feb 17 2020, 6:39 AM

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
3604–3605	Can you add another assert in loadInputValue? It already asserts assert(Arg->getRegister().isPhysical()), but should also assert the result is virtual

Can you also add a MIR test to make sure the intermediate virtual copy is added? You'll need to explicitly add the queueptr in the MIR MachineFunctionInfo

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.trap.ll
96	Actually, never mind. A non-kernel function should work. Calling it will not

Take care of review comments by arsenm.

Introduce intermediate virtual register copy
Remove DAG tests in the GlobalIsel test directory
Add LLVM IRs that uses the stack
Assert that destination register is virtual within loadInputValue()

Harbormaster completed remote builds in B46658: Diff 245020.Feb 17 2020, 11:53 AM

Regarding the sharing of original ISelDAG test within GlobalISel, it seems to be not working since there are checks like below in case of ISelDAG path. Hence, let's keep the both the tests separate for now.

; MESA-TRAP: .section .AMDGPU.config
; MESA-TRAP: .long 47180
; MESA-TRAP-NEXT: .long 208

In D74688#1879573, @hsmhsm wrote:

Regarding the sharing of original ISelDAG test within GlobalISel, it seems to be not working since there are checks like below in case of ISelDAG path. Hence, let's keep the both the tests separate for now.

; MESA-TRAP: .section .AMDGPU.config
; MESA-TRAP: .long 47180
; MESA-TRAP-NEXT: .long 208

I would expect these to be the same in both?

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
3606	loadInputValue calls getLiveInRegister, you shouldn't need it here
3607	Why do you need to set the class? It shouldn't be necessary at this point
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.trap.ll
96	Should still add a function test. Also one that has a separate, explicit use of the queue.ptr intrinsic wouldn't hurt either

Share the original DAG test for GlobalISel, and do not re-invent new one here.

Harbormaster completed remote builds in B46686: Diff 245088.Feb 17 2020, 11:48 PM

hsmhsm marked 3 inline comments as done.Feb 18 2020, 12:08 AM

hsmhsm added inline comments.

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
3606	My understanding of your earlier review comments is: loadInputValue() should expect, source register as physical register, and destination register as virtual register, and we should have assertions for these accordingly. The assertion for later was missing, that I added as per your review suggestion. Now, going by above, it is the responsibility of the caller of loadInputValue() to make sure that it passes destination register as virtual register. Hence, I created a destination virtual register here by calling getLiveInRegister(). The another call within loadInputValue() is for source physical register. I am not getting what am I missing here.
3607	We need to set a register class for destination virtual register. otherwise, we get an assertion failure during compilation saying that the register type is not valid. If it is not suppose to be setting here, where else should I be setting it? OR do I need to handle it in a different way? Please suggest it in more detail.
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.trap.ll
96	Actually, I am bit confused about writing test cases here. Let's start from scratch here. As you suggested, I have shared the same original DAG test for GlobalISel, without re-inventing the same for GlobalISel. I was making some mistakes earlier, that I learned and corrected it, and sharing of original DAG test works fine now. Now, starting from this point, please suggest how should I further continue here. What are the additional tests that need to be covered here, and why? Do I need to add the additional tests within original DAG test, and make sure both ISelDAG path and GlobalSel path work fine for these additional tests? Or these are specific to only GlobalISel path for some specific reason?

arsenm added inline comments.Feb 18 2020, 9:55 AM

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
3606	loadInputValue loads a physical register copy into a virtual register. This was written for the use by the intrinsics, which have a virtual register result write to already. In this case you don't have or really need one. What you really want is a version of getLiveInRegister that ensures the copy is inserted, as the second half of loadInputValue does. What you have now I think happens to work, but confusingly since the loadInputValue call will insert it You may want to split this part into a helper function: // Insert the argument copy if it doens't already exist. // FIXME: It seems EmitLiveInCopies isn't called anywhere? if (!MRI.getVRegDef(LiveIn)) { // FIXME: Should have scoped insert pt MachineBasicBlock &OrigInsBB = B.getMBB(); auto OrigInsPt = B.getInsertPt(); MachineBasicBlock &EntryMBB = B.getMF().front(); EntryMBB.addLiveIn(Arg->getRegister()); B.setInsertPt(EntryMBB, EntryMBB.begin()); B.buildCopy(LiveIn, Arg->getRegister()); B.setInsertPt(OrigInsBB, OrigInsPt); }
3607	The register class is only tangentially related to the type. I think you somehow ended up creating a virtual register without setting the type on it. Did something already add the queue ptr as a live in before this lowering? I also would not call this VDstReg, as that makes it sound like a VGPR.

Another possible strategy would be to emit a call to the queue.ptr intrinsic, and allow that to be legalized

Take care of further review comments by arsenm.

Add an another version of getLiveInRegister() which works with TargetRegisterClass
While copying to destination virtual register, make sure that it is defined
Bit of code refactoring around related part of the source.

Harbormaster completed remote builds in B46789: Diff 245346.Feb 19 2020, 1:08 AM

hsmhsm marked 2 inline comments as done.Feb 19 2020, 1:12 AM

hsmhsm added inline comments.

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
3606	I have taken care of it. Now, before making copy to destination virtual register, we ensure that it is defined.
3607	I have taken care of it. I just needed an another version of getLiveInRegister() which works with TargetRegisterClass.

arsenm added inline comments.Feb 19 2020, 2:23 PM

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2315–2316	I still don't think you should need the register class set. This should still have the type set on the virtual register
2321–2324	I think it would be better to have this only take the physical register input, and return the livein virtreg. This wouldn't have the // Destination virtual register is already defined, just insert copy case. The intrinsic lowering cases would then be responsible for inserting the extra copy to the expected result
2348	This shouldn't be called twice
2357–2359	Move the assignment up to avoid the weird !()
3602	Ditto
3609	Remove DstReg and just directly refer to the physical register
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h
95	Should use MCRegister for Reg to make it clear it's physical.

Changes as per further review comments:

First, I would like to make sure that all the *necessary* foundational issues are handled before actually discussing the crux of some assertion failure, that we hit, if we do not assign a register class to a virtual live-in register of SGPR01.
So, I have taken care of all your previous review comments here, and I have deliberately put assertions about physical registers where it is expected.
Let's ensure these changes are as per your satisfaction, before moving to next point of discussion. Please let me know if I need to take care any remaining issues here before actually discussing assertion failure related issues.

Harbormaster completed remote builds in B46885: Diff 245576.Feb 19 2020, 10:00 PM

hsmhsm marked 6 inline comments as done.Feb 19 2020, 10:04 PM

hsmhsm added inline comments.

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2315–2316	I have taken care of it. But, we hit with an assertion, if we do not set register class. However, let's assume for now that this is what expected here, and later discuss this assertion issue, once we are fine with all the foundational changes.
2321–2324	taken care
2348	taken care
2357–2359	taken care
3602	taken care
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h
95	BuildCopy seems not work with MCRegister, hence added an assertion to ensure that PhyReg is indeed physical register.

arsenm added inline comments.Feb 20 2020, 4:26 PM

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2321–2324	This is halfway there. You're still passing in both the live in physical register and the corresponding virtual register. I was thinking you would call getLiveInRegister with just the physical register, and it would be responsible for finding out the virtual register, and inserting the entry block copy by calling insertLiveInCopy. loadInputValue then wouldn't need to worry about ensuring the entry block copy was inserted like it does now.
3607	Physical registers don't have types. This will just assert. Here you know the type is just LLT::pointer(CONSTANT_ADDRESSS, 64)

Take care of further review comments by arsenm.

hsmhsm marked an inline comment as done.Feb 22 2020, 10:28 PM

hsmhsm added inline comments.

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2321–2324	Taken care

Harbormaster completed remote builds in B47099: Diff 246097.Feb 22 2020, 10:34 PM

I believe that the debug_trap trap handler support no longer needs the queue_ptr to be passed in as it is internally computing it from the dootbell ID available from a GETREG together with a doorbell to queue mapping maintained by the ROCm Runtim that is accessible through the TMB register.

So should that change be reflect in this to simplify things? This is does change the ABI so needs the HSA ABI number to be incremented in the ELF header and AMDGOUUsage documentation updated. However, from the compilers point of view it is not ABI breaking as old code will still work, as it is generating code to set the queue_ptr that is unnecessary.

Adding @kzhuravl for ELF ABI version help.

t-tye added a reviewer: kzhuravl.Feb 23 2020, 7:09 AM

In D74688#1888262, @t-tye wrote:

I believe that the debug_trap trap handler support no longer needs the queue_ptr to be passed in as it is internally computing it from the dootbell ID available from a GETREG together with a doorbell to queue mapping maintained by the ROCm Runtim that is accessible through the TMB register.

So should that change be reflect in this to simplify things? This is does change the ABI so needs the HSA ABI number to be incremented in the ELF header and AMDGOUUsage documentation updated. However, from the compilers point of view it is not ABI breaking as old code will still work, as it is generating code to set the queue_ptr that is unnecessary.

Adding @kzhuravl for ELF ABI version help.

Hi Tony,

If you are specifically asking for llvm.debugtrap() intrinsic here, then, we are already taken care of it, and we are not adding queue_ptr in this case, as you can see from the code changes at AMDGPULegalizerInfo.cpp:3602. The queue_ptr related discussions here are only specific to llvm.trap() intrinsic.

LGTM

This revision is now accepted and ready to land.Feb 26 2020, 7:07 PM

Couple of additional required changes

[1] SGPR01 is a new register created within legalizeTrapIntrinsic(), and we actually need to get its type as LLT::scalar(64).
[2] Call to insertLiveInCopy() within getLiveInRegister() is only required if the copy is required from physical to virtual register.
[3] When call to insertLiveInCopy() within getLiveInRegister() is required, make sure that it is called irrespective of whether the virtual register is newly created within getLiveInRegister() or not.

Harbormaster completed remote builds in B47689: Diff 247447.Feb 29 2020, 9:05 AM

In D74688#1888777, @hsmhsm wrote:

In D74688#1888262, @t-tye wrote:

I believe that the debug_trap trap handler support no longer needs the queue_ptr to be passed in as it is internally computing it from the dootbell ID available from a GETREG together with a doorbell to queue mapping maintained by the ROCm Runtim that is accessible through the TMB register.

So should that change be reflect in this to simplify things? This is does change the ABI so needs the HSA ABI number to be incremented in the ELF header and AMDGOUUsage documentation updated. However, from the compilers point of view it is not ABI breaking as old code will still work, as it is generating code to set the queue_ptr that is unnecessary.

Adding @kzhuravl for ELF ABI version help.

Hi Tony,

If you are specifically asking for llvm.debugtrap() intrinsic here, then, we are already taken care of it, and we are not adding queue_ptr in this case, as you can see from the code changes at AMDGPULegalizerInfo.cpp:3602. The queue_ptr related discussions here are only specific to llvm.trap() intrinsic.

Looking at AMDGPULegalizerInfo.cpp:3602 it appears the queue_ptr is still being set up so not sure what you mean that it is already being taken care of. What I describe is true for both llvm.trap() and llvm.debug_trap(). Should the setup of the queu_ptr be removed since it is no longer needed to support either llvm.trap() or llvm.debug_trap()? Doing so would need a change in AMDGPUUsage, a change to the ELF HSA ABI number, and corresponding ROCm loader changes. There may also need to be a plan to support older ROCm releases.

In D74688#1900248, @t-tye wrote:

In D74688#1888777, @hsmhsm wrote:

In D74688#1888262, @t-tye wrote:

I believe that the debug_trap trap handler support no longer needs the queue_ptr to be passed in as it is internally computing it from the dootbell ID available from a GETREG together with a doorbell to queue mapping maintained by the ROCm Runtim that is accessible through the TMB register.

So should that change be reflect in this to simplify things? This is does change the ABI so needs the HSA ABI number to be incremented in the ELF header and AMDGOUUsage documentation updated. However, from the compilers point of view it is not ABI breaking as old code will still work, as it is generating code to set the queue_ptr that is unnecessary.

Adding @kzhuravl for ELF ABI version help.

Hi Tony,

If you are specifically asking for llvm.debugtrap() intrinsic here, then, we are already taken care of it, and we are not adding queue_ptr in this case, as you can see from the code changes at AMDGPULegalizerInfo.cpp:3602. The queue_ptr related discussions here are only specific to llvm.trap() intrinsic.

Looking at AMDGPULegalizerInfo.cpp:3602 it appears the queue_ptr is still being set up so not sure what you mean that it is already being taken care of.

The line number has changed with latest changes. I was basically referring to the function - AMDGPULegalizerInfo::legalizeDebugTrapIntrinsic() by assuming that you referring here only debugtrap() inntrinsic.

What I describe is true for both llvm.trap() and llvm.debug_trap(). Should the setup of the queu_ptr be removed since it is no longer needed to support either llvm.trap() or llvm.debug_trap()? Doing so would need a change in AMDGPUUsage, a change to the ELF HSA ABI number, and corresponding ROCm loader changes. There may also need to be a plan to support older ROCm releases.

I suggest that let's check-in this change so that trap and debug trap support looks similar both in ISELDAG and GLOBALISEL PATH (by setting-up queue_ptr) for now. And, we should take-up what you mention above (to remove queue_ptr) as a separate activity, first in ISELDAG path, and then in GLOBALISEL path.

In D74688#1900255, @hsmhsm wrote:

In D74688#1900248, @t-tye wrote:

In D74688#1888777, @hsmhsm wrote:

In D74688#1888262, @t-tye wrote:

I believe that the debug_trap trap handler support no longer needs the queue_ptr to be passed in as it is internally computing it from the dootbell ID available from a GETREG together with a doorbell to queue mapping maintained by the ROCm Runtim that is accessible through the TMB register.

So should that change be reflect in this to simplify things? This is does change the ABI so needs the HSA ABI number to be incremented in the ELF header and AMDGOUUsage documentation updated. However, from the compilers point of view it is not ABI breaking as old code will still work, as it is generating code to set the queue_ptr that is unnecessary.

Adding @kzhuravl for ELF ABI version help.

Hi Tony,

If you are specifically asking for llvm.debugtrap() intrinsic here, then, we are already taken care of it, and we are not adding queue_ptr in this case, as you can see from the code changes at AMDGPULegalizerInfo.cpp:3602. The queue_ptr related discussions here are only specific to llvm.trap() intrinsic.

Looking at AMDGPULegalizerInfo.cpp:3602 it appears the queue_ptr is still being set up so not sure what you mean that it is already being taken care of.

The line number has changed with latest changes. I was basically referring to the function - AMDGPULegalizerInfo::legalizeDebugTrapIntrinsic() by assuming that you referring here only debugtrap() inntrinsic.

OK I see that now. Thanks.

What I describe is true for both llvm.trap() and llvm.debug_trap(). Should the setup of the queu_ptr be removed since it is no longer needed to support either llvm.trap() or llvm.debug_trap()? Doing so would need a change in AMDGPUUsage, a change to the ELF HSA ABI number, and corresponding ROCm loader changes. There may also need to be a plan to support older ROCm releases.

I suggest that let's check-in this change so that trap and debug trap support looks similar both in ISELDAG and GLOBALISEL PATH (by setting-up queue_ptr) for now. And, we should take-up what you mention above (to remove queue_ptr) as a separate activity, first in ISELDAG path, and then in GLOBALISEL path.

OK that sounds reasonable. Thanks.

arsenm requested changes to this revision.Mar 2 2020, 12:03 PM

arsenm added inline comments.

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
3606	I would just use the literal register value below rather than have a variable for it. At least make this a const Register
3607	The type here should really be LLT::pointer(CONSTANT_ADDRESS, 64)

This revision now requires changes to proceed.Mar 2 2020, 12:03 PM

Take care of latest review comments by matt about type setting of virtual register.

Harbormaster failed remote builds in B47822: Diff 247712!Mar 2 2020, 1:19 PM

arsenm accepted this revision.Mar 4 2020, 12:23 PM

This revision is now accepted and ready to land.Mar 4 2020, 12:23 PM

Closed by commit rG3fda1fde8f7b: AMDGPU/GlobalISel: Support llvm.trap and llvm.debugtrap intrinsics (authored by hsmhsm). · Explain WhyMar 4 2020, 7:03 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPULegalizerInfo.h

16 lines

AMDGPULegalizerInfo.cpp

169 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

llvm.trap.ll

16 lines

Diff 248372

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h

Show First 20 Lines • Show All 83 Lines • ▼ Show 20 Lines	public:
bool legalizeFExp(MachineInstr &MI, MachineIRBuilder &B) const;		bool legalizeFExp(MachineInstr &MI, MachineIRBuilder &B) const;
bool legalizeFPow(MachineInstr &MI, MachineIRBuilder &B) const;		bool legalizeFPow(MachineInstr &MI, MachineIRBuilder &B) const;
bool legalizeFFloor(MachineInstr &MI, MachineRegisterInfo &MRI,		bool legalizeFFloor(MachineInstr &MI, MachineRegisterInfo &MRI,
MachineIRBuilder &B) const;		MachineIRBuilder &B) const;

bool legalizeBuildVector(MachineInstr &MI, MachineRegisterInfo &MRI,		bool legalizeBuildVector(MachineInstr &MI, MachineRegisterInfo &MRI,
MachineIRBuilder &B) const;		MachineIRBuilder &B) const;

Register getLiveInRegister(MachineRegisterInfo &MRI,		Register getLiveInRegister(MachineIRBuilder &B, MachineRegisterInfo &MRI,
Register Reg, LLT Ty) const;		Register PhyReg, LLT Ty,
		bool InsertLiveInCopy = true) const;
		Register insertLiveInCopy(MachineIRBuilder &B, MachineRegisterInfo &MRI,
		arsenmUnsubmitted Not Done Reply Inline Actions Should use MCRegister for Reg to make it clear it's physical. arsenm: Should use MCRegister for Reg to make it clear it's physical.
		hsmhsmAuthorUnsubmitted Done Reply Inline Actions BuildCopy seems not work with MCRegister, hence added an assertion to ensure that PhyReg is indeed physical register. hsmhsm: BuildCopy seems not work with MCRegister, hence added an assertion to ensure that PhyReg is…
		Register LiveIn, Register PhyReg) const;
		const ArgDescriptor *
		getArgDescriptor(MachineIRBuilder &B,
		AMDGPUFunctionArgInfo::PreloadedValue ArgType) const;
bool loadInputValue(Register DstReg, MachineIRBuilder &B,		bool loadInputValue(Register DstReg, MachineIRBuilder &B,
const ArgDescriptor *Arg) const;		const ArgDescriptor *Arg) const;
bool legalizePreloadedArgIntrin(		bool legalizePreloadedArgIntrin(
MachineInstr &MI, MachineRegisterInfo &MRI, MachineIRBuilder &B,		MachineInstr &MI, MachineRegisterInfo &MRI, MachineIRBuilder &B,
AMDGPUFunctionArgInfo::PreloadedValue ArgType) const;		AMDGPUFunctionArgInfo::PreloadedValue ArgType) const;

bool legalizeUDIV_UREM(MachineInstr &MI, MachineRegisterInfo &MRI,		bool legalizeUDIV_UREM(MachineInstr &MI, MachineRegisterInfo &MRI,
MachineIRBuilder &B) const;		MachineIRBuilder &B) const;
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	public:

bool legalizeSBufferLoad(		bool legalizeSBufferLoad(
MachineInstr &MI, MachineIRBuilder &B,		MachineInstr &MI, MachineIRBuilder &B,
GISelChangeObserver &Observer) const;		GISelChangeObserver &Observer) const;

bool legalizeAtomicIncDec(MachineInstr &MI, MachineIRBuilder &B,		bool legalizeAtomicIncDec(MachineInstr &MI, MachineIRBuilder &B,
bool IsInc) const;		bool IsInc) const;

		bool legalizeTrapIntrinsic(MachineInstr &MI, MachineRegisterInfo &MRI,
		MachineIRBuilder &B) const;
		bool legalizeDebugTrapIntrinsic(MachineInstr &MI, MachineRegisterInfo &MRI,
		MachineIRBuilder &B) const;

bool legalizeIntrinsic(MachineInstr &MI, MachineIRBuilder &B,		bool legalizeIntrinsic(MachineInstr &MI, MachineIRBuilder &B,
GISelChangeObserver &Observer) const override;		GISelChangeObserver &Observer) const override;
};		};
} // End llvm namespace.		} // End llvm namespace.
#endif		#endif

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp

Show First 20 Lines • Show All 2,249 Lines • ▼ Show 20 Lines	if (Next != MI.getParent()->end()) {
if (Next->getOpcode() != AMDGPU::G_BR)		if (Next->getOpcode() != AMDGPU::G_BR)
return nullptr;		return nullptr;
Br = &*Next;		Br = &*Next;
}		}

return &UseMI;		return &UseMI;
}		}

Register AMDGPULegalizerInfo::getLiveInRegister(MachineRegisterInfo &MRI,		Register AMDGPULegalizerInfo::insertLiveInCopy(MachineIRBuilder &B,
Register Reg, LLT Ty) const {		MachineRegisterInfo &MRI,
Register LiveIn = MRI.getLiveInVirtReg(Reg);		Register LiveIn,
if (LiveIn)		Register PhyReg) const {
		assert(PhyReg.isPhysical() && "Physical register expected");

		// Insert the live-in copy, if required, by defining destination virtual
		// register.
		// FIXME: It seems EmitLiveInCopies isn't called anywhere?
		if (!MRI.getVRegDef(LiveIn)) {
		// FIXME: Should have scoped insert pt
		MachineBasicBlock &OrigInsBB = B.getMBB();
		auto OrigInsPt = B.getInsertPt();

		MachineBasicBlock &EntryMBB = B.getMF().front();
		EntryMBB.addLiveIn(PhyReg);
		B.setInsertPt(EntryMBB, EntryMBB.begin());
		B.buildCopy(LiveIn, PhyReg);

		B.setInsertPt(OrigInsBB, OrigInsPt);
		}

		return LiveIn;
		}

		Register AMDGPULegalizerInfo::getLiveInRegister(MachineIRBuilder &B,
		MachineRegisterInfo &MRI,
		Register PhyReg, LLT Ty,
		bool InsertLiveInCopy) const {
		assert(PhyReg.isPhysical() && "Physical register expected");

		// Get or create virtual live-in regester
		Register LiveIn = MRI.getLiveInVirtReg(PhyReg);
		if (!LiveIn) {
		LiveIn = MRI.createGenericVirtualRegister(Ty);
		MRI.addLiveIn(PhyReg, LiveIn);
		}

		// When the actual true copy required is from virtual register to physical
		// register (to be inserted later), live-in copy insertion from physical
		// to register virtual register is not required
		if (!InsertLiveInCopy)
return LiveIn;		return LiveIn;

Register NewReg = MRI.createGenericVirtualRegister(Ty);		return insertLiveInCopy(B, MRI, LiveIn, PhyReg);
MRI.addLiveIn(Reg, NewReg);		}
return NewReg;
		const ArgDescriptor *AMDGPULegalizerInfo::getArgDescriptor(
		MachineIRBuilder &B, AMDGPUFunctionArgInfo::PreloadedValue ArgType) const {
		const SIMachineFunctionInfo *MFI = B.getMF().getInfo<SIMachineFunctionInfo>();
		const ArgDescriptor *Arg;
		const TargetRegisterClass *RC;
		std::tie(Arg, RC) = MFI->getPreloadedValue(ArgType);
		if (!Arg) {
		LLVM_DEBUG(dbgs() << "Required arg register missing\n");
		return nullptr;
		}
		return Arg;
}		}
		arsenmUnsubmitted Not Done Reply Inline Actions I still don't think you should need the register class set. This should still have the type set on the virtual register arsenm: I still don't think you should need the register class set. This should still have the type set…
		hsmhsmAuthorUnsubmitted Done Reply Inline Actions I have taken care of it. But, we hit with an assertion, if we do not set register class. However, let's assume for now that this is what expected here, and later discuss this assertion issue, once we are fine with all the foundational changes. hsmhsm: I have taken care of it. But, we hit with an assertion, if we do not set register class.

bool AMDGPULegalizerInfo::loadInputValue(Register DstReg, MachineIRBuilder &B,		bool AMDGPULegalizerInfo::loadInputValue(Register DstReg, MachineIRBuilder &B,
const ArgDescriptor *Arg) const {		const ArgDescriptor *Arg) const {
if (!Arg->isRegister() \|\| !Arg->getRegister().isValid())		if (!Arg->isRegister() \|\| !Arg->getRegister().isValid())
return false; // TODO: Handle these		return false; // TODO: Handle these

assert(Arg->getRegister().isPhysical());		Register SrcReg = Arg->getRegister();
		assert(SrcReg.isPhysical() && "Physical register expected");
		arsenmUnsubmitted Not Done Reply Inline Actions I think it would be better to have this only take the physical register input, and return the livein virtreg. This wouldn't have the // Destination virtual register is already defined, just insert copy case. The intrinsic lowering cases would then be responsible for inserting the extra copy to the expected result arsenm: I think it would be better to have this only take the physical register input, and return the…
		hsmhsmAuthorUnsubmitted Done Reply Inline Actions taken care hsmhsm: taken care
		arsenmUnsubmitted Not Done Reply Inline Actions This is halfway there. You're still passing in both the live in physical register and the corresponding virtual register. I was thinking you would call getLiveInRegister with just the physical register, and it would be responsible for finding out the virtual register, and inserting the entry block copy by calling insertLiveInCopy. loadInputValue then wouldn't need to worry about ensuring the entry block copy was inserted like it does now. arsenm: This is halfway there. You're still passing in both the live in physical register and the…
		hsmhsmAuthorUnsubmitted Done Reply Inline Actions Taken care hsmhsm: Taken care
		assert(DstReg.isVirtual() && "Virtual register expected");

MachineRegisterInfo &MRI = *B.getMRI();		MachineRegisterInfo &MRI = *B.getMRI();

LLT Ty = MRI.getType(DstReg);		LLT Ty = MRI.getType(DstReg);
Register LiveIn = getLiveInRegister(MRI, Arg->getRegister(), Ty);		Register LiveIn = getLiveInRegister(B, MRI, SrcReg, Ty);

if (Arg->isMasked()) {		if (Arg->isMasked()) {
// TODO: Should we try to emit this once in the entry block?		// TODO: Should we try to emit this once in the entry block?
const LLT S32 = LLT::scalar(32);		const LLT S32 = LLT::scalar(32);
const unsigned Mask = Arg->getMask();		const unsigned Mask = Arg->getMask();
const unsigned Shift = countTrailingZeros<unsigned>(Mask);		const unsigned Shift = countTrailingZeros<unsigned>(Mask);

Register AndMaskSrc = LiveIn;		Register AndMaskSrc = LiveIn;

if (Shift != 0) {		if (Shift != 0) {
auto ShiftAmt = B.buildConstant(S32, Shift);		auto ShiftAmt = B.buildConstant(S32, Shift);
AndMaskSrc = B.buildLShr(S32, LiveIn, ShiftAmt).getReg(0);		AndMaskSrc = B.buildLShr(S32, LiveIn, ShiftAmt).getReg(0);
}		}

B.buildAnd(DstReg, AndMaskSrc, B.buildConstant(S32, Mask >> Shift));		B.buildAnd(DstReg, AndMaskSrc, B.buildConstant(S32, Mask >> Shift));
} else		} else {
B.buildCopy(DstReg, LiveIn);		B.buildCopy(DstReg, LiveIn);

// Insert the argument copy if it doens't already exist.
// FIXME: It seems EmitLiveInCopies isn't called anywhere?
if (!MRI.getVRegDef(LiveIn)) {
// FIXME: Should have scoped insert pt
MachineBasicBlock &OrigInsBB = B.getMBB();
auto OrigInsPt = B.getInsertPt();

MachineBasicBlock &EntryMBB = B.getMF().front();
EntryMBB.addLiveIn(Arg->getRegister());
B.setInsertPt(EntryMBB, EntryMBB.begin());
B.buildCopy(LiveIn, Arg->getRegister());

B.setInsertPt(OrigInsBB, OrigInsPt);
}		}
		arsenmUnsubmitted Not Done Reply Inline Actions This shouldn't be called twice arsenm: This shouldn't be called twice
		hsmhsmAuthorUnsubmitted Done Reply Inline Actions taken care hsmhsm: taken care

return true;		return true;
}		}

bool AMDGPULegalizerInfo::legalizePreloadedArgIntrin(		bool AMDGPULegalizerInfo::legalizePreloadedArgIntrin(
MachineInstr &MI,		MachineInstr &MI, MachineRegisterInfo &MRI, MachineIRBuilder &B,
MachineRegisterInfo &MRI,
MachineIRBuilder &B,
AMDGPUFunctionArgInfo::PreloadedValue ArgType) const {		AMDGPUFunctionArgInfo::PreloadedValue ArgType) const {
B.setInstr(MI);		B.setInstr(MI);

const SIMachineFunctionInfo *MFI = B.getMF().getInfo<SIMachineFunctionInfo>();		const ArgDescriptor *Arg = getArgDescriptor(B, ArgType);
		if (!Arg)
		arsenmUnsubmitted Not Done Reply Inline Actions Move the assignment up to avoid the weird !() arsenm: Move the assignment up to avoid the weird !()
		hsmhsmAuthorUnsubmitted Done Reply Inline Actions taken care hsmhsm: taken care
		return false;

const ArgDescriptor *Arg;		if (!loadInputValue(MI.getOperand(0).getReg(), B, Arg))
const TargetRegisterClass *RC;
std::tie(Arg, RC) = MFI->getPreloadedValue(ArgType);
if (!Arg) {
LLVM_DEBUG(dbgs() << "Required arg register missing\n");
return false;		return false;
}

if (loadInputValue(MI.getOperand(0).getReg(), B, Arg)) {
MI.eraseFromParent();		MI.eraseFromParent();
return true;		return true;
}		}

return false;
}

bool AMDGPULegalizerInfo::legalizeFDIV(MachineInstr &MI,		bool AMDGPULegalizerInfo::legalizeFDIV(MachineInstr &MI,
MachineRegisterInfo &MRI,		MachineRegisterInfo &MRI,
MachineIRBuilder &B) const {		MachineIRBuilder &B) const {
B.setInstr(MI);		B.setInstr(MI);
Register Dst = MI.getOperand(0).getReg();		Register Dst = MI.getOperand(0).getReg();
LLT DstTy = MRI.getType(Dst);		LLT DstTy = MRI.getType(Dst);
LLT S16 = LLT::scalar(16);		LLT S16 = LLT::scalar(16);
LLT S32 = LLT::scalar(32);		LLT S32 = LLT::scalar(32);
▲ Show 20 Lines • Show All 1,204 Lines • ▼ Show 20 Lines	if (!isPowerOf2_32(Size)) {
else		else
Helper.widenScalarDst(MI, getPow2ScalarType(Ty), 0);		Helper.widenScalarDst(MI, getPow2ScalarType(Ty), 0);
}		}

Observer.changedInstr(MI);		Observer.changedInstr(MI);
return true;		return true;
}		}

		bool AMDGPULegalizerInfo::legalizeTrapIntrinsic(MachineInstr &MI,
		MachineRegisterInfo &MRI,
		MachineIRBuilder &B) const {
		B.setInstr(MI);

		// Is non-HSA path or trap-handler disabled? then, insert s_endpgm instruction
		if (ST.getTrapHandlerAbi() != GCNSubtarget::TrapHandlerAbiHsa \|\|
		!ST.isTrapHandlerEnabled()) {
		B.buildInstr(AMDGPU::S_ENDPGM).addImm(0);
		} else {
		// Pass queue pointer to trap handler as input, and insert trap instruction
		// Reference: https://llvm.org/docs/AMDGPUUsage.html#trap-handler-abi
		const ArgDescriptor *Arg =
		getArgDescriptor(B, AMDGPUFunctionArgInfo::QUEUE_PTR);
		arsenmUnsubmitted Not Done Reply Inline Actions Ditto arsenm: Ditto
		hsmhsmAuthorUnsubmitted Done Reply Inline Actions taken care hsmhsm: taken care
		if (!Arg)
		return false;
		MachineRegisterInfo &MRI = *B.getMRI();
		arsenmUnsubmitted Done Reply Inline Actions This is directly copying the physical register into another physical register. We don't want that. This needs to go through an intermediate virtual register copy arsenm: This is directly copying the physical register into another physical register. We don't want…
		arsenmUnsubmitted Done Reply Inline Actions Can you add another assert in loadInputValue? It already asserts assert(Arg->getRegister().isPhysical()), but should also assert the result is virtual arsenm: Can you add another assert in loadInputValue? It already asserts assert(Arg->getRegister().
		Register SGPR01(AMDGPU::SGPR0_SGPR1);
		arsenmUnsubmitted Not Done Reply Inline Actions loadInputValue calls getLiveInRegister, you shouldn't need it here arsenm: loadInputValue calls getLiveInRegister, you shouldn't need it here
		hsmhsmAuthorUnsubmitted Done Reply Inline Actions My understanding of your earlier review comments is: loadInputValue() should expect, source register as physical register, and destination register as virtual register, and we should have assertions for these accordingly. The assertion for later was missing, that I added as per your review suggestion. Now, going by above, it is the responsibility of the caller of loadInputValue() to make sure that it passes destination register as virtual register. Hence, I created a destination virtual register here by calling getLiveInRegister(). The another call within loadInputValue() is for source physical register. I am not getting what am I missing here. hsmhsm: My understanding of your earlier review comments is: 1. loadInputValue() should expect…
		arsenmUnsubmitted Not Done Reply Inline Actions loadInputValue loads a physical register copy into a virtual register. This was written for the use by the intrinsics, which have a virtual register result write to already. In this case you don't have or really need one. What you really want is a version of getLiveInRegister that ensures the copy is inserted, as the second half of loadInputValue does. What you have now I think happens to work, but confusingly since the loadInputValue call will insert it You may want to split this part into a helper function: // Insert the argument copy if it doens't already exist. // FIXME: It seems EmitLiveInCopies isn't called anywhere? if (!MRI.getVRegDef(LiveIn)) { // FIXME: Should have scoped insert pt MachineBasicBlock &OrigInsBB = B.getMBB(); auto OrigInsPt = B.getInsertPt(); MachineBasicBlock &EntryMBB = B.getMF().front(); EntryMBB.addLiveIn(Arg->getRegister()); B.setInsertPt(EntryMBB, EntryMBB.begin()); B.buildCopy(LiveIn, Arg->getRegister()); B.setInsertPt(OrigInsBB, OrigInsPt); } arsenm: loadInputValue loads a physical register copy into a virtual register. This was written for the…
		hsmhsmAuthorUnsubmitted Done Reply Inline Actions I have taken care of it. Now, before making copy to destination virtual register, we ensure that it is defined. hsmhsm: I have taken care of it. Now, before making copy to destination virtual register, we ensure…
		arsenmUnsubmitted Not Done Reply Inline Actions I would just use the literal register value below rather than have a variable for it. At least make this a const Register arsenm: I would just use the literal register value below rather than have a variable for it. At least…
		Register LiveIn = getLiveInRegister(
		arsenmUnsubmitted Not Done Reply Inline Actions Why do you need to set the class? It shouldn't be necessary at this point arsenm: Why do you need to set the class? It shouldn't be necessary at this point
		hsmhsmAuthorUnsubmitted Done Reply Inline Actions We need to set a register class for destination virtual register. otherwise, we get an assertion failure during compilation saying that the register type is not valid. If it is not suppose to be setting here, where else should I be setting it? OR do I need to handle it in a different way? Please suggest it in more detail. hsmhsm: We need to set a register class for destination virtual register. otherwise, we get an…
		arsenmUnsubmitted Not Done Reply Inline Actions The register class is only tangentially related to the type. I think you somehow ended up creating a virtual register without setting the type on it. Did something already add the queue ptr as a live in before this lowering? I also would not call this VDstReg, as that makes it sound like a VGPR. arsenm: The register class is only tangentially related to the type. I think you somehow ended up…
		hsmhsmAuthorUnsubmitted Done Reply Inline Actions I have taken care of it. I just needed an another version of getLiveInRegister() which works with TargetRegisterClass. hsmhsm: I have taken care of it. I just needed an another version of getLiveInRegister() which works…
		arsenmUnsubmitted Not Done Reply Inline Actions Physical registers don't have types. This will just assert. Here you know the type is just LLT::pointer(CONSTANT_ADDRESSS, 64) arsenm: Physical registers don't have types. This will just assert. Here you know the type is just LLT…
		arsenmUnsubmitted Not Done Reply Inline Actions The type here should really be LLT::pointer(CONSTANT_ADDRESS, 64) arsenm: The type here should really be LLT::pointer(CONSTANT_ADDRESS, 64)
		B, MRI, SGPR01, LLT::pointer(AMDGPUAS::CONSTANT_ADDRESS, 64),
		/InsertLiveInCopy=/false);
		arsenmUnsubmitted Not Done Reply Inline Actions Remove DstReg and just directly refer to the physical register arsenm: Remove DstReg and just directly refer to the physical register
		if (!loadInputValue(LiveIn, B, Arg))
		return false;
		B.buildCopy(SGPR01, LiveIn);
		B.buildInstr(AMDGPU::S_TRAP)
		.addImm(GCNSubtarget::TrapIDLLVMTrap)
		.addReg(SGPR01, RegState::Implicit);
		}

		MI.eraseFromParent();
		return true;
		}

		bool AMDGPULegalizerInfo::legalizeDebugTrapIntrinsic(
		MachineInstr &MI, MachineRegisterInfo &MRI, MachineIRBuilder &B) const {
		B.setInstr(MI);

		// Is non-HSA path or trap-handler disabled? then, report a warning
		// accordingly
		if (ST.getTrapHandlerAbi() != GCNSubtarget::TrapHandlerAbiHsa \|\|
		!ST.isTrapHandlerEnabled()) {
		DiagnosticInfoUnsupported NoTrap(B.getMF().getFunction(),
		"debugtrap handler not supported",
		MI.getDebugLoc(), DS_Warning);
		LLVMContext &Ctx = B.getMF().getFunction().getContext();
		Ctx.diagnose(NoTrap);
		} else {
		// Insert debug-trap instruction
		B.buildInstr(AMDGPU::S_TRAP).addImm(GCNSubtarget::TrapIDLLVMDebugTrap);
		}

		MI.eraseFromParent();
		return true;
		}

bool AMDGPULegalizerInfo::legalizeIntrinsic(MachineInstr &MI,		bool AMDGPULegalizerInfo::legalizeIntrinsic(MachineInstr &MI,
MachineIRBuilder &B,		MachineIRBuilder &B,
GISelChangeObserver &Observer) const {		GISelChangeObserver &Observer) const {
MachineRegisterInfo &MRI = *B.getMRI();		MachineRegisterInfo &MRI = *B.getMRI();

// Replace the use G_BRCOND with the exec manipulate and branch pseudos.		// Replace the use G_BRCOND with the exec manipulate and branch pseudos.
auto IntrID = MI.getIntrinsicID();		auto IntrID = MI.getIntrinsicID();
switch (IntrID) {		switch (IntrID) {
▲ Show 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	bool AMDGPULegalizerInfo::legalizeIntrinsic(MachineInstr &MI,
case Intrinsic::amdgcn_struct_buffer_atomic_dec:		case Intrinsic::amdgcn_struct_buffer_atomic_dec:
case Intrinsic::amdgcn_raw_buffer_atomic_cmpswap:		case Intrinsic::amdgcn_raw_buffer_atomic_cmpswap:
case Intrinsic::amdgcn_struct_buffer_atomic_cmpswap:		case Intrinsic::amdgcn_struct_buffer_atomic_cmpswap:
return legalizeBufferAtomic(MI, B, IntrID);		return legalizeBufferAtomic(MI, B, IntrID);
case Intrinsic::amdgcn_atomic_inc:		case Intrinsic::amdgcn_atomic_inc:
return legalizeAtomicIncDec(MI, B, true);		return legalizeAtomicIncDec(MI, B, true);
case Intrinsic::amdgcn_atomic_dec:		case Intrinsic::amdgcn_atomic_dec:
return legalizeAtomicIncDec(MI, B, false);		return legalizeAtomicIncDec(MI, B, false);
		case Intrinsic::trap:
		return legalizeTrapIntrinsic(MI, MRI, B);
		case Intrinsic::debugtrap:
		return legalizeDebugTrapIntrinsic(MI, MRI, B);
default: {		default: {
if (const AMDGPU::ImageDimIntrinsicInfo *ImageDimIntr =		if (const AMDGPU::ImageDimIntrinsicInfo *ImageDimIntr =
AMDGPU::getImageDimIntrinsicInfo(IntrID))		AMDGPU::getImageDimIntrinsicInfo(IntrID))
return legalizeImageIntrinsic(MI, B, Observer, ImageDimIntr);		return legalizeImageIntrinsic(MI, B, Observer, ImageDimIntr);
return true;		return true;
}		}
}		}

return true;		return true;
}		}

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.trap.ll

This file was added.

				; Runs original SDAG test with -global-isel
				; RUN: llc -global-isel -mtriple=amdgcn--amdhsa -verify-machineinstrs < %S/../trap.ll \| FileCheck -check-prefix=GCN -check-prefix=HSA-TRAP -enable-var-scope %S/../trap.ll

				; RUN: llc -global-isel -mtriple=amdgcn--amdhsa -mattr=+trap-handler -verify-machineinstrs < %S/../trap.ll \| FileCheck -check-prefix=GCN -check-prefix=HSA-TRAP -enable-var-scope %S/../trap.ll
				; RUN: llc -global-isel -mtriple=amdgcn--amdhsa -mattr=-trap-handler -verify-machineinstrs < %S/../trap.ll \| FileCheck -check-prefix=GCN -check-prefix=NO-HSA-TRAP -enable-var-scope %S/../trap.ll
				; RUN: llc -global-isel -mtriple=amdgcn--amdhsa -mattr=-trap-handler -verify-machineinstrs < %S/../trap.ll 2>&1 \| FileCheck -check-prefix=GCN -check-prefix=GCN-WARNING -enable-var-scope %S/../trap.ll

				; enable trap handler feature
				; RUN: llc -global-isel -mtriple=amdgcn-unknown-mesa3d -mattr=+trap-handler -verify-machineinstrs < %S/../trap.ll \| FileCheck -check-prefix=GCN -check-prefix=NO-MESA-TRAP -check-prefix=TRAP-BIT -check-prefix=MESA-TRAP -enable-var-scope %S/../trap.ll
				; RUN: llc -global-isel -mtriple=amdgcn-unknown-mesa3d -mattr=+trap-handler -verify-machineinstrs < %S/../trap.ll 2>&1 \| FileCheck -check-prefix=GCN -check-prefix=GCN-WARNING -check-prefix=TRAP-BIT -enable-var-scope %S/../trap.ll

				; disable trap handler feature
				; RUN: llc -global-isel -mtriple=amdgcn-unknown-mesa3d -mattr=-trap-handler -verify-machineinstrs < %S/../trap.ll \| FileCheck -check-prefix=GCN -check-prefix=NO-MESA-TRAP -check-prefix=NO-TRAP-BIT -check-prefix=NOMESA-TRAP -enable-var-scope %S/../trap.ll
				; RUN: llc -global-isel -mtriple=amdgcn-unknown-mesa3d -mattr=-trap-handler -verify-machineinstrs < %S/../trap.ll 2>&1 \| FileCheck -check-prefix=GCN -check-prefix=GCN-WARNING -check-prefix=NO-TRAP-BIT -enable-var-scope %S/../trap.ll

				; RUN: llc -global-isel -march=amdgcn -verify-machineinstrs < %S/../trap.ll 2>&1 \| FileCheck -check-prefix=GCN -check-prefix=GCN-WARNING -enable-var-scope %S/../trap.ll
				arsenmUnsubmitted Done Reply Inline Actions Can you add another test that uses the stack? Ideally we would also have a test in a non-kernel function, but I know that won't work right now since we don't handle the special argument inputs yet arsenm: Can you add another test that uses the stack? Ideally we would also have a test in a non…
				arsenmUnsubmitted Not Done Reply Inline Actions Actually, never mind. A non-kernel function should work. Calling it will not arsenm: Actually, never mind. A non-kernel function should work. Calling it will not
				arsenmUnsubmitted Not Done Reply Inline Actions Should still add a function test. Also one that has a separate, explicit use of the queue.ptr intrinsic wouldn't hurt either arsenm: Should still add a function test. Also one that has a separate, explicit use of the queue.ptr…
				hsmhsmAuthorUnsubmitted Done Reply Inline Actions Actually, I am bit confused about writing test cases here. Let's start from scratch here. As you suggested, I have shared the same original DAG test for GlobalISel, without re-inventing the same for GlobalISel. I was making some mistakes earlier, that I learned and corrected it, and sharing of original DAG test works fine now. Now, starting from this point, please suggest how should I further continue here. What are the additional tests that need to be covered here, and why? Do I need to add the additional tests within original DAG test, and make sure both ISelDAG path and GlobalSel path work fine for these additional tests? Or these are specific to only GlobalISel path for some specific reason? hsmhsm: Actually, I am bit confused about writing test cases here. Let's start from scratch here. 1.

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/GlobalISel: Support llvm.trap and llvm.debugtrap intrinsicsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 248372

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.trap.ll

AMDGPU/GlobalISel: Support llvm.trap and llvm.debugtrap intrinsics
ClosedPublic