This is an archive of the discontinued LLVM Phabricator instance.

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
3497–3498	This is directly copying the physical register into another physical register. We don't want that. This needs to go through an intermediate virtual register copy
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.debugtrap.ll
4–15	Shouldn't run the DAG tests in the GlobalIsel test directory
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.trap.ll
96	Can you add another test that uses the stack? Ideally we would also have a test in a non-kernel function, but I know that won't work right now since we don't handle the special argument inputs yet

arsenm added inline comments.Feb 17 2020, 6:39 AM

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
3497–3498	Can you add another assert in loadInputValue? It already asserts assert(Arg->getRegister().isPhysical()), but should also assert the result is virtual

Can you also add a MIR test to make sure the intermediate virtual copy is added? You'll need to explicitly add the queueptr in the MIR MachineFunctionInfo

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.trap.ll
96	Actually, never mind. A non-kernel function should work. Calling it will not

Take care of review comments by arsenm.

Introduce intermediate virtual register copy
Remove DAG tests in the GlobalIsel test directory
Add LLVM IRs that uses the stack
Assert that destination register is virtual within loadInputValue()

Harbormaster completed remote builds in B46658: Diff 245020.Feb 17 2020, 11:53 AM

Regarding the sharing of original ISelDAG test within GlobalISel, it seems to be not working since there are checks like below in case of ISelDAG path. Hence, let's keep the both the tests separate for now.

; MESA-TRAP: .section .AMDGPU.config
; MESA-TRAP: .long 47180
; MESA-TRAP-NEXT: .long 208

In D74688#1879573, @hsmhsm wrote:

Regarding the sharing of original ISelDAG test within GlobalISel, it seems to be not working since there are checks like below in case of ISelDAG path. Hence, let's keep the both the tests separate for now.

; MESA-TRAP: .section .AMDGPU.config
; MESA-TRAP: .long 47180
; MESA-TRAP-NEXT: .long 208

I would expect these to be the same in both?

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
3499	loadInputValue calls getLiveInRegister, you shouldn't need it here
3500	Why do you need to set the class? It shouldn't be necessary at this point
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.trap.ll
96	Should still add a function test. Also one that has a separate, explicit use of the queue.ptr intrinsic wouldn't hurt either

Share the original DAG test for GlobalISel, and do not re-invent new one here.

Harbormaster completed remote builds in B46686: Diff 245088.Feb 17 2020, 11:48 PM

hsmhsm marked 3 inline comments as done.Feb 18 2020, 12:08 AM

hsmhsm added inline comments.

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
3499	My understanding of your earlier review comments is: loadInputValue() should expect, source register as physical register, and destination register as virtual register, and we should have assertions for these accordingly. The assertion for later was missing, that I added as per your review suggestion. Now, going by above, it is the responsibility of the caller of loadInputValue() to make sure that it passes destination register as virtual register. Hence, I created a destination virtual register here by calling getLiveInRegister(). The another call within loadInputValue() is for source physical register. I am not getting what am I missing here.
3500	We need to set a register class for destination virtual register. otherwise, we get an assertion failure during compilation saying that the register type is not valid. If it is not suppose to be setting here, where else should I be setting it? OR do I need to handle it in a different way? Please suggest it in more detail.
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.trap.ll
96	Actually, I am bit confused about writing test cases here. Let's start from scratch here. As you suggested, I have shared the same original DAG test for GlobalISel, without re-inventing the same for GlobalISel. I was making some mistakes earlier, that I learned and corrected it, and sharing of original DAG test works fine now. Now, starting from this point, please suggest how should I further continue here. What are the additional tests that need to be covered here, and why? Do I need to add the additional tests within original DAG test, and make sure both ISelDAG path and GlobalSel path work fine for these additional tests? Or these are specific to only GlobalISel path for some specific reason?

arsenm added inline comments.Feb 18 2020, 9:55 AM

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
3499	loadInputValue loads a physical register copy into a virtual register. This was written for the use by the intrinsics, which have a virtual register result write to already. In this case you don't have or really need one. What you really want is a version of getLiveInRegister that ensures the copy is inserted, as the second half of loadInputValue does. What you have now I think happens to work, but confusingly since the loadInputValue call will insert it You may want to split this part into a helper function: // Insert the argument copy if it doens't already exist. // FIXME: It seems EmitLiveInCopies isn't called anywhere? if (!MRI.getVRegDef(LiveIn)) { // FIXME: Should have scoped insert pt MachineBasicBlock &OrigInsBB = B.getMBB(); auto OrigInsPt = B.getInsertPt(); MachineBasicBlock &EntryMBB = B.getMF().front(); EntryMBB.addLiveIn(Arg->getRegister()); B.setInsertPt(EntryMBB, EntryMBB.begin()); B.buildCopy(LiveIn, Arg->getRegister()); B.setInsertPt(OrigInsBB, OrigInsPt); }
3500	The register class is only tangentially related to the type. I think you somehow ended up creating a virtual register without setting the type on it. Did something already add the queue ptr as a live in before this lowering? I also would not call this VDstReg, as that makes it sound like a VGPR.

Another possible strategy would be to emit a call to the queue.ptr intrinsic, and allow that to be legalized

Take care of further review comments by arsenm.

Add an another version of getLiveInRegister() which works with TargetRegisterClass
While copying to destination virtual register, make sure that it is defined
Bit of code refactoring around related part of the source.

Harbormaster completed remote builds in B46789: Diff 245346.Feb 19 2020, 1:08 AM

hsmhsm marked 2 inline comments as done.Feb 19 2020, 1:12 AM

hsmhsm added inline comments.

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
3499	I have taken care of it. Now, before making copy to destination virtual register, we ensure that it is defined.
3500	I have taken care of it. I just needed an another version of getLiveInRegister() which works with TargetRegisterClass.

arsenm added inline comments.Feb 19 2020, 2:23 PM

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2237–2238	I still don't think you should need the register class set. This should still have the type set on the virtual register
2243–2246	I think it would be better to have this only take the physical register input, and return the livein virtreg. This wouldn't have the // Destination virtual register is already defined, just insert copy case. The intrinsic lowering cases would then be responsible for inserting the extra copy to the expected result
2283	This shouldn't be called twice
2297–2301	Move the assignment up to avoid the weird !()
3495	Ditto
3502	Remove DstReg and just directly refer to the physical register
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h
96	Should use MCRegister for Reg to make it clear it's physical.

Changes as per further review comments:

First, I would like to make sure that all the *necessary* foundational issues are handled before actually discussing the crux of some assertion failure, that we hit, if we do not assign a register class to a virtual live-in register of SGPR01.
So, I have taken care of all your previous review comments here, and I have deliberately put assertions about physical registers where it is expected.
Let's ensure these changes are as per your satisfaction, before moving to next point of discussion. Please let me know if I need to take care any remaining issues here before actually discussing assertion failure related issues.

Harbormaster completed remote builds in B46885: Diff 245576.Feb 19 2020, 10:00 PM

hsmhsm marked 6 inline comments as done.Feb 19 2020, 10:04 PM

hsmhsm added inline comments.

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2237–2238	I have taken care of it. But, we hit with an assertion, if we do not set register class. However, let's assume for now that this is what expected here, and later discuss this assertion issue, once we are fine with all the foundational changes.
2243–2246	taken care
2283	taken care
2297–2301	taken care
3495	taken care
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h
96	BuildCopy seems not work with MCRegister, hence added an assertion to ensure that PhyReg is indeed physical register.

arsenm added inline comments.Feb 20 2020, 4:26 PM

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2243–2246	This is halfway there. You're still passing in both the live in physical register and the corresponding virtual register. I was thinking you would call getLiveInRegister with just the physical register, and it would be responsible for finding out the virtual register, and inserting the entry block copy by calling insertLiveInCopy. loadInputValue then wouldn't need to worry about ensuring the entry block copy was inserted like it does now.
3500	Physical registers don't have types. This will just assert. Here you know the type is just LLT::pointer(CONSTANT_ADDRESSS, 64)

Take care of further review comments by arsenm.

hsmhsm marked an inline comment as done.Feb 22 2020, 10:28 PM

hsmhsm added inline comments.

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2243–2246	Taken care

Harbormaster completed remote builds in B47099: Diff 246097.Feb 22 2020, 10:34 PM

I believe that the debug_trap trap handler support no longer needs the queue_ptr to be passed in as it is internally computing it from the dootbell ID available from a GETREG together with a doorbell to queue mapping maintained by the ROCm Runtim that is accessible through the TMB register.

So should that change be reflect in this to simplify things? This is does change the ABI so needs the HSA ABI number to be incremented in the ELF header and AMDGOUUsage documentation updated. However, from the compilers point of view it is not ABI breaking as old code will still work, as it is generating code to set the queue_ptr that is unnecessary.

Adding @kzhuravl for ELF ABI version help.

t-tye added a reviewer: kzhuravl.Feb 23 2020, 7:09 AM

In D74688#1888262, @t-tye wrote:

I believe that the debug_trap trap handler support no longer needs the queue_ptr to be passed in as it is internally computing it from the dootbell ID available from a GETREG together with a doorbell to queue mapping maintained by the ROCm Runtim that is accessible through the TMB register.

So should that change be reflect in this to simplify things? This is does change the ABI so needs the HSA ABI number to be incremented in the ELF header and AMDGOUUsage documentation updated. However, from the compilers point of view it is not ABI breaking as old code will still work, as it is generating code to set the queue_ptr that is unnecessary.

Adding @kzhuravl for ELF ABI version help.

Hi Tony,

If you are specifically asking for llvm.debugtrap() intrinsic here, then, we are already taken care of it, and we are not adding queue_ptr in this case, as you can see from the code changes at AMDGPULegalizerInfo.cpp:3602. The queue_ptr related discussions here are only specific to llvm.trap() intrinsic.

LGTM

This revision is now accepted and ready to land.Feb 26 2020, 7:07 PM

Couple of additional required changes

[1] SGPR01 is a new register created within legalizeTrapIntrinsic(), and we actually need to get its type as LLT::scalar(64).
[2] Call to insertLiveInCopy() within getLiveInRegister() is only required if the copy is required from physical to virtual register.
[3] When call to insertLiveInCopy() within getLiveInRegister() is required, make sure that it is called irrespective of whether the virtual register is newly created within getLiveInRegister() or not.

Harbormaster completed remote builds in B47689: Diff 247447.Feb 29 2020, 9:05 AM

In D74688#1888777, @hsmhsm wrote:

In D74688#1888262, @t-tye wrote:

I believe that the debug_trap trap handler support no longer needs the queue_ptr to be passed in as it is internally computing it from the dootbell ID available from a GETREG together with a doorbell to queue mapping maintained by the ROCm Runtim that is accessible through the TMB register.

So should that change be reflect in this to simplify things? This is does change the ABI so needs the HSA ABI number to be incremented in the ELF header and AMDGOUUsage documentation updated. However, from the compilers point of view it is not ABI breaking as old code will still work, as it is generating code to set the queue_ptr that is unnecessary.

Adding @kzhuravl for ELF ABI version help.

Hi Tony,

If you are specifically asking for llvm.debugtrap() intrinsic here, then, we are already taken care of it, and we are not adding queue_ptr in this case, as you can see from the code changes at AMDGPULegalizerInfo.cpp:3602. The queue_ptr related discussions here are only specific to llvm.trap() intrinsic.

Looking at AMDGPULegalizerInfo.cpp:3602 it appears the queue_ptr is still being set up so not sure what you mean that it is already being taken care of. What I describe is true for both llvm.trap() and llvm.debug_trap(). Should the setup of the queu_ptr be removed since it is no longer needed to support either llvm.trap() or llvm.debug_trap()? Doing so would need a change in AMDGPUUsage, a change to the ELF HSA ABI number, and corresponding ROCm loader changes. There may also need to be a plan to support older ROCm releases.

In D74688#1900248, @t-tye wrote:

In D74688#1888777, @hsmhsm wrote:

In D74688#1888262, @t-tye wrote:

I believe that the debug_trap trap handler support no longer needs the queue_ptr to be passed in as it is internally computing it from the dootbell ID available from a GETREG together with a doorbell to queue mapping maintained by the ROCm Runtim that is accessible through the TMB register.

So should that change be reflect in this to simplify things? This is does change the ABI so needs the HSA ABI number to be incremented in the ELF header and AMDGOUUsage documentation updated. However, from the compilers point of view it is not ABI breaking as old code will still work, as it is generating code to set the queue_ptr that is unnecessary.

Adding @kzhuravl for ELF ABI version help.

Hi Tony,

If you are specifically asking for llvm.debugtrap() intrinsic here, then, we are already taken care of it, and we are not adding queue_ptr in this case, as you can see from the code changes at AMDGPULegalizerInfo.cpp:3602. The queue_ptr related discussions here are only specific to llvm.trap() intrinsic.

Looking at AMDGPULegalizerInfo.cpp:3602 it appears the queue_ptr is still being set up so not sure what you mean that it is already being taken care of.

The line number has changed with latest changes. I was basically referring to the function - AMDGPULegalizerInfo::legalizeDebugTrapIntrinsic() by assuming that you referring here only debugtrap() inntrinsic.

What I describe is true for both llvm.trap() and llvm.debug_trap(). Should the setup of the queu_ptr be removed since it is no longer needed to support either llvm.trap() or llvm.debug_trap()? Doing so would need a change in AMDGPUUsage, a change to the ELF HSA ABI number, and corresponding ROCm loader changes. There may also need to be a plan to support older ROCm releases.

I suggest that let's check-in this change so that trap and debug trap support looks similar both in ISELDAG and GLOBALISEL PATH (by setting-up queue_ptr) for now. And, we should take-up what you mention above (to remove queue_ptr) as a separate activity, first in ISELDAG path, and then in GLOBALISEL path.

In D74688#1900255, @hsmhsm wrote:

In D74688#1900248, @t-tye wrote:

In D74688#1888777, @hsmhsm wrote:

In D74688#1888262, @t-tye wrote:

I believe that the debug_trap trap handler support no longer needs the queue_ptr to be passed in as it is internally computing it from the dootbell ID available from a GETREG together with a doorbell to queue mapping maintained by the ROCm Runtim that is accessible through the TMB register.

So should that change be reflect in this to simplify things? This is does change the ABI so needs the HSA ABI number to be incremented in the ELF header and AMDGOUUsage documentation updated. However, from the compilers point of view it is not ABI breaking as old code will still work, as it is generating code to set the queue_ptr that is unnecessary.

Adding @kzhuravl for ELF ABI version help.

Hi Tony,

If you are specifically asking for llvm.debugtrap() intrinsic here, then, we are already taken care of it, and we are not adding queue_ptr in this case, as you can see from the code changes at AMDGPULegalizerInfo.cpp:3602. The queue_ptr related discussions here are only specific to llvm.trap() intrinsic.

Looking at AMDGPULegalizerInfo.cpp:3602 it appears the queue_ptr is still being set up so not sure what you mean that it is already being taken care of.

The line number has changed with latest changes. I was basically referring to the function - AMDGPULegalizerInfo::legalizeDebugTrapIntrinsic() by assuming that you referring here only debugtrap() inntrinsic.

OK I see that now. Thanks.

What I describe is true for both llvm.trap() and llvm.debug_trap(). Should the setup of the queu_ptr be removed since it is no longer needed to support either llvm.trap() or llvm.debug_trap()? Doing so would need a change in AMDGPUUsage, a change to the ELF HSA ABI number, and corresponding ROCm loader changes. There may also need to be a plan to support older ROCm releases.

I suggest that let's check-in this change so that trap and debug trap support looks similar both in ISELDAG and GLOBALISEL PATH (by setting-up queue_ptr) for now. And, we should take-up what you mention above (to remove queue_ptr) as a separate activity, first in ISELDAG path, and then in GLOBALISEL path.

OK that sounds reasonable. Thanks.

arsenm requested changes to this revision.Mar 2 2020, 12:03 PM

arsenm added inline comments.

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
3499	I would just use the literal register value below rather than have a variable for it. At least make this a const Register
3500	The type here should really be LLT::pointer(CONSTANT_ADDRESS, 64)

This revision now requires changes to proceed.Mar 2 2020, 12:03 PM

Take care of latest review comments by matt about type setting of virtual register.

Harbormaster failed remote builds in B47822: Diff 247712!Mar 2 2020, 1:19 PM

arsenm accepted this revision.Mar 4 2020, 12:23 PM

This revision is now accepted and ready to land.Mar 4 2020, 12:23 PM

Closed by commit rG3fda1fde8f7b: AMDGPU/GlobalISel: Support llvm.trap and llvm.debugtrap intrinsics (authored by hsmhsm). · Explain WhyMar 4 2020, 7:03 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPULegalizerInfo.h

9 lines

AMDGPULegalizerInfo.cpp

71 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

llvm.debugtrap.ll

40 lines

llvm.trap.ll

85 lines

Diff 245020

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h

Show First 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	bool legalizeFFloor(MachineInstr &MI, MachineRegisterInfo &MRI,
MachineIRBuilder &B) const;		MachineIRBuilder &B) const;

bool legalizeBuildVector(MachineInstr &MI, MachineRegisterInfo &MRI,		bool legalizeBuildVector(MachineInstr &MI, MachineRegisterInfo &MRI,
MachineIRBuilder &B) const;		MachineIRBuilder &B) const;

Register getLiveInRegister(MachineRegisterInfo &MRI,		Register getLiveInRegister(MachineRegisterInfo &MRI,
Register Reg, LLT Ty) const;		Register Reg, LLT Ty) const;

		const ArgDescriptor *
		getArgDescriptor(MachineIRBuilder &B,
		AMDGPUFunctionArgInfo::PreloadedValue ArgType) const;
		arsenmUnsubmitted Not Done Reply Inline Actions Should use MCRegister for Reg to make it clear it's physical. arsenm: Should use MCRegister for Reg to make it clear it's physical.
		hsmhsmAuthorUnsubmitted Done Reply Inline Actions BuildCopy seems not work with MCRegister, hence added an assertion to ensure that PhyReg is indeed physical register. hsmhsm: BuildCopy seems not work with MCRegister, hence added an assertion to ensure that PhyReg is…

bool loadInputValue(Register DstReg, MachineIRBuilder &B,		bool loadInputValue(Register DstReg, MachineIRBuilder &B,
const ArgDescriptor *Arg) const;		const ArgDescriptor *Arg) const;
bool legalizePreloadedArgIntrin(		bool legalizePreloadedArgIntrin(
MachineInstr &MI, MachineRegisterInfo &MRI, MachineIRBuilder &B,		MachineInstr &MI, MachineRegisterInfo &MRI, MachineIRBuilder &B,
AMDGPUFunctionArgInfo::PreloadedValue ArgType) const;		AMDGPUFunctionArgInfo::PreloadedValue ArgType) const;

bool legalizeUDIV_UREM(MachineInstr &MI, MachineRegisterInfo &MRI,		bool legalizeUDIV_UREM(MachineInstr &MI, MachineRegisterInfo &MRI,
MachineIRBuilder &B) const;		MachineIRBuilder &B) const;
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	public:

bool legalizeSBufferLoad(		bool legalizeSBufferLoad(
MachineInstr &MI, MachineIRBuilder &B,		MachineInstr &MI, MachineIRBuilder &B,
GISelChangeObserver &Observer) const;		GISelChangeObserver &Observer) const;

bool legalizeAtomicIncDec(MachineInstr &MI, MachineIRBuilder &B,		bool legalizeAtomicIncDec(MachineInstr &MI, MachineIRBuilder &B,
bool IsInc) const;		bool IsInc) const;

		bool legalizeTrapIntrinsic(MachineInstr &MI, MachineRegisterInfo &MRI,
		MachineIRBuilder &B) const;
		bool legalizeDebugTrapIntrinsic(MachineInstr &MI, MachineRegisterInfo &MRI,
		MachineIRBuilder &B) const;

bool legalizeIntrinsic(MachineInstr &MI, MachineIRBuilder &B,		bool legalizeIntrinsic(MachineInstr &MI, MachineIRBuilder &B,
GISelChangeObserver &Observer) const override;		GISelChangeObserver &Observer) const override;
};		};
} // End llvm namespace.		} // End llvm namespace.
#endif		#endif

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp

Show First 20 Lines • Show All 2,218 Lines • ▼ Show 20 Lines	Register AMDGPULegalizerInfo::getLiveInRegister(MachineRegisterInfo &MRI,
if (LiveIn)		if (LiveIn)
return LiveIn;		return LiveIn;

Register NewReg = MRI.createGenericVirtualRegister(Ty);		Register NewReg = MRI.createGenericVirtualRegister(Ty);
MRI.addLiveIn(Reg, NewReg);		MRI.addLiveIn(Reg, NewReg);
return NewReg;		return NewReg;
}		}

		const ArgDescriptor *AMDGPULegalizerInfo::getArgDescriptor(
		MachineIRBuilder &B, AMDGPUFunctionArgInfo::PreloadedValue ArgType) const {
		const SIMachineFunctionInfo *MFI = B.getMF().getInfo<SIMachineFunctionInfo>();
		const ArgDescriptor *Arg;
		const TargetRegisterClass *RC;
		std::tie(Arg, RC) = MFI->getPreloadedValue(ArgType);
		if (!Arg) {
		LLVM_DEBUG(dbgs() << "Required arg register missing\n");
		return nullptr;
		}
		return Arg;
		}
		arsenmUnsubmitted Not Done Reply Inline Actions I still don't think you should need the register class set. This should still have the type set on the virtual register arsenm: I still don't think you should need the register class set. This should still have the type set…
		hsmhsmAuthorUnsubmitted Done Reply Inline Actions I have taken care of it. But, we hit with an assertion, if we do not set register class. However, let's assume for now that this is what expected here, and later discuss this assertion issue, once we are fine with all the foundational changes. hsmhsm: I have taken care of it. But, we hit with an assertion, if we do not set register class.

bool AMDGPULegalizerInfo::loadInputValue(Register DstReg, MachineIRBuilder &B,		bool AMDGPULegalizerInfo::loadInputValue(Register DstReg, MachineIRBuilder &B,
const ArgDescriptor *Arg) const {		const ArgDescriptor *Arg) const {
if (!Arg->isRegister() \|\| !Arg->getRegister().isValid())		if (!Arg->isRegister() \|\| !Arg->getRegister().isValid())
return false; // TODO: Handle these		return false; // TODO: Handle these

assert(Arg->getRegister().isPhysical());		assert(Arg->getRegister().isPhysical());
		assert(DstReg.isVirtual());
		arsenmUnsubmitted Not Done Reply Inline Actions I think it would be better to have this only take the physical register input, and return the livein virtreg. This wouldn't have the // Destination virtual register is already defined, just insert copy case. The intrinsic lowering cases would then be responsible for inserting the extra copy to the expected result arsenm: I think it would be better to have this only take the physical register input, and return the…
		hsmhsmAuthorUnsubmitted Done Reply Inline Actions taken care hsmhsm: taken care
		arsenmUnsubmitted Not Done Reply Inline Actions This is halfway there. You're still passing in both the live in physical register and the corresponding virtual register. I was thinking you would call getLiveInRegister with just the physical register, and it would be responsible for finding out the virtual register, and inserting the entry block copy by calling insertLiveInCopy. loadInputValue then wouldn't need to worry about ensuring the entry block copy was inserted like it does now. arsenm: This is halfway there. You're still passing in both the live in physical register and the…
		hsmhsmAuthorUnsubmitted Done Reply Inline Actions Taken care hsmhsm: Taken care

MachineRegisterInfo &MRI = *B.getMRI();		MachineRegisterInfo &MRI = *B.getMRI();

LLT Ty = MRI.getType(DstReg);		LLT Ty = MRI.getType(DstReg);
Register LiveIn = getLiveInRegister(MRI, Arg->getRegister(), Ty);		Register LiveIn = getLiveInRegister(MRI, Arg->getRegister(), Ty);

if (Arg->isMasked()) {		if (Arg->isMasked()) {
// TODO: Should we try to emit this once in the entry block?		// TODO: Should we try to emit this once in the entry block?
Show All 20 Lines	if (!MRI.getVRegDef(LiveIn)) {
auto OrigInsPt = B.getInsertPt();		auto OrigInsPt = B.getInsertPt();

MachineBasicBlock &EntryMBB = B.getMF().front();		MachineBasicBlock &EntryMBB = B.getMF().front();
EntryMBB.addLiveIn(Arg->getRegister());		EntryMBB.addLiveIn(Arg->getRegister());
B.setInsertPt(EntryMBB, EntryMBB.begin());		B.setInsertPt(EntryMBB, EntryMBB.begin());
B.buildCopy(LiveIn, Arg->getRegister());		B.buildCopy(LiveIn, Arg->getRegister());

B.setInsertPt(OrigInsBB, OrigInsPt);		B.setInsertPt(OrigInsBB, OrigInsPt);
}		}
		arsenmUnsubmitted Not Done Reply Inline Actions This shouldn't be called twice arsenm: This shouldn't be called twice
		hsmhsmAuthorUnsubmitted Done Reply Inline Actions taken care hsmhsm: taken care

return true;		return true;
}		}

bool AMDGPULegalizerInfo::legalizePreloadedArgIntrin(		bool AMDGPULegalizerInfo::legalizePreloadedArgIntrin(
MachineInstr &MI,		MachineInstr &MI,
MachineRegisterInfo &MRI,		MachineRegisterInfo &MRI,
MachineIRBuilder &B,		MachineIRBuilder &B,
AMDGPUFunctionArgInfo::PreloadedValue ArgType) const {		AMDGPUFunctionArgInfo::PreloadedValue ArgType) const {
B.setInstr(MI);		B.setInstr(MI);

const SIMachineFunctionInfo *MFI = B.getMF().getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *MFI = B.getMF().getInfo<SIMachineFunctionInfo>();

const ArgDescriptor *Arg;		const ArgDescriptor *Arg;
const TargetRegisterClass *RC;		const TargetRegisterClass *RC;
std::tie(Arg, RC) = MFI->getPreloadedValue(ArgType);		std::tie(Arg, RC) = MFI->getPreloadedValue(ArgType);
if (!Arg) {		if (!Arg) {
LLVM_DEBUG(dbgs() << "Required arg register missing\n");		LLVM_DEBUG(dbgs() << "Required arg register missing\n");
		arsenmUnsubmitted Not Done Reply Inline Actions Move the assignment up to avoid the weird !() arsenm: Move the assignment up to avoid the weird !()
		hsmhsmAuthorUnsubmitted Done Reply Inline Actions taken care hsmhsm: taken care
return false;		return false;
}		}

if (loadInputValue(MI.getOperand(0).getReg(), B, Arg)) {		if (loadInputValue(MI.getOperand(0).getReg(), B, Arg)) {
MI.eraseFromParent();		MI.eraseFromParent();
return true;		return true;
}		}

▲ Show 20 Lines • Show All 1,164 Lines • ▼ Show 20 Lines	if (!isPowerOf2_32(Size)) {
else		else
Helper.widenScalarDst(MI, getPow2ScalarType(Ty), 0);		Helper.widenScalarDst(MI, getPow2ScalarType(Ty), 0);
}		}

Observer.changedInstr(MI);		Observer.changedInstr(MI);
return true;		return true;
}		}

		bool AMDGPULegalizerInfo::legalizeTrapIntrinsic(MachineInstr &MI,
		MachineRegisterInfo &MRI,
		MachineIRBuilder &B) const {
		B.setInstr(MI);

		// Is non-HSA path or trap-handler disabled? then, insert s_endpgm instruction
		if (ST.getTrapHandlerAbi() != GCNSubtarget::TrapHandlerAbiHsa \|\|
		!ST.isTrapHandlerEnabled()) {
		B.buildInstr(AMDGPU::S_ENDPGM).addImm(0);
		} else {
		// Pass queue pointer to trap handler as input, and insert trap instruction
		// Reference: https://llvm.org/docs/AMDGPUUsage.html#trap-handler-abi
		const ArgDescriptor *Arg;
		if (!(Arg = getArgDescriptor(B, AMDGPUFunctionArgInfo::QUEUE_PTR)))
		arsenmUnsubmitted Not Done Reply Inline Actions Ditto arsenm: Ditto
		hsmhsmAuthorUnsubmitted Done Reply Inline Actions taken care hsmhsm: taken care
		return false;
		MachineRegisterInfo &MRI = *B.getMRI();
		Register DstReg(AMDGPU::SGPR0_SGPR1);
		arsenmUnsubmitted Done Reply Inline Actions This is directly copying the physical register into another physical register. We don't want that. This needs to go through an intermediate virtual register copy arsenm: This is directly copying the physical register into another physical register. We don't want…
		arsenmUnsubmitted Done Reply Inline Actions Can you add another assert in loadInputValue? It already asserts assert(Arg->getRegister().isPhysical()), but should also assert the result is virtual arsenm: Can you add another assert in loadInputValue? It already asserts assert(Arg->getRegister().
		Register VDstReg = getLiveInRegister(MRI, DstReg, MRI.getType(DstReg));
		arsenmUnsubmitted Not Done Reply Inline Actions loadInputValue calls getLiveInRegister, you shouldn't need it here arsenm: loadInputValue calls getLiveInRegister, you shouldn't need it here
		hsmhsmAuthorUnsubmitted Done Reply Inline Actions My understanding of your earlier review comments is: loadInputValue() should expect, source register as physical register, and destination register as virtual register, and we should have assertions for these accordingly. The assertion for later was missing, that I added as per your review suggestion. Now, going by above, it is the responsibility of the caller of loadInputValue() to make sure that it passes destination register as virtual register. Hence, I created a destination virtual register here by calling getLiveInRegister(). The another call within loadInputValue() is for source physical register. I am not getting what am I missing here. hsmhsm: My understanding of your earlier review comments is: 1. loadInputValue() should expect…
		arsenmUnsubmitted Not Done Reply Inline Actions loadInputValue loads a physical register copy into a virtual register. This was written for the use by the intrinsics, which have a virtual register result write to already. In this case you don't have or really need one. What you really want is a version of getLiveInRegister that ensures the copy is inserted, as the second half of loadInputValue does. What you have now I think happens to work, but confusingly since the loadInputValue call will insert it You may want to split this part into a helper function: // Insert the argument copy if it doens't already exist. // FIXME: It seems EmitLiveInCopies isn't called anywhere? if (!MRI.getVRegDef(LiveIn)) { // FIXME: Should have scoped insert pt MachineBasicBlock &OrigInsBB = B.getMBB(); auto OrigInsPt = B.getInsertPt(); MachineBasicBlock &EntryMBB = B.getMF().front(); EntryMBB.addLiveIn(Arg->getRegister()); B.setInsertPt(EntryMBB, EntryMBB.begin()); B.buildCopy(LiveIn, Arg->getRegister()); B.setInsertPt(OrigInsBB, OrigInsPt); } arsenm: loadInputValue loads a physical register copy into a virtual register. This was written for the…
		hsmhsmAuthorUnsubmitted Done Reply Inline Actions I have taken care of it. Now, before making copy to destination virtual register, we ensure that it is defined. hsmhsm: I have taken care of it. Now, before making copy to destination virtual register, we ensure…
		arsenmUnsubmitted Not Done Reply Inline Actions I would just use the literal register value below rather than have a variable for it. At least make this a const Register arsenm: I would just use the literal register value below rather than have a variable for it. At least…
		MRI.setRegClass(VDstReg, &AMDGPU::SGPR_64RegClass);
		arsenmUnsubmitted Not Done Reply Inline Actions Why do you need to set the class? It shouldn't be necessary at this point arsenm: Why do you need to set the class? It shouldn't be necessary at this point
		hsmhsmAuthorUnsubmitted Done Reply Inline Actions We need to set a register class for destination virtual register. otherwise, we get an assertion failure during compilation saying that the register type is not valid. If it is not suppose to be setting here, where else should I be setting it? OR do I need to handle it in a different way? Please suggest it in more detail. hsmhsm: We need to set a register class for destination virtual register. otherwise, we get an…
		arsenmUnsubmitted Not Done Reply Inline Actions The register class is only tangentially related to the type. I think you somehow ended up creating a virtual register without setting the type on it. Did something already add the queue ptr as a live in before this lowering? I also would not call this VDstReg, as that makes it sound like a VGPR. arsenm: The register class is only tangentially related to the type. I think you somehow ended up…
		hsmhsmAuthorUnsubmitted Done Reply Inline Actions I have taken care of it. I just needed an another version of getLiveInRegister() which works with TargetRegisterClass. hsmhsm: I have taken care of it. I just needed an another version of getLiveInRegister() which works…
		arsenmUnsubmitted Not Done Reply Inline Actions Physical registers don't have types. This will just assert. Here you know the type is just LLT::pointer(CONSTANT_ADDRESSS, 64) arsenm: Physical registers don't have types. This will just assert. Here you know the type is just LLT…
		arsenmUnsubmitted Not Done Reply Inline Actions The type here should really be LLT::pointer(CONSTANT_ADDRESS, 64) arsenm: The type here should really be LLT::pointer(CONSTANT_ADDRESS, 64)
		if (!loadInputValue(VDstReg, B, Arg))
		return false;
		arsenmUnsubmitted Not Done Reply Inline Actions Remove DstReg and just directly refer to the physical register arsenm: Remove DstReg and just directly refer to the physical register
		B.buildCopy(DstReg, VDstReg);
		B.buildInstr(AMDGPU::S_TRAP)
		.addImm(GCNSubtarget::TrapIDLLVMTrap)
		.addReg(DstReg, RegState::Implicit);
		}

		MI.eraseFromParent();
		return true;
		}

		bool AMDGPULegalizerInfo::legalizeDebugTrapIntrinsic(
		MachineInstr &MI, MachineRegisterInfo &MRI, MachineIRBuilder &B) const {
		B.setInstr(MI);

		// Is non-HSA path or trap-handler disabled? then, report a warning
		// accordingly
		if (ST.getTrapHandlerAbi() != GCNSubtarget::TrapHandlerAbiHsa \|\|
		!ST.isTrapHandlerEnabled()) {
		DiagnosticInfoUnsupported NoTrap(B.getMF().getFunction(),
		"debugtrap handler not supported",
		MI.getDebugLoc(), DS_Warning);
		LLVMContext &Ctx = B.getMF().getFunction().getContext();
		Ctx.diagnose(NoTrap);
		} else {
		// Insert debug-trap instruction
		B.buildInstr(AMDGPU::S_TRAP).addImm(GCNSubtarget::TrapIDLLVMDebugTrap);
		}

		MI.eraseFromParent();
		return true;
		}

bool AMDGPULegalizerInfo::legalizeIntrinsic(MachineInstr &MI,		bool AMDGPULegalizerInfo::legalizeIntrinsic(MachineInstr &MI,
MachineIRBuilder &B,		MachineIRBuilder &B,
GISelChangeObserver &Observer) const {		GISelChangeObserver &Observer) const {
MachineRegisterInfo &MRI = *B.getMRI();		MachineRegisterInfo &MRI = *B.getMRI();

// Replace the use G_BRCOND with the exec manipulate and branch pseudos.		// Replace the use G_BRCOND with the exec manipulate and branch pseudos.
auto IntrID = MI.getIntrinsicID();		auto IntrID = MI.getIntrinsicID();
switch (IntrID) {		switch (IntrID) {
▲ Show 20 Lines • Show All 151 Lines • ▼ Show 20 Lines	bool AMDGPULegalizerInfo::legalizeIntrinsic(MachineInstr &MI,
case Intrinsic::amdgcn_struct_buffer_atomic_dec:		case Intrinsic::amdgcn_struct_buffer_atomic_dec:
case Intrinsic::amdgcn_raw_buffer_atomic_cmpswap:		case Intrinsic::amdgcn_raw_buffer_atomic_cmpswap:
case Intrinsic::amdgcn_struct_buffer_atomic_cmpswap:		case Intrinsic::amdgcn_struct_buffer_atomic_cmpswap:
return legalizeBufferAtomic(MI, B, IntrID);		return legalizeBufferAtomic(MI, B, IntrID);
case Intrinsic::amdgcn_atomic_inc:		case Intrinsic::amdgcn_atomic_inc:
return legalizeAtomicIncDec(MI, B, true);		return legalizeAtomicIncDec(MI, B, true);
case Intrinsic::amdgcn_atomic_dec:		case Intrinsic::amdgcn_atomic_dec:
return legalizeAtomicIncDec(MI, B, false);		return legalizeAtomicIncDec(MI, B, false);
		case Intrinsic::trap:
		return legalizeTrapIntrinsic(MI, MRI, B);
		case Intrinsic::debugtrap:
		return legalizeDebugTrapIntrinsic(MI, MRI, B);
default: {		default: {
if (const AMDGPU::ImageDimIntrinsicInfo *ImageDimIntr =		if (const AMDGPU::ImageDimIntrinsicInfo *ImageDimIntr =
AMDGPU::getImageDimIntrinsicInfo(IntrID))		AMDGPU::getImageDimIntrinsicInfo(IntrID))
return legalizeImageIntrinsic(MI, B, Observer, ImageDimIntr);		return legalizeImageIntrinsic(MI, B, Observer, ImageDimIntr);
return true;		return true;
}		}
}		}

return true;		return true;
}		}

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.debugtrap.ll

This file was added.

				; hsa-path: trap handler enabled
				; RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs < %s \| FileCheck -check-prefix=ENTRY-LABLE -check-prefix=HSA-DEBUG-TRAP %s
				; RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -mattr=+trap-handler < %s \| FileCheck -check-prefix=ENTRY-LABLE -check-prefix=HSA-DEBUG-TRAP %s

				; hsa-path: trap handler disabled
				; RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -mattr=-trap-handler < %s 2>&1 \| FileCheck -check-prefix=WARNING -check-prefix=ENTRY-LABLE -check-prefix=NO-HSA-DEBUG-TRAP %s

				; non-hsa-path: trap handler enabled
				; RUN: llc -global-isel -mtriple=amdgcn-unknown-mesa3d -verify-machineinstrs -mattr=+trap-handler < %s 2>&1 \| FileCheck -check-prefix=WARNING -check-prefix=ENTRY-LABLE -check-prefix=MESA-DEBUG-TRAP %s

				; non-hsa-path: trap handler disabled
				; RUN: llc -global-isel -mtriple=amdgcn-unknown-mesa3d -verify-machineinstrs -mattr=-trap-handler < %s 2>&1 \| FileCheck -check-prefix=WARNING -check-prefix=ENTRY-LABLE -check-prefix=NO-MESA-DEBUG-TRAP %s

				declare void @llvm.debugtrap() #0

				arsenmUnsubmitted Done Reply Inline Actions Shouldn't run the DAG tests in the GlobalIsel test directory arsenm: Shouldn't run the DAG tests in the GlobalIsel test directory
				; WARNING: warning: <unknown>:0:0: in function debug_trap void (i32 addrspace(1)*): debugtrap handler not supported

				; ENTRY-LABLE: {{^}}debug_trap:

				; HSA-DEBUG-TRAP: enable_trap_handler = 0
				; HSA-DEBUG-TRAP: s_trap 3
				; HSA-DEBUG-TRAP: COMPUTE_PGM_RSRC2:TRAP_HANDLER: 0

				; NO-HSA-DEBUG-TRAP: enable_trap_handler = 0
				; NO-HSA-DEBUG-TRAP: COMPUTE_PGM_RSRC2:TRAP_HANDLER: 0

				; MESA-DEBUG-TRAP: enable_trap_handler = 1
				; MESA-DEBUG-TRAP: COMPUTE_PGM_RSRC2:TRAP_HANDLER: 1

				; NO-MESA-DEBUG-TRAP: enable_trap_handler = 0
				; NO-MESA-DEBUG-TRAP: COMPUTE_PGM_RSRC2:TRAP_HANDLER: 0
				define amdgpu_kernel void @debug_trap(i32 addrspace(1)* nocapture readonly %arg0) {
				%alloca = alloca i32, addrspace(5)
				store volatile i32 2, i32 addrspace(5)* %alloca
				store volatile i32 1, i32 addrspace(1)* %arg0
				call void @llvm.debugtrap()
				ret void
				}

				attributes #0 = { nounwind noreturn }

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.trap.ll

This file was added.

				; hsa-path: trap handler enabled
				; RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs < %s \| FileCheck -check-prefix=ENTRY-LABLE -check-prefix=HSA-TRAP %s
				; RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -mattr=+trap-handler < %s \| FileCheck -check-prefix=ENTRY-LABLE -check-prefix=HSA-TRAP %s

				; hsa-path: trap handler disabled
				; RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -mattr=-trap-handler < %s \| FileCheck -check-prefix=ENTRY-LABLE -check-prefix=NO-HSA-TRAP %s

				; non-hsa-path: trap handler enabled
				; RUN: llc -global-isel -mtriple=amdgcn-unknown-mesa3d -verify-machineinstrs -mattr=+trap-handler < %s \| FileCheck -check-prefix=ENTRY-LABLE -check-prefix=MESA-TRAP %s

				; non-hsa-path: trap handler disabled
				; RUN: llc -global-isel -mtriple=amdgcn-unknown-mesa3d -verify-machineinstrs -mattr=-trap-handler < %s \| FileCheck -check-prefix=ENTRY-LABLE -check-prefix=NO-MESA-TRAP %s

				declare void @llvm.trap() #0

				; ENTRY-LABLE: {{^}}trap:

				; HSA-TRAP: enable_trap_handler = 0
				; HSA-TRAP: s_mov_b64 s[0:1], s[4:5]
				; HSA-TRAP: s_trap 2
				; HSA-TRAP: COMPUTE_PGM_RSRC2:TRAP_HANDLER: 0

				; NO-HSA-TRAP: enable_trap_handler = 0
				; NO-HSA-TRAP: s_endpgm
				; NO-HSA-TRAP: COMPUTE_PGM_RSRC2:TRAP_HANDLER: 0

				; MESA-TRAP: enable_trap_handler = 1
				; MESA-TRAP: s_endpgm
				; MESA-TRAP: COMPUTE_PGM_RSRC2:TRAP_HANDLER: 1

				; NO-MESA-TRAP: enable_trap_handler = 0
				; NO-MESA-TRAP: s_endpgm
				; NO-MESA-TRAP: COMPUTE_PGM_RSRC2:TRAP_HANDLER: 0
				define amdgpu_kernel void @trap(i32 addrspace(1)* nocapture readonly %arg0) {
				%alloca = alloca i32, addrspace(5)
				store volatile i32 2, i32 addrspace(5)* %alloca
				store volatile i32 1, i32 addrspace(1)* %arg0
				call void @llvm.trap()
				ret void
				}

				; ENTRY-LABLE: {{^}}non_entry_trap:

				; HSA-TRAP: enable_trap_handler = 0
				; HSA-TRAP: s_endpgm
				; HSA-TRAP: BB{{[0-9]_[0-9]+}}: ; %trap
				; HSA-TRAP: s_mov_b64 s[0:1], s[4:5]
				; HSA-TRAP: s_trap 2
				; HSA-TRAP: COMPUTE_PGM_RSRC2:TRAP_HANDLER: 0

				; NO-HSA-TRAP: enable_trap_handler = 0
				; NO-HSA-TRAP: s_endpgm
				; NO-HSA-TRAP: BB{{[0-9]_[0-9]+}}: ; %trap
				; NO-HSA-TRAP: s_endpgm
				; NO-HSA-TRAP: COMPUTE_PGM_RSRC2:TRAP_HANDLER: 0

				; MESA-TRAP: enable_trap_handler = 1
				; MESA-TRAP: s_endpgm
				; MESA-TRAP: BB{{[0-9]_[0-9]+}}: ; %trap
				; MESA-TRAP: s_endpgm
				; MESA-TRAP: COMPUTE_PGM_RSRC2:TRAP_HANDLER: 1

				; NO-MESA-TRAP: enable_trap_handler = 0
				; NO-MESA-TRAP: s_endpgm
				; NO-MESA-TRAP: BB{{[0-9]_[0-9]+}}: ; %trap
				; NO-MESA-TRAP: s_endpgm
				; NO-MESA-TRAP: COMPUTE_PGM_RSRC2:TRAP_HANDLER: 0
				define amdgpu_kernel void @non_entry_trap(i32 addrspace(1)* nocapture readonly %arg0) local_unnamed_addr {
				entry:
				%alloca = alloca i32, addrspace(5)
				store volatile i32 2, i32 addrspace(5)* %alloca
				%tmp29 = load volatile i32, i32 addrspace(1)* %arg0
				%cmp = icmp eq i32 %tmp29, -1
				br i1 %cmp, label %ret, label %trap

				trap:
				call void @llvm.trap()
				unreachable

				ret:
				store volatile i32 3, i32 addrspace(1)* %arg0
				ret void
				}

				attributes #0 = { nounwind noreturn }
				arsenmUnsubmitted Done Reply Inline Actions Can you add another test that uses the stack? Ideally we would also have a test in a non-kernel function, but I know that won't work right now since we don't handle the special argument inputs yet arsenm: Can you add another test that uses the stack? Ideally we would also have a test in a non…
				arsenmUnsubmitted Not Done Reply Inline Actions Actually, never mind. A non-kernel function should work. Calling it will not arsenm: Actually, never mind. A non-kernel function should work. Calling it will not
				arsenmUnsubmitted Not Done Reply Inline Actions Should still add a function test. Also one that has a separate, explicit use of the queue.ptr intrinsic wouldn't hurt either arsenm: Should still add a function test. Also one that has a separate, explicit use of the queue.ptr…
				hsmhsmAuthorUnsubmitted Done Reply Inline Actions Actually, I am bit confused about writing test cases here. Let's start from scratch here. As you suggested, I have shared the same original DAG test for GlobalISel, without re-inventing the same for GlobalISel. I was making some mistakes earlier, that I learned and corrected it, and sharing of original DAG test works fine now. Now, starting from this point, please suggest how should I further continue here. What are the additional tests that need to be covered here, and why? Do I need to add the additional tests within original DAG test, and make sure both ISelDAG path and GlobalSel path work fine for these additional tests? Or these are specific to only GlobalISel path for some specific reason? hsmhsm: Actually, I am bit confused about writing test cases here. Let's start from scratch here. 1.

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/GlobalISel: Support llvm.trap and llvm.debugtrap intrinsicsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 245020

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.debugtrap.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.trap.ll

AMDGPU/GlobalISel: Support llvm.trap and llvm.debugtrap intrinsics
ClosedPublic