Page MenuHomePhabricator

[AMDGPU] Split GCN subtarget features for unaligned access
AbandonedPublic

Authored by mbrkusanin on Mar 12 2021, 4:08 AM.

Details

Summary

Split UnalignedAccessMode into UnalignedDSAccess and UnalignedBufferAccess so
that only UnalignedBufferAccess can be enabled by default for AMDHSA

Diff Detail

Event Timeline

mbrkusanin created this revision.Mar 12 2021, 4:08 AM
mbrkusanin requested review of this revision.Mar 12 2021, 4:08 AM
piotr edited reviewers, added: arsenm; removed: arsen.Mar 12 2021, 4:33 AM
rampitec added inline comments.Mar 12 2021, 8:21 AM
llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
96

HSA wants unaligned DS access as well. That is only underaligned ds_read/write_b128 shall not be produced for performance reasons.

I thought these are still the same hardware control, so aren't truly different features

@mbrkusanin

I see some comments from @rampitec and @arsenm, and @rampitec had also made below comments in the related internal JIRA ticket.

Comments from @rampitec in JIRA ticket:
Disabling unaligned dword access is not acceptable for HSA.
My proposal was simple: prefer ds_read2_b64 (and write) over b128 if alignment < 16. I never heard of b64 performance issues though, so the rest is the same as now: create a widest load/store possible with that one exception for b128.

From the JIRA comments, I also see @foad is in agreement with above comments. So, it looks like, we should avoid accessing ds_read_128/ds_write_128 when the alignment is < 16. I am not sure, what you think here, but, your comments are very valuable to make further progress here.

If we all agree with above, then looks like the the file DSInstructions.td has to go below changes.

The code

let SubtargetPredicate = HasUnalignedDSAccess in {

foreach vt = VReg_96.RegTypes in {
defm : DSReadPat_mc <DS_READ_B96, vt, "load_local">;
}

foreach vt = VReg_128.RegTypes in {
defm : DSReadPat_mc <DS_READ_B128, vt, "load_local">;
}

} // End SubtargetPredicate = HasUnalignedDSAccess

should change to

let SubtargetPredicate = HasUnalignedDSAccess in {

foreach vt = VReg_96.RegTypes in {
defm : DSReadPat_mc <DS_READ_B96, vt, "load_local">;
}

foreach vt = VReg_128.RegTypes in {
defm : DSReadPat_mc <DS_READ_B128, vt, "load_align16_local">;
}

} // End SubtargetPredicate = HasUnalignedDSAccess

and the code

let SubtargetPredicate = HasUnalignedDSAccess in {

foreach vt = VReg_96.RegTypes in {
defm : DSWritePat_mc <DS_WRITE_B96, vt, "store_local">;
}

foreach vt = VReg_128.RegTypes in {
defm : DSWritePat_mc <DS_WRITE_B128, vt, "store_local">;
}

} // End SubtargetPredicate = HasUnalignedDSAccess

should change to

let SubtargetPredicate = HasUnalignedDSAccess in {

foreach vt = VReg_96.RegTypes in {
defm : DSWritePat_mc <DS_WRITE_B96, vt, "store_local">;
}

foreach vt = VReg_128.RegTypes in {
defm : DSWritePat_mc <DS_WRITE_B128, vt, "store_align16_local">;
}

} // End SubtargetPredicate = HasUnalignedDSAccess

@mbrkusanin

I see some comments from @rampitec and @arsenm, and @rampitec had also made below comments in the related internal JIRA ticket.

Comments from @rampitec in JIRA ticket:
Disabling unaligned dword access is not acceptable for HSA.
My proposal was simple: prefer ds_read2_b64 (and write) over b128 if alignment < 16. I never heard of b64 performance issues though, so the rest is the same as now: create a widest load/store possible with that one exception for b128.

From the JIRA comments, I also see @foad is in agreement with above comments. So, it looks like, we should avoid accessing ds_read_128/ds_write_128 when the alignment is < 16. I am not sure, what you think here, but, your comments are very valuable to make further progress here.

If we all agree with above, then looks like the the file DSInstructions.td has to go below changes.

I strongly disagree with splitting features in a way that does not correspond with the hardware controls to match a desired output.

@mbrkusanin
My proposal was simple: prefer ds_read2_b64 (and write) over b128 if alignment < 16. I never heard of b64 performance issues though, so the rest is the same as now: create a widest load/store possible with that one exception for b128.

We have since then confirmed that ds_read_b64 has the same performance hit on memory not aligned to 64 bit, so 64 bit operations too need an alignment check.

@mbrkusanin
My proposal was simple: prefer ds_read2_b64 (and write) over b128 if alignment < 16. I never heard of b64 performance issues though, so the rest is the same as now: create a widest load/store possible with that one exception for b128.

We have since then confirmed that ds_read_b64 has the same performance hit on memory not aligned to 64 bit, so 64 bit operations too need an alignment check.

But I see the below ISel pattern for - DS_READ_B64 indeed checks alignment requirment, though not sure if it is over written some where else.

DSInstructions.td:717-719

foreach vt = VReg_64.RegTypes in {
defm : DSReadPat_mc <DS_READ_B64, vt, "load_align8_local">;
}

@mbrkusanin
My proposal was simple: prefer ds_read2_b64 (and write) over b128 if alignment < 16. I never heard of b64 performance issues though, so the rest is the same as now: create a widest load/store possible with that one exception for b128.

We have since then confirmed that ds_read_b64 has the same performance hit on memory not aligned to 64 bit, so 64 bit operations too need an alignment check.

But I see the below ISel pattern for - DS_READ_B64 indeed checks alignment requirment, though not sure if it is over written some where else.

DSInstructions.td:717-719

foreach vt = VReg_64.RegTypes in {
defm : DSReadPat_mc <DS_READ_B64, vt, "load_align8_local">;
}

And further based on a closer look at the code, I think, in general, we better revert the commit 0654ff703d4e99423133165db63083b831efb9b6 in order to avoid above performance issues because of unligned access. But, I am not aware of the consequences of reverting this patch.

@mbrkusanin / @foad What is your openion on reverting the patch 0654ff703d4e99423133165db63083b831efb9b6?

foad added a comment.Mar 23 2021, 10:24 AM

@mbrkusanin / @foad What is your openion on reverting the patch 0654ff703d4e99423133165db63083b831efb9b6?

0654ff703d4e99423133165db63083b831efb9b6 (aka D84403) had good effects like selecting ds_write_b96 when the alignment was 16. It would be a shame to lose that.

arsenm requested changes to this revision.Mar 30 2021, 5:36 PM
This revision now requires changes to proceed.Mar 30 2021, 5:36 PM
mbrkusanin abandoned this revision.Apr 8 2021, 2:50 AM