This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][CodeGen] Support (register + immediate) SMRD offsets.
ClosedPublic

Authored by kosarev on Jul 8 2022, 9:58 AM.

Diff Detail

Event Timeline

kosarev created this revision.Jul 8 2022, 9:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 8 2022, 9:58 AM
kosarev requested review of this revision.Jul 8 2022, 9:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 8 2022, 9:58 AM

The stats is rather interesting. Of all our test shaders, counting all the duplicates, 134 are affected by this change, but of them it's actually less than 19% of those that changed their code size. The reason seems to be that we often need the resulting offset in a register pair anyway, so our good old s_add/s_adc just keep living along with the shiny new 'offset:' loads.

Where the size of the code has changed, it is now smaller, except for one Wolfenstein II shader which got 140 extra bytes of code with this change applied. I'm looking into what's going on there.

arsenm added inline comments.Jul 8 2022, 10:24 AM
llvm/test/CodeGen/AMDGPU/amdgcn-load-offset-from-reg.ll
1–3

You shouldn't use -stop-after=amdgpu-isel. Use finalize-isel for MIR tests (although I'm not sure why you're testing MIR here)

60

Also test some other load sizes?

I'm looking into what's going on there.

The reason seems to be that with this change in place we generate more spills, which in turn is caused by reduced number of used registers as determined by AMDGPUResourceUsageAnalysis::analyzeResourceUsage() -- 102 registers originally vs 96 with the change. I'm not sure how precise the analysis is expected to be and whether this needs a ticket. Would appreciate any advice on this.

kosarev added inline comments.Jul 11 2022, 7:42 AM
llvm/test/CodeGen/AMDGPU/amdgcn-load-offset-from-reg.ll
1–3

This seems to be where we already have checks for a similar case? Is there a better place for that?

kosarev added inline comments.Jul 11 2022, 9:22 AM
llvm/test/CodeGen/AMDGPU/amdgcn-load-offset-from-reg.ll
1–3

You shouldn't use -stop-after=amdgpu-isel.

Can you expand a bit more on why this is preferable, please? I see literally 4 tests utilising finalize-isel, none of them look like purely isel tests, and the other 35 all seem to use amdgpu-isel. The description of finalize-isel as it comes from rG9cac4e6d14035 again doesn't suggest anything obviously useful for this test?

kosarev marked an inline comment as done.Jul 11 2022, 10:02 AM

I'm looking into what's going on there.

The reason seems to be that with this change in place we generate more spills, which in turn is caused by reduced number of used registers as determined by AMDGPUResourceUsageAnalysis::analyzeResourceUsage() -- 102 registers originally vs 96 with the change. I'm not sure how precise the analysis is expected to be and whether this needs a ticket. Would appreciate any advice on this.

That should be precise except in the presence of indirect/external/unanalyzeable calls. I'm a bit surprised there would be any big difference from this, but this is certainly going to be from secondary effects

llvm/test/CodeGen/AMDGPU/amdgcn-load-offset-from-reg.ll
1–3

amdgpu-isel is specifically the SDAG pass. Your GISEL check didn't stop anywhere near the same place, and -stop-pass ran off the end and completed register allocation

kosarev added inline comments.Jul 11 2022, 10:41 AM
llvm/test/CodeGen/AMDGPU/amdgcn-load-offset-from-reg.ll
1–3

What if just remove GISEL checks from this test completely then?

arsenm added inline comments.Jul 11 2022, 10:48 AM
llvm/test/CodeGen/AMDGPU/amdgcn-load-offset-from-reg.ll
1–3

I'd rather just use -stop-after finalize-isel

foad added a comment.Jul 12 2022, 3:14 AM

The reason seems to be that with this change in place we generate more spills, which in turn is caused by reduced number of used registers as determined by AMDGPUResourceUsageAnalysis::analyzeResourceUsage()

I don't understand this. I thought AMDGPUResourceUsageAnalysis ran after regalloc so I don't see how it can affect spilling.

I don't understand this. I thought AMDGPUResourceUsageAnalysis ran after regalloc so I don't see how it can affect spilling.

You are right, I was too quick thinking that that reduced number of registers is where the higher pressure comes from, sorry for misguiding. Further analysis showed that despite we indeed seem to use less registers in the end, at the register allocation phase an additional spill slot is now introduced due to interference of live ranges and a failure to assign a value to a physical register, which I think is not very surprising as I see lots of moved code in other shaders as well, and for that slot we generate 4 v_writelane_b32s and 16 additional v_readlane_b32s replacing what previously was 4 s_mov_b64s and a s_waitcnt. So unfortunately doesn't look like something easy to avoid.

kosarev updated this revision to Diff 443967.Jul 12 2022, 8:47 AM

Changed the .ll test to use -stop-after=finalize-isel.

arsenm added inline comments.Jul 12 2022, 8:55 AM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1978

I don't follow how/why B is being discarded

kosarev added inline comments.Jul 12 2022, 9:10 AM
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1978

It's being split into SBase and SOffset, which we both return.

arsenm accepted this revision.Jul 12 2022, 9:25 AM
This revision is now accepted and ready to land.Jul 12 2022, 9:25 AM
This revision was landed with ongoing or failed builds.Jul 18 2022, 3:31 AM
This revision was automatically updated to reflect the committed changes.