Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
The stats is rather interesting. Of all our test shaders, counting all the duplicates, 134 are affected by this change, but of them it's actually less than 19% of those that changed their code size. The reason seems to be that we often need the resulting offset in a register pair anyway, so our good old s_add/s_adc just keep living along with the shiny new 'offset:' loads.
Where the size of the code has changed, it is now smaller, except for one Wolfenstein II shader which got 140 extra bytes of code with this change applied. I'm looking into what's going on there.
The reason seems to be that with this change in place we generate more spills, which in turn is caused by reduced number of used registers as determined by AMDGPUResourceUsageAnalysis::analyzeResourceUsage() -- 102 registers originally vs 96 with the change. I'm not sure how precise the analysis is expected to be and whether this needs a ticket. Would appreciate any advice on this.
llvm/test/CodeGen/AMDGPU/amdgcn-load-offset-from-reg.ll | ||
---|---|---|
1–2 | This seems to be where we already have checks for a similar case? Is there a better place for that? |
llvm/test/CodeGen/AMDGPU/amdgcn-load-offset-from-reg.ll | ||
---|---|---|
1–2 |
Can you expand a bit more on why this is preferable, please? I see literally 4 tests utilising finalize-isel, none of them look like purely isel tests, and the other 35 all seem to use amdgpu-isel. The description of finalize-isel as it comes from rG9cac4e6d14035 again doesn't suggest anything obviously useful for this test? |
That should be precise except in the presence of indirect/external/unanalyzeable calls. I'm a bit surprised there would be any big difference from this, but this is certainly going to be from secondary effects
llvm/test/CodeGen/AMDGPU/amdgcn-load-offset-from-reg.ll | ||
---|---|---|
1–2 | amdgpu-isel is specifically the SDAG pass. Your GISEL check didn't stop anywhere near the same place, and -stop-pass ran off the end and completed register allocation |
llvm/test/CodeGen/AMDGPU/amdgcn-load-offset-from-reg.ll | ||
---|---|---|
1–2 | What if just remove GISEL checks from this test completely then? |
llvm/test/CodeGen/AMDGPU/amdgcn-load-offset-from-reg.ll | ||
---|---|---|
1–2 | I'd rather just use -stop-after finalize-isel |
The reason seems to be that with this change in place we generate more spills, which in turn is caused by reduced number of used registers as determined by AMDGPUResourceUsageAnalysis::analyzeResourceUsage()
I don't understand this. I thought AMDGPUResourceUsageAnalysis ran after regalloc so I don't see how it can affect spilling.
You are right, I was too quick thinking that that reduced number of registers is where the higher pressure comes from, sorry for misguiding. Further analysis showed that despite we indeed seem to use less registers in the end, at the register allocation phase an additional spill slot is now introduced due to interference of live ranges and a failure to assign a value to a physical register, which I think is not very surprising as I see lots of moved code in other shaders as well, and for that slot we generate 4 v_writelane_b32s and 16 additional v_readlane_b32s replacing what previously was 4 s_mov_b64s and a s_waitcnt. So unfortunately doesn't look like something easy to avoid.
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp | ||
---|---|---|
1978 | I don't follow how/why B is being discarded |
llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp | ||
---|---|---|
1978 | It's being split into SBase and SOffset, which we both return. |
I don't follow how/why B is being discarded