An address can be a uniform sum of two i64 bit values.
That regularly happens in a loop where index is an induction
variable promoted to 64 bit by the LSR. We can materialize
zero in a VGPR and still use SADDR form of the load.
Details
- Reviewers
arsenm foad Joe_Nash - Commits
- rG909a5ccf3be7: [AMDGPU] Improve global SADDR selection
Diff Detail
Unit Tests
Time | Test | |
---|---|---|
80 ms | x64 debian > Clang.Driver::debug-pass-structure.c |
Event Timeline
This looks like it's regressing the case where the offset is known constant, but just doesn't fit.
Can you also make the corresponding globalisel change?
llvm/test/CodeGen/AMDGPU/global-saddr-load.ll | ||
---|---|---|
88–91 | This is more instructions |
Do you mean atomic_add_i32_huge_offset() test? Will check it.
Can you also make the corresponding globalisel change?
Yes, will do.
llvm/test/CodeGen/AMDGPU/global-saddr-load.ll | ||
---|---|---|
88–91 | It is one materialized zero, I think it is more or less degenarate case. In a real world there is usually a vgpr with zero. On the other hand this is one VGPR less. |
That's one case, but it looks like all of the cases that add an extra v_mov_b32 (e.g. global_load_saddr_i8_offset_neg4096 which I commented on).
Can you also make the corresponding globalisel change?
Yes, will do.
llvm/test/CodeGen/AMDGPU/global-saddr-load.ll | ||
---|---|---|
88–91 | This is the same pattern in all of the cases with a large immediate offset split. I think this is reasonably common and I would lean towards fewer instructions here |
llvm/test/CodeGen/AMDGPU/global-saddr-load.ll | ||
---|---|---|
88–91 | We still need to materialize zero, but I got the idea, will check what is possible to do here. One caveat with non zero vaddr is it will be impossible to switch to vaddr form if calculations will end up in sgprs (and they likely will in these cases) but for any reason we would need to move instruction to VALU which in turn will result in readfirstlane instructions for the address, like it is now. |
I am not sure if that can be simplified. It seems we have already checked for base with constant offset, maybe I can move the new checks higher where we have constant offset. I will try to simplify it.
llvm/test/CodeGen/AMDGPU/global-saddr-load.ll | ||
---|---|---|
2395 | Could you pre-commit this bit so we can see the diff please? |
Any chance that this causes this test breakage http://45.33.8.238/linux/45940/step_12.txt ?
Ah no, looks like this is due to @arsenm's https://github.com/llvm/llvm-project/commit/ef5f0adecd02d92cbb1a713ac7316f6768269412 which doesn't have a phab link :/
Could someone revert that? (On phone ATM, so can't do it myself)
Yes, it is this one: https://reviews.llvm.org/rGef5f0adecd02d92cbb1a713ac7316f6768269412
@arsenm you have missing colon in the pattern there.
clang-format: please reformat the code