Legalize soffset of buffer instructions. If vgpr is assigned to soffset, use readlaneVGPRToSGPR to copy value from vgpr to sgpr.
Diff Detail
Unit Tests
Time | Test | |
---|---|---|
60,040 ms | x64 debian > libFuzzer.libFuzzer::minimize_crash.test |
Event Timeline
Looks good.
llvm/test/CodeGen/AMDGPU/legalize-soffset-mbuf.ll | ||
---|---|---|
4 | Remove inreg throughout? I don't think it does anything unless you use one of the graphics calling conventionsl like amdgpu_ps. Also, you are using a non-uniform %rsrc so the compiler is (correctly) generating a waterfall loop, but that's not what you're testing for here. So could you change the test to use a uniform %rsrc, and then the generated code would be much simpler, and you could use update_mir_test_checks.py. |
llvm/test/CodeGen/AMDGPU/legalize-soffset-mbuf.ll | ||
---|---|---|
14 | I still think this is just a flat out wrong lowering that needs to use a waterfall loop |
llvm/test/CodeGen/AMDGPU/legalize-soffset-mbuf.ll | ||
---|---|---|
14 | What's the use case for that? My understanding is that isel will never put a divergent value in soffset, and for the intrinsics we can define that soffset has to be uniform. |
llvm/test/CodeGen/AMDGPU/legalize-soffset-mbuf.ll | ||
---|---|---|
14 | You cannot define that a value has to be uniform. Every transform would need to be aware of that concept and apply it. load constant 32-bit is similarly broken in pretending to provide a uniformity guarantee |
llvm/test/CodeGen/AMDGPU/legalize-soffset-mbuf.ll | ||
---|---|---|
14 | Hmm. I guess you could define that an input had to be uniform if you marked the intrinsic as the-opposite-of-convergent, to say it cannot be hoisted out of an "if"? |
llvm/test/CodeGen/AMDGPU/legalize-soffset-mbuf.ll | ||
---|---|---|
14 | I don't think we would want to do that. I think we just need to be able to handle waterfall loops for anything and have optimizations to avoid this kind of problem. We don't need a hard constraint |
llvm/test/CodeGen/AMDGPU/legalize-soffset-mbuf.ll | ||
---|---|---|
14 | OK, well I have no objections to someone working on that approach. But it does feel like the optimization that you would need to implement to avoid hurting performance by introducing a waterfall loop where you don't really need one, would be very very similar to the hard constraint imposed by marking the intrinsic as the-opposite-of-convergent. |
Remove inreg throughout? I don't think it does anything unless you use one of the graphics calling conventionsl like amdgpu_ps.
Also, you are using a non-uniform %rsrc so the compiler is (correctly) generating a waterfall loop, but that's not what you're testing for here. So could you change the test to use a uniform %rsrc, and then the generated code would be much simpler, and you could use update_mir_test_checks.py.