No need to handle invariant loads when avoiding WAR conflicts, as
there cannot be a vector store to the same memory location.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp | ||
---|---|---|
987 | Doesn't this normally use a register value based dependency? Why does it need to be conservative? |
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp | ||
---|---|---|
1489–1490 | If we don't do anything here when !Ptr, then we have lost the information that there is an outstanding scalar load, which does not seem safe to me. Stepping back a bit, I don't think the original SLoadAddresses implementation in D71934 was safe. Surely a scalar load and a vector store can alias even if they don't have an identical Ptr value? |
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp | ||
---|---|---|
1489–1490 | Agreed that my change is not correct either. Looks D71934 was a best effort fix and did not intend to be a final one ("Proper fix should include creating the alias analysis on the machine IR that is obviously too big hummer at the moment."). The reason I started looking at this is that the map gets corrupted if there are multiple elements with the same key and different values. Possibly, I should be looking at places where the memop data is dropped and fix that at source. |
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp | ||
---|---|---|
1492 | Might be worth asserting that Ptr is not null here, since that's what caused the problem in the first place? |
Added the assert. The assert seems in order here - no hits in the lit tests or Vulkan CTS. There would have been hits in 194 lit tests if the assert had been placed here without the isInvariant check, which somewhat proves usefulness of the patch.
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp | ||
---|---|---|
1492 | Not having the value set is *not* an error. This should not be an assert. This is a situation that needs to work. If we had two loads if converted into a select on the pointer, it would have been correct to strip out the IR value |
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp | ||
---|---|---|
1492 | OK, but just removing the assert is no good, because then we'll be back in the same situation of adding nullptr as a map key and getting non-deterministic compiles. |
Yes, based on Matt's last comment, there is still a potential problem even though my patch significantly reduces the likelihood of it occurring.
Doesn't this normally use a register value based dependency? Why does it need to be conservative?