This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Skip invariant loads when avoiding WAR conflicts
ClosedPublic

Authored by piotr on Apr 23 2021, 9:21 AM.

Details

Summary

No need to handle invariant loads when avoiding WAR conflicts, as
there cannot be a vector store to the same memory location.

Diff Detail

Event Timeline

piotr created this revision.Apr 23 2021, 9:21 AM
piotr requested review of this revision.Apr 23 2021, 9:21 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 23 2021, 9:21 AM
arsenm added inline comments.Apr 23 2021, 9:36 AM
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
987

Doesn't this normally use a register value based dependency? Why does it need to be conservative?

foad added inline comments.Apr 26 2021, 2:10 AM
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
987

This isn't handling a register dependency, it's D71934 "need to insert wait between the scalar load and vector store to the same address to avoid WAR conflict".

foad added inline comments.Apr 26 2021, 2:14 AM
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1489–1490

If we don't do anything here when !Ptr, then we have lost the information that there is an outstanding scalar load, which does not seem safe to me.

Stepping back a bit, I don't think the original SLoadAddresses implementation in D71934 was safe. Surely a scalar load and a vector store can alias even if they don't have an identical Ptr value?

piotr added inline comments.Apr 26 2021, 2:39 AM
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1489–1490

Agreed that my change is not correct either.

Looks D71934 was a best effort fix and did not intend to be a final one ("Proper fix should include creating the alias analysis on the machine IR that is obviously too big hummer at the moment.").

The reason I started looking at this is that the map gets corrupted if there are multiple elements with the same key and different values.

Possibly, I should be looking at places where the memop data is dropped and fix that at source.

piotr updated this revision to Diff 343757.May 7 2021, 2:03 PM

Re-purposing patch to not add invariant loads to the map.

piotr retitled this revision from [AMDGPU] Avoid adding nullptr keys to hash table to [AMDGPU] Skip invariant loads when avoiding WAR conflicts.May 7 2021, 2:04 PM
piotr edited the summary of this revision. (Show Details)
arsenm added a comment.May 7 2021, 2:26 PM

Can you also add a mir test

piotr updated this revision to Diff 344007.May 10 2021, 2:08 AM

Added mir test.

foad accepted this revision.May 11 2021, 5:02 AM
foad added inline comments.
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1492

Might be worth asserting that Ptr is not null here, since that's what caused the problem in the first place?

This revision is now accepted and ready to land.May 11 2021, 5:02 AM
piotr updated this revision to Diff 344710.May 12 2021, 1:22 AM

Added the assert. The assert seems in order here - no hits in the lit tests or Vulkan CTS. There would have been hits in 194 lit tests if the assert had been placed here without the isInvariant check, which somewhat proves usefulness of the patch.

foad added a comment.May 12 2021, 1:24 AM

Added the assert. The assert seems in order here - no hits in the lit tests or Vulkan CTS. There would have been hits in 194 lit tests if the assert had been placed here without the isInvariant check, which somewhat proves usefulness of the patch.

Thanks!

This revision was landed with ongoing or failed builds.May 12 2021, 1:58 AM
This revision was automatically updated to reflect the committed changes.
arsenm added inline comments.May 12 2021, 5:39 AM
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1492

Not having the value set is *not* an error. This should not be an assert. This is a situation that needs to work. If we had two loads if converted into a select on the pointer, it would have been correct to strip out the IR value

foad added inline comments.May 12 2021, 6:01 AM
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1492

OK, but just removing the assert is no good, because then we'll be back in the same situation of adding nullptr as a map key and getting non-deterministic compiles.

piotr added a comment.May 12 2021, 6:11 AM

Yes, based on Matt's last comment, there is still a potential problem even though my patch significantly reduces the likelihood of it occurring.