SIInsertWaitcnts inserts waitcnt instructions to resolve data
dependencies. The GFX10+ vscnt (VMEM store count) counter is never used
in this way. It is only used to resolve memory dependencies, and that is
handled by SIMemoryLegalizer. Hence there is no need to conservatively
wait for vscnt to be 0 on function entry and before returns.
Details
- Reviewers
nhaehnle mareko rampitec kerbowa stepthomas arsenm - Group Reviewers
Restricted Project - Commits
- rGf2c164c81505: [AMDGPU] Do not wait for vscnt on function entry and return
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Time | Test | |
---|---|---|
60,040 ms | x64 debian > MLIR.Examples/standalone::test.toy |
Event Timeline
llvm/test/CodeGen/AMDGPU/GlobalISel/bswap.ll | ||
---|---|---|
70 | These were super annoying |
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp | ||
---|---|---|
1231 | Technically it should not matter on HW with VScnt since they all can back off barriers, so this 'if' should never be true on Navi. There is an exception currently with gfx11 because of the memory model description bug with cumode, but that should be temporary. |
Ping!
This does not change the ABI. It just removes a bunch of wait instructions that were never required in the first place.
It is only used to resolve memory dependencies, and that is handled by SIMemoryLegalizer.
The idea was the stores on the caller side don't need to wait to finish because the prolog wait will take care of it after the latency of the jump. Is the memory legalizer not taking advantage of this?
It is only used to resolve memory dependencies, and that is handled by SIMemoryLegalizer.
The idea was the stores on the caller side don't need to wait to finish because the prolog wait will take care of it after the latency of the jump. Is the memory legalizer not taking advantage of this?
I don't see how it can be taking advantage of that, because it only considers each load or store in isolation.
Would that just be an implementation deficiency? With something like https://godbolt.org/z/55YEn473W where does the wait happen?
With something like https://godbolt.org/z/55YEn473W where does the wait happen?
No wait is required. The store and the load stay in order. SIMemoryLegalizer does not insert one, irrespective of whether the store and load are in the same function or not.
@kerbowa unlike the rest of SIInsertWaitcnts, I assume this part does want to wait for vscnt==0 since it is handling memory dependencies?