- new waitcnt pass remains off by default; -enable-si-insert-waitcnts=1 to enable it
- fix handling of PERMUTE ops
- fix insertion of waitcnt instrs at function begin/end ( port of analogous code that was added to old waitcnt pass )
Details
Diff Detail
Event Timeline
Can you hold off on merging this until people have had a chance to test with radv and mesa.
Yes, sure. What is estimated time to test? We tested GL mesa internally and did not find any issues.
No regressions in piglit gpu.py on Kaveri.
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
test/CodeGen/AMDGPU/ret_jump.ll | ||
---|---|---|
75 | Trailing whitespace. |
Any update on the radv testing/timeline? Assuming all is well, I will check in this change today.
We still got a regression with radv, using the Sascha Willems distancefieldfonts demo from
https://github.com/SaschaWillems/Vulkan
image showing issue: https://people.freedesktop.org/~bnieuwenhuizen/radv_regression.png
full log regressed: https://people.freedesktop.org/~bnieuwenhuizen/waitcnt_regression.log
full log correct: https://people.freedesktop.org/~bnieuwenhuizen/waitcnt_correct.log
shader that probably gets miscompiled: https://people.freedesktop.org/~bnieuwenhuizen/waitcnt_shader.llvm
Commandeering this revision; I have inherited it from the original author (kanarayan).
Updates since last set of diffs:
- Fix conditional in SIInsertWaitCnts::updateEventWaitCntAfter() , which was always returning true
- Tweak how DS_PERMUTE/DS_SWIZZLE ops are modeled to cover issue found during RADV testing
- Tweak how new pass inserts waitcnts at the beginning/end of functions to match what old pass does
- Adjust existing tests
- Add new test ( test/CodeGen/AMDGPU/waitcnt-permute.mir ) to cover issue found during RADV testing
lib/Target/AMDGPU/DSInstructions.td | ||
---|---|---|
442 ↗ | (On Diff #100486) | The LGKM_CNT flag indicates that the instruction updates the count so you shouldn't need to rely on isLoad. Look for this comment in the pass: // TODO: Use the (TSFlags & SIInstrFlags::LGKM_CNT) property everywhere. |
- Updating to git diff -U999999
- Two reviewers have questioned the use of mayLoad, so I will rework that aspect of the fix.
lib/Target/AMDGPU/DSInstructions.td | ||
---|---|---|
442 ↗ | (On Diff #100486) | I will revise. |
lib/Target/AMDGPU/SIInsertWaitcnts.cpp | ||
1865 | Code is triggered by 3 LIT tests, which were recently added by Matt Arsenault: I'm porting the fixes he put inplace within the old wait pass into the new to keep them in step with one another. |
Don't use mayLoad, so revert changes to DSInstructions.td; use (TSFlags & SIInstrFlags::LGKM_CNT) instead.
I’m splitting this review into two reviews:
(1) This review will be repurposed solely for the fixes to the new waitcnt pass and a new test; no changes to those fixes; so I’m going to consider those fixes accepted.
(2) The second review will be to turn the new waitcnt pass on by default. I'll give folks a chance to weigh in on that before moving forward with it.
We do not have call support, how this code is triggered?