This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fixes for the new waitcnt insertion pass. Add test.
ClosedPublic

Authored by msearles on May 11 2017, 7:47 PM.

Details

Summary
  • new waitcnt pass remains off by default; -enable-si-insert-waitcnts=1 to enable it
  • fix handling of PERMUTE ops
  • fix insertion of waitcnt instrs at function begin/end ( port of analogous code that was added to old waitcnt pass )

Diff Detail

Event Timeline

kanarayan created this revision.May 11 2017, 7:47 PM
This revision is now accepted and ready to land.May 11 2017, 8:28 PM

Can you hold off on merging this until people have had a chance to test with radv and mesa.

Can you hold off on merging this until people have had a chance to test with radv and mesa.

Yes, sure. What is estimated time to test? We tested GL mesa internally and did not find any issues.

No regressions in piglit gpu.py on Kaveri.

Tested-by: Michel Dänzer <michel.daenzer@amd.com>

test/CodeGen/AMDGPU/ret_jump.ll
75

Trailing whitespace.

Any update on the radv testing/timeline? Assuming all is well, I will check in this change today.

Any update on the radv testing/timeline? Assuming all is well, I will check in this change today.

We still got a regression with radv, using the Sascha Willems distancefieldfonts demo from

https://github.com/SaschaWillems/Vulkan

image showing issue: https://people.freedesktop.org/~bnieuwenhuizen/radv_regression.png

full log regressed: https://people.freedesktop.org/~bnieuwenhuizen/waitcnt_regression.log
full log correct: https://people.freedesktop.org/~bnieuwenhuizen/waitcnt_correct.log

shader that probably gets miscompiled: https://people.freedesktop.org/~bnieuwenhuizen/waitcnt_shader.llvm

radv regression looks related do ds_bpermute interactions if I had to guess.

msearles commandeered this revision.May 26 2017, 3:37 PM
msearles edited reviewers, added: kanarayan; removed: msearles.
msearles edited edge metadata.

Commandeering this revision; I have inherited it from the original author (kanarayan).

msearles updated this revision to Diff 100486.May 26 2017, 3:47 PM

Updates since last set of diffs:

  • Fix conditional in SIInsertWaitCnts::updateEventWaitCntAfter() , which was always returning true
  • Tweak how DS_PERMUTE/DS_SWIZZLE ops are modeled to cover issue found during RADV testing
  • Tweak how new pass inserts waitcnts at the beginning/end of functions to match what old pass does
  • Adjust existing tests
  • Add new test ( test/CodeGen/AMDGPU/waitcnt-permute.mir ) to cover issue found during RADV testing

Marc, please upload full context diff (with -U999999).

rampitec added inline comments.May 26 2017, 4:35 PM
lib/Target/AMDGPU/DSInstructions.td
442 ↗(On Diff #100486)

I'm not really sure mayLoad is a correct flag here.

lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1865

We do not have call support, how this code is triggered?

tstellar added inline comments.May 26 2017, 4:55 PM
lib/Target/AMDGPU/DSInstructions.td
442 ↗(On Diff #100486)

The LGKM_CNT flag indicates that the instruction updates the count so you shouldn't need to rely on isLoad. Look for this comment in the pass:

// TODO: Use the (TSFlags & SIInstrFlags::LGKM_CNT) property everywhere.
msearles updated this revision to Diff 100509.May 26 2017, 5:34 PM
  • Updating to git diff -U999999
  • Two reviewers have questioned the use of mayLoad, so I will rework that aspect of the fix.
msearles added inline comments.May 26 2017, 5:43 PM
lib/Target/AMDGPU/DSInstructions.td
442 ↗(On Diff #100486)

I will revise.

lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1865

Code is triggered by 3 LIT tests, which were recently added by Matt Arsenault:
LLVM :: CodeGen/AMDGPU/frame-index-elimination.ll
LLVM :: CodeGen/AMDGPU/function-args.ll
LLVM :: CodeGen/AMDGPU/function-returns.ll

I'm porting the fixes he put inplace within the old wait pass into the new to keep them in step with one another.

msearles updated this revision to Diff 100516.May 26 2017, 7:05 PM

Don't use mayLoad, so revert changes to DSInstructions.td; use (TSFlags & SIInstrFlags::LGKM_CNT) instead.

I’m splitting this review into two reviews:
(1) This review will be repurposed solely for the fixes to the new waitcnt pass and a new test; no changes to those fixes; so I’m going to consider those fixes accepted.
(2) The second review will be to turn the new waitcnt pass on by default. I'll give folks a chance to weigh in on that before moving forward with it.

msearles updated this revision to Diff 100873.May 31 2017, 9:06 AM
msearles retitled this revision from [AMDGPU] Turn on the new waitcnt insertion pass. Adjust tests. to [AMDGPU] Fixes for the new waitcnt insertion pass. Add test..
msearles edited the summary of this revision. (Show Details)

Limit this revision to the fixes for the waitcnt pass and the new test.

msearles closed this revision.Jun 1 2017, 8:36 AM

r304311