This is an archive of the discontinued LLVM Phabricator instance.

Revert "[AMDGPU] Mark mbcnt as convergent"
ClosedPublic

Authored by sameerds on Jun 28 2023, 1:31 AM.

Details

Summary

This reverts commit 37114036aa57e53217a57afacd7f47b36114edfb.

The output of mbcnt does not depend on other active lanes, and hence it is not
convergent. The original change was made as a possible fix for

https://github.com/ROCm-Developer-Tools/HIP/issues/3172

But changing mbcnt does not fix that issue.

Diff Detail

Event Timeline

sameerds created this revision.Jun 28 2023, 1:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 28 2023, 1:31 AM
sameerds requested review of this revision.Jun 28 2023, 1:31 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJun 28 2023, 1:31 AM
foad accepted this revision.Jun 28 2023, 4:59 AM
This revision is now accepted and ready to land.Jun 28 2023, 4:59 AM
yaxunl requested changes to this revision.Jun 28 2023, 5:42 AM

Marking mbcnt as convergent, together with https://reviews.llvm.org/D144756 prevent mbcnt to be merged, which fixed the reported issue.

Do you have an alternative fix for the issue?

This revision now requires changes to proceed.Jun 28 2023, 5:42 AM

Marking mbcnt as convergent, together with https://reviews.llvm.org/D144756 prevent mbcnt to be merged, which fixed the reported issue.

Do you have an alternative fix for the issue?

I completely disagree with this line of thought. The change to mbcnt is fundamentally incorrect and not related to the issue. There is no ground to ask for changes to this revision. Why was the change to mbcnt committed if was not an actual fix for anything?

Also, please note that the issue itself is invalid. I have put a comment in github explaining the same. The original program itself is invalid.

yaxunl accepted this revision.Jun 28 2023, 12:12 PM

I checked the ISA manual about mbcnt and I agree that its value should only depend on thread position in warp and not on CFG.

It is possible that the side effect of preventing it from merging fixed something by accident, but marking mbcnt as convergent seems not right. Therefore I agree to revert that patch.

This revision is now accepted and ready to land.Jun 28 2023, 12:12 PM
sameerds updated this revision to Diff 535606.Jun 28 2023, 7:29 PM
  • rebase
  • fix failing lit tests for atomic optimization
sameerds requested review of this revision.Jun 28 2023, 7:30 PM
sameerds added a reviewer: pravinjagtap.

@pravinjagtap @arsenm ... reverting the mbcnt intrinsic affects tests that were added for atomic optimizations. In particular, the mbcnt is now being moved across/into/out of control flow because it is no longer convergent. I eyeballed one example and it seemed okay to me, but a more thorough check will be useful.

@pravinjagtap @arsenm ... reverting the mbcnt intrinsic affects tests that were added for atomic optimizations. In particular, the mbcnt is now being moved across/into/out of control flow because it is no longer convergent. I eyeballed one example and it seemed okay to me, but a more thorough check will be useful.

Hello @sameerds,

The existing logic (before D147408) in atomic optimizer uses mbcnt(exec) to setup the control flow to allow only one lane to update the reduced value [Please refer: https://github.com/llvm/llvm-project/blob/80155cbf0be1744953edf68b9729c24bd0de79c4/llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp#L671]. The instructions related to that seems to be moved to other basic block (ComputeEnd). In my opinion it should not affect the logic. @ruiling and @foad can you please confirm this ?

While working on D147408, I introduced cttz(exec) [Please refer: https://github.com/llvm/llvm-project/blob/80155cbf0be1744953edf68b9729c24bd0de79c4/llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp#L538C1-L539C72] which is not being affected here. The EXEC is being moved to sgpr before the ComputeLoop (which we want) and then used it inside the ComputeLoop and modified inside it to set up the logic for ComputeLoop. This is not being affected because of this change.

I could be completely wrong here. Lets see what others have to say.

foad added a comment.Jun 28 2023, 11:30 PM

@pravinjagtap @arsenm ... reverting the mbcnt intrinsic affects tests that were added for atomic optimizations. In particular, the mbcnt is now being moved across/into/out of control flow because it is no longer convergent. I eyeballed one example and it seemed okay to me, but a more thorough check will be useful.

They are just being moved from before the loop to after the loop. This is fine. It is even a bit weird that the atomic optimizer pass emits them before the loop in the first place.

@pravinjagtap @arsenm ... reverting the mbcnt intrinsic affects tests that were added for atomic optimizations. In particular, the mbcnt is now being moved across/into/out of control flow because it is no longer convergent. I eyeballed one example and it seemed okay to me, but a more thorough check will be useful.

They are just being moved from before the loop to after the loop. This is fine. It is even a bit weird that the atomic optimizer pass emits them before the loop in the first place.

@foad just to respect the process, are you okay with approving the review again? I had set it back to "needs review".

ruiling accepted this revision.Jun 29 2023, 9:38 PM
This revision is now accepted and ready to land.Jun 29 2023, 9:38 PM
foad accepted this revision.Jun 29 2023, 11:19 PM
This revision was landed with ongoing or failed builds.Jun 30 2023, 12:54 AM
This revision was automatically updated to reflect the committed changes.