This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][GFX11] Mitigate VALU mask write hazard
ClosedPublic

Authored by critson on Sep 18 2022, 5:26 PM.

Details

Summary

VALU use of an SGPR (pair) as mask followed by SALU write to the
same SGPR can cause incorrect execution of subsequent SALU reads
of the SGPR.

Diff Detail

Event Timeline

critson created this revision.Sep 18 2022, 5:26 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 18 2022, 5:26 PM
critson requested review of this revision.Sep 18 2022, 5:26 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 18 2022, 5:26 PM
rampitec added inline comments.Sep 19 2022, 11:09 AM
llvm/lib/Target/AMDGPU/AMDGPU.td
707 ↗(On Diff #461106)

You do not really need a feature bit for this, just a subtarget check.

llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
2722

You seem to start from the instruction in the middle and have no check for the 3rd instruction. This makes the w/a overrestrictive as far as I understand. This maybe fine, but probably deserves a comment.

2728

A mask can only be a 64 bit register in wave64. You could check the size and bail rather then listing 32 bit registers.

2735

Isn't that less error prone to just check for an implicit vcc use? I assume it shall not be true unless it reads a mask.

2744

As far as I understand this can only be a VCC too.

2807

Call runOnInstruction() on this new instruction.

critson marked 6 inline comments as done.Sep 20 2022, 7:13 PM
critson added inline comments.
llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
2722

Yes, this intentionally does not do a forward check to look for hazard expiry.
From my analysis we rarely reach expiry, <10% of the time in graphics.
So I considered the forward check a waste of compile time.
I will add a comment to explain this.

2728

Mask can only be 64 bit, but SALU write to either 32 bit half is a hazard.

2735

I am not sure I understand?
These instructions always implicitly read VCC.

2744

As above, my understanding is partial overlap is also a hazard.

critson updated this revision to Diff 461777.Sep 20 2022, 7:13 PM
critson marked 4 inline comments as done.
  • Address reviewer comments
rampitec accepted this revision.Sep 21 2022, 12:49 PM
rampitec added inline comments.
llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
2722

Thanks for adding the comment. In fact forward search would not work, we do not have such mechanism. There are cases of 3 instruction hazards where we always start from the last instruction, but that would be an overkill here.

2735

I mean instead of listing instruction opcodes it can use a general check is there is an implicit $vcc operand. But OK, if you think this list is comprehensive.

This revision is now accepted and ready to land.Sep 21 2022, 12:49 PM
foad added inline comments.Sep 22 2022, 2:35 AM
llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
2735

The _e32 versions have an implicit $vcc, but the _e64 versions do not.

There is still a small problem that the list of instructions here is missing exotic forms like V_CNDMASK_B32_dpp.

critson updated this revision to Diff 463763.EditedSep 28 2022, 11:52 PM
  • Add DPP forms and associated tests

I was hanging on to this while I waited for some performance data.
After reviewing this closely, I think I need to switch the workaround mechanism.

foad accepted this revision.Sep 29 2022, 1:25 AM

Sure, although...

There is still a small problem that the list of instructions here is missing exotic forms like V_CNDMASK_B32_dpp.

... this still bothers me.

critson marked 2 inline comments as done.Sep 29 2022, 1:31 AM

Sure, although...

There is still a small problem that the list of instructions here is missing exotic forms like V_CNDMASK_B32_dpp.

... this still bothers me.

Beyond DPP which other exotic forms are you looking for?

critson updated this revision to Diff 463796.Sep 29 2022, 1:37 AM
  • Change mitigation strategy to s_waitcnt_depctr after SALU
foad added a comment.Sep 29 2022, 1:58 AM

Sure, although...

There is still a small problem that the list of instructions here is missing exotic forms like V_CNDMASK_B32_dpp.

... this still bothers me.

Beyond DPP which other exotic forms are you looking for?

The ones I can find at the moment are:

V_ADDC_U32_sdwa
V_CNDMASK_B16_sdwa
V_CNDMASK_B32_sdwa
V_SUBBREV_U32_sdwa
V_SUBB_U32_sdwa

Beyond DPP which other exotic forms are you looking for?

The ones I can find at the moment are:

V_ADDC_U32_sdwa
V_CNDMASK_B16_sdwa
V_CNDMASK_B32_sdwa
V_SUBBREV_U32_sdwa
V_SUBB_U32_sdwa

GFX11 doesn't have SDWA which is why I did not add them.

foad added a comment.Sep 30 2022, 2:25 AM

Beyond DPP which other exotic forms are you looking for?

The ones I can find at the moment are:

V_ADDC_U32_sdwa
V_CNDMASK_B16_sdwa
V_CNDMASK_B32_sdwa
V_SUBBREV_U32_sdwa
V_SUBB_U32_sdwa

GFX11 doesn't have SDWA which is why I did not add them.

OK, good point!

This revision was landed with ongoing or failed builds.Oct 1 2022, 12:49 AM
This revision was automatically updated to reflect the committed changes.