This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Prevent aliasing of SrcC and Dst in MAI
ClosedPublic

Authored by rampitec on Jan 20 2022, 3:54 PM.

Details

Summary

Form the MAI spec: It’s ok that Src_C and vDst are the exact same VGPRs
or Src_C and vDst are completely separated. The case that Src_C and vDst
are overlapping should be avoid as new value could be written to accumulator
input before it gets read.

Note that this inevitably increases register pressure to the point where
some programs will become uncompilable.

Fixes: SWDEV-318900

Diff Detail

Event Timeline

rampitec created this revision.Jan 20 2022, 3:54 PM
rampitec requested review of this revision.Jan 20 2022, 3:54 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 20 2022, 3:54 PM
Herald added a subscriber: wdng. · View Herald Transcript

Theoretically last mfma in chain may use tied operands because SRC shall be killed. I am not sure though how to tell it is last. I could use kill flag, but it is going away and unreliable. In general it shall be better to check for the kill alone, but only if it is reliable.

Maybe move it even lower in RA and use LIS?

foad added a comment.Jan 21 2022, 2:28 AM

I don't understand the need for the new GCNPreRaFixups pass. Isn't this exactly what TwoAddressInstruction does, when it calls the target's convertToThreeAddress?

arsenm added inline comments.Jan 21 2022, 8:20 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
8295–8299

This is all static information, so can't you just set earlyclobber on the operand definition?

I don't understand the need for the new GCNPreRaFixups pass. Isn't this exactly what TwoAddressInstruction does, when it calls the target's convertToThreeAddress?

A call to convertToThreeAddress is an optimization and not guaranteed as far as I understand.

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
8295–8299

Making it always earlyclobber will result in massive regressions. This patch either ties operands or uses earlyclobber depending on uses, checking if source is still needed or not.

foad added a comment.Jan 21 2022, 9:39 AM

I don't understand the need for the new GCNPreRaFixups pass. Isn't this exactly what TwoAddressInstruction does, when it calls the target's convertToThreeAddress?

A call to convertToThreeAddress is an optimization and not guaranteed as far as I understand.

I would suggest instruction selecting to the version with tied operands, and then letting TwoAddressInstruction decide whether to call your convertToThreeAddress, which would convert it to the earlyclobber form.

I don't understand the need for the new GCNPreRaFixups pass. Isn't this exactly what TwoAddressInstruction does, when it calls the target's convertToThreeAddress?

A call to convertToThreeAddress is an optimization and not guaranteed as far as I understand.

I would suggest instruction selecting to the version with tied operands, and then letting TwoAddressInstruction decide whether to call your convertToThreeAddress, which would convert it to the earlyclobber form.

So that would mean 4 pseudos for each instruction? 2 different versions for accvgpr vs vccvgpr by 2 variants with tied operands vs earlyclobber?
I cannot really do this trick with tied operand at/after selection, it got malformed after operand folding.

rampitec planned changes to this revision.Jan 21 2022, 1:44 PM

I would suggest instruction selecting to the version with tied operands, and then letting TwoAddressInstruction decide whether to call your convertToThreeAddress, which would convert it to the earlyclobber form.

So that would mean 4 pseudos for each instruction? 2 different versions for accvgpr vs vccvgpr by 2 variants with tied operands vs earlyclobber?
I cannot really do this trick with tied operand at/after selection, it got malformed after operand folding.

OK, that seems to be working in principle.

rampitec updated this revision to Diff 402125.Jan 21 2022, 3:44 PM

Changed to duplicate instruction definitions and use two-address pass.

A followup patch to validate operand constraints in asm also needed.

rampitec marked an inline comment as done.Jan 21 2022, 4:55 PM
foad added a comment.Jan 24 2022, 1:03 AM

Thanks. No further comments from me!

arsenm accepted this revision.Jan 26 2022, 2:40 PM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/VOP3PInstructions.td
391

Probably should add a comment explaining how the real restriction is less strict than earlyclobber/tied imply

This revision is now accepted and ready to land.Jan 26 2022, 2:40 PM
rampitec updated this revision to Diff 403414.Jan 26 2022, 2:49 PM
rampitec marked an inline comment as done.

Expanded NoDstOverlap comment.

This revision was landed with ongoing or failed builds.Jan 26 2022, 2:51 PM
This revision was automatically updated to reflect the committed changes.