This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] GCNDPPCombine: don't shrink V_ADD_CO_U32 if carry out is used
ClosedPublic

Authored by foad on Apr 19 2021, 6:55 AM.

Details

Summary

Don't shrink VOP3 instructions if there are any uses of a carry-out
operand, because the shrunken form of the instruction would write the
carry-out to vcc instead of to a virtual register.

Diff Detail

Event Timeline

foad created this revision.Apr 19 2021, 6:55 AM
foad requested review of this revision.Apr 19 2021, 6:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 19 2021, 6:55 AM
dstuttard accepted this revision.EditedApr 19 2021, 8:14 AM

LGTM

You might want to wait for Valery or Stas to comment too.

Only minor suggestion - maybe put in a TODO (or equiv) that we could handle this scenario as well (even with the shrinking. (I think).

This revision is now accepted and ready to land.Apr 19 2021, 8:14 AM
foad added a comment.Apr 19 2021, 8:17 AM

Only minor suggestion - maybe put in a TODO (or equiv) that we could handle this scenario as well (even with the shrinking. (I think).

I'm reluctant to do that because I don't know how to handle it.

Only minor suggestion - maybe put in a TODO (or equiv) that we could handle this scenario as well (even with the shrinking. (I think).

I'm reluctant to do that because I don't know how to handle it.

Ok - that's reasonable (I don't either)

Is that a functional or performance problem if vcc is used?

foad added a comment.Apr 19 2021, 12:44 PM

Is that a functional or performance problem if vcc is used?

It's a correctness problem because when we shrink an instruction like %4:vgpr_32, %5:sreg_64_xexec = V_ADD_CO_U32_e64 %3, %1, 0, implicit $exec the def of %5 turns into an implicit def of vcc, but we don't touch the uses of %5, so they become uses with no def which fails MIR verification.