Page MenuHomePhabricator

[AMDGPU] Make bfi patterns divergence-aware
ClosedPublic

Authored by foad on Sep 24 2020, 10:03 AM.

Details

Summary

This tends to increase code size but more importantly it reduces vgpr
usage, and could avoid costly readfirstlanes if the result needs to be
in an sgpr.

Diff Detail

Event Timeline

foad created this revision.Sep 24 2020, 10:03 AM
foad requested review of this revision.Sep 24 2020, 10:03 AM
arsenm accepted this revision.Sep 24 2020, 10:07 AM

LGTM (although I think readfirstlane is the same cost as a regular copy)

This revision is now accepted and ready to land.Sep 24 2020, 10:07 AM
foad added a comment.Fri, Sep 25, 8:41 AM

LGTM (although I think readfirstlane is the same cost as a regular copy)

readfirstlane is bad news because there's a pretty big stall between a VALU instruction that writes an SGPR, and any following SALU instruction.

This revision was landed with ongoing or failed builds.Mon, Sep 28, 2:17 AM
This revision was automatically updated to reflect the committed changes.