This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] PHI node cost should not be counted for the size and latency.
ClosedPublic

Authored by alex-t on Jun 29 2021, 5:27 AM.

Details

Summary
Details: https://reviews.llvm.org/D96805 changed the GCNTTIImpl::getCFInstrCost to return 1 for the PHI nodes
for the TTI::TCK_CodeSize and TTI::TCK_SizeAndLatency. This is incorrect because the value moves that are the
result of the PHI lowering are inserted into the basic block predecessors - not into the block itself.
As a result of this change LoopRotate and LoopUnroll were broken because of the incorrect Loop header and loop
body size/cost estimation.

Diff Detail

Event Timeline

alex-t created this revision.Jun 29 2021, 5:27 AM
alex-t requested review of this revision.Jun 29 2021, 5:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2021, 5:27 AM
Herald added a subscriber: wdng. · View Herald Transcript
dfukalov added inline comments.Jun 29 2021, 7:26 AM
llvm/test/Analysis/CostModel/AMDGPU/control-flow.ll
7

Please update the sizes reported after your change instead of just removing the test lines here and below.

alex-t updated this revision to Diff 355345.Jun 29 2021, 1:17 PM

Test corrected

rampitec accepted this revision.Jun 29 2021, 1:41 PM
This revision is now accepted and ready to land.Jun 29 2021, 1:41 PM
dfukalov added inline comments.Jun 29 2021, 3:51 PM
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
841

Nit: leave this or similar related todo comments somewhere, it wasn't done.

alex-t added inline comments.Jun 30 2021, 5:11 AM
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
841

Such a prediction is unlikely possible. The number of copies that survived after the register coalescing too much dependent on the passes that run in between.

This revision was landed with ongoing or failed builds.Jun 30 2021, 6:11 AM
This revision was automatically updated to reflect the committed changes.