This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] PHI node cost should not be counted for the size and latency.
AbandonedPublic

Authored by alex-t on Jun 30 2021, 5:51 AM.

Details

Reviewers
None
Summary
  Details: https://reviews.llvm.org/D96805 changed the GCNTTIImpl::getCFInstrCost to return 1 for the PHI nodes
  for the TTI::TCK_CodeSize and TTI::TCK_SizeAndLatency. This is incorrect because the value moves that are the
  result of the PHI lowering are inserted into the basic block predecessors - not into the block itself.
  As a result of this change LoopRotate and LoopUnroll were broken because of the incorrect Loop header and loop
  body size/cost estimation.

  Fixes SWDEV-289429 10-11% Performance drop observed with ROC_OCL_Perf_Linpack_DGEMM_W32

Differential Revision: https://reviews.llvm.org/D105104

Diff Detail

Event Timeline

alex-t created this revision.Jun 30 2021, 5:51 AM
alex-t requested review of this revision.Jun 30 2021, 5:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2021, 5:51 AM
xgupta added a subscriber: xgupta.Jun 30 2021, 6:09 AM

Seems it is a duplicate of D105104?
Instead of committing that patch, you create a new revision :)

alex-t abandoned this revision.Jun 30 2021, 6:10 AM