Page MenuHomePhabricator

[CostModel] Unify Intrinsic Costs.
ClosedPublic

Authored by samparker on May 15 2020, 7:51 AM.

Details

Summary

With the two getIntrinsicInstrCosts folded into one, now fold in the scalar/code-size orientated getIntrinsicCost. This involved sinking cost of the TTIImpl into the base implementation, as it performs no target checks. The opcodes remaining were memcpy, cttz and ctlz which now have special handling in the BasicTTI implementation. getInstructionThroughput can now directly return the result of getUserCost.
This has required a change in the AMDGPU backend for fabs and it as the tests suggest that they should always be 'free'. I've also changed the X86 backend to return '1' for any intrinsic when the CostKind isn't RecipThroughput.

Diff Detail

Event Timeline

samparker created this revision.May 15 2020, 7:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 15 2020, 7:51 AM

Having fabs free on AMDGPU LGTM.

arsenm added inline comments.May 15 2020, 9:51 AM
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
563

Previously this would have been reported from TLI.isFAbsFree, but I don't see that check getting dropped here?

samparker marked an inline comment as done.May 18 2020, 12:27 AM
samparker added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
563

I don't recall seeing a check like that... but it makes sense. Having the base implementation call it should work.

samparker marked an inline comment as done.May 18 2020, 4:56 AM
samparker added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
563

Okay, so now I see it and it would seem that the logic has changed because of the split and merge with this and D79941. AMDGPUISelLowering doesn't report that fabs vectors are free, so which is true?

dfukalov accepted this revision.May 18 2020, 11:45 AM
dfukalov added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
563

This estimation is good in average. I'm going to add tests and improve this place after your commit. LGTM

This revision is now accepted and ready to land.May 18 2020, 11:45 AM
arsenm added inline comments.May 18 2020, 1:25 PM
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
563

Fabs is always free. Eventually vectors break down into scalars that have free fabs uses

This revision was automatically updated to reflect the committed changes.
nikic added a subscriber: nikic.May 21 2020, 2:53 AM

This change has caused some large text size changes: http://llvm-compile-time-tracker.com/compare.php?from=7606a54363d3d90802977c9f5fb9046d4d780835&to=de71def3f59dc9f12f67141b5040d8e15c84d08a&stat=size-text There's a 5% increase on tramp3d-v4 and some large decreases (up to 8%) on debuginfo builds.

As the commit message tagged this as no function change intended and there are no test changes, I'm assuming this impact wasn't intended?

Er, wow, some of those are huge. Are you able to characterise those benchmarks and what optimisations they are affected by? I generally come across inlining, vectorization and unrolling changes and those all sound plausible candidates! Is a reproducer possible?

Any runtime improvements of these benchmarks?

I'm suspecting that the debug changes maybe caused by the base implementation treating the debug intrinsics as free. I'll revert this and break it down into three separate patches:

  1. Sink all the trivially free intrinsics into the bottom-most implementation.
  2. Combine getIntrinsicCost and getIntrinsicInstrCost.
  3. Have getInstructionThroughput use getUserCost.

First part has gone in as: rGb263fee4d2c9: [CostModel] Sink intrinsic costs to base TTI.