Page MenuHomePhabricator

[CostModel] Replace getUserCost with getInstructionCost.
ClosedPublic

Authored by RKSimon on May 6 2020, 5:27 AM.

Details

Summary
  • Replace getUserCost with getInstructionCost, covering all cost kinds.
  • Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks.

Original Patch by @samparker (Sam Parker)

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Merge two of the intrinsic cost APIs: D79941

Unify intrinsic costs: D80012

samparker updated this revision to Diff 272956.Jun 24 2020, 3:37 AM
samparker retitled this revision from [RFC][CostModel] Remove getInstructionCost. to [CostModel] Remove getInstructionCost..
samparker edited the summary of this revision. (Show Details)
samparker added a reviewer: reames.

Updated with final patch now that all the other pieces are in.

hfinkel added a subscriber: hfinkel.Jul 1 2020, 7:41 AM

Is getUserCost a better name than getInstructionCost? I wonder if we're moving in the wrong direction.

FWIW, when someone gets around to wanting to fix the vectorizer's interleaving heuristic, we'll need instruction latencies.

Is getUserCost a better name than getInstructionCost? I wonder if we're moving in the wrong direction.

I'd prefer to use getInstructionCost too.

samparker updated this revision to Diff 275053.Jul 2 2020, 4:05 AM
samparker retitled this revision from [CostModel] Remove getInstructionCost. to [CostModel] Replace getUserCost with getInstructionCost..
samparker edited the summary of this revision. (Show Details)

Renamed getUserCost to getInstructionCost.

@samparker Are you intending to look at this again?

@RKSimon I'm not really doing much work on LLVM currently, so no.... But I'd be happy to help out with reviews if someone else wanted to pick it up.

reames resigned from this revision.Nov 30 2021, 9:55 AM
RKSimon commandeered this revision.Nov 30 2021, 10:09 AM
RKSimon edited reviewers, added: samparker; removed: RKSimon.

Commandeering from @samparker - this has been on my backlog for ages, but you never know, I might get around to it soon...

RKSimon planned changes to this revision.Dec 10 2021, 2:19 AM
RKSimon updated this revision to Diff 451541.Aug 10 2022, 10:17 AM
RKSimon edited the summary of this revision. (Show Details)

rebase - (WIP) still plenty of cost cleanup todo - it looks like some targets aren't correctly filtering costs by cost kind

RKSimon updated this revision to Diff 451787.Aug 11 2022, 2:50 AM

looks like there's a codegen regression as well that needs addressing

Please shout when you're ready for review.

RKSimon updated this revision to Diff 452438.Aug 13 2022, 10:35 AM

Cleaned up default fp arithmetic / select costs - still investigating the remaining regressions......

RKSimon retitled this revision from [CostModel] Replace getUserCost with getInstructionCost. to [CostModel] Replace getUserCost with getInstructionCost. WIP.Aug 13 2022, 10:36 AM
RKSimon added a subscriber: apostolakis.
RKSimon added inline comments.
llvm/test/CodeGen/X86/select-optimize.ll
395 ↗(On Diff #452438)

@apostolakis Please would take a look at this - we're about to start work on improving the cost numbers for latency/size and just this initial cleanup is causing this test to fail - which suggests you're relying on some very inaccurate costs.

RKSimon edited the summary of this revision. (Show Details)Aug 17 2022, 4:12 AM
RKSimon edited the summary of this revision. (Show Details)Aug 17 2022, 5:48 AM
apostolakis added inline comments.Aug 17 2022, 7:17 AM
llvm/test/CodeGen/X86/select-optimize.ll
395 ↗(On Diff #452438)

I will remove this test. This particular test was a bit flaky to begin with given that it depended on the instruction latency of the instructions. At least it served the purpose of notifying me of the changes in the underlying cost modeling.

Just to clarify the use in the select-optimize pass. The goal was to approximate TargetSchedModel::computeInstrLatency and compute the length of dependence chains. getInstructionCost seemed the best approximation based on the documentation. getUserCost had an unclear purpose to me, did not account for latency queries (I guess this will change now) and takes the operands as input which is worrisome since I just wanted the cost of the instructions on their own regardless of their operands (in practice the operands are mostly ignored so that might not be an issue). The biggest change in cost modeling at least from switching from getInstructionCost to getUserCost seem to be the function calls which might be for the better.

RKSimon added inline comments.Aug 17 2022, 7:30 AM
llvm/test/CodeGen/X86/select-optimize.ll
395 ↗(On Diff #452438)

Thanks - if you're wanting to perform this with IR then getUserCost (-> getInstructionCost) will be your best bet - but it does rely on us getting all the costs to be more accurate, which will take time.

I'm intending to fix the bitrot (we changed the way cost-model values are reported) on the script from D103695 which allows us to compare TTI costs vs various scheduler models (via llvm-mca), so eventually most instructions/intrinsics should have values similar to TargetSchedModel::computeInstrLatency. The script just helped check for throughput mismatches, but I did have the other cost kinds in mind as well.

I'd recommend that you do take operands into account where possible, as at least on X86 there are some attempts to use them to match likely codegen (e.g. sign/zero-extended integers that can use smaller arithmetic ops).

apostolakis added inline comments.Aug 17 2022, 7:34 AM
llvm/test/CodeGen/X86/select-optimize.ll
395 ↗(On Diff #452438)

Sounds great! Thanks for improving this cost modeling. It is much needed.

For the operands, yes it seems that the purpose is to better predict how they will be lowered which indeed serves to improve cost modeling accuracy.

apostolakis added inline comments.Aug 17 2022, 7:44 AM
llvm/test/CodeGen/X86/select-optimize.ll
395 ↗(On Diff #452438)

Test removed in D132029.

RKSimon updated this revision to Diff 453330.Aug 17 2022, 10:20 AM
RKSimon retitled this revision from [CostModel] Replace getUserCost with getInstructionCost. WIP to [CostModel] Replace getUserCost with getInstructionCost..

rebase - we can now make Instruction::Select latency cost override to be x86-only.

I think this is ready for general review now.

This is mostly expected to have minimal impact on the decisions the compiler makes, right? Most of the changes are in latency, which are not used very much in heuristics.

I ran some benchmarks and (after fixing what I got wrong the first time), it looks OK.

llvm/lib/Transforms/Scalar/LICM.cpp
1324–1325

getUserCost -> getInstructionCost

This is mostly expected to have minimal impact on the decisions the compiler makes, right? Most of the changes are in latency, which are not used very much in heuristics.

That's correct - the select-optimize pass is the only one that really tries to use it so far, and that's still disabled by default until the costs have been improved.

I ran some benchmarks and (after fixing what I got wrong the first time), it looks OK.

👍

RKSimon edited the summary of this revision. (Show Details)Aug 18 2022, 2:20 AM
samparker added inline comments.Aug 18 2022, 2:46 AM
llvm/include/llvm/Analysis/TargetTransformInfo.h
303–304

nit: three-argument

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
505

Would it be better, for now, to keep this in the X86 backend, if you need it there?

RKSimon added inline comments.Aug 18 2022, 2:58 AM
llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
505

getInstructionLatency did have this as the default for all fp results

As well as x86 it causes one aarch64 sve fadd cost test to change from latency cost = 3 to 1 - not sure if thats enough of a reason to keep it generic or not?

RKSimon updated this revision to Diff 453604.Aug 18 2022, 3:09 AM

Make the default fp latency = 3 to be x86 only

samparker accepted this revision.Aug 18 2022, 3:10 AM

LGTM, and thanks for doing this.

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
505

getInstructionLatency did have this as the default for all fp results

Okay, fine then.

This revision is now accepted and ready to land.Aug 18 2022, 3:10 AM
RKSimon added inline comments.Aug 18 2022, 3:19 AM
llvm/test/Analysis/CostModel/AArch64/sve-math.ll
15

@samparker Just to be clear - you're happy for me to make the latency=3 default x86-only?

dmgreen added inline comments.Aug 18 2022, 3:31 AM
llvm/test/Analysis/CostModel/AArch64/sve-math.ll
15

I think 3 sounds pretty sensible as a first approximation - it might not be precise but I've commonly seem fp operations in that ballpark. If was used for all targets before then I would keep as all targets now.

RKSimon updated this revision to Diff 453613.Aug 18 2022, 3:41 AM

Default fp arithmmetic latency = 3

This revision was landed with ongoing or failed builds.Aug 18 2022, 3:55 AM
This revision was automatically updated to reflect the committed changes.
Matt added a subscriber: Matt.Aug 24 2022, 11:32 AM