Page MenuHomePhabricator

[AMDGPU] Improve code size cost model
ClosedPublic

Authored by dfukalov on Oct 11 2019, 11:48 AM.

Details

Summary

Added estimation for zero size insertelement, extractelement
and llvm.fabs operators.
Updated inline/unroll parameters default values.

Diff Detail

Event Timeline

dfukalov created this revision.Oct 11 2019, 11:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 11 2019, 11:48 AM
arsenm added inline comments.Oct 11 2019, 12:04 PM
llvm/lib/Target/AMDGPU/AMDGPUInline.cpp
54

This is a separate change

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
698

We already report vector insert/extract as free. Why does this need to look at these specifically? What is the purpose of Operands which seems to be ignored?

What uses this version? I thought the set of cost model function with specific value contexts were only used by the vectorizers

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
207

This is a separate change

dfukalov marked 3 inline comments as done.Oct 14 2019, 7:46 AM
dfukalov added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUInline.cpp
54

The parameter' default value should be updated to correspond changed cost model, to avoid performance regressions.

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
698

CostModel has three estimation modes: RecipThroughput, Latency and CodeSize. Vectorizer uses the first one but inliner and unroller use code size estimations.
Insert/extract and other estimations were implemented for RecipThroughput path only so e.g. inliner got wrong code size costs estimations for such instructions. The change introduces the same estimations for some trivial cases by overloading getUserCost().

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
207

The parameter' default value should be updated to correspond changed cost model, to avoid performance regressions.

arsenm added inline comments.Oct 15 2019, 1:02 PM
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
698

Nothing here looks target specific though? It's just forwarding the calls. Why doesn't the base implementation do this?

dfukalov marked an inline comment as done.Oct 16 2019, 5:47 AM
dfukalov added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
698

The parameters preparation and calls forwarding scheme is from base implementation of getInstructionThroughput() but we cannot say that for all targets zero cost in terms of throughtput means zero code size.
Moreover, getVectorInstrCost() is target specific here and I'm going to add estimation with overloaded getShuffleCost() too.

arsenm accepted this revision.Oct 16 2019, 12:51 PM

LGTM but I still find TTI's set of cost functions incomprehensible

This revision is now accepted and ready to land.Oct 16 2019, 12:51 PM
This revision was automatically updated to reflect the committed changes.