This is an archive of the discontinued LLVM Phabricator instance.

Cost for Gather and Scatter operations.
ClosedPublic

Authored by delena on Dec 20 2015, 2:53 AM.

Details

Summary

Implemented cost calculation for Gather/Scatter operations on X86.
Added a common interface.

Gather / scatter are expensive instructions on the current AVX-512 targets.
I calculate 2 scenarios - using GS machine instructions and full scalarization. The best cost is returned.
In order to give the right cost for intrinsic I should use argument values, argument types are not enough.
I added interface for intrinsic cost calculation based on values.
Added tests for intrinsics.

Diff Detail

Repository
rL LLVM

Event Timeline

delena updated this revision to Diff 43320.Dec 20 2015, 2:53 AM
delena retitled this revision from to Cost for Gather and Scatter operations..
delena updated this object.
delena added reviewers: Ayal, gilr, hfinkel, nadav.
delena set the repository for this revision to rL LLVM.
delena added a subscriber: llvm-commits.
Ayal added inline comments.Dec 22 2015, 12:18 PM
../include/llvm/Analysis/TargetTransformInfo.h
464 ↗(On Diff #43320)

poin[t]er

833 ↗(On Diff #43320)

keep order of arguments "ID, RetTy, Args" similar to "ID, RetTy, Tys"?

../include/llvm/CodeGen/BasicTTIImpl.h
605 ↗(On Diff #43320)

Maybe better call it ConstMasked or VarMasked instead of MaskedOp (throughout)

../lib/Target/X86/X86TargetTransformInfo.cpp
1294 ↗(On Diff #43320)

Document the patterns that fold into 32bit indexed GS.

1296 ↗(On Diff #43320)

May be better to first check if IndexSize < 64 or dyn_cast is null and return 32, to reduce subsequent indentation.

1300 ↗(On Diff #43320)

check isVectorTy() instead?

1316 ↗(On Diff #43320)

check this when bumping NumOfVarIndices. E.g., if (++NumOfVarIndices > 1)

1329–1330 ↗(On Diff #43320)

Divide VF by SplitFactor instead of IdxsLT.first?

Else assert IdxsLT.first > 1? Otherwise recursion may spin.

1333–1335 ↗(On Diff #43320)

Some explanation for the formula and overheads?

1337 ↗(On Diff #43320)

space after *

1339 ↗(On Diff #43320)

DataLT = SrcLT?

assert MaskLT.first == 1?

1392 ↗(On Diff #43320)

Typos. Suggest: // Add the cost of inserting each scalar load into the vector.

1397 ↗(On Diff #43320)

Suggest: // Add the cost of extracting each element out to a scalar store.

1412 ↗(On Diff #43320)

Is 'cast' needed? See pattern used below in getGatherScatterOpCost, or use dyn_cast instead?

delena marked 11 inline comments as done.Dec 26 2015, 11:35 PM
delena added inline comments.
../lib/Target/X86/X86TargetTransformInfo.cpp
1412 ↗(On Diff #43320)

I removed this function

delena updated this revision to Diff 43650.Dec 27 2015, 12:41 AM

Fixes according to Ayal's comments.

Ayal added inline comments.Dec 27 2015, 9:59 AM
../include/llvm/Analysis/TargetTransformInfo.h
465 ↗(On Diff #43650)

... with a mask that is not a compile-time constant

../lib/Target/X86/X86TargetTransformInfo.cpp
572 ↗(On Diff #43650)

this is independent of the rest of the patch, right?

1310 ↗(On Diff #43650)

// operation will use 16 x 64 indices which do not fit in a zmm and needs to split. Also check that the base pointer is the same for all lanes, and that there's at most one variable index.

1314 ↗(On Diff #43650)

this should be "if (IndexSize < 64 || !GEP)" to reverse your original condition.

1338–1339 ↗(On Diff #43650)

Alternatively, pass VF to getIndexSizeInBits and fold this logic of checking for VF < 16 in it?

1353–1354 ↗(On Diff #43650)

in[s]truction at a time

1358 ↗(On Diff #43650)

no need for empty line

1365 ↗(On Diff #43650)

VariableMask

1425 ↗(On Diff #43650)

Mark as TODO?

../lib/Target/X86/X86TargetTransformInfo.h
80 ↗(On Diff #43650)

VariableMask

delena marked 6 inline comments as done.Dec 28 2015, 12:15 AM
delena added inline comments.
../lib/Target/X86/X86TargetTransformInfo.cpp
572 ↗(On Diff #43650)

removed, it does not help with the current cost model algirithm

1353–1354 ↗(On Diff #43650)

I don't know ho to add spell checker plug-in to kdevelop

1425 ↗(On Diff #43650)

no

delena updated this revision to Diff 43672.Dec 28 2015, 12:24 AM
delena marked an inline comment as done.

Thanks Ayal,

The code is fixed according to your comments.

Ayal accepted this revision.Dec 28 2015, 12:58 AM
Ayal edited edge metadata.

This looks good to me. Thanks.

This revision is now accepted and ready to land.Dec 28 2015, 12:58 AM
This revision was automatically updated to reflect the committed changes.