This is an archive of the discontinued LLVM Phabricator instance.

Cost for Gather and Scatter operations.
ClosedPublic

Authored by delena on Dec 20 2015, 2:53 AM.

Details

Summary

Implemented cost calculation for Gather/Scatter operations on X86.
Added a common interface.

Gather / scatter are expensive instructions on the current AVX-512 targets.
I calculate 2 scenarios - using GS machine instructions and full scalarization. The best cost is returned.
In order to give the right cost for intrinsic I should use argument values, argument types are not enough.
I added interface for intrinsic cost calculation based on values.
Added tests for intrinsics.

Diff Detail

Repository
rL LLVM

Event Timeline

delena updated this revision to Diff 43320.Dec 20 2015, 2:53 AM
delena retitled this revision from to Cost for Gather and Scatter operations..
delena updated this object.
delena added reviewers: Ayal, gilr, hfinkel, nadav.
delena set the repository for this revision to rL LLVM.
delena added a subscriber: llvm-commits.
Ayal added inline comments.Dec 22 2015, 12:18 PM
../include/llvm/Analysis/TargetTransformInfo.h
464

poin[t]er

835

keep order of arguments "ID, RetTy, Args" similar to "ID, RetTy, Tys"?

../include/llvm/CodeGen/BasicTTIImpl.h
605

Maybe better call it ConstMasked or VarMasked instead of MaskedOp (throughout)

../lib/Target/X86/X86TargetTransformInfo.cpp
1306

Document the patterns that fold into 32bit indexed GS.

1308

May be better to first check if IndexSize < 64 or dyn_cast is null and return 32, to reduce subsequent indentation.

1312

check isVectorTy() instead?

1328

check this when bumping NumOfVarIndices. E.g., if (++NumOfVarIndices > 1)

1341–1342

Divide VF by SplitFactor instead of IdxsLT.first?

Else assert IdxsLT.first > 1? Otherwise recursion may spin.

1345–1347

Some explanation for the formula and overheads?

1349

space after *

1351

DataLT = SrcLT?

assert MaskLT.first == 1?

1404

Typos. Suggest: // Add the cost of inserting each scalar load into the vector.

1409

Suggest: // Add the cost of extracting each element out to a scalar store.

1424

Is 'cast' needed? See pattern used below in getGatherScatterOpCost, or use dyn_cast instead?

delena marked 11 inline comments as done.Dec 26 2015, 11:35 PM
delena added inline comments.
../lib/Target/X86/X86TargetTransformInfo.cpp
1424

I removed this function

delena updated this revision to Diff 43650.Dec 27 2015, 12:41 AM

Fixes according to Ayal's comments.

Ayal added inline comments.Dec 27 2015, 9:59 AM
../include/llvm/Analysis/TargetTransformInfo.h
465

... with a mask that is not a compile-time constant

../lib/Target/X86/X86TargetTransformInfo.cpp
572

this is independent of the rest of the patch, right?

1309

// operation will use 16 x 64 indices which do not fit in a zmm and needs to split. Also check that the base pointer is the same for all lanes, and that there's at most one variable index.

1313

this should be "if (IndexSize < 64 || !GEP)" to reverse your original condition.

1337–1338

Alternatively, pass VF to getIndexSizeInBits and fold this logic of checking for VF < 16 in it?

1352–1353

in[s]truction at a time

1357

no need for empty line

1364

VariableMask

1424

Mark as TODO?

../lib/Target/X86/X86TargetTransformInfo.h
80

VariableMask

delena marked 6 inline comments as done.Dec 28 2015, 12:15 AM
delena added inline comments.
../lib/Target/X86/X86TargetTransformInfo.cpp
572

removed, it does not help with the current cost model algirithm

1352–1353

I don't know ho to add spell checker plug-in to kdevelop

1424

no

delena updated this revision to Diff 43672.Dec 28 2015, 12:24 AM
delena marked an inline comment as done.

Thanks Ayal,

The code is fixed according to your comments.

Ayal accepted this revision.Dec 28 2015, 12:58 AM
Ayal edited edge metadata.

This looks good to me. Thanks.

This revision is now accepted and ready to land.Dec 28 2015, 12:58 AM
This revision was automatically updated to reflect the committed changes.