This is an archive of the discontinued LLVM Phabricator instance.

Cost for Gather and Scatter operations.
ClosedPublic

Authored by delena on Dec 20 2015, 2:53 AM.

Details

Summary

Implemented cost calculation for Gather/Scatter operations on X86.
Added a common interface.

Gather / scatter are expensive instructions on the current AVX-512 targets.
I calculate 2 scenarios - using GS machine instructions and full scalarization. The best cost is returned.
In order to give the right cost for intrinsic I should use argument values, argument types are not enough.
I added interface for intrinsic cost calculation based on values.
Added tests for intrinsics.

Diff Detail

Repository
rL LLVM

Event Timeline

delena updated this revision to Diff 43320.Dec 20 2015, 2:53 AM
delena retitled this revision from to Cost for Gather and Scatter operations..
delena updated this object.
delena added reviewers: Ayal, gilr, hfinkel, nadav.
delena set the repository for this revision to rL LLVM.
delena added a subscriber: llvm-commits.
Ayal added inline comments.Dec 22 2015, 12:18 PM
../include/llvm/Analysis/TargetTransformInfo.h
464

poin[t]er

833

keep order of arguments "ID, RetTy, Args" similar to "ID, RetTy, Tys"?

../include/llvm/CodeGen/BasicTTIImpl.h
605

Maybe better call it ConstMasked or VarMasked instead of MaskedOp (throughout)

../lib/Target/X86/X86TargetTransformInfo.cpp
1294

Document the patterns that fold into 32bit indexed GS.

1296

May be better to first check if IndexSize < 64 or dyn_cast is null and return 32, to reduce subsequent indentation.

1300

check isVectorTy() instead?

1316

check this when bumping NumOfVarIndices. E.g., if (++NumOfVarIndices > 1)

1329–1330

Divide VF by SplitFactor instead of IdxsLT.first?

Else assert IdxsLT.first > 1? Otherwise recursion may spin.

1333–1335

Some explanation for the formula and overheads?

1337

space after *

1339

DataLT = SrcLT?

assert MaskLT.first == 1?

1392

Typos. Suggest: // Add the cost of inserting each scalar load into the vector.

1397

Suggest: // Add the cost of extracting each element out to a scalar store.

1412

Is 'cast' needed? See pattern used below in getGatherScatterOpCost, or use dyn_cast instead?

delena marked 11 inline comments as done.Dec 26 2015, 11:35 PM
delena added inline comments.
../lib/Target/X86/X86TargetTransformInfo.cpp
1412

I removed this function

delena updated this revision to Diff 43650.Dec 27 2015, 12:41 AM

Fixes according to Ayal's comments.

Ayal added inline comments.Dec 27 2015, 9:59 AM
../include/llvm/Analysis/TargetTransformInfo.h
465

... with a mask that is not a compile-time constant

../lib/Target/X86/X86TargetTransformInfo.cpp
572

this is independent of the rest of the patch, right?

1297

// operation will use 16 x 64 indices which do not fit in a zmm and needs to split. Also check that the base pointer is the same for all lanes, and that there's at most one variable index.

1301

this should be "if (IndexSize < 64 || !GEP)" to reverse your original condition.

1325–1326

Alternatively, pass VF to getIndexSizeInBits and fold this logic of checking for VF < 16 in it?

1340–1341

in[s]truction at a time

1345

no need for empty line

1352

VariableMask

1412

Mark as TODO?

../lib/Target/X86/X86TargetTransformInfo.h
80

VariableMask

delena marked 6 inline comments as done.Dec 28 2015, 12:15 AM
delena added inline comments.
../lib/Target/X86/X86TargetTransformInfo.cpp
572

removed, it does not help with the current cost model algirithm

1340–1341

I don't know ho to add spell checker plug-in to kdevelop

1412

no

delena updated this revision to Diff 43672.Dec 28 2015, 12:24 AM
delena marked an inline comment as done.

Thanks Ayal,

The code is fixed according to your comments.

Ayal accepted this revision.Dec 28 2015, 12:58 AM
Ayal edited edge metadata.

This looks good to me. Thanks.

This revision is now accepted and ready to land.Dec 28 2015, 12:58 AM
This revision was automatically updated to reflect the committed changes.