This is an archive of the discontinued LLVM Phabricator instance.

[SLPVectorizer] Move scalar/vector costs to helper functions (NFCI).
AbandonedPublic

Authored by RKSimon on Jul 12 2018, 2:40 AM.

Details

Summary

As detailed on D49135, this patch moves most of the opcode scalar/vector cost calculations into helper functions. This is primarily to avoid repetition in the shufflevector case for alternate opcodes, but it also demonstrates how many of the opcode groups share the same cost calculation code.

I haven't touched the ExtractValue/ExtractElement costs - these might require more work before we can use them correctly for PR30787/D28907 'copyable' cases.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon created this revision.Jul 12 2018, 2:40 AM
ABataev added inline comments.Jul 16 2018, 10:55 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
609–610

Can we just use TTI->getInstructionCost(I, TargetTransformInfo::TCK_RecipThroughput); instead of this function?

612–615

The same question here

RKSimon added inline comments.Jul 16 2018, 1:28 PM
lib/Transforms/Vectorize/SLPVectorizer.cpp
609–610

I've kept more closely to the original code than might be necessary - more of the instructions could use TTI->getInstructionCost directly in the switch statements - but enums like GEP and Cast have minor diffs that we seem to be relying on....

612–615

Again, there are some diffs in the calls but we might be able to reuse more than we do - the TTI->getInstructionCost isn't really designed to take a scalar Instruction and a vector Type.

ABataev added inline comments.Jul 16 2018, 1:35 PM
lib/Transforms/Vectorize/SLPVectorizer.cpp
609–610

Can we use this for load, store and calls?

RKSimon added inline comments.Jul 17 2018, 3:51 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
609–610

Load/Store could use TTI->getInstructionCost but it will mean we're not hard coding the address space to 0 anymore (we'll use the StoreInst/LoadInst getPointerAddressSpace())

Call should be fine - we might want to keep an assert to check that its an IntrinsicInst type.

I think the use of TTI->getInstructionCost in getScalarCost would be quite straightforward - it is getVectorCost that will need most of the custom handling as we're manipulating scalar Instructions to query equivalent vector costs.

IMO we're better off using the getScalarCost/getVectorCost abstractions instead of embedding TTI->getInstructionCost calls directly inside BoUpSLP::getEntryCost - it notably helps simplify the code and improves readability.

I think the use of TTI->getInstructionCost in getScalarCost would be quite straightforward - it is getVectorCost that will need most of the custom handling as we're manipulating scalar Instructions to query equivalent vector costs.

IMO we're better off using the getScalarCost/getVectorCost abstractions instead of embedding TTI->getInstructionCost calls directly inside BoUpSLP::getEntryCost - it notably helps simplify the code and improves readability.

I tried to do something similar some time ago, but I did not like it. Instead of one switch for opcode we have 2. But we already know the opcode in many cases and, actually, the second switch is required only for the shuffles. Maybe it is worth it to outline several standalone functions for PHIs, CmpInsts, BinOps etc. and use them directly where we know the opcode and use these getScalarCost/getVectorCost only for shuffles? Of course, these 2 functions also should call these outlined cost functions for each particular opcode.

I think the use of TTI->getInstructionCost in getScalarCost would be quite straightforward - it is getVectorCost that will need most of the custom handling as we're manipulating scalar Instructions to query equivalent vector costs.

IMO we're better off using the getScalarCost/getVectorCost abstractions instead of embedding TTI->getInstructionCost calls directly inside BoUpSLP::getEntryCost - it notably helps simplify the code and improves readability.

I tried to do something similar some time ago, but I did not like it. Instead of one switch for opcode we have 2. But we already know the opcode in many cases and, actually, the second switch is required only for the shuffles. Maybe it is worth it to outline several standalone functions for PHIs, CmpInsts, BinOps etc. and use them directly where we know the opcode and use these getScalarCost/getVectorCost only for shuffles? Of course, these 2 functions also should call these outlined cost functions for each particular opcode.

I can investigate ways to return std::pair<int, int> if you think that might be better? That would avoid the scalar/vector switch statement duplication, TTI->getInstructionCost will be doing something very similar.

An alternative would be to handle shufflevector/alternate separately (like we do for gather stages) and split getEntryCost into 2 functions - the special cases (gathers/alternates etc.) and the costs for particular opcodes.

I think the use of TTI->getInstructionCost in getScalarCost would be quite straightforward - it is getVectorCost that will need most of the custom handling as we're manipulating scalar Instructions to query equivalent vector costs.

IMO we're better off using the getScalarCost/getVectorCost abstractions instead of embedding TTI->getInstructionCost calls directly inside BoUpSLP::getEntryCost - it notably helps simplify the code and improves readability.

I tried to do something similar some time ago, but I did not like it. Instead of one switch for opcode we have 2. But we already know the opcode in many cases and, actually, the second switch is required only for the shuffles. Maybe it is worth it to outline several standalone functions for PHIs, CmpInsts, BinOps etc. and use them directly where we know the opcode and use these getScalarCost/getVectorCost only for shuffles? Of course, these 2 functions also should call these outlined cost functions for each particular opcode.

I can investigate ways to return std::pair<int, int> if you think that might be better? That would avoid the scalar/vector switch statement duplication, TTI->getInstructionCost will be doing something very similar.

Why do we need to return pair? You want to reuse the same function for scalar/vector? I meant, that we have the first switch when we looking through the opcodes in getEntryCost and then we have 2 additional switches when we calling getScalarCost and getVectorCost.
BTW, do we really need the separate cost for scalar/vector ops or we can calculate the final cost immediately? I mean, instead of getSclarCost/getVectorCost we may have just getPHICost, getBinOpCOst, etc. + getCost that will call all these functions for the shuffle? These functions will calculate the difference between the vector/scalar cost.

An alternative would be to handle shufflevector/alternate separately (like we do for gather stages) and split getEntryCost into 2 functions - the special cases (gathers/alternates etc.) and the costs for particular opcodes.

Yes, probably it would be better and that's what I actually suggested.

RKSimon abandoned this revision.Jul 18 2018, 6:57 AM