Stores for constant floats must be vectorized, improve analysis in SLP
vectorizer for stores.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
5819 | I'm not sure I understand why we may need this extra argument. Here is how the code could look: const auto *I0 = cast<Instruction>(VL[0]); Type *OpTy = I0->getOperand(OpIdx)->getType(); if (OpTy->isFloatingPointTy()) { for (Value *V : VL) { const auto *Inst = cast<Instruction>(V); assert(Inst->getOpcode() == I0->getOpcode() && "Expected same opcode"); if (!isa<ConstantFP>(Inst->getOperand(OpIdx))) return {TTI::OK_AnyValue, TTI::OP_None}; } return {TTI::OK_NonUniformConstantValue, TTI::OP_None}; } // If all operands are exactly the same ConstantInt then set the // operand kind to OK_UniformConstantValue. // If instead not all operands are constants, then set the operand kind // to OK_AnyValue. If all operands are constants but not the same, // then set the operand kind to OK_NonUniformConstantValue. ConstantInt *CInt0 = dyn_cast<ConstantInt>(I0->getOperand(OpIdx)); if (!CInt0) return {TTI::OK_AnyValue, TTI::OP_None}; TTI::OperandValueKind VK = TTI::OK_UniformConstantValue; TTI::OperandValueProperties VP = TTI::OP_PowerOf2; for (Value *V : VL.drop_front()) { const auto *Inst = cast<Instruction>(V); assert(Inst->getOpcode() == I0->getOpcode() && "Expected same opcode"); auto *CInt = dyn_cast<ConstantInt>(Inst->getOperand(OpIdx)); if (!CInt) return {TTI::OK_AnyValue, TTI::OP_None}; if (VP == TTI::OP_PowerOf2 && !CInt->getValue().isPowerOf2()) VP = TTI::OP_None; if (CInt0 != CInt) VK = TTI::OK_NonUniformConstantValue; } return {VK, VP}; |
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
5819 | Minor correction for the above (VP Initialization): TTI::OperandValueProperties VP = CInt0->getValue().isPowerOf2() ? TTI::OP_PowerOf2 : TTI::OP_None; |
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
5819 | Just we need to perform this abalysis only for stores but not for fadd, fneg, etc. I can add a check fir the kind of the instruction instead but I thought that it would be better to provide a flag explicitly and pass it explicitly for the required instructions. |
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
5819 |
Is that because you are targeting stores only in this patch? How does it affect other operations? |
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
5819 | Yes, I just don't have enough time/resources for other ops analysis. |
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
5819 | That's okay for incremental step. Then please don't declare the default value for the argument and |
Looks acceptable.
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
5835 | As I said earlier we should never have a situation of mixed types. So I do not quite understand why you are still sticking with packing the two flows into the single loop. This does not save anything but makes the code to be error prone. Just look: the condition at 5840 should probably be if (EnableFP && CFlt). You also do not use CFlt to access the object but only for null pointer check. |
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
5835 | Sure, just missed it. |
Adding this only for the store case seems undesirable both long term as it's clearly not the right long term goal and short term as it adds complexity for no good reason. I'm not actively requesting you don't land this, but I do strongly want to encourage you to either a) do the analysis for the non-stores and refresh this patch, or b) commit to doing a follow up in the near future.
I can try to add OK_FPConstantValue value instead and change analysis using this value for now.
Huh? Why do this?
I meant that we clearly want uniform constant handling to include float vectors in all cases long term. That's what I was asking you to follow up with.
I see. I can send a followup patch with uniform floats analysis but I cannot test it thoroughly.
I'm not sure I understand why we may need this extra argument.
Is it for restraining purpose?
We can tell whether we dealing with FP or integer from type of the operand we are evaluating.
Besides, type of operation across VL have to be homogeneous so it makes sense to multi-version the method for better clarity.
And you don't need to enumerate VL if you process first element early.
Here is how the code could look: