COST(zext (<4 x i32> load(...) to <4 x i64>)) != 0 when
<4 x i64> is an illegal result type that requires splitting
of the operation.
Details
Diff Detail
Unit Tests
Event Timeline
llvm/include/llvm/CodeGen/BasicTTIImpl.h | ||
---|---|---|
795 | This is that the type legalization costs are the same? I'm a little surprised that isLoadExtLegal returns true if they are not legal. |
llvm/include/llvm/CodeGen/BasicTTIImpl.h | ||
---|---|---|
795 |
Yes that's right, if only one of the types require splitting/scalarizing and the other does not, we can't assume the operation can be done freely as part of the the load itself, because the operation needs splitting. Looking at this condition now, perhaps it should use the legalized types instead of ExtVT and LoadVT, as in: if (DstLT.first == SrcLT.first && TLI->isLoadExtLegal(LType, DstLT.second, SrcLT.second) I'll change that.
isLoadExtLegal checks that getLoadExtAction returns Legal, where LegalizeAction has the following options: enum LegalizeAction : uint8_t { Legal, // The target natively supports this operation. Promote, // This operation should be executed in a larger type. Expand, // Try to expand this to other ops, otherwise use a libcall. LibCall, // Don't try to expand this to other ops, always use a libcall. Custom // Use the LowerOperation hook to implement custom lowering. }; Splitting is not one of the options, which probably makes sense as this would already be part of the type legalization. |
Pass legalized types to isLoadExtLegal (This also makes certain operations free again)
The v8-main/base costs are a little odd, as it's not an architecture that has vectors. But the extends of the loads should probably be free once it's scalarized, and the loads are already more expensive.
LGTM. Thanks
Ah that explains the scalar code I saw when I compiled some of the examples with llc. The zero/sign extends were indeed folded into the scalar loads.
Thanks for the review!
Reverted back to previous revision of this patch, because I found some failing tests for X86 which happened because one of the legalized types was widened where the other was not, which led to asking TLI the question if isLoadExtLegal(v2i64, 16i8), which returns false.
Ah I see. I guess it should be working on the maximum of the two type legalization costs, and the same number of vector lanes for each type? But that looks like a larger change throughout getCastInstrCost.
I think this splitting the same number of times still makes sense as a heuristic.
The cost of legalization is basically the cost of splitting the type. When splitting, either both (Src/Dst) have a legalization cost of 1, or the extended-type has a higher legalization cost than the source type. For that the code is now doing the right thing, either the extend is free (both cost of 1), or it continues down in this function to calculate a cost. When the destination type is legal and the source vector is widened, both have a legalization cost of 1 (no splitting occurred). Having the extend still be free makes sense because the widening doesn't actually happen, the extend operation is directly folded into the load, which results in v2i64.
I think this splitting the same number of times still makes sense as a heuristic.
Thanks for giving it another look!
This is that the type legalization costs are the same? I'm a little surprised that isLoadExtLegal returns true if they are not legal.