This is an archive of the discontinued LLVM Phabricator instance.

[CostModel] An extending load to illegal type is not free.
ClosedPublic

Authored by sdesmalen on Feb 8 2021, 1:59 AM.

Details

Summary

COST(zext (<4 x i32> load(...) to <4 x i64>)) != 0 when
<4 x i64> is an illegal result type that requires splitting
of the operation.

Diff Detail

Event Timeline

sdesmalen requested review of this revision.Feb 8 2021, 1:59 AM
sdesmalen created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 8 2021, 1:59 AM
dmgreen added inline comments.Feb 8 2021, 3:11 AM
llvm/include/llvm/CodeGen/BasicTTIImpl.h
795

This is that the type legalization costs are the same? I'm a little surprised that isLoadExtLegal returns true if they are not legal.

sdesmalen added inline comments.Feb 8 2021, 3:31 AM
llvm/include/llvm/CodeGen/BasicTTIImpl.h
795

This is that the type legalization costs are the same?

Yes that's right, if only one of the types require splitting/scalarizing and the other does not, we can't assume the operation can be done freely as part of the the load itself, because the operation needs splitting. Looking at this condition now, perhaps it should use the legalized types instead of ExtVT and LoadVT, as in:

if (DstLT.first == SrcLT.first && TLI->isLoadExtLegal(LType, DstLT.second, SrcLT.second)

I'll change that.

I'm a little surprised that isLoadExtLegal returns true if they are not legal.

isLoadExtLegal checks that getLoadExtAction returns Legal, where LegalizeAction has the following options:

enum LegalizeAction : uint8_t {
  Legal,      // The target natively supports this operation.
  Promote,    // This operation should be executed in a larger type.
  Expand,     // Try to expand this to other ops, otherwise use a libcall.
  LibCall,    // Don't try to expand this to other ops, always use a libcall.
  Custom      // Use the LowerOperation hook to implement custom lowering.
};

Splitting is not one of the options, which probably makes sense as this would already be part of the type legalization.

sdesmalen updated this revision to Diff 322079.Feb 8 2021, 4:22 AM

Pass legalized types to isLoadExtLegal (This also makes certain operations free again)

dmgreen accepted this revision.Feb 8 2021, 5:50 AM

The v8-main/base costs are a little odd, as it's not an architecture that has vectors. But the extends of the loads should probably be free once it's scalarized, and the loads are already more expensive.

LGTM. Thanks

This revision is now accepted and ready to land.Feb 8 2021, 5:50 AM

The v8-main/base costs are a little odd, as it's not an architecture that has vectors. But the extends of the loads should probably be free once it's scalarized, and the loads are already more expensive.

Ah that explains the scalar code I saw when I compiled some of the examples with llc. The zero/sign extends were indeed folded into the scalar loads.

Thanks for the review!

sdesmalen updated this revision to Diff 322480.Feb 9 2021, 1:09 PM

Reverted back to previous revision of this patch, because I found some failing tests for X86 which happened because one of the legalized types was widened where the other was not, which led to asking TLI the question if isLoadExtLegal(v2i64, 16i8), which returns false.

Ah I see. I guess it should be working on the maximum of the two type legalization costs, and the same number of vector lanes for each type? But that looks like a larger change throughout getCastInstrCost.

I think this splitting the same number of times still makes sense as a heuristic.

Ah I see. I guess it should be working on the maximum of the two type legalization costs, and the same number of vector lanes for each type? But that looks like a larger change throughout getCastInstrCost.

The cost of legalization is basically the cost of splitting the type. When splitting, either both (Src/Dst) have a legalization cost of 1, or the extended-type has a higher legalization cost than the source type. For that the code is now doing the right thing, either the extend is free (both cost of 1), or it continues down in this function to calculate a cost. When the destination type is legal and the source vector is widened, both have a legalization cost of 1 (no splitting occurred). Having the extend still be free makes sense because the widening doesn't actually happen, the extend operation is directly folded into the load, which results in v2i64.

I think this splitting the same number of times still makes sense as a heuristic.

Thanks for giving it another look!