This is an archive of the discontinued LLVM Phabricator instance.

[SVE] Add folds for sign and zero extends of vscale
ClosedPublic

Authored by DylanFleming-arm on Jul 14 2021, 8:50 AM.

Diff Detail

Event Timeline

DylanFleming-arm requested review of this revision.Jul 14 2021, 8:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 14 2021, 8:50 AM
Matt added a subscriber: Matt.Jul 14 2021, 9:19 AM
Matt added inline comments.
llvm/test/Transforms/InstCombine/vscale_sext_and_zext.ll
43

Nit: Missing space.

I think you might need to be a bit more careful to justify these transforms.

LangRef says "If the result value does not fit in the result type, then the result is a poison value." But does that mean the sign bit is guaranteed to be zero?

We could use the max-vscale attribute to determine if the runtime value fits the type. If this cannot be determined, the optimization would be skipped, otherwise we change the type of llvm.vscale to the extended type.
Would that be a suitable approach?

I assume the max-vscale attribute will usually be present? In that case, checking it seems best.

(We might actually want to consider relaxing the LangRef rule; if we can easily determine whether the result fits into a given bitwidth, there isn't really any reason to make llvm.vscale return poison.)

There's instances where max-vscale won't be known at compile time, but this route should allow for the optimizations wherever possible.

I believe in terms of the LangRef, it's still possible for llvm.vscale to return poision, and it could still happen in cases where max-vscale is unknown until runtime. So personally I don't think any adjustment is required.

I assume the max-vscale attribute will usually be present? In that case, checking it seems best.

(We might actually want to consider relaxing the LangRef rule; if we can easily determine whether the result fits into a given bitwidth, there isn't really any reason to make llvm.vscale return poison.)

Just to make sure that I understand the suggestion, do you mean changing the wording in the LangRef that the result is undefined instead of poison? If so, does that mean we could replace llvm.vscale.i8() with undef if max(vscale_range) doesn't fit the i8?

I assume the max-vscale attribute will usually be present? In that case, checking it seems best.

(We might actually want to consider relaxing the LangRef rule; if we can easily determine whether the result fits into a given bitwidth, there isn't really any reason to make llvm.vscale return poison.)

Just to make sure that I understand the suggestion, do you mean changing the wording in the LangRef that the result is undefined instead of poison? If so, does that mean we could replace llvm.vscale.i8() with undef if max(vscale_range) doesn't fit the i8?

My suggestion is that, for example, @llvm.vscale.i8() returns "vscale & 0xFF". Not really sure if it's worthwhile, but it seems like an unnecessary source of poison.

Just to make sure that I understand the suggestion, do you mean changing the wording in the LangRef that the result is undefined instead of poison? If so, does that mean we could replace llvm.vscale.i8() with undef if max(vscale_range) doesn't fit the i8?

My suggestion is that, for example, @llvm.vscale.i8() returns "vscale & 0xFF". Not really sure if it's worthwhile, but it seems like an unnecessary source of poison.

What problem would that solve?

Just to make sure that I understand the suggestion, do you mean changing the wording in the LangRef that the result is undefined instead of poison? If so, does that mean we could replace llvm.vscale.i8() with undef if max(vscale_range) doesn't fit the i8?

My suggestion is that, for example, @llvm.vscale.i8() returns "vscale & 0xFF". Not really sure if it's worthwhile, but it seems like an unnecessary source of poison.

What problem would that solve?

It would allow folding "trunc vscale". And it might be a little simpler to understand. Either way, not a big deal. :)

Added check for vscale_range attribute before optimisation
If the attribute isn't present, or if the maximum value exceeds the bitwidth of the original instrinsic, the optimization is skipped

Updated .ll test to test the extra logic

efriedma added inline comments.Jul 21 2021, 12:34 PM
llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
1371

This shift can overflow.

I'd suggest using Log2_32(MaxVScale) instead.

Changed bitshift to Log2_32

This revision is now accepted and ready to land.Jul 26 2021, 9:53 AM
This revision was landed with ongoing or failed builds.Jul 30 2021, 8:03 AM
This revision was automatically updated to reflect the committed changes.