This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Promote expression tree with @llvm.vscale when zero-extending result.
ClosedPublic

Authored by sdesmalen on Jan 31 2023, 2:05 PM.

Details

Summary

The LoopVectorizer emits the (scaled) element count as i32, which for
scalable VFs results in calls to @llvm.vscale.i32(). This value is scaled
and further zero-extended to i64.

The zero-extend can be folded away by executing the whole expression in i64
type using @llvm.vscale.i64(). Any logical and that would needed to mask
the result can be further folded away by KnownBits analysis when
vscale_range is set.

Diff Detail

Event Timeline

sdesmalen created this revision.Jan 31 2023, 2:05 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 31 2023, 2:05 PM
sdesmalen requested review of this revision.Jan 31 2023, 2:05 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 31 2023, 2:05 PM
goldstein.w.n added inline comments.Jan 31 2023, 2:12 PM
llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
260

Since Res is conditionally set here (i.e we can have Call w.o it being vscale think this may segfault b.c Res is set nullptr above and unconditionally derefenced below.

llvm/test/Transforms/InstCombine/AArch64/vscale.ll
1 ↗(On Diff #493729)

Can the tests the split into a separate patch so we can see the diff?

fhahn added a subscriber: fhahn.Jan 31 2023, 2:12 PM
fhahn added inline comments.
llvm/test/Transforms/InstCombine/AArch64/vscale.ll
1 ↗(On Diff #493729)

Does the test need to be AArch64 specific? IIUC this is a target-independent fold of @llvm.vscale, so it might be better to have a generic test?

sdesmalen updated this revision to Diff 493733.Jan 31 2023, 2:27 PM
sdesmalen marked 3 inline comments as done.
  • Fixed issue where Res was conditionally set but still used.
  • Made test agnostic of target.
  • Patch now only shows the diff for the test.

Thanks for the quick feedback!

llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
260

Good spot!

goldstein.w.n added inline comments.Jan 31 2023, 3:16 PM
llvm/test/Transforms/InstCombine/vscale.ll
18

Is this semantically equivilent? Unfortunately alive2 doesn't support @llvm.vscale but is it possible for anything in vscale[29:31] to be set? If so the result will be different.

craig.topper added inline comments.
llvm/test/Transforms/InstCombine/vscale.ll
18

The vscale_range attribute is placing a limit on the upper bound. I think without that there would be an AND instruction at the end to clear the upper bits.

sdesmalen updated this revision to Diff 493862.Feb 1 2023, 1:00 AM
  • Added a test without vscale_range, to show it results in explicit anding of upper bits.
  • Replaced LLVM_FALLTHROUGH with explicit 'llvm_unreachable' and added a switch-statement for the intrinsic ID.
nikic accepted this revision.Feb 1 2023, 1:40 AM

LGTM

This revision is now accepted and ready to land.Feb 1 2023, 1:40 AM
This revision was landed with ongoing or failed builds.Feb 2 2023, 3:19 AM
This revision was automatically updated to reflect the committed changes.