This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Improve codegen for dupq SVE ACLE intrinsics
ClosedPublic

Authored by bsmith on May 25 2021, 5:07 AM.

Details

Summary

Use llvm.experimental.vector.insert instead of storing into an alloca
when generating code for these intrinsics. This defers the codegen of
the generated vector to instruction selection, allowing existing
shufflevector style optimizations to apply.

Additionally, introduce a new target transform that can recognise fixed
predicate patterns in the svbool variants of these intrinsics.

Diff Detail

Event Timeline

bsmith created this revision.May 25 2021, 5:07 AM
bsmith requested review of this revision.May 25 2021, 5:07 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMay 25 2021, 5:07 AM
Matt added a subscriber: Matt.May 25 2021, 8:08 AM

Do we really need a dedicated LLVM intrinsic to make the pattern-matching work here? It would be better if we could leverage @llvm.experimental.vector.insert.nxv16i8.v16i8 or something like that. Something along the lines of https://godbolt.org/z/Wz4azzKrP seems straightforward enough to match, and generates decent code even without any special patterns.

bsmith updated this revision to Diff 349241.Jun 2 2021, 5:25 AM
bsmith retitled this revision from [AArch64][SVE] Optimize svbool dupq ACLE intrinsic to fixed predicate patterns to [AArch64][SVE] Improve codegen for dupq SVE ACLE intrinsics.
bsmith edited the summary of this revision. (Show Details)
  • Rework approach to use llvm.experimental.vector.insert instead of introducing a new LLVM intrinsic
  • Also apply changes to all non-svbool variants.

Can we add a few end-to-end tests of bool svdupq with constant operands to acle_sve_dupq.c? The pattern matching to create ptrue seems a bit fragile, so I want to make sure we don't break it by accident.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
415

isZero(). (Every time getZExtValue() is used somewhere, I have to think about whether the value might be wider than 64 bits, so please avoid where possible.)

435

isa<UndefValue>

bsmith updated this revision to Diff 349525.Jun 3 2021, 5:25 AM
  • Use !isZero() in place of getZExtValue() != 0
  • Add end to end tests for ptrue transformation
bsmith marked 2 inline comments as done.Jun 3 2021, 5:25 AM
efriedma added inline comments.Jun 3 2021, 11:51 AM
clang/lib/CodeGen/CGBuiltin.cpp
9070

Constant doesn't imply ConstantInt. (For example, it could be the address of a global variable.)

Not sure you need to explicitly check for constants here, anyway; Builder.CreateZExt does constant folding.

bsmith updated this revision to Diff 349827.Jun 4 2021, 5:07 AM
bsmith marked an inline comment as done.
  • Remove unnecessary complexity when zero-extending dupq operands into a vector.
This revision is now accepted and ready to land.Jun 4 2021, 12:15 PM
This revision was landed with ongoing or failed builds.Jun 7 2021, 4:21 AM
This revision was automatically updated to reflect the committed changes.