This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Combine bitcasts to predicate types with vector inserts of loads
ClosedPublic

Authored by bsmith on Jul 22 2021, 7:12 AM.

Details

Summary

An insert subvector that is inserting the result of a vector predicate
sized load into undef at index 0, whose result is casted to a predicate
type, can be combined into a direct predicate load.

The purpose of this optimization is to clean up cases that will be
introduced in a later patch where casts to/from predicate types from i8
types will use insert subvector, rather than going through memory early.

This optimization is done in SVEIntrinsicOpts rather than InstCombine to
re-introduce scalable loads as late as possible, to give other
optimizations the best chance possible to do a good job.

Diff Detail

Event Timeline

bsmith created this revision.Jul 22 2021, 7:12 AM
bsmith requested review of this revision.Jul 22 2021, 7:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 22 2021, 7:12 AM
Matt added a subscriber: Matt.Jul 22 2021, 11:02 AM
efriedma added inline comments.Jul 22 2021, 11:21 AM
llvm/lib/Target/AArch64/SVEIntrinsicOpts.cpp
293

Do you need to check getMaxSVEVectorSizeInBits() somewhere?

This code could probably use some comments explaining why it's checking various conditions.

junparser added inline comments.Jul 22 2021, 8:45 PM
llvm/lib/Target/AArch64/SVEIntrinsicOpts.cpp
442

shall we need to handle vector_extract as well?

bsmith updated this revision to Diff 361206.Jul 23 2021, 7:44 AM
bsmith marked 2 inline comments as done.
  • Add equivalent optimization for vector extract
  • Add comment to clarify what is being checked
  • Ensure max vector length is also exactly the size we want
  • Check the attribute rather than the subtarget options

The patch looks good to me. However, It seems that we can also handle SVE vector extract/insert here as well rather than just predicate vector.

junparser added inline comments.Jul 26 2021, 7:47 PM
llvm/lib/Target/AArch64/SVEIntrinsicOpts.cpp
296

Why do we disable differing vscale min/max? I donot see any difference between vscale_range(4,0) and vscale_range(4,6).

llvm/lib/Target/AArch64/SVEIntrinsicOpts.cpp
364

Nit: s/loads/stores/

The patch looks good to me. However, It seems that we can also handle SVE vector extract/insert here as well rather than just predicate vector.

In principal we could, however we haven't actually seen a point where this would be useful. Clang doesn't generate the same kinds of bitcast for other types as it does for predicates. It's something we can add later if it does turn out to be useful.

bsmith updated this revision to Diff 362394.Jul 28 2021, 8:07 AM
bsmith marked 2 inline comments as done.
  • Don't allow a maximum vector size of 0
  • Fixed clang format issues
  • Fix copy-paste issue in comment
paulwalker-arm added inline comments.Jul 29 2021, 5:49 AM
llvm/lib/Target/AArch64/SVEIntrinsicOpts.cpp
285

Is this name representative? I think optimizePredicateStore is more in keeping with what is going on.

294

These are really MinVScale/MaxVScale.

296

I think this needs || MinSVEVectorSize == 0 to cover the vscale_range(0,0) case. Perhaps worth adding a comment along the lines of "The transform needs to know the exact runtime length of scalable vectors".

To bring this point home perhaps it's worth introducing unsigned SVEVectorSize = MinVScale*128;, but I'll leave you to decide if it's worth it.

314–315

I've got a feeling this needs to be more specific otherwise we'll introduce endianness issues. Specially we know predicate load/store instructions are byte based and so we should only allow the transform when the fixed length load/store instructions are also byte based.

Given this is all very specific I imagine you'll end up with something like if (Store->getOperand(0)->getType() == PrecalculatedTy)

322

Is this restriction strictly necessary? I'm thinking we might have multiple stores of the same value so unless there is a real affect on code quality I'd rather not artificially restrict the transform.

326–327

cast<ConstantInt>(IntrI->getOperand(1))->isZero()?

358

I don't see any test to ensure the old bitcast is dead? When combined with my previous comment you'll likely want:

if (IntrI->getNumUses() == 0)
  IntrI->eraseFromParent();
if (BitCast->getNumUses() == 0)
  BitCast->eraseFromParent();
368

I haven't looked but I imagine this'll have similar issues as raise for optimizePredicateVectorExtract.

bsmith updated this revision to Diff 363041.Jul 30 2021, 5:28 AM
bsmith marked 8 inline comments as done.
  • Avoid iterating over the instructions twice
  • Rename function to better reflect what they are doing
  • Disallow Vscale == 0 case
  • Only allow optimization when the fixed type involved is of i8 type
  • Relax only one use requirements
  • Only remove instructions when there are no more uses
paulwalker-arm added inline comments.Aug 2 2021, 9:54 AM
llvm/lib/Target/AArch64/SVEIntrinsicOpts.cpp
286

Not that bothered but you could pass this into the function given optimizeInstructions already knows it. If you do keep the lookup then can you use I->getFunction().

304

To keep things simple (pun intended) I suggest adding || !Store->isSimple()) so that we only transform ordinary stores.

308

Is the dyn_cast necessary?

352

As with optimizePredicateStore.

368–369

Although possible, is this worth caring about?

389

As with the store case, we should only allow ordinary loads.

393

Is the dyn_cast necessary?

397

The insertion point needs to be where the original load is to ensure the new load maintains the same load/store ordering.

llvm/test/CodeGen/AArch64/sve-insert-vector-to-predicate-load.ll
11

Given the new load placement issue I mentioned above it's worth having the load in a separate basic block in order to validate that fix.

bsmith updated this revision to Diff 363687.Aug 3 2021, 4:18 AM
bsmith marked 9 inline comments as done.
  • Only perform optimization on simple loads/stores
  • Remove unnecessary dyn_casts
  • Move insertion point for load case to the load
  • Add testcase for load insertion point
paulwalker-arm accepted this revision.Aug 3 2021, 6:53 AM
paulwalker-arm added inline comments.
llvm/lib/Target/AArch64/SVEIntrinsicOpts.cpp
338

Not sure why I've only just spotted this but does this do anything? Can there be any uses of a store?

llvm/test/CodeGen/AArch64/sve-insert-vector-to-predicate-load.ll
62–65

Not really relevant for this patch but it occurs to me that this IR is provably bogus if we wanted to add the necessary hooks into the verifier.

This revision is now accepted and ready to land.Aug 3 2021, 6:53 AM