This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Sink op into loop if it's used by PTEST and known to zero inactive lanes.
ClosedPublic

Authored by sdesmalen on Jul 15 2022, 6:10 AM.

Details

Summary

This helps fold away the ptest instructions, which needs the knowledge on whether
the general predicate is known to zero the inactive lanes.

This fixes some PTEST regressions introduced by D129282.

Diff Detail

Event Timeline

sdesmalen created this revision.Jul 15 2022, 6:10 AM
sdesmalen requested review of this revision.Jul 15 2022, 6:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 15 2022, 6:10 AM

Thanks @sdesmalen for fixing some of my regressions!

llvm/test/CodeGen/AArch64/sve-ptest-removal-sink.ll
9

Looks like %a and %b aren't used anywhere?

Can you provide a C/C++ example for when this is a problem? I'm not saying there's no problem to solve but rather that I'm starting to wonder if early decisions about removing reinterpret intrinsics are now starting to bite us.

Can you provide a C/C++ example for when this is a problem? I'm not saying there's no problem to solve but rather that I'm starting to wonder if early decisions about removing reinterpret intrinsics are now starting to bite us.

https://godbolt.org/z/q6hKMsxKK

In this case it doesn't have anything to do with the reinterpret intrinsics, but rather the ptrue that gets hoisted out which is then no longer visible to (non-Global) ISel when it lowers the ptest.

Matt added a subscriber: Matt.Jul 15 2022, 10:53 AM

Can you provide a C/C++ example for when this is a problem? I'm not saying there's no problem to solve but rather that I'm starting to wonder if early decisions about removing reinterpret intrinsics are now starting to bite us.

https://godbolt.org/z/q6hKMsxKK

In this case it doesn't have anything to do with the reinterpret intrinsics, but rather the ptrue that gets hoisted out which is then no longer visible to (non-Global) ISel when it lowers the ptest.

Just to expand on this a little bit, I experimented with disabling the removal of the reinterpret intrinsics, but found that the convert.to.svbool call is still outside the loop at the point of reaching codegen, meaning that it still can't fold away the ptest instruction unless it actively tries to sink the operand.

Can you provide a C/C++ example for when this is a problem? I'm not saying there's no problem to solve but rather that I'm starting to wonder if early decisions about removing reinterpret intrinsics are now starting to bite us.

https://godbolt.org/z/q6hKMsxKK

In this case it doesn't have anything to do with the reinterpret intrinsics, but rather the ptrue that gets hoisted out which is then no longer visible to (non-Global) ISel when it lowers the ptest.

Just to expand on this a little bit, I experimented with disabling the removal of the reinterpret intrinsics, but found that the convert.to.svbool call is still outside the loop at the point of reaching codegen, meaning that it still can't fold away the ptest instruction unless it actively tries to sink the operand.

I can believe that because removing those optimisations just replace the implicit zeroing with explicit zeroing. Whereas, I'm trying to figure out if we need to tighten up the intrinsics across the board to stop hiding the fact that predicate setting intrinsics always set all the bits. There's nothing here for you to worry about because this patch is generally restoring previous behaviour. I'm just worried we'll see an ever increase list of places were we need to restore the zeroing information that we've deliberately hidden.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
12761

My worry here is pulling something into a loop that's more expensive than what we're trying to remove. Would it be wrong to limit this to simple constant splats and ptrue intrinsics?

sdesmalen updated this revision to Diff 447654.Jul 26 2022, 5:34 AM

Limit opcode to ptrue intrinsic.

paulwalker-arm accepted this revision.Jul 26 2022, 5:56 AM

Please don't forget @RosieSumpter's unused parameters comment.

This revision is now accepted and ready to land.Jul 26 2022, 5:56 AM
This revision was landed with ongoing or failed builds.Jul 26 2022, 7:09 AM
This revision was automatically updated to reflect the committed changes.