Remove any predicate that we replace with a vctp intrinsic, and try to remove their operands too. Also, look into the exit block to see if there's any duplicates of the predicates that we've replaced.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Now cloning VCTP in the exit block if needed. The reason being that if we create a real TP loop, then VPR will not hold the predicate.
Added 'hasSideEffects' to VCTP because machine cse did a good job of removing the duplicate vctp in the exit block. I think always executing the 'extra' instruction outweighs the risk of having to spill/reload the vpr or not being able to perform the proper tail predication.
Just curious, if the input is already a tail-predicated loop (e.g. coming from intrinsics), would it be possible we incorrectly start removing/inserting things?
lib/Target/ARM/MVETailPredication.cpp | ||
---|---|---|
459 ↗ | (On Diff #220989) | I wouldn't mind if this is done in a helper function as TryConvert is becoming a pretty big function. |
462 ↗ | (On Diff #220989) | Do we need the extra MaybeDead bookkeeping? Can we just use Predicates? |
So the pass shouldn't attempt to convert a loop that contains any vector intrinsics, other than masked.load and masked.store. So, we will actually need to do extra work for this to operate with MVE intrinsics. Even once we've done that, the pass only tries to remove the old icmp predicates, which we pattern match. If the user defined predicates match those that the vectorizer outputs, then there's no reason why we won't perform this transform. I would say that I'd add a test, but we don't have intrinsic support yet...
lib/Target/ARM/MVETailPredication.cpp | ||
---|---|---|
462 ↗ | (On Diff #220989) | I'll take a look. |
So the pass shouldn't attempt to convert a loop that contains any vector intrinsics, other than masked.load and masked.store. So, we will actually need to do extra work for this to operate with MVE intrinsics. Even once we've done that, the pass only tries to remove the old icmp predicates, which we pattern match. If the user defined predicates match those that the vectorizer outputs, then there's no reason why we won't perform this transform. I would say that I'd add a test, but we don't have intrinsic support yet...
Excellent, thanks for the reminder/clarification.
Thanks, looks good to me.
Nitpick/bikeshedding the title:
Cleanup tail-predicated loop
This is accurate when you know what this is doing, but one could get the wrong impression, like this is a NFC patch. Perhaps something like:
"Finalise tail-predicated loop, remove duplicate predicates"
Or something along those lines.
lib/Target/ARM/MVETailPredication.cpp | ||
---|---|---|
430 ↗ | (On Diff #221026) | nit: don't need the brackets in this case (I just noticed because I started looking to see which blocks it closes as it is a bit difficult to see sometimes in phabricator). |