This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Enable multiple icmp when tail folding
Needs ReviewPublic

Authored by samparker on Sep 17 2020, 5:30 AM.

Details

Summary

Now that we are handling VCMP and VPT during the tail predication, we should allow more than a single icmp when deciding when to tail fold or not.

Diff Detail

Event Timeline

samparker created this revision.Sep 17 2020, 5:30 AM
samparker requested review of this revision.Sep 17 2020, 5:30 AM

I have the feeling that this was protecting us from more than just things that would become vpt blocks. I think we have a check that the original was only a single loop? That would keep things simpler at least, to mostly icmp + select (and maybe zext(icmp), but as far as I understand that should be fine). Things like min/max/abs should all work OK, and select on their own. But what about the saturating intrinsics we have? vqmovn for example.

llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
1592

Fcmp too, I would expect could be treated the same way as icmp.

llvm/test/Transforms/LoopVectorize/ARM/tail-folding-not-allowed.ll
572

Apparently we have two files for these, and they should now maybe belong in tail-folding-allowed.ll

original was only a single loop?

Do you mean a single block loop? If so, I thought the the vectorizer would perform if-conversion after calling this? And for saturation, does the vectorizer not yet consume saturating intrinsics? Are all of ours ones that merge/retain previous values, maybe their idiom could be checked for?

original was only a single loop?

Do you mean a single block loop? If so, I thought the the vectorizer would perform if-conversion after calling this? And for saturation, does the vectorizer not yet consume saturating intrinsics? Are all of ours ones that merge/retain previous values, maybe their idiom could be checked for?

Yeah sorry. Single block loops. So there should be no ifconversion, and we will only be dealing with icmp+select. That will keep things simpler - less complex loop structures to worry about.

There are many forms of saturating instructions. In this case I was just talking about something that does min(max(x, -128), 127) or an i16, for example. We turn that into a vqmovn in the backend, with some bitcasts for the types. I had presumed that the "n" in there would be a problem, but they look like they are marked as valid of tail pred, even if they do change size.

It is still probably worth checking some examples, to make sure they are still doing as well with this change.