Now that we are handling VCMP and VPT during the tail predication, we should allow more than a single icmp when deciding when to tail fold or not.
Details
Diff Detail
Event Timeline
I have the feeling that this was protecting us from more than just things that would become vpt blocks. I think we have a check that the original was only a single loop? That would keep things simpler at least, to mostly icmp + select (and maybe zext(icmp), but as far as I understand that should be fine). Things like min/max/abs should all work OK, and select on their own. But what about the saturating intrinsics we have? vqmovn for example.
llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp | ||
---|---|---|
1592 | Fcmp too, I would expect could be treated the same way as icmp. | |
llvm/test/Transforms/LoopVectorize/ARM/tail-folding-not-allowed.ll | ||
572 | Apparently we have two files for these, and they should now maybe belong in tail-folding-allowed.ll |
original was only a single loop?
Do you mean a single block loop? If so, I thought the the vectorizer would perform if-conversion after calling this? And for saturation, does the vectorizer not yet consume saturating intrinsics? Are all of ours ones that merge/retain previous values, maybe their idiom could be checked for?
Yeah sorry. Single block loops. So there should be no ifconversion, and we will only be dealing with icmp+select. That will keep things simpler - less complex loop structures to worry about.
There are many forms of saturating instructions. In this case I was just talking about something that does min(max(x, -128), 127) or an i16, for example. We turn that into a vqmovn in the backend, with some bitcasts for the types. I had presumed that the "n" in there would be a problem, but they look like they are marked as valid of tail pred, even if they do change size.
It is still probably worth checking some examples, to make sure they are still doing as well with this change.
Fcmp too, I would expect could be treated the same way as icmp.