This is an archive of the discontinued LLVM Phabricator instance.

[SVE][DAGCombiner] Enable ISD::ABDS and ISD::ABDU for SVE.
ClosedPublic

Authored by paulwalker-arm on Dec 14 2021, 8:58 AM.

Details

Summary

Add the typical custom lowering and isel patterns to enable ABD
for scalable vectors.

The existing ABD combine doesn't quite work because for SVE only
a single scalable vector per scalar integer type it legal. (i.e.
for i32, <vscale x 4 x i32> is the only legal scalable vector type)

To account for this I've extended the combine for the case when the
extension of the input operands cannot be folded into the ABD. The
accompanying tests use legal and twice-the-size of legal types to
exercise both combines.

Diff Detail

Event Timeline

paulwalker-arm created this revision.Dec 14 2021, 8:58 AM
paulwalker-arm requested review of this revision.Dec 14 2021, 8:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 14 2021, 8:58 AM
efriedma added inline comments.Dec 14 2021, 2:00 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9526

If I'm understanding correctly, this is just saying that if you know some number of leading sign bits of the operands, abs(a-b) is equivalent to abds(a,b) (and the equivalent for unsigned).

This would be more clear if you explicitly checked for leading zero/sign bits, instead of implying them from a SIGN_EXTEND/ZERO_EXTEND opcode.

I'd like to see non-SVE testcases for this.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9526

@efriedma : I've confused myself several times when creating this patch so perhaps I'm still misunderstanding something but I don't think the checks relate to knowing if leading sign bits exists but rather looking for an indication as to how the operands should be interpreted. By this I mean knowing that a value is positive or negative is not enough to determine which of ISD::ABDS or ISD::ABDU will produce the same result as a plain abs(sub()) sequence.

I couldn't see a clear way to write non-SVE test cases. The nodes have very limited uses and from what I can see they don't have any of the typical legalisation plumbing either. The existing AArch64 uses don't suffer the same problem with SVE because there's typically a shorter legal vector type available that uses the same element type.

Matt added a subscriber: Matt.Jan 4 2022, 9:22 AM

The, lets say, "standard" pattern for a uabd is trunc(abs(sub(zext(a), zext(b)))). We start the transform at the abs though, as it allows for more folding, and convert the whole thing to trunc(zext(abd(a, b))), which is folded to plain abd(a, b). Starting at the abd helps us fold more patterns, like those where (say) vecreduce(abs(sub(zext(a), zext(b)))) can be folded to vecreduce(zext(abd(a, b))), which becomes a udot(abd(a, b)) on aarch64 (for example, with the right types).

I would suspect that you could include the trunc in the SVE tests, which would remove the need for the target independent parts of this patch, and it could test the legal types. (The existing ones might be good to keep around too if they would show inefficient codegen). And if you wanted to continue with the target indepedent part it could be done in a separate patch? Where it is probably worth adding some NEON AArch64 tests too.

Thanks @dmgreen. I've broken the ISel part out into D117873. I'll revisit this patch once the ISel is available. I took another look at the NEON side of things and I looks like I can use 32bit vectors to exercise the new combine so I'll add those tests when I rebase.

Rebase now that the isel side of things in on main.

sdesmalen added inline comments.Feb 8 2022, 6:25 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9523–9526

Is there a test for the case where VT1 != VT2 ?

There is potential alternative for the "Smaller than legal" Neon types in D119075 (It's an MVE patch, but the AArch64 version I have looks very similar). I'm not sure it will handle all the types here (like i1's), but I would hope SVE could be handled in the same way, and some of the assembly will be more optimal as it will use the smaller type with a single extend. I haven't had enough time to update them since the weekend yet.

There is also D119072, which too is a little contentious, but it uses ComputeNumSignBits and computeKnownBits/countMinLeadingZeros for detecting the number of sign bits. (The contentious part is about it running from DemandedBits, as far as I understand, not the calls to computeKnownBits). This patch it kind of doing that but just with zext/sext, and not creating the smaller type if possible.

Rebase and adds extra tests requested by Sander.

I don't know what you're trying to tell me here @dmgreen. As I see it, this is a generic issue that I've solved in a target independent and simple way. Does it cover all bases? No, but then nor does the original combine, hence this addition. If it turns out we need to support more cases then we'd just extend the combine and add the necessary tests.

No strong objection from me!

But like Eli said, the code is a little odd, and it doesn't seem to be producing optimal results. If you don't want to wait for me to get back to D119075, this sounds fine too.

dmgreen accepted this revision.Feb 15 2022, 10:07 AM

It looks like the other patch still needs more work. Feel free to go with this in the meantime. LGTM

This revision is now accepted and ready to land.Feb 15 2022, 10:07 AM
This revision was landed with ongoing or failed builds.Feb 17 2022, 5:34 AM
This revision was automatically updated to reflect the committed changes.