This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Lowering sve.dot to DOT node
ClosedPublic

Authored by junparser on Mar 31 2021, 9:13 PM.

Details

Summary

As is the title, this patch also takes care about dup node in performAddDotCombine which tries to support scalable vector type.

TestPlan: check-llvm

Diff Detail

Event Timeline

junparser created this revision.Mar 31 2021, 9:13 PM
junparser requested review of this revision.Mar 31 2021, 9:13 PM
Herald added a project: Restricted Project. ยท View Herald TranscriptMar 31 2021, 9:13 PM
Herald added a subscriber: llvm-commits. ยท View Herald Transcript

address pre-check comments.

dmgreen added inline comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13365

Can we add a isZerosVector function that encapsulates isBuildVectorAllZeros and the splats/dups logic? It sounds generally useful to have a function that checks for the various ways a vector can be all zeros.

isConstantSplatVectorAllZeros may also be better, as it may already handle the ISD::SPLAT_VECTOR part.

junparser added inline comments.Apr 1 2021, 3:18 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13365

sound good to me, I'll update this.

junparser updated this revision to Diff 334637.Apr 1 2021, 3:28 AM

address the comments.

Thanks for the patch @junparser! Just a few minor comments from me. ๐Ÿ™‚

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
2161โ€“2163

Might be better to do this _before_ the above if(ISD::isConstantSplatVectorAllZeroes(N)), in my opinion, to avoid running this loop twice (once in this function, once in ISD::isConstantSplatVectorAllZeros).

2165

A cursory glance at ISD::isConstantSplatVectorAllZeros suggests that the SPLAT_VECTOR case is entirely captured earlier on, I think. I reckon you can get away with just:

if (N->getOpcode() != AArch64ISD::DUP)
  return false;
2171โ€“2173

nit:

return (CINT && CINT->isNullValue()) || (CFP && CFP->isZero());
13361

nit: superfluous newline.

13365

Following on from this, it might be a good idea to write isZerosVector as:

static bool isZerosVector(const SDNode *N) {
  return ISD::isConstantSplatVectorAllZeros(N) || isConstantDupVectorAllZeros(N);
}

Might be a nicer separation of the logic here.

Your call, though!

llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith.ll
41

The new CHECK: lines introduced look simple enough to write out by hand, so I would prefer to not use utils/update_llc_checks.py in this case because it has made the diff a lot noisier (e.g. all of the tests have been changed syntactically in this patch, but AFAICT none of them have changed functionally).

In this case, I think it should be sufficient to write your new tests like this:

define <vscale x 2 x i64> @test_sdot_i64_zero(<vscale x 2 x i64> %a, <vscale x 8 x i16> %b, <vscale x 8 x i16> %c) {
; CHECK-LABEL: test_sdot_i64_zero:
; CHECK:         sdot z0.d, z1.h, z2.h
; CHECK-NEXT:    ret
entry:
  %vdot1.i = call <vscale x 2 x i64> @llvm.aarch64.sve.sdot.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 8 x i16> %b, <vscale x 8 x i16> %c)
  %ret = add <vscale x 2 x i64> %vdot1.i, %a
  ret <vscale x 2 x i64> %ret
}

define <vscale x 4 x i32> @test_sdot_i32_zero(<vscale x 4 x i32> %a, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c) {
; CHECK-LABEL: test_sdot_i32_zero:
; CHECK:         sdot z0.s, z1.b, z2.b
; CHECK-NEXT:    ret
entry:
  %vdot1.i = call <vscale x 4 x i32> @llvm.aarch64.sve.sdot.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c)
  %ret = add <vscale x 4 x i32> %vdot1.i, %a
  ret <vscale x 4 x i32> %ret
}

... and:

define <vscale x 2 x i64> @test_udot_i64_zero(<vscale x 2 x i64> %a, <vscale x 8 x i16> %b, <vscale x 8 x i16> %c) {
; CHECK-LABEL: test_udot_i64_zero:
; CHECK:         udot z0.d, z1.h, z2.h
; CHECK-NEXT:    ret
entry:
  %vdot1.i = call <vscale x 2 x i64> @llvm.aarch64.sve.udot.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 8 x i16> %b, <vscale x 8 x i16> %c)
  %ret = add <vscale x 2 x i64> %vdot1.i, %a
  ret <vscale x 2 x i64> %ret
}

define <vscale x 4 x i32> @test_udot_i32_zero(<vscale x 4 x i32> %a, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c) {
; CHECK-LABEL: test_udot_i32_zero:
; CHECK:         udot z0.s, z1.b, z2.b
; CHECK-NEXT:    ret
entry:
  %vdot1.i = call <vscale x 4 x i32> @llvm.aarch64.sve.udot.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c)
  %ret = add <vscale x 4 x i32> %vdot1.i, %a
  ret <vscale x 4 x i32> %ret
}

... and then just revert all other changes in the file. ๐Ÿ˜„

paulwalker-arm added inline comments.Apr 1 2021, 4:00 AM
llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith.ll
41

I think not using update_llc_checks.py was a mistake when landing the original file. Although using it now makes the patch look bigger it will make it easier to update in the future, which is a good thing.

david-arm added inline comments.Apr 1 2021, 4:06 AM
llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith.ll
41

I think the LLVM community prefers using utils/update_llc_test_checks.py where possible, in particular for small functions like those here.

junparser added inline comments.Apr 1 2021, 4:24 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
2165

the ISD::isConstantSplatVectorAllZeros does not check ConstantFPSDNode for SPLAT_VECTOR, that's I added it here.

2165

I will add another patch to check check ConstantFPSDNode for SPLAT_VECTOR in ISD::isConstantSplatVectorAllZeros and then remove the code here? is that ok?

llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith.ll
41

I also prefer to use update_llc_checks.py here.

dmgreen added inline comments.Apr 1 2021, 4:33 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
2165

I had not realized that isConstantSplatVectorAllZeros didn't already handle SPLAT_VECTOR with FP constants. Adding it there sounds like a good idea, either here or in a separate patch.

llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith.ll
41

Yeah, using the update script has benefits in terms of maintainability and consistency between test. As well as not missing things in the generated assembly.

Feel free regenerate the existing test and commit that separately, so just the changes from the patch are shown here. I often to so far as to commit the new tests with the current trunk codegen, so that in the review it is obvious what is changing in the codegen.

junparser updated this revision to Diff 334651.Apr 1 2021, 4:51 AM

address comments.

joechrisellis added inline comments.Apr 1 2021, 5:09 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
2165

ACK -- I would have expected it to already handle SPLAT_VECTOR with float constants. @dmgreen's suggestion sounds good to me, hopefully that's a straightforward change! ๐Ÿ™‚

llvm/test/CodeGen/AArch64/sve-intrinsics-int-arith.ll
41

ACK -- any which way if fine by me. I like @dmgreen's idea of regenerating the test as a separate commit, but no complaints from me if you just leave it as-is. ๐Ÿ™‚

junparser updated this revision to Diff 334656.Apr 1 2021, 5:30 AM

address comments.

@joechrisellis @david-arm Add fp constants handle in ISD::isConstantSplatVector , also split the testcase.

dmgreen accepted this revision.Apr 2 2021, 4:40 AM

Thanks. LGTM

This revision is now accepted and ready to land.Apr 2 2021, 4:40 AM
This revision was automatically updated to reflect the committed changes.
paulwalker-arm added inline comments.Apr 6 2021, 4:48 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
149

@junparser This looks a little wired to me. Specifically the truncOrSelf part which to my mind is never something that is likely safe to do with a floating point constant (even after bitcasting to an APInt). Looking at the equivalent code in BuildVectorSDNode::isConstantSplat I can see that it stops after the bitcastToAPInt() stage. To be consistent, can this function follow the same pattern?