This is an archive of the discontinued LLVM Phabricator instance.

[SVE] Add a DAG combiner fold to visitADD for vscale with truncate
AbandonedPublic

Authored by Allen on May 27 2022, 5:40 AM.

Details

Summary

Fold a+truncate(vscale(c1))+truncate(vscale(c2)) to a+truncate(vscale(c1+c2))
As the vscale Constant is legalized with type i64 DAG, so it inserts a truncate
in function LowerVSCALE, which cause the mismatch in D82792.

Diff Detail

Event Timeline

Allen created this revision.May 27 2022, 5:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 27 2022, 5:40 AM
Allen requested review of this revision.May 27 2022, 5:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 27 2022, 5:40 AM
efriedma added inline comments.May 27 2022, 11:47 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2622

You can't assert that VSCALE returns an i64. Probably not even on Arm, but definitely not in target-independent code.

2630

Do you really need to explicitly use the number "64" to make this work?

Allen updated this revision to Diff 432681.May 27 2022, 6:33 PM
reames added a subscriber: reames.May 28 2022, 12:14 PM

I don't claim to fully understand this, so my comment here might be off base.

I suspect your fold can be generalized as: fold a+truncate(vscale(c1))+truncate(vscale(c2)) to a+truncate(vscale(c1)+vscale(c2))

The vscale(c1)+vscale(c2) to vscale(C1 + C2) is handled separately above already.

If this is true, that your transform reduces to proving that it's legal to common the truncate. However, as a far as I known trunc(x) + trunc(y) is always equal to trunc(x+y). So why do we need this transform at all? Shouldn't this be covered by generic trunc folds and the existing rule?

Anyways, I'm clearly missing something here. Any idea what?

Allen marked 2 inline comments as done.May 28 2022, 3:00 PM

I don't claim to fully understand this, so my comment here might be off base.

I suspect your fold can be generalized as: fold a+truncate(vscale(c1))+truncate(vscale(c2)) to a+truncate(vscale(c1)+vscale(c2))

The vscale(c1)+vscale(c2) to vscale(C1 + C2) is handled separately above already.

If this is true, that your transform reduces to proving that it's legal to common the truncate. However, as a far as I known trunc(x) + trunc(y) is always equal to trunc(x+y). So why do we need this transform at all? Shouldn't this be covered by generic trunc folds and the existing rule?

Anyways, I'm clearly missing something here. Any idea what?

Thanks for your attention. With my debug, I find vscale is a little especial. As the vscale(C1) is not a const type node before LowerVSCALE, so we missing the transform of trunc(x) + trunc(y) --> trunc(x+y) with instcombine.
And in the dagcombine, it match the DAG IR from bottom to top, so we can capture the former a+truncate(vscale(c1))+truncate(vscale(c2)), but not truncate(vscale(c1))+truncate(vscale(c2)).

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2622

Thanks, apply your commit

2630

Yes, it can be deleted

craig.topper added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2634

Do you need to check that both vscales have the same types?

Allen marked 2 inline comments as done.May 28 2022, 3:59 PM
Allen added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2634

Thanks for your attention.
As only the same type node can be on the both side of ISD::ADD, so I think it can be ignored. Am I missing something?

craig.topper added inline comments.May 28 2022, 4:09 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2634

There's no guarantee the input types of the two truncates is the same. You could have something like (i16 trunc (i32 vscale)) and (i16 trunc (i64 vscale)).

Allen updated this revision to Diff 432759.May 28 2022, 4:56 PM
Allen added a reviewer: craig.topper.

Add new condtion to check that both vscales have the same types

Allen marked 2 inline comments as done.May 29 2022, 8:48 AM
Allen added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2634

Thanks for detail example. Apply your comment

Allen marked an inline comment as done.Jun 3 2022, 6:28 PM

ping ?

paulwalker-arm added inline comments.Jun 6 2022, 5:58 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2628–2629

Would the following combine be safe?

ty1 truncate(ty2 vscale(c1)) -> ty1 vscale(c1)

I ask because then the combine just above the new one would just work?

I guess the problem is that operation legalisation might be the thing introducing the truncate but then we can just limit the combine to before then. I wouldn't expect the combine to be all that useful after legalisation anyway, although am happy to be proven wrong if you've a test case.

2630

Is the VT.isScalarInteger() check necessary? I figure the later ISD::VSCALE requirement will guaranteed such, plus I don't see anything in the if block that actually cares.

Allen updated this revision to Diff 434666.Jun 6 2022, 6:15 PM
Matt added a subscriber: Matt.Jun 10 2022, 3:11 PM
Allen updated this revision to Diff 436266.Jun 12 2022, 7:39 PM

Add the check Level > AfterLegalizeVectorOps

Allen marked 2 inline comments as done.Jun 18 2022, 10:42 PM
Allen added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2618

With Further debuging , I found this pattern doen't match during llvm::BeforeLegalizeTypes because when the node t26 is updated, it only add its user node t9 to the Worklist, without its recursive user node t10
https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L1607

Surely, It's unreasonable to recursively add nodes t10 to the worklist, as it may lead to a significant increase in compile time.

SelectionDAG has 16 nodes:
  t0: ch = EntryToken
        t2: i32,ch = CopyFromReg t0, Register:i32 %0
        t26: i32 = vscale Constant:i32<8>  --- update last
      t9: i32 = add nuw t2, t26
      t19: i32 = vscale Constant:i32<2>
    t10: i32 = add nuw t9, t19
  t12: ch,glue = CopyToReg t0, Register:i32 $w0, t10
    t4: i32 = vscale Constant:i32<1>
  t24: i32 = shl t4, Constant:i64<3>
  t13: ch = AArch64ISD::RET_FLAG t12, Register:i32 $w0, t12:1
2628–2629

thanks @paulwalker-arm for your idea, I thinks ty1 truncate(ty2 vscale(c1)) -> ty1 vscale(c1) is safe as C1 is a const.
But If we combine this, the ISel may crash as we don't defined such pattern.

Yes, the truncate is introduced in AArch64TargetLowering::LowerVSCALE, where addressing the legalisation for the new added case combine_add_vscale_C_i32.

BTW: Added the check Level > AfterLegalizeVectorOps according the comment

2630

Thanks, apply your comment, and deleted the unnecessary check.

Allen abandoned this revision.Jul 15 2022, 9:15 PM
Allen marked 2 inline comments as done.

adopt as I don't work on it now