Fold a+truncate(vscale(c1))+truncate(vscale(c2)) to a+truncate(vscale(c1+c2))
As the vscale Constant is legalized with type i64 DAG, so it inserts a truncate
in function LowerVSCALE, which cause the mismatch in D82792.
Details
Diff Detail
Event Timeline
I don't claim to fully understand this, so my comment here might be off base.
I suspect your fold can be generalized as: fold a+truncate(vscale(c1))+truncate(vscale(c2)) to a+truncate(vscale(c1)+vscale(c2))
The vscale(c1)+vscale(c2) to vscale(C1 + C2) is handled separately above already.
If this is true, that your transform reduces to proving that it's legal to common the truncate. However, as a far as I known trunc(x) + trunc(y) is always equal to trunc(x+y). So why do we need this transform at all? Shouldn't this be covered by generic trunc folds and the existing rule?
Anyways, I'm clearly missing something here. Any idea what?
Thanks for your attention. With my debug, I find vscale is a little especial. As the vscale(C1) is not a const type node before LowerVSCALE, so we missing the transform of trunc(x) + trunc(y) --> trunc(x+y) with instcombine.
And in the dagcombine, it match the DAG IR from bottom to top, so we can capture the former a+truncate(vscale(c1))+truncate(vscale(c2)), but not truncate(vscale(c1))+truncate(vscale(c2)).
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
2622 | Thanks, apply your commit | |
2630 | Yes, it can be deleted |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
2634 | Do you need to check that both vscales have the same types? |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
2634 | Thanks for your attention. |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
2634 | There's no guarantee the input types of the two truncates is the same. You could have something like (i16 trunc (i32 vscale)) and (i16 trunc (i64 vscale)). |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
2634 | Thanks for detail example. Apply your comment |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
2628–2629 | Would the following combine be safe? ty1 truncate(ty2 vscale(c1)) -> ty1 vscale(c1) I ask because then the combine just above the new one would just work? I guess the problem is that operation legalisation might be the thing introducing the truncate but then we can just limit the combine to before then. I wouldn't expect the combine to be all that useful after legalisation anyway, although am happy to be proven wrong if you've a test case. | |
2630 | Is the VT.isScalarInteger() check necessary? I figure the later ISD::VSCALE requirement will guaranteed such, plus I don't see anything in the if block that actually cares. |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
2618 | With Further debuging , I found this pattern doen't match during llvm::BeforeLegalizeTypes because when the node t26 is updated, it only add its user node t9 to the Worklist, without its recursive user node t10 Surely, It's unreasonable to recursively add nodes t10 to the worklist, as it may lead to a significant increase in compile time. SelectionDAG has 16 nodes: t0: ch = EntryToken t2: i32,ch = CopyFromReg t0, Register:i32 %0 t26: i32 = vscale Constant:i32<8> --- update last t9: i32 = add nuw t2, t26 t19: i32 = vscale Constant:i32<2> t10: i32 = add nuw t9, t19 t12: ch,glue = CopyToReg t0, Register:i32 $w0, t10 t4: i32 = vscale Constant:i32<1> t24: i32 = shl t4, Constant:i64<3> t13: ch = AArch64ISD::RET_FLAG t12, Register:i32 $w0, t12:1 | |
2628–2629 | thanks @paulwalker-arm for your idea, I thinks ty1 truncate(ty2 vscale(c1)) -> ty1 vscale(c1) is safe as C1 is a const. Yes, the truncate is introduced in AArch64TargetLowering::LowerVSCALE, where addressing the legalisation for the new added case combine_add_vscale_C_i32. BTW: Added the check Level > AfterLegalizeVectorOps according the comment | |
2630 | Thanks, apply your comment, and deleted the unnecessary check. |
With Further debuging , I found this pattern doen't match during llvm::BeforeLegalizeTypes because when the node t26 is updated, it only add its user node t9 to the Worklist, without its recursive user node t10
https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L1607
Surely, It's unreasonable to recursively add nodes t10 to the worklist, as it may lead to a significant increase in compile time.