This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SME] Fix an infinite loop in DAGCombine related to adding -force-streaming-compatible-sve flag.
ClosedPublic

Authored by dtemirbulatov on Mar 8 2023, 5:35 AM.

Details

Summary

Compiler hits infinite loop in DAGCombine. For force-streaming-compatible-sve mode we have custom lowering for 128-bit vector splats and later in DAGCombiner::SimplifyVCastOp() we scalarized SPLAT because we have custom lowering for SME. Later, we restored SPLAT opertion via performMulCombine().

Diff Detail

Event Timeline

dtemirbulatov created this revision.Mar 8 2023, 5:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 8 2023, 5:35 AM
dtemirbulatov requested review of this revision.Mar 8 2023, 5:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 8 2023, 5:35 AM
Matt added a subscriber: Matt.Mar 8 2023, 11:05 AM
sdesmalen added inline comments.Mar 10 2023, 3:58 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
24422–24424

The only reason this currently works for NEON is because DAGCombiner::SimplifyVCastOp for no good reason singles out ISD::SPLAT_VECTOR, which is only used for scalable vectors and SVE.
When I add || N0.getOpcode() == ISD::BUILD_VECTOR to the condition in SimplifyVCastOp, various AArch64 tests fail for the same reason as the test you've added here.

I think the better solution would be to pass the whole Node* to preferScalarizeSplat, and then to look at its uses. If the opcode is a zero/sign extend and any of the uses is a mul, then we don't want to do this transformation, because that's when a umull/smull instruction can be used.

I looked into whether it was possible to make LoweMUL smarter and have it recognise the dup(ext(x)) pattern so that we don't need to do the combine for dup(ext(x)) -> ext(dup(x) in the first place, but this seems rather non-trivial.

llvm/test/CodeGen/AArch64/aarch64-force-streaming-compatible-sve.ll
4 ↗(On Diff #503331)

Just a general comment (and not a suggestion for this patch): It would be nice if we could use SVE's [us]mull[bt] instructions for this.

david-arm added inline comments.Mar 15 2023, 9:08 AM
llvm/test/CodeGen/AArch64/aarch64-force-streaming-compatible-sve.ll
1 ↗(On Diff #503331)

I think it makes sense to move this test into one of the existing files, perhaps something like CodeGen/AArch64/sve-streaming-mode-fixed-length-int-extends.ll?

4 ↗(On Diff #503331)

I think it would be good to rename this function to something that is more meaningful, perhaps extend_cmp_of_mul? Also, I think it would be nice to add a comment that describes what we're testing here, i.e. that the DAG combiner does not get stuck in an infinite loop.

david-arm added a subscriber: hassnaa-arm.

Adding @hassnaa-arm as a reviewer too as she has been doing a lot of work on the streaming compatible code generation.

dtemirbulatov marked 3 inline comments as done.

Addressed remarks.

Removed double not in AArch64TargetLowering::preferScalarizeSplat() function.

sdesmalen added inline comments.Mar 21 2023, 6:52 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
24423

I guess this condition can be removed now?

24424

Should this code also check for ISD::ANY_EXTEND ?

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-extends.ll
894

Can you also add a test where the result of the extend is not used by a mul?

dtemirbulatov marked 3 inline comments as done.

Rebased, Resolved comments.

sdesmalen added inline comments.Mar 23 2023, 7:27 AM
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-extends.ll
930–932

This is testing the wrong thing, because it is testing splat(sext(scalar)) rather than sext(splat(scalar)) which is what SimplifyVCastOp tries to change.

931

It's better to use poison here and below.

955

No vectors are used in this loop, so the code you've modified does not apply to this test at all. I'm expecting a similar test to the above, but then using a different operator instead of a mul.

Changed the second test case with suggestions from comment.

dtemirbulatov marked 2 inline comments as done.Mar 27 2023, 1:58 AM
dtemirbulatov added inline comments.
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-extends.ll
894

ok, I added function with the setcc node instead of mul.

sdesmalen added inline comments.Mar 27 2023, 3:51 AM
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-extends.ll
932

This could be poison as well.

934–935

I'm not sure why the icmp and zext are relevant to this test.

953

Can you just store %1 directly without the add?

dtemirbulatov marked 2 inline comments as done.Mar 27 2023, 5:34 AM

Changing tests.

sdesmalen added inline comments.Mar 27 2023, 5:43 AM
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-extends.ll
919

This test is different from the one above in ways other than the mul. Can you make the tests otherwise identical?

Can you also verify that it still exercises the code you've added to your patch?

sdesmalen added inline comments.Mar 27 2023, 5:45 AM
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-extends.ll
911

nit: If you replace the 8 here by a smaller number like 2, then we don't need to observe the effect of legalisation and we'd only get 1 mul instead of 4.

dtemirbulatov marked an inline comment as done.

Updated tests.

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-extends.ll
911

I could not reproduce the issue with <2 x > type.

919

I can reproduce the error with updated test.

dtemirbulatov marked an inline comment as done.

Moved tests to <2 x > types.

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-extends.ll
911

sorry, I was wrong. I can reproduce with <2 x > types.

Replaced zeroinitializer to poison in second operand for shufflevector.

dtemirbulatov marked an inline comment as done.Mar 27 2023, 10:11 AM
dtemirbulatov marked an inline comment as done.
david-arm accepted this revision.Apr 4 2023, 7:12 AM

LGTM! Thanks @dtemirbulatov.

This revision is now accepted and ready to land.Apr 4 2023, 7:12 AM