Page MenuHomePhabricator

[SelectionDAG] Support scalable-vector splats in more cases

Authored by frasercrmck on Jan 12 2021, 8:12 AM.



This patch adds support for scalable-vector splats in DAGCombiner's
isConstantOrConstantVector and ISD::matchUnaryPredicate functions,
which enable the SelectionDAG div/rem-by-constant optimizations for
scalable vector types.

It also fixes up one case where the UDIV optimization was generating a
SETCC without first consulting the target for its preferred SETCC result

Diff Detail

Event Timeline

frasercrmck created this revision.Jan 12 2021, 8:12 AM
frasercrmck requested review of this revision.Jan 12 2021, 8:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2021, 8:12 AM

@c-rhodes or @david-arm I've updated the AArch64 tests naively but I thought I'd tag you in case you don't intend to be shifting out all bits.

frasercrmck added inline comments.Jan 12 2021, 8:52 AM

In general, is isFixedLengthVector || isScalableVector a suitable replacement for isVector, or should we be more explicit/future-proof about the possibility of a third case?

craig.topper added inline comments.Jan 21 2021, 11:31 AM

Is there only one element in Shifts and Factors? If I'm reading matchUnaryPredicate right it will only see the single operand of the SPLAT_VECTOR so only call BuildSDIVPattern once?


I'm a bit surprised this didn't affect AVX512 on X86.


Is this test trying to test an oversized shift amount or just a bad choice of constant? I'm leaning towards bad constant choice the lsl tests don't have this issue?

rebase on main


Yes, that's how it's working. Is that strong enough a thing to rely upon, do you think?


Should we double-check that AVX512 has tested this code path?


Yeah I'd imagine the latter too. I tried to tag some of the AArch64 team to see if they could help resolve this issue

david-arm added inline comments.Jan 22 2021, 9:10 AM

Oh yeah, sorry about this. We did discuss it and the problem is with our tests I think. We've created tests that are shifting by more than the element width. I think this change is fine and we'll create some better tests after your patch lands.

craig.topper added inline comments.Jan 22 2021, 9:23 AM

Can we put an assert on the size and everwhere else we access element 0 to make it clear we aren't dropping anything?

  • add in assertions
frasercrmck marked 2 inline comments as done.Jan 22 2021, 9:48 AM
frasercrmck added inline comments.

Yep, done

craig.topper added inline comments.Jan 22 2021, 4:35 PM

Looks like both operands to getSetCC are both constant vectors so this gets constant folded on non-scalable vectors. Then the getSelect will constant fold based on the condition being a constant. So maybe that's why AVX512 doesn't care.

This revision is now accepted and ready to land.Jan 22 2021, 4:45 PM
This revision was automatically updated to reflect the committed changes.
frasercrmck marked an inline comment as done.

This is crashing SVE target lowering with "LLVM ERROR: Cannot select: t23: nxv4i32 = mulhu t9, t34"
Let me know if there is a way of fixing it?

Take below t.ll
Run: llc -O2 -mtriple=aarch64-linux-gnu -mattr=+sve < t.ll

target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-unknown-linux-gnu"

define void @test(<vscale x 4 x i32>* %in, <vscale x 4 x i32>* %out, <vscale x 4 x i1> %p) {
  %a = load <vscale x 4 x i32>, <vscale x 4 x i32>* %in, align 16
  %div = udiv <vscale x 4 x i32> %a, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> undef, i32 3, i32 0), <vscale x 4 x i32> undef, <vscale x 4 x i32> zeroinitializer)
  call void<vscale x 4 x i32> %div, <vscale x 4 x i32>* %out, i32 4, <vscale x 4 x i1> %p)

declare void<vscale x 4 x i32>, <vscale x 4 x i32>*, i32 immarg, <vscale x 4 x i1>)

This is crashing SVE target lowering with "LLVM ERROR: Cannot select: t23: nxv4i32 = mulhu t9, t34"
Let me know if there is a way of fixing it?

Hi @huihuiz, thanks for raising this issue with me. You've probably seen D96849 by now.

Since I'm not familiar with AArch64 I thought the quickest and safest way to fix the issue would be to prevent these optimizations from taking place. The AArch64 team can work out if there's a way of supporting these nodes.

Thanks @frasercrmck for helping! Appreciate it!