This is an archive of the discontinued LLVM Phabricator instance.

[SVE][CodeGen] Fix scalable vector issues in DAGTypeLegalizer::GenWidenVectorStores
ClosedPublic

Authored by david-arm on Jul 30 2020, 5:37 AM.

Details

Summary

In DAGTypeLegalizer::GenWidenVectorStores the algorithm assumes it only
ever deals with fixed width types, hence the offsets for each individual
store never take 'vscale' into account. I've changed the main loop in
that function to use TypeSize instead of unsigned for tracking the
remaining store amount and offset increment. In addition, I've changed
the loop to use the new IncrementPointer helper function for updating
the addresses in each iteration, since this handles scalable vector
types.

Whilst fixing this function I also fixed a minor issue in
IncrementPointer whereby we were not adding the no-unsigned-wrap flag
for the add instruction in the same way as the fixed width case does.

Also, I've added a report_fatal_error in GenWidenVectorTruncStores,
since this code currently uses a sequence of element-by-element scalar
stores.

I've added new tests in

CodeGen/AArch64/sve-intrinsics-stores.ll
CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll

for the changes in GenWidenVectorStores.

Diff Detail

Event Timeline

david-arm created this revision.Jul 30 2020, 5:37 AM
Herald added a project: Restricted Project. · View Herald Transcript
david-arm requested review of this revision.Jul 30 2020, 5:37 AM
david-arm edited the summary of this revision. (Show Details)

The existing algorithm is too complex to adapt to take scalable vectors into account

Really? At first glance, I don't see any obvious reason the original algorithm wouldn't just work, assuming you treat all the offsets/sizes as scaled values.

HI @efriedma. When I first tried fixing this issue I did originally try adapting the original loop, but the patch gradually grew to be bigger and bigger in order to support cases that I believe could never happen. So the original loop conceptually allows for the possibility of fixed width vector and scalar stores being mixed together in any order. By adding scalable vector stores into the mix we then have to support the possibility of scalable vector stores, fixed width vector stores and scalar stores happening in any combination, any order. That even means changing the existing scalar and fixed width stores to support polynomial offsets, since we could have mixtures of fixed and scaled offsets. In addition, I had to change all the parameters such as StWidth, Offset, etc. to TypeSize, then add new "-=" operators to the TypeSize class. So I began to think this seemed like a much more invasive change than was necessary because in reality we can only ever use legal scalable vector stores to store out wider scalable vector values. However, if you think there is real value in adapting the original loop to allow scalable vectors I am happy to take another look?

Hi @efriedma, I'm now trying to see if I can adapt the original loop by changing all unsigned sizes and offsets to TypeSize, adding -= and += operators to TypeSize, and adding various asserts in the scalar stores and fixed width stores to ensure we haven't got any scaled offsets.

david-arm updated this revision to Diff 282580.Aug 3 2020, 4:37 AM
david-arm edited the summary of this revision. (Show Details)

It looks like this turned out pretty clean. I like the new += and -= operators.

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
5167

TmpSt isn't a great name.

Also, can you add a test for storing something like <vscale x 6 x float>?

david-arm updated this revision to Diff 282808.Aug 4 2020, 12:05 AM
david-arm marked an inline comment as done.Aug 4 2020, 12:08 AM

Hi @efriedma, yeah when I first started this patch I did try rewriting the original loop, but for some reason my first attempt didn't seem so clean. However, the second attempt does look a lot better!

  • Renamed TmpSt -> PartStore
  • Added tuple tests for floats.

I actually wanted <vscale x 6 x float>, not <vscale x 12 x float>. The difference being that it exercises the path where the two stores have different types. (I guess it might be a little tricky to construct one without tripping over other issues; I think you should be able to use a splat?)

david-arm added a comment.EditedAug 5 2020, 12:50 AM

Hi @efriedma, I tried adding a test as you suggested, but hit other errors. For this test:

define void @store_nxv6f32(<vscale x 6 x float>* %out) {

%ins = insertelement <vscale x 6 x float> undef, float 1.0, i32 0
%splat = shufflevector <vscale x 6 x float> %ins, <vscale x 6 x float> undef, <vscale x 6 x i32> zeroinitializer
store <vscale x 6 x float> %splat, <vscale x 6 x float>* %out
ret void

}

I hit this error:

WidenVectorResult #0: t8: nxv6f32 = splat_vector ConstantFP:f32<1.000000e+00>

Do not know how to widen the result of this operator!
UNREACHABLE executed at /home/davshe01/upstream/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp:2735!

Would like me to try fixing this first in a different patch or part of this patch?

I thought I fixed that in D84706; try rebasing on trunk?

Hi @efriedma I tried rebasing and I do get a little further, but then fail at isel:

LLVM ERROR: Cannot select: t22: ch = store<(store unknown-size, align 32)> t0, t27, t19, undef:i64

t27: nxv2f32 = extract_subvector t28, Constant:i64<0>
  t28: nxv4f32 = AArch64ISD::DUP ConstantFP:f32<1.000000e+00>
    t4: f32 = ConstantFP<1.000000e+00>
  t14: i64 = Constant<0>
t19: i64 = add t2, t18
  t2: i64,ch = CopyFromReg t0, Register:i64 %0
    t1: i64 = Register %0
  t18: i64 = vscale Constant:i64<16>
    t17: i64 = Constant<16>
t9: i64 = undef

In function: store_nxv6f32

I can try to fix this as part of this change if you want? Alternatively, this could be done in a parent patch that adds support for storing <vscale x 2 x float>if you think this is worthwhile? The alignment also looks wrong for the store.

For the store itself, maybe worth doing as a parent patch? Probably a small change.

The alignment is a bug in this patch, I think. The usage of getOriginalAlign() assumes that the MachinePointerInfo has an offset. If it doesn't have one, we need to reduce the alignment.

david-arm updated this revision to Diff 283871.Aug 7 2020, 4:59 AM
david-arm edited the summary of this revision. (Show Details)

Hi @efriedma, in this patch I've:

  1. Added another parent patch that fixes extract_subvector issues, which are needed for the <vscale x 6 x float> tests (without them the tests hit isel lowering errors)
  2. Added alignment info for the case where the MachinePointerInfo is null.
efriedma added inline comments.Aug 7 2020, 12:24 PM
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
5168

getPrefTypeAlign() isn't related to anything relevant. We need to compute the common alignment based on the original alignment, the offset of the original MachinePointerInfo, and the current offset.

efriedma added inline comments.Aug 7 2020, 12:28 PM
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
5168

Something like commonAlignment(ST->getAlign(), Idx * ValEltWIdth), I guess? Maybe some better way to compute the offset so far from the loop.

david-arm added inline comments.Aug 10 2020, 1:33 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
5168

OK, I can try that, but I'm not sure what to do with the offset? The loop currently (in theory) permits arbitrary offsets containing any combination of scalable vectors, fixed vectors and scalar types. I could re-introduce the Offset variable that I tried to eliminate from the previous code and keep track of that around the loop? For the case of scaled (or indeed polynomial) offsets I'm not sure what I would pass to commonAlignment though as I can't simply pass in Offset.getKnownMinSize() in this case. I could do something like this perhaps:

commonAlignment(ST->getOriginalAlign(), Offset.isScalable() ? <something> : Offset.getFixedSize());

where in place of <something> I could introduce a new target query function that asks the target what the alignment should be? Or do you think it's better to leave the alignment unchanged like in the original code, i.e. always pass in ST->getOriginalAlign()?

efriedma added inline comments.Aug 10 2020, 12:59 PM
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
5168

commonAlignment just computes the GCD (greatest common divisor) of the operands. Passing in Offset.getKnownMinSize() does the right thing: the actual offset has to be a multiple of the Offset.getKnownMinSize(), so the resulting alignment should be conservatively correct.

david-arm updated this revision to Diff 284745.Aug 11 2020, 8:30 AM
efriedma added inline comments.Aug 11 2020, 1:58 PM
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
5164

getAlign(), not getOriginalAlign(), I think? Try writing a testcase where the store is at some offset inside a global variable, or something like that, and the difference should be clear.

Also, ideally, we'd continue to use plain getOriginalAlign() for the non-scalable case. This dance with the alignment is only necessary because we can't specify a scalable offset in the MachinePointerInfo. Maybe IncrementPointer should be involved in this somehow?

david-arm updated this revision to Diff 285048.Aug 12 2020, 5:24 AM
  • Added optional ScaledOffset parameter to IncrementPointer to keep a running total of the increments.
  • Returned alignment for non-scalable vectors back to using ST->getOriginalAlign(). For scalable vectors we use the common alignment of ST->getAlign() and the ScaledOffset.

Hi @efriedma, hopefully I've addressed your comments! I've tried to make use of IncrementPointer as you suggested to track the scaled offsets, then only do something different for the alignment for the scalable case.

This revision is now accepted and ready to land.Aug 12 2020, 2:44 PM