This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] Implement PromoteIntRes_INSERT_SUBVECTOR
ClosedPublic

Authored by bsmith on May 19 2021, 6:01 AM.

Details

Summary

Inserting into a smaller-than-legal scalable vector would result in an
internal compiler error. For example, inserting a <vscale x 4 x i8> into
a <vscale x 8 x i8> (both illegal vector types for SVE) would cause a
crash.

This crash was happening because there was no code to promote (legalise)
the result of an INSERT_SUBVECTOR node.

This patch implements PromoteIntRes_INSERT_SUBVECTOR, which legalises
the ISD node. This is currently done by going through memory. This is
necessary because of the requirement that the SubVec parameter of the
INSERT_SUBVECTOR node must be smaller than the Vec parameter, which
means that INSERT_SUBVECTOR cannot always have a legal result/operand
types.

Depends on: D102765

Diff Detail

Event Timeline

joechrisellis created this revision.May 19 2021, 6:01 AM
joechrisellis requested review of this revision.May 19 2021, 6:01 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2021, 6:01 AM

Some suggestions.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4704

nit: Extraneous blank.

4717

For me, the term 'spilling' is usually associated with running out of registers and needing to create register space by spilling them to the stack.

I think a comment here instead should express the intent of the code, something like "To insert SubVec into Vec, store the wider vector to memory, overwrite the lower half with the narrower vector, and reload". The other comments can probably be removed.

llvm/test/CodeGen/AArch64/insert-subvector-res-legalization.ll
15

I think you can use -asm-verbose=0 in the run line to eliminate the CFI escapes.

joechrisellis marked an inline comment as done.May 20 2021, 2:18 AM
joechrisellis added inline comments.
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4704

This is consistent with PromoteIntRes_EXTRACT_SUBVECTOR some 60-ish lines above.

llvm/test/CodeGen/AArch64/insert-subvector-res-legalization.ll
15

I tried this and llvm/utils/update_llc_test_checks.py doesn't spit anything out. I have hit this issue before. IIRC, the regular expression that is used by the the script to delimit the functions in the assembly codes doesn't function as expected if the CFI escapes are missing. Might submit a patch for this later if I can recall what the issue was.

FWIW:

$ grep -Rl 'Assertions have been' llvm/test/**/AArch64/**/* | xargs grep -l 'asm-verbose=0'
llvm/test/CodeGen/AArch64/bf16-vector-shuffle.ll

... there's only one file with autogen'd assertions that does use -asm-verbose=0.

Address review comments.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4704

This is only a nit, but the majority of functions in this file don't have it and I think PromoteIntRes_EXTRACT_SUBVECTOR is in error. It's sufficiently nearby and related code that I'd be tempted to remove it from that one too to maintain local consistenc (but not anywhere else, there are other examples in this file).

Matt added a subscriber: Matt.May 20 2021, 2:46 AM
llvm/test/CodeGen/AArch64/insert-subvector-res-legalization.ll
108

As I understand it, this should overwrite elements with indices {2,3} of %vec, but this seems to overwrite elements {1,2}. So I am not convinced this is correct.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4704

Functional issues aside, I'm with @peterwaller-arm on this one.

llvm/test/CodeGen/AArch64/insert-subvector-res-legalization.ll
15

attributes #0 = { nounwind "target-features"="+sve" } will see the CFI entries removed.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4717

Whoops - not 'lower half', because of course you can insert at a given index, nt just the lower order bits. So the comment needs adjusting.

joechrisellis edited the summary of this revision. (Show Details)May 21 2021, 9:14 AM

Fold in D102765 + address nits.

joechrisellis planned changes to this revision.May 21 2021, 9:43 AM

Scale index by vscale.

Remove CFI entries from test.

joechrisellis marked an inline comment as done.Jun 2 2021, 3:08 AM
llvm/test/CodeGen/AArch64/insert-subvector-res-legalization.ll
9

Can you pass <vscale x ...> by value rather than by pointer? I realise the loads are required in the fixed case, but that might shrink the code a little.

25

There is a bit of extraneous stuff going on in these tests, if you choose a couple of optimization passes are you able to shrink them a bit? I'm looking at the store of x29 and extra addpl.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4731

Thinking: If IdxAPInt is zero, you could set ScaledIdx = Idx. Alternatively, just update Idx if non-zero. This would get rid of some rdvl ..., #0.

joechrisellis marked 2 inline comments as done.

Address review comments:

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4731

Good shout -- that's removed a few instructions from the test cases.

llvm/test/CodeGen/AArch64/insert-subvector-res-legalization.ll
25

-O3 doesn't change the lowering here, so I am not sure if these test cases can be reduced by using different flags.

Update insertion indices. The old insertion indices will be reported as invalid
when D104468 is merged.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4743

My thinking here is that the clamping logic of getVectorElementPointer isn't appropriate for what is needed here. It's necessary to ensure the upper element of the inserted fixed-width vector fits into the scalable vector. getVectorElementPointer is effectively only forcing that the first element is in bounds, but it's the last element which matters.

joechrisellis edited the summary of this revision. (Show Details)Jun 24 2021, 9:52 AM
bsmith commandeered this revision.Jun 30 2021, 5:48 AM
bsmith edited reviewers, added: joechrisellis; removed: bsmith.
bsmith updated this revision to Diff 355520.Jun 30 2021, 6:09 AM
bsmith removed a reviewer: joechrisellis.
  • Use new getVectorSubVecPointer TLI function to ensure correct clamping
  • Fixup tests due to changes
peterwaller-arm accepted this revision.Jun 30 2021, 7:25 AM
This revision is now accepted and ready to land.Jun 30 2021, 7:25 AM
This revision was landed with ongoing or failed builds.Jul 1 2021, 9:06 AM
This revision was automatically updated to reflect the committed changes.

Sorry about the late reply here, but I'm not sure why PromoteIntRes_INSERT_SUBVECTOR needs to go through the stack. Can't you just ANY_EXTEND the operand and the result?

At that point, you might end up with a node that needs to be legalized by PromoteIntOp_INSERT_SUBVECTOR, but better to take legalization one step at a time.

bsmith added a comment.Jul 5 2021, 3:28 AM

Sorry about the late reply here, but I'm not sure why PromoteIntRes_INSERT_SUBVECTOR needs to go through the stack. Can't you just ANY_EXTEND the operand and the result?

At that point, you might end up with a node that needs to be legalized by PromoteIntOp_INSERT_SUBVECTOR, but better to take legalization one step at a time.

I'm not sure I fully understand how you are thinking this would look, bare in mind that we also need to handle inserting scalable into scalable here.

If you had something like:

`%ins = call <vscale x 4 x i16> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i16> %vec, <vscale x 2 x i16> %subvec, i64 4)`

and you promoted all of the scalable types to their equivalent legal types you'd end up with:

`%ins = call <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i32> %vec, <vscale x 2 x i64> %subvec, i64 4)`

This ends up no longer being a valid use of vector.insert since the element types differ.

Sorry about the late reply here, but I'm not sure why PromoteIntRes_INSERT_SUBVECTOR needs to go through the stack. Can't you just ANY_EXTEND the operand and the result?

At that point, you might end up with a node that needs to be legalized by PromoteIntOp_INSERT_SUBVECTOR, but better to take legalization one step at a time.

I'm not sure I fully understand how you are thinking this would look, bare in mind that we also need to handle inserting scalable into scalable here.

If you had something like:

%ins = call <vscale x 4 x i16> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i16> %vec, <vscale x 2 x i16> %subvec, i64 4)

and you promoted all of the scalable types to their equivalent legal types you'd end up with:

%ins = call <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i32> %vec, <vscale x 2 x i64> %subvec, i64 4)

This ends up no longer being a valid use of vector.insert since the element types differ.

Right. My suggestion is that you promote from:

%ins = call <vscale x 4 x i16> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i16> %vec, <vscale x 2 x i16> %subvec, i64 1)

to:

%ins = call <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i32.nxv2i32(<vscale x 4 x i32> %vec, <vscale x 2 x i32> %subvec, i64 1)

This isn't legal, but we need code to handle it anyway.

bsmith added a comment.Jul 6 2021, 5:32 AM

Sorry about the late reply here, but I'm not sure why PromoteIntRes_INSERT_SUBVECTOR needs to go through the stack. Can't you just ANY_EXTEND the operand and the result?

At that point, you might end up with a node that needs to be legalized by PromoteIntOp_INSERT_SUBVECTOR, but better to take legalization one step at a time.

I'm not sure I fully understand how you are thinking this would look, bare in mind that we also need to handle inserting scalable into scalable here.

If you had something like:

%ins = call <vscale x 4 x i16> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i16> %vec, <vscale x 2 x i16> %subvec, i64 4)

and you promoted all of the scalable types to their equivalent legal types you'd end up with:

%ins = call <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i32> %vec, <vscale x 2 x i64> %subvec, i64 4)

This ends up no longer being a valid use of vector.insert since the element types differ.

Right. My suggestion is that you promote from:

%ins = call <vscale x 4 x i16> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i16> %vec, <vscale x 2 x i16> %subvec, i64 1)

to:

%ins = call <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i32.nxv2i32(<vscale x 4 x i32> %vec, <vscale x 2 x i32> %subvec, i64 1)

This isn't legal, but we need code to handle it anyway.

Are you suggesting to still go through memory but to do it during operand legalization instead?

Sorry about the late reply here, but I'm not sure why PromoteIntRes_INSERT_SUBVECTOR needs to go through the stack. Can't you just ANY_EXTEND the operand and the result?

At that point, you might end up with a node that needs to be legalized by PromoteIntOp_INSERT_SUBVECTOR, but better to take legalization one step at a time.

I'm not sure I fully understand how you are thinking this would look, bare in mind that we also need to handle inserting scalable into scalable here.

If you had something like:

%ins = call <vscale x 4 x i16> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i16> %vec, <vscale x 2 x i16> %subvec, i64 4)

and you promoted all of the scalable types to their equivalent legal types you'd end up with:

%ins = call <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i32> %vec, <vscale x 2 x i64> %subvec, i64 4)

This ends up no longer being a valid use of vector.insert since the element types differ.

Right. My suggestion is that you promote from:

%ins = call <vscale x 4 x i16> @llvm.experimental.vector.insert.nxv4i16.nxv2i16(<vscale x 4 x i16> %vec, <vscale x 2 x i16> %subvec, i64 1)

to:

%ins = call <vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i32.nxv2i32(<vscale x 4 x i32> %vec, <vscale x 2 x i32> %subvec, i64 1)

This isn't legal, but we need code to handle it anyway.

Are you suggesting to still go through memory but to do it during operand legalization instead?

Yes, sort of...

In some cases, we might not end up going through memory; we currently have some custom lowering support for some special cases, and might add more cases in the future.

bsmith added a comment.Jul 8 2021, 5:33 AM

Are you suggesting to still go through memory but to do it during operand legalization instead?

Yes, sort of...

In some cases, we might not end up going through memory; we currently have some custom lowering support for some special cases, and might add more cases in the future.

This should be sorted out in https://reviews.llvm.org/D105624