This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Support extracting fixed-length vectors from illegal scalable vectors
ClosedPublic

Authored by david-arm on Jan 17 2022, 8:46 AM.

Details

Summary

For some indices we can simply extract the fixed-length subvector from the
low half of the scalable vector, for example when the index is less than the
minimum number of elements in the low half. For all other cases we can
expand the operation through the stack by storing out the vector and
reloading the fixed-length part we need.

Fixes https://github.com/llvm/llvm-project/issues/55412

Tests added here:

CodeGen/AArch64/sve-extract-fixed-from-scalable-vector.ll

Diff Detail

Event Timeline

david-arm created this revision.Jan 17 2022, 8:46 AM
david-arm requested review of this revision.Jan 17 2022, 8:46 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 17 2022, 8:46 AM
craig.topper added inline comments.
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
3015–3016

Drop else after return

3018

Drop else after return

Any thoughts on lowering to the stack, vs. generating "select" operations? If I'm not mistaken, at least for SVE, we should always be extract elements from exactly one half of the split. Not sure how much it matters in practice, though.

Do we need to special-case i1 vectors?

Matt added a subscriber: Matt.Jan 25 2022, 3:14 PM
david-arm updated this revision to Diff 449662.Aug 3 2022, 7:14 AM
  • Rebased and added a few more tests.
Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2022, 7:14 AM

Any thoughts on lowering to the stack, vs. generating "select" operations? If I'm not mistaken, at least for SVE, we should always be extract elements from exactly one half of the split. Not sure how much it matters in practice, though.

Do we need to special-case i1 vectors?

I had a look into using selects, but it doesn't work because when selecting between lo and hi parts you end up constructing a non-constant index for the hi part, i.e. IdxVal - LoNumElts (which is not a constant for scalable vectors). This leads to asserts because EXTRACT_SUBVECTOR is defined to only take constant indices.

I don't think we have to do anything special for i1 vectors because if the i1 type gets promoted (i.e. NEON), then we will end up down a different path where we extract individual elements from an illegal scalable vector, which is dealt with in SplitVecOp_EXTRACT_VECTOR_ELT. This is why in @extract_v4i1_nxv32i1_16 we end up spilling the input four times - once for each EXTRACT_VECTOR_ELT. If the i1 type is legal then presumably there will also be legal loads for them too.

david-arm added inline comments.Aug 3 2022, 7:21 AM
llvm/test/CodeGen/AArch64/sve-extract-fixed-from-scalable-vector.ll
46

Hmm, I just realised one or two test names are wrong, i.e. this function ends with _8 (for index 8), yet the actual IR has an extract from index 16.

281

I also realised this is a bogus comment because v2i32 is legal for NEON. Probably best to just remove the comment entirely.

david-arm edited the summary of this revision. (Show Details)Aug 3 2022, 7:22 AM

I had a look into using selects, but it doesn't work because when selecting between lo and hi parts you end up constructing a non-constant index for the hi part, i.e. IdxVal - LoNumElts (which is not a constant for scalable vectors). This leads to asserts because EXTRACT_SUBVECTOR is defined to only take constant indices.

You could handle the special case where the offset only points into the high vector if vscale is exactly 1.

Actually, more generally, if vscale is a power of two, there's exactly one value of vscale that would make the EXTRACT_SUBVECTOR point into the high vector: given a pair of index/vscale that points into the high vector, if vscale is smaller, it's UB, and if vscale is larger, it points into the low vector. So we can construct an appropriate constant. I guess we don't promise that vscale is a power of two in general, though.

If we allow for the possibility that vscale isn't a power of two, then yes, you'd need some sort of variable shuffle or load/store in general.

I don't think we have to do anything special for i1 vectors because if the i1 type gets promoted (i.e. NEON), then we will end up down a different path where we extract individual elements from an illegal scalable vector, which is dealt with in SplitVecOp_EXTRACT_VECTOR_ELT. This is why in @extract_v4i1_nxv32i1_16 we end up spilling the input four times - once for each EXTRACT_VECTOR_ELT. If the i1 type is legal then presumably there will also be legal loads for them too.

The issue would be on a target where a type like v4i1 is legal. i1 vectors are tightly packed in memory, so v1i1, v2i1, v4i1, and v8i1 are all one byte, so you can't just load the part you want.

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
3039

Missing alignment on the load op.

Hi @efriedma,

The issue would be on a target where a type like v4i1 is legal. i1 vectors are tightly packed in memory, so v1i1, v2i1, v4i1, and v8i1 are all one byte, so you can't just load the part you want.

But if both the result (say a v4i1) and the broken-down input (say a nxv32i1, broken down into nxv16i1) are both legal, then they will both be equally packed in the same way so this should be safe I think? Unless I've misunderstood something it sounds like the problem you're worried about is when the result is packed and the input stored on to the stack is unpacked, right? If so, I don't think I can really write a test case for this because I don't know of any targets that both support scalable vectors and have legal fixed-width predicate vectors. Perhaps I can just add an unreachable for cases like that?

Say you're trying to extract a v4i1 from an nxv8i1, with index 4. If you store the nxv8i1 to the stack, then load it as v4i1, you get the elements at index 0, since the first 8 elements of nxv8i1 are all packed into a single byte.

(Note that SVE doesn't have any native instruction that's equivalent to an nxv2i1/nxv4i1/nxv8i1 store. Currently, it doesn't come up because nothing emits such operations.)

I don't think I can really write a test case for this because I don't know of any targets that both support scalable vectors and have legal fixed-width predicate vectors.

True.

david-arm updated this revision to Diff 452606.Aug 15 2022, 1:43 AM
  • Changed code to bail out if extracting a legal fixed-width predicate subvector from a scalable vector.

Hi @efriedma, if it's ok with you I'd like to leave the spilling and reloading from the stack if that's ok? As you say, there may be situations where we can improve this with a select, but the crux of this patch is to fix a bug. If we see performance issues in future we can always revisit this?

I've also added checks for the case you described, extracting a legal fixed-width predicate subvector.

I'm okay with skipping the select optimization for now, sure.

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
3030

getTypeAction(SubVT) is always going to be "legal", or we wouldn't be here. (We always legalize results before operands.)

david-arm updated this revision to Diff 456891.Aug 31 2022, 1:31 AM
  • Changed the i1 subvector check to report an error if the subvector has a i1 element type.
david-arm marked 4 inline comments as done.Aug 31 2022, 1:32 AM
efriedma added inline comments.Aug 31 2022, 10:58 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
3030

Please use report_fatal_error so we get a diagnostic even if assertions aren't enabled.

david-arm updated this revision to Diff 457574.Sep 2 2022, 6:02 AM
  • Changed llvm_unreachable to report_fatal_error.
david-arm marked an inline comment as done.Sep 2 2022, 6:02 AM
david-arm added inline comments.
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
3030

Yes, you're absolutely right. Don't know what I was thinking!

This revision is now accepted and ready to land.Sep 2 2022, 10:03 AM
This revision was landed with ongoing or failed builds.Sep 5 2022, 7:05 AM
This revision was automatically updated to reflect the committed changes.
david-arm marked an inline comment as done.