This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner][AArch64] Reorder the bitcast of scalable vector
AbandonedPublic

Authored by Allen on May 14 2022, 2:11 AM.

Details

Summary

Perform the following reorder when the scalable vector's inner type is floating
point and the outer type is not scalable vector, eg:
t19: v2f32 = extract_subvector t2, Constant:i64<0>

t12: v2i32 = bitcast t19

-->
t20: nxv2i32 = bitcast t2

t21: v2i32 = extract_subvector t20, Constant:i64<0>

Diff Detail

Event Timeline

Allen created this revision.May 14 2022, 2:11 AM
Allen requested review of this revision.May 14 2022, 2:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 14 2022, 2:12 AM
Allen edited the summary of this revision. (Show Details)May 14 2022, 2:52 AM
Allen updated this revision to Diff 429585.May 15 2022, 6:13 PM
david-arm added inline comments.May 16 2022, 6:01 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14937

From what I can see all the input and output types you're hoping to deal with are legal, right? For example, <vscale x 2 x float> is legal, but <vscale x 2 x i32> is illegal. So it looks like you're trying to take advantage of legalisation behaviour when doing something like this:

%out = <2 x i32> extract_subvector <vscale x 2 x i32> %in, i32 0

which will probably turn into

%out1 = <2 x i64> extract_subvector <vscale x 2 x i64> %in, i32 0
%out2 = truncate <2 x i64> %out1 to <2 x i32>

The first operation will then become a nop.

One problem with this approach is that it looks like you're assuming the index is always 0. What happens when extracting a subvector from index 2, etc? I'm worried the generated code might then look even worse. It just feels like we might want to restrict the allowed cases a bit more here to just those examples where we know there will be an improvement.

llvm/test/CodeGen/AArch64/extract-insert-element-sve.ll
1 ↗(On Diff #429585)

We actually already have similar tests in sve-fixed-length-reshuffle.ll - could you move this test into that file please?

Allen added inline comments.May 16 2022, 7:54 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14937

sure, I hope to deal with all the input and output types are legal. would you please show me which API can be used, here ? thanks.

Allen updated this revision to Diff 429726.May 16 2022, 8:35 AM

update review
1、add restrict for index ==0 and legal for types
2、move test into file llvm/test/CodeGen/AArch64/sve-fixed-length-reshuffle.ll

Allen added inline comments.May 18 2022, 5:53 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14937

hi @david-arm , do you mean to list all the type pair, where we know there will be an improvement?

llvm/test/CodeGen/AArch64/extract-insert-element-sve.ll
1 ↗(On Diff #429585)

done, thanks.

Hi @Allen, sorry I've not had chance to review your patch yet. I'll try to take a proper look tomorrow but in the interim my main concern is that this optimisation specifically relates to NEON sized vectors where we can do a better job of the extracts. I'm not sure we want to do this transform for all fixed length vector types though. Part of me thinks we're missing a different, more specific, transformation somewhere. However, I cannot be sure until I've had chance to investigate a little more.

Allen added a comment.May 19 2022, 5:58 PM

Thank @paulwalker-arm very much, I look forward to your review suggestions.

Hi @Allen, Sorry for the delay but my investigations turned more complex than I had expected. I don't believe this is a DAG level problem and the real issue here is that 64/128bit fixed length extracts from unpacked scalable vectors are not legal and thus common legalisation code kicks in to perform the operation via the stack. The solution is to be able to directly isel these operations. It turns out this is not straight forward. I've uploaded D126201, which is clearly a bit of a hack, to show the areas to investigate.

The main problem is the current definition of extract_subvector does not allow mixed (i.e. fixed and scalable) vector types. Although it can be changed, this causes build failures because many existing patterns need to be updated because their types can no longer be inferred. Instead I propose a specific node, currently called extract_subvector2 which is specifically for the case where you want to extract a fixed length vector from a scalable vector. My reasoning being no existing patterns need to change and it makes it easier to see these sorts of extracts that typically need special handling. I didn't manage to figure out why I need to disable the existing extract_subvector patterns, which for some reason kept matching even though the types do not match what those patterns require.

Let me know how you want to proceed. Feel free to run with D126201 or you can wait on me to create a proper implementation. It just depend on you timeline as I'm not sure how much time I'll be able to invest in this over the next few weeks.

Allen added a comment.May 23 2022, 6:08 PM

hi, @paulwalker-arm, Sorry to remind you! Thank you very much for providing the right idea to this problem. I'll try with your solution

Thanks @Allen. As another data point I just noticed vector_extract_subvec already exists upstream, which might mean you don't need the extract_subvector2 I created as part of D126201.

Allen added a comment.May 24 2022, 3:42 AM

Thanks @Allen. As another data point I just noticed vector_extract_subvec already exists upstream, which might mean you don't need the extract_subvector2 I created as part of D126201.

That's great. Thank you very much for the quick resolution!

Matt added a subscriber: Matt.May 25 2022, 2:06 PM
Allen abandoned this revision.Jun 1 2022, 6:08 PM