This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Integrate looking through shuffles logic into ShuffleInstructionBuilder.
ClosedPublic

Authored by ABataev on Dec 15 2022, 5:59 AM.

Details

Summary

Added BaseShuffleAnalysis as a base class for ShuffleInstructionBuilder
and integrated shuffle logic from shuffles for externally used scalars
into this class. This class is used as the main container that
implements smart shuffle instruction builder logic.
ShuffleInstructionBuilder uses this logic.
ShuffleInstructionBuilder is also used in building of the shuffle for
the externally used scalars instead of lambdas, which are now part of BaseShuffleAnalysis class.

Diff Detail

Event Timeline

ABataev created this revision.Dec 15 2022, 5:59 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 15 2022, 5:59 AM
ABataev requested review of this revision.Dec 15 2022, 5:59 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 15 2022, 5:59 AM
vdmitrie added inline comments.Dec 15 2022, 6:25 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6292

Please add that when isStrict is true it returns false if mask size does not match vector size.

6297

You can drop " && VF != Limit".

6318

Could you please give more explanatory description here including arguments and return values?

6329

We do not exit. The comment needs an update.

6338

We do not exit here too.

6341

this needs an explanation why IdentityMask is assigned with Mask

ABataev updated this revision to Diff 483543.Dec 16 2022, 7:49 AM

Address comments

vdmitrie added inline comments.Dec 16 2022, 10:32 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6306

Please describe the purpose of LocalVF argument?

6342

Thanks for the update. What is still confusing me...
The comment says "emit" while this method belongs to an analysis class, i.e. is not supposed to emit anything. For these two examples input value of argument V presumable will be %s2 (what will be the Mask argument value on input by the way? the mask from %s2 instruction?), and on output V presumably points to %0 with the mask having value <0,1,2,3>. Is that correct?
Could you please add this kind of clarification to the comment?
btw the boolean return value is still not described.

6384

What is still confusing.. it is unclear why broadcast mask & Op saved as identity (it's not actually an identity mask).
Probably the variable name does not reflect accurately its purpose...

6449

Could you share some insights why you put this into BaseShuffleAnalysis class method rather than into ShuffleInstructionBuilder? Since it produces new code technically it is not a pure analysis method. And why you want it to be a template (it seems to be instantiated just once with ShuffleIRBuilder)?

ABataev added inline comments.Dec 16 2022, 10:39 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6306

Will do

6342

I'll fix it

6384

Because we can treat it as identity too, because the mask 0,1,2,3,.. can be applied to broadcasts too.

6449

It will be used for ShuffleCostEstimator too to avoid duplication/diffs in analysis/emission and cost estimation logic.

vdmitrie added inline comments.Dec 16 2022, 11:25 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6384

A small example would be very helpful here...

6449

Ah, I see. There is actually a comment to the class saying that, which I overlooked. So you are going to produce another class from it which will serve as cost estimator. Perhaps having "Analysis" as a part of its name isn't quite accurate. It probably makes sense to rename it to something like ShuffleBuilderBase to avoid confusion in future.

ABataev updated this revision to Diff 483615.Dec 16 2022, 11:27 AM

Address comments

ABataev added inline comments.Dec 16 2022, 12:21 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6449

It will be used not only for ShuffleBuilder, but for ShuffleCostEstimator, that's why I named it just ShuffleBaseAnalaysis

ABataev updated this revision to Diff 483641.Dec 16 2022, 12:29 PM

Address comments

vdmitrie accepted this revision.Dec 20 2022, 5:22 PM

Looks good. Thanks.

This revision is now accepted and ready to land.Dec 20 2022, 5:22 PM
This revision was landed with ongoing or failed builds.Dec 21 2022, 6:16 AM
This revision was automatically updated to reflect the committed changes.
asmok-g added a subscriber: asmok-g.EditedDec 28 2022, 1:16 AM

Heads-up: This has caused a change in behavior when using #pragma unroll in some test using SIMD. In other words, using #pragma unroll results in a different output than without it. I'm preparing a reduced test case.

alexfh added a subscriber: alexfh.Jan 3 2023, 1:35 PM

Heads-up: This has caused a change in behavior when using #pragma unroll in some test using SIMD. In other words, using #pragma unroll results in a different output than without it. I'm preparing a reduced test case.

I've got a reduced IR file for that case:

$ old-clang -O2 q.ll && ./a.out
0.0
$ new-clang -O2 q.ll && ./a.out
5.0

Please fix or revert the patch, if there's no trivial fix.

Thanks!

Heads-up: This has caused a change in behavior when using #pragma unroll in some test using SIMD. In other words, using #pragma unroll results in a different output than without it. I'm preparing a reduced test case.

I've got a reduced IR file for that case:

$ old-clang -O2 q.ll && ./a.out
0.0
$ new-clang -O2 q.ll && ./a.out
5.0

Please fix or revert the patch, if there's no trivial fix.

Thanks!

Thanks for the reproducer, will investigate it.

alexfh added a comment.Jan 4 2023, 7:52 AM

Heads-up: This has caused a change in behavior when using #pragma unroll in some test using SIMD. In other words, using #pragma unroll results in a different output than without it. I'm preparing a reduced test case.

I've got a reduced IR file for that case:

$ old-clang -O2 q.ll && ./a.out
0.0
$ new-clang -O2 q.ll && ./a.out
5.0

Please fix or revert the patch, if there's no trivial fix.

Thanks!

Thanks for the reproducer, will investigate it.

Alexey, could you revert the commit in the meantime?

Heads-up: This has caused a change in behavior when using #pragma unroll in some test using SIMD. In other words, using #pragma unroll results in a different output than without it. I'm preparing a reduced test case.

I've got a reduced IR file for that case:

$ old-clang -O2 q.ll && ./a.out
0.0
$ new-clang -O2 q.ll && ./a.out
5.0

Please fix or revert the patch, if there's no trivial fix.

Thanks!

Thanks for the reproducer, will investigate it.

Alexey, could you revert the commit in the meantime?

It is not so easy, there are several commits on top of it already. I just started looking at it, hope to provide a fix in 1-2hrs, if it will take more time, I'll revert it.

alexfh added a comment.Jan 4 2023, 7:58 AM

Heads-up: This has caused a change in behavior when using #pragma unroll in some test using SIMD. In other words, using #pragma unroll results in a different output than without it. I'm preparing a reduced test case.

I've got a reduced IR file for that case:

$ old-clang -O2 q.ll && ./a.out
0.0
$ new-clang -O2 q.ll && ./a.out
5.0

Please fix or revert the patch, if there's no trivial fix.

Thanks!

Thanks for the reproducer, will investigate it.

Alexey, could you revert the commit in the meantime?

It is not so easy, there are several commits on top of it already. I just started looking at it, hope to provide a fix in 1-2hrs, if it will take more time, I'll revert it.

Thanks! Please let me know once you have a resolution or decide to revert this.

Heads-up: This has caused a change in behavior when using #pragma unroll in some test using SIMD. In other words, using #pragma unroll results in a different output than without it. I'm preparing a reduced test case.

I've got a reduced IR file for that case:

$ old-clang -O2 q.ll && ./a.out
0.0
$ new-clang -O2 q.ll && ./a.out
5.0

Please fix or revert the patch, if there's no trivial fix.

Thanks!

Thanks for the reproducer, will investigate it.

Alexey, could you revert the commit in the meantime?

It is not so easy, there are several commits on top of it already. I just started looking at it, hope to provide a fix in 1-2hrs, if it will take more time, I'll revert it.

Thanks! Please let me know once you have a resolution or decide to revert this.

Ok.

Heads-up: This has caused a change in behavior when using #pragma unroll in some test using SIMD. In other words, using #pragma unroll results in a different output than without it. I'm preparing a reduced test case.

I've got a reduced IR file for that case:

$ old-clang -O2 q.ll && ./a.out
0.0
$ new-clang -O2 q.ll && ./a.out
5.0

Please fix or revert the patch, if there's no trivial fix.

Thanks!

Thanks for the reproducer, will investigate it.

Alexey, could you revert the commit in the meantime?

It is not so easy, there are several commits on top of it already. I just started looking at it, hope to provide a fix in 1-2hrs, if it will take more time, I'll revert it.

Thanks! Please let me know once you have a resolution or decide to revert this.

Ok, found the bug, has a patch, preparing the test. Will be ready in 1-2hrs (need to test it).

Heads-up: This has caused a change in behavior when using #pragma unroll in some test using SIMD. In other words, using #pragma unroll results in a different output than without it. I'm preparing a reduced test case.

I've got a reduced IR file for that case:

$ old-clang -O2 q.ll && ./a.out
0.0
$ new-clang -O2 q.ll && ./a.out
5.0

Please fix or revert the patch, if there's no trivial fix.

Thanks!

Thanks for the reproducer, will investigate it.

Alexey, could you revert the commit in the meantime?

It is not so easy, there are several commits on top of it already. I just started looking at it, hope to provide a fix in 1-2hrs, if it will take more time, I'll revert it.

Thanks! Please let me know once you have a resolution or decide to revert this.

Must be fixed in a1b18946f9af130a14655721653beb4510fde9be

alexfh added a comment.Jan 4 2023, 1:50 PM

Thanks! Verified that this fixes the issue. I'll run this through the full testing and let you know if it breaks anything else )