This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Fix a bug in min/max reduction, number of condition uses.
ClosedPublic

Authored by ABataev on Apr 1 2021, 11:08 AM.

Details

Summary

The ultimate reduction node may have multiple uses, but if the ultimate
reduction is min/max reduction and based on SelectInstruction, the
condition of this select instruction must have only single use.

Diff Detail

Event Timeline

ABataev created this revision.Apr 1 2021, 11:08 AM
ABataev requested review of this revision.Apr 1 2021, 11:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 1 2021, 11:08 AM
ABataev added inline comments.Apr 1 2021, 11:10 AM
llvm/test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll
908

Here is an example of the bug, use undef instead of the result of icmp instruction (%10 in the test)

spatel added a comment.Apr 1 2021, 1:32 PM

The problem/test looks similar to the one in 0a8e7ca402e - can you explain how they differ (what allows these examples to avoid that fix)?

The problem/test looks similar to the one in 0a8e7ca402e - can you explain how they differ (what allows these examples to avoid that fix)?

The reduced root is not select instruction anymore, it is an intrinsic call. And we cannot extract the condition

spatel added a comment.Apr 1 2021, 2:00 PM

The problem/test looks similar to the one in 0a8e7ca402e - can you explain how they differ (what allows these examples to avoid that fix)?

The reduced root is not select instruction anymore, it is an intrinsic call. And we cannot extract the condition

Can we remove the code that was added with D70148 then? (assert that the final cmp has only 1 use or 0 remaining uses?)
Hopefully, all of this becomes unnecessary after we switch to the min/max intrinsics.

The problem/test looks similar to the one in 0a8e7ca402e - can you explain how they differ (what allows these examples to avoid that fix)?

The reduced root is not select instruction anymore, it is an intrinsic call. And we cannot extract the condition

Can we remove the code that was added with D70148 then? (assert that the final cmp has only 1 use or 0 remaining uses?)
Hopefully, all of this becomes unnecessary after we switch to the min/max intrinsics.

Yes, I think so. Will do this tomorrow.

The problem/test looks similar to the one in 0a8e7ca402e - can you explain how they differ (what allows these examples to avoid that fix)?

The reduced root is not select instruction anymore, it is an intrinsic call. And we cannot extract the condition

Can we remove the code that was added with D70148 then? (assert that the final cmp has only 1 use or 0 remaining uses?)
Hopefully, all of this becomes unnecessary after we switch to the min/max intrinsics.

Yes, I think so. Will do this tomorrow.

I have another idea - try to keep cmpinstruction as an external user. Will try to implement it tomorrow.

ABataev updated this revision to Diff 334946.Apr 2 2021, 6:26 AM

Fixes and rebase

The problem/test looks similar to the one in 0a8e7ca402e - can you explain how they differ (what allows these examples to avoid that fix)?

The reduced root is not select instruction anymore, it is an intrinsic call. And we cannot extract the condition

Can we remove the code that was added with D70148 then? (assert that the final cmp has only 1 use or 0 remaining uses?)
Hopefully, all of this becomes unnecessary after we switch to the min/max intrinsics.

Yes, I think so. Will do this tomorrow.

I have another idea - try to keep cmpinstruction as an external user. Will try to implement it tomorrow.

Tried it and currently, it is impossible to implement, as we need to know the reduction result for n-1 reduction values. This can be improved later by the non-power-2 SLP patch

spatel accepted this revision.Apr 2 2021, 7:08 AM

LGTM

This revision is now accepted and ready to land.Apr 2 2021, 7:08 AM