Download Raw Diff

Details

Reviewers

RKSimon
spatel
dtemirbulatov
anton-afanasyev

Commits

rG4284afdf9432: [SLP]Need shrink the load vector after reordering.

Summary

After merging the shuffles, we cannot rely on the previous shuffle
anymore and need to shrink the final shuffle, if it is required.

Reported in D92668

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ABataev created this revision.Jan 1 2021, 8:47 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptJan 1 2021, 8:47 AM

ABataev requested review of this revision.Jan 1 2021, 8:47 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 1 2021, 8:47 AM

anton-afanasyev added inline comments.Jan 1 2021, 9:26 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4265	This comment is obsolete after change.

Harbormaster completed remote builds in B83806: Diff 314225.Jan 1 2021, 9:33 AM

Removed outdated comment.

Harbormaster completed remote builds in B83892: Diff 314368.Jan 4 2021, 6:16 AM

LGTM

This revision is now accepted and ready to land.Jan 4 2021, 6:59 AM

I can confirm this fixes the non-reduced test case we're seeing internally, thanks!

Closed by commit rG4284afdf9432: [SLP]Need shrink the load vector after reordering. (authored by ABataev). · Explain WhyJan 7 2021, 4:52 AM

This revision was automatically updated to reflect the committed changes.

ABataev added a commit: rG4284afdf9432: [SLP]Need shrink the load vector after reordering..

This caused misoptimizations for armv7, where code that previously worked correctly now produce different results. (The code is clean under ubsan, so it shouldn't be relying on anything undefined.)

The issue appears with https://martin.st/temp/interplayvideo-preproc.c, compiled with clang -target armv7-linux-gnueabihf -O2.

The diff in generated code, before/after, looks like this:

        vmov.32 d16[0], lr
        vmov.32 d16[1], r2
 .LBB27_3:                               @ %if.end
                                         @   in Loop: Header=BB27_4 Depth=1
        vmov.32 r2, d16[1]
        add     r3, r3, #1
        vmov.32 r5, d16[0]
        cmp     r3, #8
+       vdup.32 d16, d16[0]
        vmov.16 d17[1], r2
        vmov.16 d18[0], r5
        vdup.16 d21, d17[1]
        vdup.16 d20, d18[0]
        vst1.16 {d20, d21}, [r1], r12
        beq     .LBB27_9
 .LBB27_4:                               @ %for.body
                                         @ =>This Inner Loop Header: Depth=1
        tst     r3, #3
        bne     .LBB27_3

If it loops back to .LBB27_3, the vector element d16[1] no longer has the value it was expected to have.

In D93967#2486129, @mstorsjo wrote:
This caused misoptimizations for armv7, where code that previously worked correctly now produce different results. (The code is clean under ubsan, so it shouldn't be relying on anything undefined.)

The issue appears with https://martin.st/temp/interplayvideo-preproc.c, compiled with clang -target armv7-linux-gnueabihf -O2.

The diff in generated code, before/after, looks like this:
        vmov.32 d16[0], lr
        vmov.32 d16[1], r2
 .LBB27_3:                               @ %if.end
                                         @   in Loop: Header=BB27_4 Depth=1
        vmov.32 r2, d16[1]
        add     r3, r3, #1
        vmov.32 r5, d16[0]
        cmp     r3, #8
+       vdup.32 d16, d16[0]
        vmov.16 d17[1], r2
        vmov.16 d18[0], r5
        vdup.16 d21, d17[1]
        vdup.16 d20, d18[0]
        vst1.16 {d20, d21}, [r1], r12
        beq     .LBB27_9
 .LBB27_4:                               @ %for.body
                                         @ =>This Inner Loop Header: Depth=1
        tst     r3, #3
        bne     .LBB27_3
If it loops back to .LBB27_3, the vector element d16[1] no longer has the value it was expected to have.

Hi, thanks for the report, the patch is correct, the bug is in the existing code. Will fix it soon.

pifon2a added a reverting change: rGbcbdeafa9cb3: Revert "[SLP]Need shrink the load vector after reordering.".Jan 8 2021, 5:42 AM

ABataev reopened this revision.Jan 8 2021, 6:35 AM

This revision is now accepted and ready to land.Jan 8 2021, 6:35 AM

Bug fixes

Harbormaster completed remote builds in B84467: Diff 315381.Jan 8 2021, 7:38 AM

Bug fix.

Harbormaster completed remote builds in B84674: Diff 315776.Jan 11 2021, 6:56 AM

If you changed the existing patch it won't be clear that this is something that needs review. Can you make a new review with the bug fix?

Thanks!

-eric

ABataev mentioned this in D94972: [SLP]Need to shrink the load vector after reordering..Jan 19 2021, 7:37 AM

In D93967#2496809, @echristo wrote:

If you changed the existing patch it won't be clear that this is something that needs review. Can you make a new review with the bug fix?

Thanks!

-eric

Did it, thanks!

This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Need shrink the load vector after reordering.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 314225

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/Transforms/SLPVectorizer/X86/shrink_after_reorder.ll

This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Need shrink the load vector after reordering.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 314225

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/Transforms/SLPVectorizer/X86/shrink_after_reorder.ll

[SLP]Need shrink the load vector after reordering.
ClosedPublic