After merging the shuffles, we cannot rely on the previous shuffle
anymore and need to shrink the final shuffle, if it is required.
Reported in D92668
Differential D93967
[SLP]Need shrink the load vector after reordering. ABataev on Jan 1 2021, 8:47 AM. Authored by
Details After merging the shuffles, we cannot rely on the previous shuffle Reported in D92668
Diff Detail
Event Timeline
Comment Actions This caused misoptimizations for armv7, where code that previously worked correctly now produce different results. (The code is clean under ubsan, so it shouldn't be relying on anything undefined.) The issue appears with https://martin.st/temp/interplayvideo-preproc.c, compiled with clang -target armv7-linux-gnueabihf -O2. The diff in generated code, before/after, looks like this: vmov.32 d16[0], lr vmov.32 d16[1], r2 .LBB27_3: @ %if.end @ in Loop: Header=BB27_4 Depth=1 vmov.32 r2, d16[1] add r3, r3, #1 vmov.32 r5, d16[0] cmp r3, #8 + vdup.32 d16, d16[0] vmov.16 d17[1], r2 vmov.16 d18[0], r5 vdup.16 d21, d17[1] vdup.16 d20, d18[0] vst1.16 {d20, d21}, [r1], r12 beq .LBB27_9 .LBB27_4: @ %for.body @ =>This Inner Loop Header: Depth=1 tst r3, #3 bne .LBB27_3 If it loops back to .LBB27_3, the vector element d16[1] no longer has the value it was expected to have. Comment Actions Hi, thanks for the report, the patch is correct, the bug is in the existing code. Will fix it soon. Comment Actions If you changed the existing patch it won't be clear that this is something that needs review. Can you make a new review with the bug fix? Thanks! -eric |
This comment is obsolete after change.