Page MenuHomePhabricator

[InstCombine] remove identity shuffle simplification for mask with undefs
ClosedPublic

Authored by spatel on Thu, Nov 14, 8:22 AM.

Details

Summary

Given a shuffle that includes undef elements in an otherwise identity mask like:

define <4 x float> @shuffle(<4 x float> %arg) {
  %shuf = shufflevector <4 x float> %arg, <4 x float> undef, <4 x i32> <i32 undef, i32 1, i32 2, i32 3>
  ret <4 x float> %shuf
}

We were simplifying that to the input operand.

But as discussed in PR43958:
https://bugs.llvm.org/show_bug.cgi?id=43958
...that means that per-vector-element poison that would be stopped by the shuffle can now leak to the result.

Also note that we still have (and there are tests for) the same transform with no undef elements in the mask (a fully-defined identity mask). I don't think there's any controversy about that case - it's a valid transform under any interpretation of shufflevector/undef/poison.

Looking at a few of the diffs into codegen, I don't see any difference in final asm. So depending on your perspective, that's good (no real loss of optimization power) or bad (poison exists in the DAG, so we only partially fixed the bug).

Diff Detail

Event Timeline

spatel created this revision.Thu, Nov 14, 8:22 AM
Herald added a project: Restricted Project. · View Herald TranscriptThu, Nov 14, 8:22 AM
spatel edited the summary of this revision. (Show Details)Thu, Nov 14, 8:26 AM

Edit - I misread the diff for:
rL142671
...so I'm not sure when this was added, but it was even longer ago. :)

spatel updated this revision to Diff 229345.Thu, Nov 14, 10:23 AM

Patch updated:
There was another use of recognizeIdentityMask(), but it appears to have been from effectively dead code.
I removed that use with:
rG385572ccfe50

So now we can delete recognizeIdentityMask() as well as the caller code. No additional test diffs from the previous rev.

The optimization can be kept live by adding freeze (shufflevector <4 x float> %arg, ... -> freeze %arg), but existing analyzers are not aware of freeze yet (which will block following optimizations) and the underlying problem is more related with interaction of undef/poison (which disappears if undef is removed), so I'm fine with this patch's direction.

BTW, does the current semantics of shufflevector imply that

%shuf = shufflevector <4 x float> %arg, <4 x float> undef, <4 x i32> <i32 undef, i32 1, i32 2, i32 3>
->
%shuf = shufflevector <4 x float> %arg, <4 x float> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>

is incorrect?
If the reason why shufflevector returns undef is simply there was no poison constant, why don't we move toward adding poison constant?

BTW, does the current semantics of shufflevector imply that

%shuf = shufflevector <4 x float> %arg, <4 x float> undef, <4 x i32> <i32 undef, i32 1, i32 2, i32 3>
->
%shuf = shufflevector <4 x float> %arg, <4 x float> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>

is incorrect?

AFAIK, no - but that's because we were using 'undef' in the docs here:
http://llvm.org/docs/LangRef.html#shufflevector-instruction
(no mention of poison)

If the reason why shufflevector returns undef is simply there was no poison constant, why don't we move toward adding poison constant?

What steps are needed to make it happen? From the discussion in PR43958, it sounds like adding+using the new constant might take a while? If not, and we can avoid/alter this patch, that would be better for me (no risk of losing optimizations).

AFAIK, no - but that's because we were using 'undef' in the docs here:
http://llvm.org/docs/LangRef.html#shufflevector-instruction
(no mention of poison)

If %arg[0] = poison, it seems the current semantics (returning undef when the shuffle mask is undef) cannot explain the transformation.

%shuf = shufflevector <4 x float> %arg, <4 x float> undef, <4 x i32> <i32 undef, i32 1, i32 2, i32 3>

%shuf[0] is undef, but after transformation,

%shuf = shufflevector <4 x float> %arg, <4 x float> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>

%shuf[0] is poison.

If the reason why shufflevector returns undef is simply there was no poison constant, why don't we move toward adding poison constant?

What steps are needed to make it happen? From the discussion in PR43958, it sounds like adding+using the new constant might take a while? If not, and we can avoid/alter this patch, that would be better for me (no risk of losing optimizations).

Oh, I meant as an ultimate goal. If moving toward poison constant was already discussed in PR43958, then never mind.
After poison constant is added, we can say that having undef as a shuffle mask returns poison, and change existing vector-related constant foldings to return poison constant, and this needs addition of poison constant.

If %arg[0] = poison, it seems the current semantics (returning undef when the shuffle mask is undef) cannot explain the transformation.

%shuf = shufflevector <4 x float> %arg, <4 x float> undef, <4 x i32> <i32 undef, i32 1, i32 2, i32 3>

%shuf[0] is undef, but after transformation,

%shuf = shufflevector <4 x float> %arg, <4 x float> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>

%shuf[0] is poison.

This is correct based on my understanding of poison (and what we've implemented so far), but we do not explicitly state that poison can occur per-element of a vector. Should we add a LangRef edit here or in a separate patch to address that?

This is correct based on my understanding of poison (and what we've implemented so far), but we do not explicitly state that poison can occur per-element of a vector. Should we add a LangRef edit here or in a separate patch to address that?

Converting poison to undef is okay, but undef to poison is not. undef & 0 == undef, but poison & 0 == poison. If undef can be optimized to poison, this can cause miscompilation. See here: https://rise4fun.com/Alive/DFYf , https://rise4fun.com/Alive/OZU

Yes, it will be great to have a separate patch that states poison can occur per-element of a vector. This can explain vectorization of operations, e.g. add nsw.

This patch looks good to me. @nlopes @regehr @lebedev.ri

liuz added a comment.Sun, Nov 17, 11:01 PM

Converting poison to undef is okay, but undef to poison is not. undef & 0 == undef, but poison & 0 == poison. If undef can be optimized to poison, this can cause miscompilation. See here: https://rise4fun.com/Alive/DFYf , https://rise4fun.com/Alive/OZU

Hi Juneyoung. Is that a typo? undef & 0 should be 0.

Hi Juneyoung. Is that a typo? undef & 0 should be 0.

I'm sorry, yes that was typo; undef & 0 is 0.

Let me give my 2c: eliminating this transformation is a good thing, since it's incorrect (end-to-end miscompilation testcase in the cited bug report).
It can be re-instated if/when we switch from initializing vectors with undef with poison, but I don't know when that will happen. We'll try to push for the poison patches soonish, but it will take time.
Anyway, thanks Sanjay for handling this :)

My biggest concern here is the interaction with SimplifyDemandedVectorElts; previously, the way the interaction worked is that SimplifyDemandedVectorElts would change the shuffle indexes to undef, then later combines could remove those shuffles. That doesn't work anymore because of this rule.

We can probably recover some cases by adding a little extra code to SimplifyDemandedVectorElts, to eliminate shuffles where the demanded elements correspond to an identity shuffle.

spatel updated this revision to Diff 230453.Thu, Nov 21, 6:56 AM

Patch updated:
Enhance SimplifyDemandedVectorElts to recognize a demand-limited identity mask.
As Eli expected, this reduces the number of test diffs (preserves some existing optimizations) - thanks!

spatel marked an inline comment as done.Thu, Nov 21, 7:05 AM
spatel added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
1277–1279

Note: we could commit this code addition as a preliminary step. It causes some seemingly benign test changes to extractelement index types (i32 vs i64). I have not tracked down where that difference originates.

efriedma accepted this revision.Thu, Nov 21, 1:31 PM
efriedma added subscribers: craig.topper, RKSimon.

LGTM

This revision is now accepted and ready to land.Thu, Nov 21, 1:31 PM
This revision was automatically updated to reflect the committed changes.