This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Fix VPRecipeBuilder::createEdgeMask to correctly generate the mask
ClosedPublic

Authored by aqjune on Jan 22 2021, 2:03 AM.

Details

Summary

This patch fixes pr48832 by correctly generating the mask when a poison value is involved.

Consider this CFG (which is a part of the input):

for.body:                                         ; preds = %for.cond
  br i1 true, label %cond.false, label %land.rhs

land.rhs:                                         ; preds = %for.body
  br i1 poison, label %cond.end, label %cond.false

cond.false:                                       ; preds = %for.body, %land.rhs
  br label %cond.end

cond.end:                                         ; preds = %land.rhs, %cond.false
  %cond = phi i32 [ 0, %cond.false ], [ 1, %land.rhs ]

The path for.body -> land.rhs -> cond.end should be taken when 'select i1 false, i1 poison, i1 false' holds (which means it's never taken); but VPRecipeBuilder::createEdgeMask was emitting 'and i1 false, poison' instead.
The former one successfully blocks poison propagation whereas the latter one doesn't, making the condition poison and thus causing the miscompilation.

SimplifyCFG has a similar bug (which didn't expose a real-world bug yet), and a patch for this is also ongoing (see https://reviews.llvm.org/D95026).

Diff Detail

Event Timeline

aqjune created this revision.Jan 22 2021, 2:03 AM
aqjune requested review of this revision.Jan 22 2021, 2:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 22 2021, 2:03 AM
bjope added a subscriber: bjope.Jan 22 2021, 2:31 AM

Thanks for fixing this!
I don't know this code at all, but I'm puzzled by this sentence in the summary:

The path for.body -> cond.false -> cond.end should be taken when 'select i1 true, i1 poison, i1 false' holds

So, the path described above is what should indeed be executed, but you say it should be taken when "'select i1 true, i1 poison, i1 false' holds"?
But that select returns poison? I really don't understand this.

Thanks for fixing this!
I don't know this code at all, but I'm puzzled by this sentence in the summary:

The path for.body -> cond.false -> cond.end should be taken when 'select i1 true, i1 poison, i1 false' holds

So, the path described above is what should indeed be executed, but you say it should be taken when "'select i1 true, i1 poison, i1 false' holds"?
But that select returns poison? I really don't understand this.

That was my epic typo, sorry. It should have been select i1 false, i1 poison, i1 false because the buggy case was encoding the path that visits cond.false.

aqjune edited the summary of this revision. (Show Details)Jan 22 2021, 3:25 AM

That was my epic typo, sorry. It should have been select i1 false, i1 poison, i1 false because the buggy case was encoding the path that visits cond.false.

Great, then I'm slightly less puzzled :) Thanks!

aqjune edited the summary of this revision. (Show Details)Jan 24 2021, 4:48 PM

There was another typo in the text, fixed. I hope the text makes sense now.
There's no diff in the tests that is bigger than syntactic replacements, so I believe that at least for optimizations that are covered in tests it doesn't block major transformations.

At least the patch seems to fix the problem I saw.
I'll run more tests with it during the night to see nothing unexpected pops up.

Thanks!

bjope added inline comments.Jan 27 2021, 8:54 AM
llvm/test/Transforms/LoopVectorize/pr48832.ll
8

Maybe safer to use a regexp instead of <4 x i32> or explicitly check that we get store <4 x i32> zeroinitializer. Or you need to force the vector factor.
Otherwise the test case is fragile to changes in cost model etc (it would for example pass for a different vector factor and then it wouldn't be useful as a regression test any longer).

bjope added a comment.Jan 27 2021, 8:56 AM

Btw, as far as I know @uabelho hasn't found any problems with the fix. And except the checks in the test case I think it looks like a nice fix (avoiding miscompiles).

aqjune updated this revision to Diff 320036.Jan 28 2021, 8:33 PM

Thank you for the feedback..! I updated the test to explicitly use zeroinitializer.

aqjune marked an inline comment as done.Jan 28 2021, 8:33 PM
bjope accepted this revision.Feb 10 2021, 12:31 AM

LGTM!

This revision is now accepted and ready to land.Feb 10 2021, 12:31 AM
fhahn added inline comments.Feb 10 2021, 12:17 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8250

nit: which is equivalent to.

8252

nit: perhaps something like The select version does not introduce new UB if EdgeMask is poison. I guess the key here is that if SrcMask is true and EdgeMask is poison, the original program already had UB.

8257

nit: Just use Builder.CreateSelect?

llvm/test/Transforms/LoopVectorize/X86/masked_load_store.ll
2462–2465

It's a shame that this version makes the IR a bit less readable :(

llvm/test/Transforms/LoopVectorize/pr48832.ll
2

If the test needs the systemZ triple, it needs to go into the SystemZ subdirectory I think, otherwise it will fail on systems that do not build the SystemZ backend.

But It should not be needed, can the same behavior be reproduced by using -force-vector-width=4?

8

I think we should explicitly check for the branch conditions that are generated.

aqjune updated this revision to Diff 323519.Feb 12 2021, 9:59 PM

Thanks for the feedbacks, I addressed the comments

aqjune marked 5 inline comments as done.Feb 12 2021, 10:01 PM
aqjune added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8257

I added createSelect to VPBuilder because having one seems good to me.

aqjune edited the summary of this revision. (Show Details)Feb 14 2021, 4:04 AM
This revision was landed with ongoing or failed builds.Feb 14 2021, 4:05 AM
This revision was automatically updated to reflect the committed changes.
aqjune marked an inline comment as done.