This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] don't automatically drop poison-generating flags in SimplifyVectorDemandedElts
ClosedPublic

Authored by spatel on Dec 10 2021, 8:00 AM.

Details

Summary

I noticed this while reviewing the test diffs in D115460 (and so the diffs in that patch will be reduced if this one is applied first).

This is effectively a revert of 3436dc29239d ( https://reviews.llvm.org/rG3436dc29239d ) - since that commit, we've made several enhancements, so the reasoning there is no longer valid. Specifically, we added a poison value to IR, and we clarified the behavior of undef/poison elements in a shuffle mask:
https://llvm.org/docs/LangRef.html#shufflevector-instruction

Alive2 seems to agree that the propagation of flags in the test diffs shown here are valid:
https://alive2.llvm.org/ce/z/UuY-jr
https://alive2.llvm.org/ce/z/GXoMD9
https://alive2.llvm.org/ce/z/nVCyVH

Diff Detail

Event Timeline

spatel created this revision.Dec 10 2021, 8:00 AM
spatel requested review of this revision.Dec 10 2021, 8:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 10 2021, 8:00 AM
spatel edited the summary of this revision. (Show Details)Dec 10 2021, 8:01 AM
nikic added a comment.Dec 10 2021, 8:16 AM

It looks like the shufflevector poison change hasn't landed yet? D93818 / D103874

I was thinking about this as well, but you have more history on the discussion than I do.

Let me briefly mention the case which I was unclear about. For most of the demanded lane stuff, leaving the flags is clearly fine as we make only undemanded lanes "more poison". However, is there an interaction with undef and demanded lanes? Do any callers assume that flags hold for any value of undef chosen? It seems like we might have a version of the undef commit problem. (i.e. where two different places assume different values for an undef, and the first doesn't commit that to IR.) Not sure, but I was uncomfortable when trying to think it through.

In a local patch, I'd tried preserving the existing drop flags functionality for undef lanes. That appeared to allow all the test changes, but I didn't post it because I wasn't clear if it was actually needed or not.

spatel added subscribers: hyeongyukim, aqjune, regehr.

It looks like the shufflevector poison change hasn't landed yet? D93818 / D103874

Thanks for the reminder! ( cc @aqjune @hyeongyukim @regehr )

I'm not sure what effect those patches would have here, so maybe we can add test examples to confirm that this doesn't break anything?
All of the combinations of poison/undef that I tried in Alive2 with shuffles similar to the ones in the diffs here are shown as valid.

nikic accepted this revision.Dec 13 2021, 3:23 AM

LGTM. After looking a bit closer, I do think this is safe and independent of the shufflevector patches referenced above. We already convert non-demanded elements to poison, so converting them to poison through nowrap flags should be just fine.

This revision is now accepted and ready to land.Dec 13 2021, 3:23 AM

LGTM. After looking a bit closer, I do think this is safe and independent of the shufflevector patches referenced above. We already convert non-demanded elements to poison, so converting them to poison through nowrap flags should be just fine.

Thanks! That's probably better than any of my attempts to put the reasoning into words.

References to preliminary steps in the path to cleaning this up:
https://llvm.org/PR43958
D70246

Let's give this a try and see if the theory holds up...

This revision was landed with ongoing or failed builds.Dec 13 2021, 7:12 AM
This revision was automatically updated to reflect the committed changes.

It looks like the shufflevector poison change hasn't landed yet? D93818 / D103874

Thanks for the reminder! ( cc @aqjune @hyeongyukim @regehr )

Hi, sorry for my late reply.
The patch is blocked at the moment because the mm*_undefined* intrinsics must be correctly lowered first. (see: https://reviews.llvm.org/D103874#3022980)
Simply fixing them to return a zero vector would be the simplest solution, but not sure whether it will have no performance regression, and no one (including me) has no time to explore...