This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine][RFC] Canonicalize constant mask in masked merge mattern
AbandonedPublic

Authored by lebedev.ri on Apr 14 2018, 8:14 AM.

Details

Summary

Masked merge has a pattern of: ((x ^ y) & M) ^ y.
Once PR6773 fold is done, that pattern will be produced.

But, if the mask is constant, there is no difference between ((x ^ y) & M) ^ y and ((x ^ y) & ~M) ^ x and ((x ^ y) & ~(~M)) ^ y,
I think the constant mask should be canonicalized.

I propose the following:

  • if this is scalar:
    • if inverted mask has less bits set (Population count) than the original mask, DO invert
    • if zero-extended value of inverted mask is smaller than of the original mask, DO invert
    • else DON'T invert.
  • if it is vector: *. apply that ^ scalar predicate to each non-undef element.
      • count number of elements where the predicate is true.
      • for undef elements the predicate is false.
    • if there are more elements where the predicate is true, than elements where predicate is false, DO invert
    • else DON'T invert.

These are strict less/more comparisons, not less or equal/more or equal.
This ensures that the canonicalization is stable, and won't be undone / won't cause cycles.

https://rise4fun.com/Alive/eox

Thoughts?

Diff Detail

Repository
rL LLVM

Event Timeline

lebedev.ri retitled this revision from [InstCombine][RFC] Add tests for mask canonicalization in masked merge to [InstCombine][RFC] Canonicalize constant mask in masked merge mattern.

Tidy up comments and the isNonCanonicalMask().

Drop duplicate tests.
I have just realized that non-const mask is not canonicalized yet too https://godbolt.org/g/NKPHGF, so i'll rework this slightly..

Rebased ontop of less controversial D45664, reducing the diff's size.

nullptr-init Mc too.
Does not appear to matter, but i have already hit the bug when
i forgot to do that to M, so better safe than sorry.

Rebased, slightly cleanup the matcher by using m_CombineAnd().

See my comment in D45733. Sorry that I didn't get a chance to review/discuss this sooner.

I doubt the premise that this pattern should exist in the first place, so I suspect we don't want to add logic to transform it. But if it does exist, then and/and/or is the best canonical form for the IR - it's better for bit-tracking analysis and better for codegen.

lebedev.ri abandoned this revision.Apr 19 2018, 3:54 AM

See my comment in D45733. Sorry that I didn't get a chance to review/discuss this sooner.

I doubt the premise that this pattern should exist in the first place, so I suspect we don't want to add logic to transform it. But if it does exist, then and/and/or is the best canonical form for the IR - it's better for bit-tracking analysis and better for codegen.

After further disscussion in D45733, yes, probably.
One more reason - the pattern could be (x & C1) | (y & C2), where C1 and C2 don't have common bits set, so the result will have ~(C1|C2) bits unset.
And we would not be able to efficiently fold that to ((x^y)&m)^y.
So yeah, let's unfold it for constants.