Page MenuHomePhabricator

[InstCombine] foldICmpWithLowBitMaskedVal(): handle ~(-1 << y) mask
ClosedPublic

Authored by lebedev.ri on Sep 16 2018, 1:27 AM.

Details

Summary

Two folds are happening here:

  1. https://rise4fun.com/Alive/oaFX
  2. And then foldICmpWithHighBitMask() (D52001): https://rise4fun.com/Alive/wsP4

This change doesn't just add the handling for eq/ne predicates,
it actually builds upon the previous foldICmpWithLowBitMaskedVal() work,
so all the 16 fold variants* are immediately supported.

I'm indeed only testing these two predicates.
I do not feel like re-proving all 16 folds*, because they were already proven
for the general case of constant with all-ones in low bits. So as long as
the mask produces all-ones in low bits, i'm pretty sure the fold is valid.

But required, i can re-prove, let me know.

  • eq/ne are commutative - 4 folds; ult/ule/ugt/uge - are not commutative (the commuted variant is InstSimplified), 4 folds; slt/sle/sgt/sge are not commutative - 4 folds. 12 folds in total.

https://bugs.llvm.org/show_bug.cgi?id=38123
https://bugs.llvm.org/show_bug.cgi?id=38708

Diff Detail

Repository
rL LLVM

Event Timeline

lebedev.ri created this revision.Sep 16 2018, 1:27 AM
lebedev.ri edited the summary of this revision. (Show Details)Sep 16 2018, 1:33 AM
lebedev.ri edited the summary of this revision. (Show Details)Sep 16 2018, 4:11 AM

Is this a generalization of D52001 (and if so, can we remove that code as a special-case of the more general pattern)?

Is this a generalization of D52001 (and if so, can we remove that code as a special-case of the more general pattern)?

No, i don't think so, why?

Like i wrote in the description:

Two folds are happening here:

  1. https://rise4fun.com/Alive/oaFX
  2. And then foldICmpWithHighBitMask() (D52001) fires on newly formed IR: https://rise4fun.com/Alive/wsP4

I.e. this fold actually allows D52001 to happen.

Is this a generalization of D52001 (and if so, can we remove that code as a special-case of the more general pattern)?

No, i don't think so, why?

Like i wrote in the description:

Two folds are happening here:

  1. https://rise4fun.com/Alive/oaFX
  2. And then foldICmpWithHighBitMask() (D52001) fires on newly formed IR: https://rise4fun.com/Alive/wsP4

I.e. this fold actually allows D52001 to happen.

Sorry, I misread what was happening in this patch series. Things look logically correct, but we should clarify for the record - what is the higher-level motivation for these folds? Particularly, for the cases with non-canonical pattern matching - do we have some important application that will benefit or data for how often those patterns occur across a range of benchmarks?

Is this a generalization of D52001 (and if so, can we remove that code as a special-case of the more general pattern)?

No, i don't think so, why?

Like i wrote in the description:

Two folds are happening here:

  1. https://rise4fun.com/Alive/oaFX
  2. And then foldICmpWithHighBitMask() (D52001) fires on newly formed IR: https://rise4fun.com/Alive/wsP4

I.e. this fold actually allows D52001 to happen.

Sorry, I misread what was happening in this patch series.

Great. I was worrying i was missing something obvious here.

Things look logically correct,

but we should clarify for the record - what is the higher-level motivation for these folds?
Particularly, for the cases with non-canonical pattern matching - do we have some important application
that will benefit or data for how often those patterns occur across a range of benchmarks?

The main reason is consistency.
I have seen all 4 of these variants to produce the mask in the wild.
I'm not aware of some 5'th variant. (at least right now?)

Sure, we may just forget these non-canonical patterns exist for the matter of this fold,
like we do in most other cases, but if we fail to canonicalize them, well, we fail.

I certainly understand the point about 'death by thousand cuts', but well,
if two more trivial patters notably matter, i suspect there is some larger problem..
(It would/will be oh so much simpler if each and every patter needn't be to manually found/tested/folded..)

Stats are not as impressive. LLVM stage 2

$ # (when the mask is constant, the actual number is higher since it fires more than once in some files)
$ find -iname *.stats | xargs grep D52146_CONSTANT | wc -l
157
$ # the canonical pattern variants.
$ find -iname *.stats | xargs grep D52146_CANONICAL
./llvm-stage2/tools/lld/lib/ReaderWriter/MachO/MachONormalizedFileToAtoms.stats:        "instcombine.D52146_CANONICAL": 1,
./llvm-stage2/tools/lld/wasm/InputChunks.stats: "instcombine.D52146_CANONICAL": 2,
$ find -iname *.stats | xargs grep D52146_UNCANONICAL
$ # well yeah, zero. given how few D52146_CANONICAL there are, i suspect this is not representable.
spatel accepted this revision.Sep 18 2018, 7:45 AM

Is this a generalization of D52001 (and if so, can we remove that code as a special-case of the more general pattern)?

No, i don't think so, why?

Like i wrote in the description:

Two folds are happening here:

  1. https://rise4fun.com/Alive/oaFX
  2. And then foldICmpWithHighBitMask() (D52001) fires on newly formed IR: https://rise4fun.com/Alive/wsP4

I.e. this fold actually allows D52001 to happen.

Sorry, I misread what was happening in this patch series.

Great. I was worrying i was missing something obvious here.

Things look logically correct,

but we should clarify for the record - what is the higher-level motivation for these folds?
Particularly, for the cases with non-canonical pattern matching - do we have some important application
that will benefit or data for how often those patterns occur across a range of benchmarks?

The main reason is consistency.
I have seen all 4 of these variants to produce the mask in the wild.
I'm not aware of some 5'th variant. (at least right now?)

Sure, we may just forget these non-canonical patterns exist for the matter of this fold,
like we do in most other cases, but if we fail to canonicalize them, well, we fail.

I certainly understand the point about 'death by thousand cuts', but well,
if two more trivial patters notably matter, i suspect there is some larger problem..
(It would/will be oh so much simpler if each and every patter needn't be to manually found/tested/folded..)

Agreed - not sure how much work that automated/better solution requires, but I guess we'll eventually get there. But in the meantime, we should be aware of the compile-time implications of the current instcombine. Therefore, not adding folds without some real-world motivation. The perf of visitICmpInst is probably 1 of the most concerning for those that are watching the compile-time creep up, so that's why I'm raising it here.

Stats are not as impressive. LLVM stage 2

$ # (when the mask is constant, the actual number is higher since it fires more than once in some files)
$ find -iname *.stats | xargs grep D52146_CONSTANT | wc -l
157
$ # the canonical pattern variants.
$ find -iname *.stats | xargs grep D52146_CANONICAL
./llvm-stage2/tools/lld/lib/ReaderWriter/MachO/MachONormalizedFileToAtoms.stats:        "instcombine.D52146_CANONICAL": 1,
./llvm-stage2/tools/lld/wasm/InputChunks.stats: "instcombine.D52146_CANONICAL": 2,
$ find -iname *.stats | xargs grep D52146_UNCANONICAL
$ # well yeah, zero. given how few D52146_CANONICAL there are, i suspect this is not representable.

Thanks for collecting the data. So the non-canonical cases are closer to the edge, but this patch is definitely justified. LGTM.

This revision is now accepted and ready to land.Sep 18 2018, 7:45 AM
This revision was automatically updated to reflect the committed changes.