Canonicalize the "masked merge" operation to the shorter xor-variant:
(a & b) | (~a & c) --> (((b ^ c) & a) ^ c)
Differential D109194
[InstCombine] Canonicalize masked merge; optimize (a & b) | (~a & c) MatzeB on Sep 2 2021, 2:03 PM. Authored by
Details
Diff Detail
Event TimelineComment Actions And for the record: It's easy enough to prove this correct with value tables and alive. Though I can't currently figure out a way to deduce this formula myself so there may be generalizations or underlying principles that I am missing which would produce more instcombine patterns... Comment Actions Seems this is called masked merge and there is prior discussion/code:
Was this combine here missed or is it somehow not good? It does make things slightly faster in my code... Comment Actions So @lebedev.ri if I did my history research correctly, then the pattern in my patch here has also been part of https://reviews.llvm.org/D46814. That diff however was reverted in the meantime (see https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20180528/556578.html @echristo) because of undef behavior? My version here adds a freeze instruction (thanks to alive2.llvm.org for catching this or I would have missed that subtlety too!). So maybe it this is ok now? Comment Actions This is an alternative bit select pattern - do we know if many backends match this pattern as well as the more common (a & b) | (~a & c) to their bit select instructions (bsl/pcmov/etc.)? The xor pattern is better on SSE for cases where b and c were both constants, plus its entirely commutable. Comment Actions
I tried llc with the following test: define i32 @variant0(i32, i32, i32) { %4 = and i32 %1, %0 %5 = xor i32 %0, -1 %6 = and i32 %5, %2 %7 = or i32 %6, %4 ret i32 %7 } define i32 @variant1(i32, i32, i32) { %4 = xor i32 %2, %1 %5 = and i32 %4, %0 %6 = xor i32 %5, %2 ret i32 %6 } (generic) X86: variant0 produces "mov; and; not; and; or"; variant1 produces "mov; xor; and; xor" -> For targets without an "and-not" instruction variant1 seems better and this patch will help produce better code there. Most targets with an "and-not" use it either way, except for ARM which fails to recognize the xor variant and PPC64 which fails to recognize the and-not variant... Also a goal for LLVM-IR is normalizing the program and this patch will help with that. Comment Actions It looks like have a reverse fold to this in DAG: DAGCombiner::unfoldMaskedMerge Also, InstCombine has a fold for: (X&~Z)|(Y&Z) -> select(X,Y) but it doesn't handle the XOR variant: https://simd.godbolt.org/z/sWM3KxbrE Comment Actions
Noticed that too. It depends on TargetLower::hasAndNot() to return true. I tried implementing hasAndNot() in the ARM and SystemZ backend which lacked this function. It does eliminate problems with this patch, unfortunately there are ripple effects in other areas and I am not sure how deep I want to dig (patterns like select(x < 0, 0, x) are DAGCombined into (x & ~(x >>s 31)) making the targets fail to select specialized saturation instructions...). Though I could probably refactor things to make it possible to independently unfold masked merge separately from the and-not zero saturation. Comment Actions
Comment Actions I filed https://bugs.llvm.org/show_bug.cgi?id=51876 about the different ARM codegen. I believe we can land this change anyway, since ARM codegen produces the same amount of instructions with any masked-merge pattern. Comment Actions I don't think this makes sense as an InstCombine (middle end) transform. The resulting expression is less analyzable both in that freeze is a complete analysis blocker, and xor is generally less analyzable than and/or. This looks like something we should be doing in the backend instead. Comment Actions
It's easy enough to move this pattern to SelectionDAG. Though on a general level, my philosophy was always that if you have 2 equivalent expressions then it's best to canonicalize towards one of them in InstCombine. Not sure if freeze is a good enough reason to not canonicalize... Comment Actions That would also work well. But it will require a freeze just the same as we go from one use of a to two. Comment Actions And FWIW: My gut feeling is that a canonicalization is worth having an extra freeze around... |