---------------------------------------- define i3 @src(i3 %a, i3 %b, i3 %c) { %0: %or1 = or i3 %b, %c %not1 = xor i3 %or1, 7 %and1 = and i3 %a, %not1 %xor1 = xor i3 %b, %c %or2 = or i3 %xor1, %a %not2 = xor i3 %or2, 7 %or3 = or i3 %and1, %not2 ret i3 %or3 } => define i3 @tgt(i3 %a, i3 %b, i3 %c) { %0: %obc = or i3 %b, %c %xbc = xor i3 %b, %c %o = or i3 %a, %xbc %and = and i3 %obc, %o %r = xor i3 %and, 7 ret i3 %r } Transformation seems to be correct!
Details
Diff Detail
Event Timeline
A side note: a shorter form would be (~a & b & c) | ~(b | c) -> ~((a & b) | (b ^ c)), however it is only mathematically correct and does not verify since it is more undefined than the source:
---------------------------------------- define i4 @src(i4 %a, i4 %b, i4 %c) { %0: %or1 = or i4 %b, %c %not1 = xor i4 %or1, 15 %and1 = and i4 %a, %not1 %xor1 = xor i4 %b, %c %or2 = or i4 %xor1, %a %not2 = xor i4 %or2, 15 %or3 = or i4 %and1, %not2 ret i4 %or3 } => define i4 @tgt(i4 %a, i4 %b, i4 %c) { %0: %and = and i4 %a, %b %not1 = xor i4 %b, %c %or = or i4 %and, %not1 %not2 = xor i4 %or, 15 ret i4 %not2 } Transformation doesn't verify! ERROR: Target's return value is more undefined Example: i4 %a = #xf (15, -1) i4 %b = undef i4 %c = #xf (15, -1) Source: i4 %or1 = #xf (15, -1) i4 %not1 = #x0 (0) i4 %and1 = #x0 (0) i4 %xor1 = any i4 %or2 = #xf (15, -1) i4 %not2 = #x0 (0) i4 %or3 = #x0 (0) Target: i4 %and = #x6 (6) i4 %not1 = #x8 (8, -8) i4 %or = #xe (14, -2) i4 %not2 = #x1 (1) Source value: #x0 (0) Target value: #x1 (1)
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp | ||
---|---|---|
2733 | I have realized it is essentially the same LHS as above. And then above is the same as another one above. Will combine these, the only real difference is one use counts, but that can be done inside the outer if statement. As the number of patterns grow I have to keep it under control somehow and factor. |
In the light of D113526 this needs to be rebased and handle inverted pattern:
(a | ~(b & c)) & ~(a & (b ^ c)) --> ~(a | b) | (a ^ b ^ c)
---------------------------------------- define i4 @src(i4 %a, i4 %b, i4 %c) { %0: %and1 = and i4 %b, %c %not1 = xor i4 %and1, 15 %or1 = or i4 %not1, %a %xor1 = xor i4 %b, %c %and2 = and i4 %xor1, %a %not2 = xor i4 %and2, 15 %and3 = and i4 %or1, %not2 ret i4 %and3 } => define i4 @tgt(i4 %a, i4 %b, i4 %c) { %0: %xor1 = xor i4 %b, %c %xor2 = xor i4 %xor1, %a %or1 = or i4 %a, %b %not1 = xor i4 %or1, 15 %or2 = or i4 %xor2, %not1 ret i4 %or2 } Transformation seems to be correct!
Double-check that I'm getting the pattern right, but I don't think the 'or' side of this should have 6 instructions. There has to be a smarter way (!), but I worked this out on paper with a truth table:
https://alive2.llvm.org/ce/z/BP3ZZm
Changed to shorter form as proposed by @spatel:
(~(a | b) & c) | ~(c | (a ^ b)) -> ~((a | b) & (c | (b ^ a)))
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp | ||
---|---|---|
1782 | Both "A | B" and "(A ^ B) | C" are existing values in this expression. Does it not make sense to capture those instead of cloning them? Note: that also means we're only creating 2 instructions total ('and' + 'not'), so this transform only needs a single one-use constraint. |
Capture one more subexpression.
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp | ||
---|---|---|
1782 | Thanks! A good catch. |
The patch was reverted since the flipped case produces more undefined result. I am not sure how did that sneak, but I am reopening it and limiting to only 'or' case.
Ah, that is unfortunate. We may need to freeze the inputs if we're changing the number/order of uses:
https://alive2.llvm.org/ce/z/JXQ2Rn
I think it's worth leaving the tests in place in case we find some way around that limitation. Also, add a code comment about undef, so we remember why the 'and' sibling is not included.
Retained the 'and' tests, but added comment why is it not done in both source and test.
Both "A | B" and "(A ^ B) | C" are existing values in this expression. Does it not make sense to capture those instead of cloning them?
Note: that also means we're only creating 2 instructions total ('and' + 'not'), so this transform only needs a single one-use constraint.