This is a fix for PR26083:
My first attempt to fix this was to skip CSE of compares that were input to idempotent instructions. But, this could leave redundant code and might inhibit inliner from inlining if the IR is bloated. This version enhances the instcombine to look for users where we can elide xor and clone those users if profitable.
Details
Diff Detail
Event Timeline
This is a very basic question. But have you thought about enhancing instcombine to catch this case rather than preventing early-cse from optimizing it? It seems like the case you're aiming to prevent could exist without early-cse having created it.
Good question. I initially started off implementing an enhancement to instcombine. Specifically, whenever we see a candidate that can be simplified using DeMorgan's Law we currently check if the predicate has more than one use because we would want to avoid inverting the predicate for other users and cause a miscompile. I began implementing an enhancement which checked if all the uses of this predicate can be inverted but this requires walking the use-def chain and seemed very inefficient to catch this corner case. I also considered if we can move the Demorgan Laws simplification to instsimplify but that requires mutation of CFG so that is not possible. Another solution that I thought of in parallel to this solution is to clone the compare instruction during instcombine while applying DeMorgan's Law. However, this will be a performance neutral simplification in most cases as we will be trading an icmp for xor. So, I came up with this solution first to see if it fires for any of Spec2k6.
lib/Transforms/InstCombine/InstCombineAndOrXor.cpp | ||
---|---|---|
1246 | Functions should start with a lower case letter. | |
1247 | Sentances start with an upper case letter. | |
1251–1253 | Why? | |
1277–1285 | Seems like we might do an awfully large amount of work. How large does Users typically get when this xform is profitable? | |
1289 | What stops Users.size() from equaling zero? | |
1291 | You are using cast_or_null but indiscriminately dereference the result, this seems wrong. | |
test/Transforms/InstCombine/not2.ll | ||
3 | This is too broad, please CHECK your tests more thoroughly. |
lib/Transforms/InstCombine/InstCombineAndOrXor.cpp | ||
---|---|---|
1277–1285 | Sorry for late response. This was added to avoid unnecessarily commutating the operands of AND instruction when we could not elide a xor because it resulted in suboptimal codegen due to another issue that I am addressing in D17942. This is still on my plate and I plan to address the comments here once D17942 is landed. |
lib/Transforms/InstCombine/InstCombineAndOrXor.cpp | ||
---|---|---|
1251–1253 | Only supporting for non vector types for now as the codegen for vector types is suboptimal. | |
1277–1285 | I have refactored the code after landing D17942. To reduce the amount of work I am now checking only local users. | |
1291 | Refactored code so that Users will always contain instructions. | |
test/Transforms/InstCombine/not2.ll | ||
4 | There is nothing more to test in these testcases other than checking if xor has been elided similar to the tests in test/Transforms/InstCombine/not.ll |
lib/Transforms/InstCombine/InstCombineAndOrXor.cpp | ||
---|---|---|
1344 | Perhaps you could move the isa<BinaryOperator> check into a separate if statement. IMO, it doesn't go with the comment above. | |
1349 | This can be a cast<> rather than a dyn_cast<> because you've already checked we have a BO. Alternatively, you could do the following to address the previous comment: BinaryOperator *BO = dyn_cast<BinaryOperator>(V); if (!BO) return false; This will also decrease the indention. | |
1364 | Is VI loop invariant? If so, please hoist this out of the loop. Also, I believe we know V is either a cmp or BO, so we can use a cast<> rather than a dyn_cast<>? |
lib/Transforms/InstCombine/InstCombineAndOrXor.cpp | ||
---|---|---|
1374–1375 | Can't this be simplified to a range-based for loop? |
lib/Transforms/InstCombine/InstCombineAndOrXor.cpp | ||
---|---|---|
1374–1375 | Thanks David, I have simplified it to a range-based for loop. |
lib/Transforms/InstCombine/InstCombineAndOrXor.cpp | ||
---|---|---|
1350 | This might read a bit better with something like: if (BinaryOperator *BO = dyn_cast<BinaryOperator>(V)) { // Must be an 'add' or a 'sub'. unsigned Opc = BO->getOpcode(); if (Opc != Instruction::Add && Opc != Instruction::Sub) return false; // At least one operand must be a constant. if (!isa<Constant>(BO->getOperand(0)) && !isa<Constant>(BO->getOperand(1))) return false; } Also, is it not the case that constants are always canonicalized to the RHS, which would mean the check on operands 0 is unnecessary? |
lib/Transforms/InstCombine/InstCombineAndOrXor.cpp | ||
---|---|---|
1350 | IIRC, instcombine is still early to assume that that constants are always canonicalized to the RHS. |
lib/Transforms/InstCombine/InstCombineAndOrXor.cpp | ||
---|---|---|
1350 | Okay, I just wanted to check. FWIW, the reassocation pass is responsible for the canonicalization. |
Just getting this off my queue. If you decide to pursue this further, please add me back as a reviewer, Balaram.
Functions should start with a lower case letter.