- User Since
- Jun 9 2016, 6:44 PM (188 w, 1 d)
Dec 18 2019
Clean up code based on reviewer feedback.
Dec 17 2019
Add special case handling:
Dec 12 2019
Hang on a bit.
Let me see if I can find more issues in ConstantFoldBinaryInstruction.
Similar issue , current upstream crash at void llvm::Value::doRAUW(llvm::Value *, llvm::Value::ReplaceMetadataUses): Assertion `New->getType() == getType() && "replaceAllUses of value with new value of different type!"' failed.
Yes, this transformation will need to go in InstCombine.
We should probably implement this when constant scalable vector support is ready, so that the effect will be most obvious.
Dec 11 2019
Current upstream crash with llvm/lib/IR/Value.cpp:404: void llvm::Value::doRAUW(llvm::Value *, llvm::Value::ReplaceMetadataUses): Assertion `New->getType() == getType() && "replaceAllUses of value with new value of different type!"' failed.
Dec 10 2019
Dec 9 2019
Dec 6 2019
Please let me know if you have any further concerns?
Dec 5 2019
"The main problem is the fix should be applied to all duplication even if BB is not removed."
-> Make sense, thank you for pointing this out.
Thank you Sanjay!
Dec 3 2019
Addressed review feedback.
current upstream crash with "void llvm::Value::doRAUW(llvm::Value *, llvm::Value::ReplaceMetadataUses): Assertion `New->getType() == getType() && "replaceAllUses of value with new value of different type!"' failed."
Hey @Carrot , really appreciated your time on it.
Dec 2 2019
I would like to merge this patch, please let me know if there are any objections?
Updated test checking.
Nov 25 2019
Any objections with this approach?
Nov 22 2019
Add reduced test case
Nov 14 2019
Addressed reviewer feedback -- add more explanation in comments.
Nov 13 2019
Guys, any thoughts on this?
Nov 11 2019
Just realize I forget to mention how the BB is removed.
Nov 8 2019
Actually fillWorkLists is not called for the deleted BB, but the MBBs of BBChain. The BB is deleted.
Here BBChain will have an unreleased UnscheduledPredecessors counter, when calling fillWorkLists for any MBBs within BBChain, this assertion will happen.
Adding some more detailed explanation:
This problem caused assertion on: Assertion `BlockToChain[&MBB]->UnscheduledPredecessors == 0 && "expect unschedPred to be 0\n"' failed.
Oct 4 2019
Sep 25 2019
Not a good solution
Sep 24 2019
Sep 23 2019
This is just FYI.
Another note, for older generation X86 target, e.g., haswell, cmove indeed has latency 2. But able to achieve comparable uOps Per Cycle
same test input
clang clampNegToZero.ll -O2 -target x86_64 -march=haswell -S -o - | llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=haswell
llvm-mca performance result for general folding:
llvm-mca results for more general folding pattern
make folding more general
make test more general
Sep 20 2019
resolved reviews feedback
resolved reviews feedback
move test "extra_use_sub" to positive case
Similar to D67799
For X86, AArch64 and ARM target, backend produce better ASM with this transformation. Please refer to below examples:
Sep 19 2019
E.g., vmin generation for ARM target
E.g., vmax generation for ARM target
Jul 12 2019
Jul 11 2019
forget to update the test comment, sorry
test case foo1_and_signbit_lshr_without_shifting_signbit_not_pwr2() aim to check that we are not creating additional shift instruction when fold fails.
I think we should only create signbit shift instruction when fold is supposed to happen.
Jul 10 2019
Addressed review comments
Jul 5 2019
Generalize InstCombiner::foldAndOrOfICmpsOfAndWithPow2() in D64275