- User Since
- May 5 2018, 9:37 AM (125 w, 3 d)
I find the naming here confusing: Looking at this patch my expectation was that DefaultIntrinsic is used for intrinsics with default attributes, based on the name. Took me a while to understand it's actually the other way around. Shouldn't the Intrinsic <-> DefaultIntrinsic naming be inverted?
FWIW, when removing a redundant fold, it would be helpful to point out which other fold already handles it in the review summary.
Mon, Sep 28
What is the relationship between this code and SimplifyIndvar::eliminateIVComparison()? Is this basically the same thing, just for icmps against something other than the IV itself?
To handle the various undef cases this ended up being a lot bulkier than I hoped - I'd love to hear if anyone has any suggestions on how we could use ConstantExpr/PatternMatch to simplify this as I imagine we could use such functionality in other places as we improve non-uniform vector support.
Sun, Sep 27
Sat, Sep 26
Thanks for putting up the review! I agree that the current testing situation is not ideal, but don't think the approach used here is moving us in the right direction.
Simplify implementation. There's no need to collect all the best ranges explicitly.
Use preference function instead of filter function, making the testing approach more broadly applicable.
Fri, Sep 25
There's a 40% code size increase on CMakeFiles/7zip-benchmark.dir/CPP/7zip/Crypto/Sha1.cpp.o, might be worth double checking that we're not missing some optimizations on funnel shifts.
Assuming it's not part of some other patch, this should also add a LangRef entry for the llvm.loop.mustprogress metadata.
Abandon this as no longer necessary for mustprogress?
Thanks for the clarification wrt callees, this LGTM now.
Thanks for the update, the new version should be correct. However, it can also be expensive, because we may loop through a phi multiple times until the depth limit is exhausted. I would suggest to follow the same implementation approach as computeKnownBits() does in https://github.com/llvm/llvm-project/blob/d2166076b882e38becf3657ea830ffd2b6a5695e/llvm/lib/Analysis/ValueTracking.cpp#L1574-L1606. Namely to perform the recursive call with MaxDepth - 1. (I would also suggest to handle your TODO right away, to have parity with the known bits code.)
This LGTM now, thanks!
Thu, Sep 24
LG with some nits
Wed, Sep 23
I've reverted this change as it breaks the clang build with linker errors. D88173 did not address them either.
Mon, Sep 21
FWIW there were code size changes with this patch, so it wasn't NFC.
Sun, Sep 20
Add additional test case @test_br_cmp_with_offset. Found in Rust code and not folded by IPSCCP (or anything else in LLVM).
Sat, Sep 19
I don't think this fully addresses the problem, in that this relies on the phi getting processed before the unreachable, which is not a given. This would have to happen prior to the main InstCombine run, at the same time we remove instructions in unreachable blocks. It's probably not worth it. I think we should fix this the direct way, by performing a replaceInstUsesWith(*Prev, UndefValue(Prev->getType()) before the eraseInstFromFunction() in the unreachable transform.
Fri, Sep 18
Actually, that would be out-of-bounds for instcombine. Removing a phi operand would mean removing the incoming block too, and we don't do that in instcombine. So I think we just bail out if a value still has uses. SimplifyCFG will eventually get it.
Thu, Sep 17
Wed, Sep 16
Tue, Sep 15
Tested the newest version of this patch and I'm still seeing massive regressions, even larger than before.
Mon, Sep 14
LGTM. I think we have enough to start using them for vector intrinsics.
Now that D87571 also handles the FLT_MAX case, adjust this patch to use either NaN, Inf or FLT_MAX as the neutral element for fminnum (and same negative for fmaxnum). This means we never have to drop FMF.
Sun, Sep 13
Preserve flags when commuting operands, fix typos in comments.
Fix confusion between isSmallest and isLargest+isNegative.
Also fold FLT_MAX/-FLT_MAX if ninf flag is set.
Add SoftPromoteHalfRes as well. As we do expand non-fast reductions on X86, we can run into it there. The added tests were previously asserting.