- User Since
- Apr 14 2013, 11:48 AM (387 w, 6 d)
Tue, Sep 15
This LGTM, thanks!
Fri, Sep 11
Wed, Sep 9
I'd like to make some progress on this soon, given that this still causes every LNT run on the SystemZ build bot to fail ...
Tue, Sep 8
We're finally there, great :-) LGTM.
Makes sense to me. As an aside, this code:
This LGTM now. Thanks!
Fri, Sep 4
LGTM now, thanks!
I think this makes sense to me, but I want to point out that this partially reverts the commit https://reviews.llvm.org/D46854. I guess there may be other parts of this commit that should be reverted?
I may be missing something, but as long as the memoization does a true intersection of the flags per this patch, there should not necessarily be anything broken, or?
Wed, Sep 2
The algorithm used in the unsigned case isn't strict-safe. However, this is exactly the same algorithm that is also used in TargetLowering::expandFP_TO_UINT, where is has been made properly strict-safe in the meantime.
Tue, Sep 1
Thanks for the links, @spatel ! It seems to me that the change in "D46854 - changed code in SelectionDAGBuilder::visit()" is actually not correctly respecting the shared flags semantics. The node returned from getNodeForIRValue, while implementing the translation of this IR, might at the same be be used elsewhere (due to DAG node merging). If that other place does not allow FMF semantics, the flags in the DAG node can still be "undefined", and therefore the new code in "visit" will just replace the flags with a version appropriate only for this IR (which may allow FMF).
This LGTM now, thanks!
Mon, Aug 31
Interesting. I agree that the flags should always be reset here. In fact, I'm questioning why we should have that "isDefined" logic in the first place. It seems to me that any node with "undefined" flags can only be used correctly if any setting of the flags would be semantically correct -- but then we may just as well use the all-zero setting of the flags anyway.
Fri, Aug 28
Wed, Aug 26
Aug 21 2020
OK, this now looks all good to me as well. Thanks!
Aug 20 2020
A few comments inline, otherwise this looks good.
LGTM now. Thanks!
Aug 19 2020
This LGTM now. Thanks!
Aug 13 2020
Aug 12 2020
Yes, this makes sense to me as well. Patch LGTM.
Aug 10 2020
Aug 4 2020
This doesn't look correct. As far as I can see, none of the conversion functions were actually changed to handle strict operations. For one, you'll need strict variants of all the PowerPC-specific conversion operations, use them in all the conversion subroutines, and consistently track their chain nodes.
Aug 3 2020
This LGTM now.
I'm not sure what specific problem you're trying to solve here, but this change in itself looks incorrect.
Jul 30 2020
Sure, of course assuming the test still passes.
Jul 29 2020
LGTM for SystemZ as well.
Jul 24 2020
Jul 23 2020
Jul 22 2020
LGTM with the format fixes.
Jul 21 2020
This LGTM now.
Jul 17 2020
See inline comments about chain handling.
Jul 16 2020
The one issue I still see is that when falling back to library calls on older platforms, signaling and non-signaling compares both are treated the same. But that's really a pre-existing problem because this is not fully supported by libgcc (see the comment in TargetLowering::softenSetCCOperands), so it's not really a problem with this patch.
Jul 14 2020
Jul 12 2020
Jul 10 2020
Committed as 4c5a93bd58bad70e91ac525b0e020bd5119a321a.
Jul 9 2020
Drop AllowNoUniqueAddress parameter; address review comment.
Jul 8 2020
Handle array of empty records correctly.
Yes, this does look like an obvious typo. LGTM!
Jul 7 2020
Rebased against mainline.
This was actually fixed in a different way here: https://reviews.llvm.org/D75014
I'm closing this review now.
Another ping ...
A couple of cosmetic changes inline, otherwise the LGTM. Thanks!
Jun 30 2020
I've now updated the knownbits.ll test case to make it less fragile and still test what it is supposed to test (commit e9c6b63). The problem was use of "undef" (which makes it susceptible to collapse as generic optimizations improve), and the fact that knownBits was operating at the very edge of MaxRecursionDepth, so changes in common code could push it above the limit.
This commit causes test case failures on s390x (unfortunately hidden as the build bot was already red due to an unrelated issue):
For unsigned conversion, the input is treated as signed here, and the result fixed up by adding 2^N; something along the lines of
(signed)N >= 0 ? signed_to_FP((signed)N) : signed_to_FP(N - 2^N) + (FP)2^N
Jun 29 2020
It looks like this broke the s390x build bots, since then I'm seeing a lot of ASan and MSan errors:
Jun 26 2020
Jun 25 2020
Jun 24 2020
Jun 23 2020
Hmm, with clang-cl it seems the driver is trying to use this:
which of course doesn't exist. Not sure what is supposed to be happening here, but it seems that it's falling back on s390x-linux since on s390x, Linux is currently the only supported OS.
Ping? I'd really appreciate feedback about this ABI issue, in particular from other affected target maintainers ...
Line 4 here fails on s390x but not on other Unix flavors, see: http://lab.llvm.org:8011/builders/clang-s390x-linux/builds/33346/steps/ninja%20check%201/logs/FAIL%3A%20Clang%3A%3Adebug-info-codeview-buildinfo.c
Jun 16 2020
Jun 15 2020
This all looks good to me.
Shouldn't the check be done inside canUseBlockOperation? I believe the other user (storeLoadCanUseMVC) really needs the same check.
Jun 10 2020
Jun 5 2020
OK. I agree this can be abandoned now. Thanks!
This looks all good to me now, we should verify that performance results are still good, and then it can go in.
A few cosmetic issues mentioned inline, otherwise this now looks good to me. Thanks!
Jun 4 2020
May 18 2020
It seems that now MachineCSE can remove a few VPERMs also. At least one case was because instead of using an undef source operand, the mask was used (it doesn't help any to replace an undef Ops with Op2 on the last line of getGeneralPermuteNode(), which I first thought). The test case I have reduced (not included) shows that MCSE on trunk fails to remove the second vperm if even though the instructions are near identical. The only difference to the success with this patch is that instead of the reused mask, there is an IMPLICIT_DEF. It looks to me that this is a minor deficiency of MCSE (Two different vregs defined by IMPLICIT_DEF should not stop CSE, I would think).
isZeroOrUndefVector(): The check for the undef vector used to be beneficial for the zero_extend_vector_inreg patch, but the way it looks now (just using a single unpack), it is not needed for anymore (NFC). I think it could be removed or we can keep it here to get the few less vperms...
Two tests - one for each case handled. Not quite sure what happened with vector-constrained-fp-intrinsics.ll - I don't understand why v0 is no longer used, but it seems that this patch begins to change things when the second operand of the VPERM is undefined.
May 12 2020
Hmm, I still think the read/write_64bit_value change is correct, GCC does read/write 64-bit counter values as lo/hi pairs even on big-endian systems.
May 11 2020
May 8 2020
Hmm, talking about MemFold, I'm wondering about this:
// Fused multiply and add/sub need to have the same dst and accumulator reg.
Ah, that still runs into the same name collisions with the MemFold pattern as you pointed out earlier. Sorry for missing that!