- User Since
- Apr 14 2013, 11:48 AM (382 w, 3 d)
Yes, this makes sense to me as well. Patch LGTM.
Mon, Aug 10
Tue, Aug 4
This doesn't look correct. As far as I can see, none of the conversion functions were actually changed to handle strict operations. For one, you'll need strict variants of all the PowerPC-specific conversion operations, use them in all the conversion subroutines, and consistently track their chain nodes.
Mon, Aug 3
This LGTM now.
I'm not sure what specific problem you're trying to solve here, but this change in itself looks incorrect.
Thu, Jul 30
Sure, of course assuming the test still passes.
Wed, Jul 29
LGTM for SystemZ as well.
Fri, Jul 24
Thu, Jul 23
Wed, Jul 22
LGTM with the format fixes.
Tue, Jul 21
This LGTM now.
Fri, Jul 17
See inline comments about chain handling.
Thu, Jul 16
The one issue I still see is that when falling back to library calls on older platforms, signaling and non-signaling compares both are treated the same. But that's really a pre-existing problem because this is not fully supported by libgcc (see the comment in TargetLowering::softenSetCCOperands), so it's not really a problem with this patch.
Tue, Jul 14
Jul 12 2020
Jul 10 2020
Committed as 4c5a93bd58bad70e91ac525b0e020bd5119a321a.
Jul 9 2020
Drop AllowNoUniqueAddress parameter; address review comment.
Jul 8 2020
Handle array of empty records correctly.
Yes, this does look like an obvious typo. LGTM!
Jul 7 2020
Rebased against mainline.
This was actually fixed in a different way here: https://reviews.llvm.org/D75014
I'm closing this review now.
Another ping ...
A couple of cosmetic changes inline, otherwise the LGTM. Thanks!
Jun 30 2020
I've now updated the knownbits.ll test case to make it less fragile and still test what it is supposed to test (commit e9c6b63). The problem was use of "undef" (which makes it susceptible to collapse as generic optimizations improve), and the fact that knownBits was operating at the very edge of MaxRecursionDepth, so changes in common code could push it above the limit.
This commit causes test case failures on s390x (unfortunately hidden as the build bot was already red due to an unrelated issue):
For unsigned conversion, the input is treated as signed here, and the result fixed up by adding 2^N; something along the lines of
(signed)N >= 0 ? signed_to_FP((signed)N) : signed_to_FP(N - 2^N) + (FP)2^N
Jun 29 2020
It looks like this broke the s390x build bots, since then I'm seeing a lot of ASan and MSan errors:
Jun 26 2020
Jun 25 2020
Jun 24 2020
Jun 23 2020
Hmm, with clang-cl it seems the driver is trying to use this:
which of course doesn't exist. Not sure what is supposed to be happening here, but it seems that it's falling back on s390x-linux since on s390x, Linux is currently the only supported OS.
Ping? I'd really appreciate feedback about this ABI issue, in particular from other affected target maintainers ...
Line 4 here fails on s390x but not on other Unix flavors, see: http://lab.llvm.org:8011/builders/clang-s390x-linux/builds/33346/steps/ninja%20check%201/logs/FAIL%3A%20Clang%3A%3Adebug-info-codeview-buildinfo.c
Jun 16 2020
Jun 15 2020
This all looks good to me.
Shouldn't the check be done inside canUseBlockOperation? I believe the other user (storeLoadCanUseMVC) really needs the same check.
Jun 10 2020
Jun 5 2020
OK. I agree this can be abandoned now. Thanks!
This looks all good to me now, we should verify that performance results are still good, and then it can go in.
A few cosmetic issues mentioned inline, otherwise this now looks good to me. Thanks!
Jun 4 2020
May 18 2020
It seems that now MachineCSE can remove a few VPERMs also. At least one case was because instead of using an undef source operand, the mask was used (it doesn't help any to replace an undef Ops with Op2 on the last line of getGeneralPermuteNode(), which I first thought). The test case I have reduced (not included) shows that MCSE on trunk fails to remove the second vperm if even though the instructions are near identical. The only difference to the success with this patch is that instead of the reused mask, there is an IMPLICIT_DEF. It looks to me that this is a minor deficiency of MCSE (Two different vregs defined by IMPLICIT_DEF should not stop CSE, I would think).
isZeroOrUndefVector(): The check for the undef vector used to be beneficial for the zero_extend_vector_inreg patch, but the way it looks now (just using a single unpack), it is not needed for anymore (NFC). I think it could be removed or we can keep it here to get the few less vperms...
Two tests - one for each case handled. Not quite sure what happened with vector-constrained-fp-intrinsics.ll - I don't understand why v0 is no longer used, but it seems that this patch begins to change things when the second operand of the VPERM is undefined.
May 12 2020
Hmm, I still think the read/write_64bit_value change is correct, GCC does read/write 64-bit counter values as lo/hi pairs even on big-endian systems.
May 11 2020
May 8 2020
Hmm, talking about MemFold, I'm wondering about this:
// Fused multiply and add/sub need to have the same dst and accumulator reg.
Ah, that still runs into the same name collisions with the MemFold pattern as you pointed out earlier. Sorry for missing that!
May 6 2020
May 4 2020
This is missing from the SystemZ back end
Apr 30 2020
Apr 29 2020
> Is this better than previously?
Apr 28 2020
Not looking at the code so far, but just answering your questions:
Looks good to me now, just one cosmetic comment inline.
Apr 24 2020
Hi Jonas, I haven't looked into everything in detail, but first one fundmental question: My understanding was that the current GeneralShuffle code would detect the shuffle you generate for a zero-extend as a case of a MERGE. Later, combineMERGE would detect that one input of the MERGE is a zero vector, and replace the merge by an UNPACKL.
Apr 15 2020
Apr 9 2020
Just a minor cosmetic suggestion inline. Otherwise the Z ABI parts now all LGTM.
Apr 3 2020
The Z ABI implementation now looks correct to me, except for one corner case: an LLVM-level argument type of f128 is passed via implicit reference. Now, in most cases this is already handled at the clang level, i.e. when you have a "long double" at C source level, you'll already see a pointer type in the LLVM IR instead. However, there are still a few cases where there is a f128 at the LLVM IR level, e.g. as arguments to some compiler builtin / libgcc routines. Those are only transformed to implicit reference in the LLVM back-end, so I believe for completeness this case should also be handled here.
Mar 31 2020
I'm now getting an assertion failure in the LNT test suite:
Thanks for running the benchmark, I guess I'm OK with the current implementation then.
Thanks for working on this, @thakis !
Mar 30 2020
Ah, good point. Dimitry, can you prepare an updated patch to implement Jonas' suggestion?
Mar 27 2020
Looking simply at the SystemZ test case change, for the icmp/[zs]ext case (fun1/fun2), we actually need three instructions (compare, load zero, conditional move), so the change seems reasonable.
Mar 26 2020
The more I think about it, the more it seems that the original check has always been somewhat questionable.
Had a look at the vararg handling. Reviewed only the ABI-relevant parts.