- User Since
- Apr 14 2013, 11:48 AM (279 w, 10 h)
Fri, Aug 17
I believe we still don't need an intrinsic for constrained fneg. As you say, floating-point negation is implemented as -0-x, using a regular fsub (*not* the constrained intrinsic). The fsub semantics says that it cannot trap, so this would still be fine -- as long as we're sure the -0-x is always implemented via a dedicated negate instruction, and never via a subtraction. But this seems to be true, the idiom -0-x is recognized early during codegen and always treated specially.
Wed, Aug 15
Ah, that's good to know! So if I understand this correctly, accessing even the 32-bit part (F0S) would be considered to clobber F0D. This is not really necessary, but probably doesn't hurt at this point. We could do the same thing as the HAX you mention to get this modeled exactly.
Well, the original rationale for using different subreg indices for float/vector registers is given here:
Fri, Aug 10
Why? The SystemZ back-end doesn't support 32-bit code generation anyway, so there is no native LLVM support on s390 ...
Thu, Aug 9
Ping. Would this be easier to review if I split it up into multiple patches?
I've checked it in now, thanks!
Tue, Aug 7
LGTM as well.
Fri, Aug 3
Patch LGTM. Thanks!
Thu, Aug 2
Wed, Aug 1
The new code for f2 in cond-move-03.ll is in fact better, since it now actually uses the conditional move instruction instead of a branch ...
Tue, Jul 31
I've just posted a patch to reimplement the broken SystemZ shift count logic: https://reviews.llvm.org/D50096
Fri, Jul 27
Updated patch to make use of the new multi-alternative pattern fragment features, and completed set of test cases.
Ah, OK. Thanks for checking!
Thu, Jul 26
Wed, Jul 25
Sorry, missed your last update.
A few more comments, LGTM with those changes.
See inline comments. In general I agree with this approach.
Mon, Jul 23
I believe in this case it would be preferable to include the full range of latency values. We should try to keep the source as concise and straightforward as possible; if this results in some units being defined that aren't used, that doesn't seem to be a drawback to me. Also, for the common code parts, even if no current instruction uses any particular value, there may always be additional instructions in future processors.
Jul 20 2018
Jul 16 2018
The SystemZ changes look good to me. Thanks for taking care of this!
Jul 15 2018
Hmm ... The SystemZ tests seem to be getting strictly worse. Before, we have in f3:
Jul 13 2018
Superseded by https://reviews.llvm.org/D48545
Jul 12 2018
Jul 11 2018
Ah, I missed that, probably because on SystemZ char is unsigned by default ... Sorry about that.
Jul 10 2018
This caused a build bot failure here:
I've now disabled the test on s390 to get the build bot going reliably again.
This is also failing on s390x (randomly). The problem is that we don't actually have a non-executable flag, so the processor *will* execute the contents of "array" as code. Depending on what's there (it is just uninitialized memory), this will crash sooner or later, but in a way that may or may not make the check succeed randomly.
Jul 6 2018
It looks like this causes build bot failures on s390x-linux. Three of the new tests fail:
Jul 3 2018
Jun 25 2018
This version looks good to me, and we're also seeing good performance results.
Jun 20 2018
Jun 19 2018
I've come up with a suggestion to avoid even the duplicated DAG patterns, by having them implicitly generated by TableGen. This uses a new "alternative" mechanism described in a separate RFC:
Jun 8 2018
Jun 6 2018
Otherwise, this LGTM.
Jun 1 2018
This version now correctly implements target support for STRICT floating-point nodes using a MMO to represent the floating-point exception status. (Note that targets can and should additionally use a register use dependency on *all* floating-point instructions to represent floating-point control state, e.g. rounding mode and exception trap flags.)
Working version of the patch using memory operands.
May 30 2018
May 29 2018
(Failed) attempt to handle strict and regular FP operations without duplicating patterns ...
Is there a reason for dropping the "FP" from "StrictFP" in the method name? I think it would be better to keep it (i.e. use "getStrictFPOperationAction"), in particular since there are already other public methods in this area that use "StrictFP", like isStrictFPOpcode and MutateStrictFPToFP.
May 24 2018
May 23 2018
May 22 2018
In addition to what Andrew said, there's another reason not to mutate this early: sooner or later, we'll need to give the target the chance to actually model strict operations differently (e.g. via more precise tracking of the FP status bit dependencies). At this point, the information which operations were strict must be preserved until the MI level anyway.
May 19 2018
Apr 30 2018
So, in effect, never do the IPRA callee-saved optimization for R11, because the caller *could* use it as frame pointer? That's certainly the conservative solution.
But we do not want to force use of a frame pointer, in general this just reduces performance for no reason on Z. Some functions require a frame pointer (those that do dynamic allocas), but most do not.
This looks like a generic problem to me, many platforms have registers that are reserved only in some functions but not others. If function A where a register is reserved calls function B where the register is not reserved, something must ensure that the register is preserved across the call to B. Usually, this is the case because such registers need to be callee-saved. But if function B is optimized via IPRA to not save the register, we have a problem ...
Well, usually this is handled correctly by the isPhysRegModified call in SystemZFrameLowering::determineCalleeSaves. (The only reason why we even need the extra hasCalls check is that the call instruction is currently not at all stages modeled correctly to show that R14 is clobbered.)
I do not believe we need to do anything special with R11. This is only special if it is used as frame pointer, in which case the code before already adds it as caller-saved: