Page MenuHomePhabricator
Feed Advanced Search

Mon, May 18

uweigand accepted D79925: [SystemZ] Eliminate the need to create a zero vector by reusing the mask..

LGTM, thanks!

Mon, May 18, 12:26 PM · Restricted Project
uweigand added a comment to D79925: [SystemZ] Eliminate the need to create a zero vector by reusing the mask..

It seems that now MachineCSE can remove a few VPERMs also. At least one case was because instead of using an undef source operand, the mask was used (it doesn't help any to replace an undef Ops[1] with Op2 on the last line of getGeneralPermuteNode(), which I first thought). The test case I have reduced (not included) shows that MCSE on trunk fails to remove the second vperm if even though the instructions are near identical. The only difference to the success with this patch is that instead of the reused mask, there is an IMPLICIT_DEF. It looks to me that this is a minor deficiency of MCSE (Two different vregs defined by IMPLICIT_DEF should not stop CSE, I would think).

isZeroOrUndefVector(): The check for the undef vector used to be beneficial for the zero_extend_vector_inreg patch, but the way it looks now (just using a single unpack), it is not needed for anymore (NFC). I think it could be removed or we can keep it here to get the few less vperms...

Two tests - one for each case handled. Not quite sure what happened with vector-constrained-fp-intrinsics.ll - I don't understand why v0 is no longer used, but it seems that this patch begins to change things when the second operand of the VPERM is undefined.

Mon, May 18, 8:34 AM · Restricted Project
uweigand added a comment to D78717: [SystemZ] Implement -fstack-clash-protection.

Normally, I would have expected a pseudo MachineInstruction to be built here with an immediate operand for the size requirement...

That sounds very nice. If you go that way I'll happily update the X86 code accordingly. Or If you point me to some example / doc / code on pseudo MachineInstruction, I can implement that too.

@uweigand : Would you rather use a target pseudo instruction for this rather than using a call with metadata?

Mon, May 18, 5:52 AM · Restricted Project

Tue, May 12

uweigand added a comment to D49132: Fix gcov profiling on big-endian machines.

Hmm, I still think the read/write_64bit_value change is correct, GCC does read/write 64-bit counter values as lo/hi pairs even on big-endian systems.

Tue, May 12, 1:02 AM

Mon, May 11

uweigand accepted D76705: [SystemZ] Improve foldMemoryOperandImpl: vec->FP conversions.

Can we instead move the W->FP mnemonic conversion back into BinaryVRRa/TernaryVRRe instead, so that the OpKey for all these vector instructions consistently holds the FP-reg instruction name? The MemFold-pseudos would then have the same mnemonic as OpKey and MemKey.

... Ah, that still runs into the same name collisions with the MemFold pattern as you pointed out earlier. Sorry for missing that!

Yeah, WFADB:ADBR would map to ADB, and not ADB_MemFoldPseudo which is what we need...

So if we use an extra modifier in the OpKey for those case, then we're basically back to your version of Apr 28, 7:15 PM, except possibly with the explicit mnemonics in the .td file.

Ok, went back to that version plus added the explicit fp_mnemonic fields, which seems nice to me as they eliminate those ugly string substitutions.

Mon, May 11, 8:34 AM · Restricted Project

Fri, May 8

uweigand added a comment to D78486: [SystemZ] Expand vector zero-extend into a shuffle..

So now I'm wondering instead: why does your new code ever trigger in the first place; why isn't this already handled in GeneralShuffle::add?

I think that be two different cases, where my handling is addressing a *new* permute resulting from combining two operands, when trying to combine (unpack) a third source operand with that first (new) permute.

Fri, May 8, 12:52 PM · Restricted Project
uweigand added a comment to D76705: [SystemZ] Improve foldMemoryOperandImpl: vec->FP conversions.

Hmm, talking about MemFold, I'm wondering about this:
​ // Fused multiply and add/sub need to have the same dst and accumulator reg.

Fri, May 8, 12:52 PM · Restricted Project
uweigand added a comment to D76705: [SystemZ] Improve foldMemoryOperandImpl: vec->FP conversions.

Ah, that still runs into the same name collisions with the MemFold pattern as you pointed out earlier. Sorry for missing that!

Fri, May 8, 12:20 PM · Restricted Project
uweigand added a comment to D76705: [SystemZ] Improve foldMemoryOperandImpl: vec->FP conversions.

Reverted back to the first version of new instruction classes providing the mapping from W...-reg -> FP-mem instructions.

Fri, May 8, 9:37 AM · Restricted Project
uweigand added a comment to D78717: [SystemZ] Implement -fstack-clash-protection.

Added probing of the tail of dyn-alloca, with an extra check for zero tail, roughly like GCC. It seems it can be assumed that the alloca size will always be a multiple of 8 bytes, or? I think that is necessary, or there might be final probe partially below SP (if tail is e.g. 4 bytes). Also, if the stack-probe size is 0, an infinite loop would result (assert?), but I suppose that would always be noticeable.

Fri, May 8, 9:37 AM · Restricted Project

Wed, May 6

uweigand committed rG947f78ac27f4: [SystemZ] Fix/optimize vec_load_len and related intrinsics (authored by uweigand).
[SystemZ] Fix/optimize vec_load_len and related intrinsics
Wed, May 6, 12:27 PM

Mon, May 4

uweigand added a comment to D78486: [SystemZ] Expand vector zero-extend into a shuffle..

I'm wondering: unless I'm missing something, there's still one specific case where you generate a vperm followed by an unpack (the case where you already had a permute as source). Wouldn't it be preferable to just use a single vperm there as well?

This is the case where the input is a permute that has no other users, which means that it's being replaced. So instead of vperm + vperm, we get a vperm + unpack, which follows the same reasoning of replacing a vperm with a single unpack.

For example, in the case with three source ops, where A and B first are combined with a permute, and now AB and C are to be combined: instead of permuting AB and C, the AB permute is changed so that AB and C can be unpacked.

Mon, May 4, 12:21 PM · Restricted Project
uweigand added a comment to D78717: [SystemZ] Implement -fstack-clash-protection.

gcc seems not to be probing the residual allocation after the loop. However if only two (unrolled) allocations were made, the residual is also probed.

I'm not seeing this, do you have an example?

void large_stack() {
  volatile int stack[2000], i;
  for (i = 0; i < sizeof(stack) / sizeof(int); ++i)
    stack[i] = i;
}

With stack[2000], I see

aghi    %r15,-4096
cg      %r0,4088(%r15)
aghi    %r15,-4072
cg      %r0,4064(%r15)

With stack[8000], I don't see a probe after the loop...

Mon, May 4, 10:09 AM · Restricted Project
uweigand added a comment to D79283: [PowerPC] Add missing handling for half precision.

This is missing from the SystemZ back end

Mon, May 4, 6:22 AM · Restricted Project

Thu, Apr 30

uweigand added a comment to D78486: [SystemZ] Expand vector zero-extend into a shuffle..

It certainly seems to be an improvement on two benchmarks to do just either one unpack or else a vperm (and not multiple unpacks). In fact, i505.mcf actually regressed (2.5%) when doing two unpacks instead of a vperm (with -ffp-contract=fast). So the initial idea of reducing the number of vperms seems to have been proven wrong - it is better to have a single vperm on the critical path rather than multiple unpacks.

Thu, Apr 30, 11:47 AM · Restricted Project

Apr 29 2020

uweigand committed rGe1de2773a534: [SystemZ] Allow specifying plain register numbers in AsmParser (authored by uweigand).
[SystemZ] Allow specifying plain register numbers in AsmParser
Apr 29 2020, 11:50 AM
uweigand committed rG6bfde063f0a7: [SystemZ] Simplify register parsing in AsmParser (authored by uweigand).
[SystemZ] Simplify register parsing in AsmParser
Apr 29 2020, 11:50 AM
uweigand added a comment to D76705: [SystemZ] Improve foldMemoryOperandImpl: vec->FP conversions.

> Is this better than previously?

Apr 29 2020, 3:43 AM · Restricted Project

Apr 28 2020

uweigand committed rGc90e09b13c90: [SystemZ] Use reserved keywords in vecintrin.h (authored by uweigand).
[SystemZ] Use reserved keywords in vecintrin.h
Apr 28 2020, 10:12 AM
uweigand committed rG095ccf445565: [SystemZ] Avoid __INTPTR_TYPE__ conversions in vecintrin.h (authored by uweigand).
[SystemZ] Avoid __INTPTR_TYPE__ conversions in vecintrin.h
Apr 28 2020, 10:12 AM
Herald added a project to D78717: [SystemZ] Implement -fstack-clash-protection: Restricted Project.

Not looking at the code so far, but just answering your questions:

Apr 28 2020, 9:06 AM · Restricted Project
uweigand added a comment to D76705: [SystemZ] Improve foldMemoryOperandImpl: vec->FP conversions.

Looks good to me now, just one cosmetic comment inline.

Apr 28 2020, 8:34 AM · Restricted Project
uweigand added a comment to D76705: [SystemZ] Improve foldMemoryOperandImpl: vec->FP conversions.

Maybe there should be a test that checks that a live CC value isn't clobbered by introducing an FP-mem instruction, such as ADB...

Apr 28 2020, 8:33 AM · Restricted Project

Apr 24 2020

uweigand added a comment to D78486: [SystemZ] Expand vector zero-extend into a shuffle..

Hi Jonas, I haven't looked into everything in detail, but first one fundmental question: My understanding was that the current GeneralShuffle code would detect the shuffle you generate for a zero-extend as a case of a MERGE. Later, combineMERGE would detect that one input of the MERGE is a zero vector, and replace the merge by an UNPACKL.

Apr 24 2020, 8:04 AM · Restricted Project

Apr 15 2020

uweigand accepted D78187: [SystemZ] Bugfix in adjustSubwordCmp().

LGTM, thanks!

Apr 15 2020, 3:47 AM · Restricted Project

Apr 9 2020

uweigand added a comment to D76624: [MSan] Add instrumentation for SystemZ.

Thanks!

Apr 9 2020, 4:50 AM · Restricted Project
uweigand added a comment to D76624: [MSan] Add instrumentation for SystemZ.

Just a minor cosmetic suggestion inline. Otherwise the Z ABI parts now all LGTM.

Apr 9 2020, 3:45 AM · Restricted Project

Apr 3 2020

uweigand added a comment to D76624: [MSan] Add instrumentation for SystemZ.

The Z ABI implementation now looks correct to me, except for one corner case: an LLVM-level argument type of f128 is passed via implicit reference. Now, in most cases this is already handled at the clang level, i.e. when you have a "long double" at C source level, you'll already see a pointer type in the LLVM IR instead. However, there are still a few cases where there is a f128 at the LLVM IR level, e.g. as arguments to some compiler builtin / libgcc routines. Those are only transformed to implicit reference in the LLVM back-end, so I believe for completeness this case should also be handled here.

Apr 3 2020, 11:20 AM · Restricted Project

Mar 31 2020

uweigand committed rGc726c920e040: [SystemZ] Allow %r0 in address context for AsmParser (authored by uweigand).
[SystemZ] Allow %r0 in address context for AsmParser
Mar 31 2020, 10:53 AM
uweigand added a comment to D71938: [SCCP] Use constant ranges for casts..

I'm now getting an assertion failure in the LNT test suite:
http://lab.llvm.org:8011/builders/clang-s390x-linux-lnt/builds/17826/steps/test-suite/logs/stdio

Mar 31 2020, 8:16 AM · Restricted Project
uweigand accepted D76771: [SystemZ] Improve foldMemoryOperandImpl: MS(G)RKC -> MS(G)C.

LGTM, thanks!

Mar 31 2020, 7:09 AM · Restricted Project
uweigand added inline comments to D76771: [SystemZ] Improve foldMemoryOperandImpl: MS(G)RKC -> MS(G)C.
Mar 31 2020, 6:04 AM · Restricted Project
uweigand added a comment to D76771: [SystemZ] Improve foldMemoryOperandImpl: MS(G)RKC -> MS(G)C.

Thanks for running the benchmark, I guess I'm OK with the current implementation then.

Mar 31 2020, 12:31 AM · Restricted Project
uweigand added a comment to D75914: systemz: allow configuring default CLANG_SYSTEMZ_ARCH.

Thanks for working on this, @thakis !

Mar 31 2020, 12:30 AM · Restricted Project, Restricted Project

Mar 30 2020

uweigand added a comment to D75914: systemz: allow configuring default CLANG_SYSTEMZ_ARCH.

Ah, good point. Dimitry, can you prepare an updated patch to implement Jonas' suggestion?

Mar 30 2020, 6:27 AM · Restricted Project, Restricted Project
uweigand committed rG9c9d88d8b1bb: [SystemZ] Allow configuring default CLANG_SYSTEMZ_ARCH (authored by uweigand).
[SystemZ] Allow configuring default CLANG_SYSTEMZ_ARCH
Mar 30 2020, 5:23 AM
uweigand closed D75914: systemz: allow configuring default CLANG_SYSTEMZ_ARCH.
Mar 30 2020, 5:23 AM · Restricted Project, Restricted Project

Mar 27 2020

uweigand added a comment to D76124: [TTI] Remove getOperationCost.

Looking simply at the SystemZ test case change, for the icmp/[zs]ext case (fun1/fun2), we actually need three instructions (compare, load zero, conditional move), so the change seems reasonable.

Mar 27 2020, 7:37 AM · Restricted Project

Mar 26 2020

uweigand added a comment to D76771: [SystemZ] Improve foldMemoryOperandImpl: MS(G)RKC -> MS(G)C.

The more I think about it, the more it seems that the original check has always been somewhat questionable.

Mar 26 2020, 11:55 AM · Restricted Project
uweigand added a comment to D76624: [MSan] Add instrumentation for SystemZ.

Had a look at the vararg handling. Reviewed only the ABI-relevant parts.

Mar 26 2020, 9:11 AM · Restricted Project
uweigand committed rGdc37287320cc: [asan] Fix read_binary_name_regtest.c test dying with SIGPIPE (authored by iii).
[asan] Fix read_binary_name_regtest.c test dying with SIGPIPE
Mar 26 2020, 5:56 AM
uweigand committed rG2ca7fe379647: [compiler-rt] Use uname syscall in GetKernelAreaSize() (authored by iii).
[compiler-rt] Use uname syscall in GetKernelAreaSize()
Mar 26 2020, 5:56 AM
uweigand closed D76776: [compiler-rt] Use uname syscall in GetKernelAreaSize().
Mar 26 2020, 5:55 AM · Restricted Project
uweigand closed D76576: [asan] Fix read_binary_name_regtest.c test dying with SIGPIPE.
Mar 26 2020, 5:55 AM · Restricted Project
uweigand added a comment to D75914: systemz: allow configuring default CLANG_SYSTEMZ_ARCH.

This doesn't apply cleanly to current mainline. Can you rebase and test again? I'll check it in then.

Mar 26 2020, 5:55 AM · Restricted Project, Restricted Project
uweigand added a comment to D76771: [SystemZ] Improve foldMemoryOperandImpl: MS(G)RKC -> MS(G)C.

I'm not sure I understand those latest changes. You seem to no longer check at all whether the target opcode actually requires tied operands, you just always tie them?

Mar 26 2020, 4:50 AM · Restricted Project

Mar 25 2020

uweigand added inline comments to D76771: [SystemZ] Improve foldMemoryOperandImpl: MS(G)RKC -> MS(G)C.
Mar 25 2020, 7:32 AM · Restricted Project
uweigand accepted D76055: [SystemZ] Improve foldMemoryOperandImpl()..

LGTM, thanks!

Mar 25 2020, 7:32 AM · Restricted Project
uweigand accepted D75914: systemz: allow configuring default CLANG_SYSTEMZ_ARCH.

LGTM, thanks!

Mar 25 2020, 6:59 AM · Restricted Project, Restricted Project

Mar 24 2020

uweigand added inline comments to D76055: [SystemZ] Improve foldMemoryOperandImpl()..
Mar 24 2020, 9:39 AM · Restricted Project
uweigand added a comment to D76055: [SystemZ] Improve foldMemoryOperandImpl()..

See inline comments. Otherwise this looks good, but I'd rather commit this as two separate patches (the memory-immediate changes in one, and the MS(G)RKC changes in the other).

Mar 24 2020, 9:07 AM · Restricted Project

Mar 23 2020

uweigand added a comment to D76055: [SystemZ] Improve foldMemoryOperandImpl()..

Oops, sorry, missed your update. Will look into that shortly.

Mar 23 2020, 9:48 AM · Restricted Project
uweigand added a comment to D76055: [SystemZ] Improve foldMemoryOperandImpl()..
logical compares (CLFHSI, CLGHSI) -- those should be trivial to add

Patch updated to include these as well, with a check for an uint<16> immediate. This gives +1800 CLGHSI and +200 CLFHSI compared to before.

Mar 23 2020, 9:48 AM · Restricted Project
uweigand accepted D76370: [SystemZ] Perform instruction shortening for fused fp ops..

LGTM, thanks!

Mar 23 2020, 4:20 AM · Restricted Project

Mar 16 2020

uweigand accepted D75978: [SystemZ] Avoid scalarization of S/UINT_TO_FP.

LGTM, thanks!

Mar 16 2020, 3:09 AM · Restricted Project
uweigand added inline comments to D76201: [TargetLowering] Only demand a rotation's modulo amount bits.
Mar 16 2020, 2:10 AM · Restricted Project

Mar 12 2020

uweigand added a comment to D75978: [SystemZ] Avoid scalarization of S/UINT_TO_FP.

Not yet looking at the implementation details, but a couple of comments on the overall approach:

Mar 12 2020, 8:08 AM · Restricted Project
uweigand added a comment to D76055: [SystemZ] Improve foldMemoryOperandImpl()..

Ah, that's a good idea. I agree this makes sense.

Mar 12 2020, 5:53 AM · Restricted Project

Mar 10 2020

uweigand added a comment to D75914: systemz: allow configuring default CLANG_SYSTEMZ_ARCH.

Thanks for working on this! A few comments inline.

Mar 10 2020, 6:58 AM · Restricted Project, Restricted Project

Mar 2 2020

uweigand accepted D75014: [InstrEmitter, SystemZ] Copy Access registers with the correct register class..

It seems safest to build the target instructions compared to just constrain the virtual register class of the register of the COPY.

I'm not sure I understand this: can you explain what problem you see with constraining the register class?

I remember seeing that the register allocator would create a new virtual register and give it the register class based on calling MI->getRegClassConstraint() (or TII->getRegClass() directly). So in theory, it seems that if there is no MCInstrDesc anywhere that demands a particular register class, regalloc might feel free to take the optimal one (GRX32). I am not sure this is needed, but there is no mechanism that I know of that would constrain a *COPY* register regclass, although it may be that a COPY of a physreg into a virtreg is left alone.

Maybe someone could instead confirm that physreg copies do not get their virtreg regclasses changed ever. Maybe that is obvious and I just wasn't aware. Or, if there is no guarantee for this, perhaps a target hook like I suggested (getPhysRegCopyRegClass()) would be a better solution after all, since that would also then be an error if regalloc broke that.

Mar 2 2020, 7:24 AM · Restricted Project
uweigand accepted D75290: [SystemZ] Also accept ISD::USUBO in shouldFormOverflowOp().

I see. I still agree it is preferable to only generate the overflow ops for types where they will be legal.

Mar 2 2020, 5:52 AM · Restricted Project
uweigand accepted D75367: [SystemZ] Bugfix for backchain with packed-stack.

LGTM, thanks!

Mar 2 2020, 5:52 AM · Restricted Project

Feb 28 2020

uweigand accepted D75290: [SystemZ] Also accept ISD::USUBO in shouldFormOverflowOp().

Otherwise, LGTM. Thanks!

Feb 28 2020, 4:59 AM · Restricted Project
uweigand added a comment to D75014: [InstrEmitter, SystemZ] Copy Access registers with the correct register class..

It seems safest to build the target instructions compared to just constrain the virtual register class of the register of the COPY.

Feb 28 2020, 4:59 AM · Restricted Project

Feb 25 2020

uweigand added a comment to D75014: [InstrEmitter, SystemZ] Copy Access registers with the correct register class..

Oh, and one more thing: either way, can you please add the original test case from D74601 so we're sure this problem is (and remains) fixed. Thanks!

I added the two test functions I could find already as @_ZTW1x -> tls-08.ll:@fun0, and @_Z6squareiiiiiii -> tls-09.ll.

Feb 25 2020, 8:53 AM · Restricted Project
uweigand added a comment to D74163: [demangler] PPC and S390: Fix parsing of e-prefixed long double literals.

I'm still wondering about Intel. Can there ever be a literal encoded using 'g' on Intel? If yes, then treating it as "long double" would still be wrong, because 'g' encodes IEEE128 (__float128), while "long double" is the Intel extended (80-bit) format, right?

On the other hand, if 'g' encoded literals can never happen on Intel (or other platforms), maybe it would be better to have the code handling 'g' within a #ifdef section only active on powerpc and s390?

For X86, 'e' is used for 80-bit long double and 'g' is used for 128-bit long double. The following is the code in Clang.

clang/lib/Basic/Targets/X86.h
....
const char *getLongDoubleMangling() const override {
  return LongDoubleFormat == &llvm::APFloat::IEEEquad() ? "g" : "e";
}
...
Feb 25 2020, 7:52 AM · Restricted Project, Restricted Project, Restricted Project
uweigand added a comment to D75014: [InstrEmitter, SystemZ] Copy Access registers with the correct register class..

I'm wondering if this handles all cases ... for CopyFromReg you apparently rely on the logic in EmitCopyFromReg that checks whether the value is used by some MachineNode with constrained regclass. But that logic isn't unconditionally used, e.g. it is skipped for "cloned" SUs ... not sure whether this could cause issues in more complicated scenarios.

Feb 25 2020, 7:38 AM · Restricted Project
uweigand added a comment to D75014: [InstrEmitter, SystemZ] Copy Access registers with the correct register class..

Oh, and one more thing: either way, can you please add the original test case from D74601 so we're sure this problem is (and remains) fixed. Thanks!

Feb 25 2020, 7:38 AM · Restricted Project
uweigand added a comment to D74163: [demangler] PPC and S390: Fix parsing of e-prefixed long double literals.

Hi @uweigand, Thanks for your comments. Please see my explanations below.

@uweigand, Hi, I've addressed your comments. Any further comments?

I'm not very familiar with this code base. However, I am somewhat confused by your proposed change to "parseExprPrimary". In particular, where you now parse 'e' literals as "double" on powerpc/s390x, and 'g' literals as "long double" everywhere. This seems incorrect to me.

In mangled names, floating-point literals are encoded using a fixed-length lowercase hexadecimal string corresponding to the internal representation, high-order bytes first. For example, float literal -1.0f is encoded as "fbf800000". For a 64-bit long double literal on powerpc and s390x, the encoded form is type code 'e' followed by 16 hexadecimal digits. For a 128-bit long double literal on powerpc and s390x, the encoded form is type code 'g' followed by 32 hexadecimal digits. So, the proposed the change allows the parser to treat type code 'e' as a double (64-bit) and take the following 16 hexadecimal digits as the internal representation of the literal, instead of treating it as a 128-bit long double and looking for 32 hexadecimal digits after it. When the type code is 'g', the parser will be looking for 32 hexadecimal digits. These are changes for parsing literals in the mangled names.

Feb 25 2020, 3:31 AM · Restricted Project, Restricted Project, Restricted Project

Feb 23 2020

uweigand accepted D74506: [SystemZ] Support the kernel backchain.

LGTM, thanks!

Feb 23 2020, 9:05 AM · Restricted Project, Restricted Project

Feb 22 2020

uweigand added a comment to D74163: [demangler] PPC and S390: Fix parsing of e-prefixed long double literals.

@uweigand, Hi, I've addressed your comments. Any further comments?

Feb 22 2020, 6:06 AM · Restricted Project, Restricted Project, Restricted Project
uweigand added inline comments to D74506: [SystemZ] Support the kernel backchain.
Feb 22 2020, 5:53 AM · Restricted Project, Restricted Project

Feb 21 2020

uweigand added a comment to D74506: [SystemZ] Support the kernel backchain.

OK, just a few more small comments, otherwise looks really good now. Thanks!

Feb 21 2020, 5:29 AM · Restricted Project, Restricted Project

Feb 20 2020

uweigand added a comment to D74506: [SystemZ] Support the kernel backchain.

Yes, this now looks correct to me. However, we now have duplicated computation between LowerFormalArguments and assignCalleeSavedSpillSlots again, which I don't really like.

Feb 20 2020, 6:22 AM · Restricted Project, Restricted Project

Feb 17 2020

uweigand added a comment to D74506: [SystemZ] Support the kernel backchain.

It seems that compiling a vararg function with gcc -mpacked-stack and -msoft-float places the stored GPRs in the default slots.

Feb 17 2020, 7:37 AM · Restricted Project, Restricted Project

Feb 13 2020

uweigand added a comment to D72685: [PowerPC] Exploit VSX rounding instrs for rint.

I believe the conversion of SNaN to QNaN is expected here. Note that the (current) C standard does not mention support signaling NaNs at all, and does not really ever mention them. This is planned to be fixed with the upcoming C2x version, which explicitly states that "rint" is supposed to implement the IEEE-754 "roundToIntegerExact" function. And that function, like most general functions defined by IEEE-754, is indeed defined to return a QNaN when the input is a SNaN.

Feb 13 2020, 2:03 AM · Restricted Project

Feb 11 2020

uweigand accepted D74352: [SystemZ] Bugfix in emitSelect().

I guess it would be nicer if we could still handle this case somehow.

Feb 11 2020, 7:32 AM · Restricted Project
uweigand added inline comments to D74352: [SystemZ] Bugfix in emitSelect().
Feb 11 2020, 4:30 AM · Restricted Project
uweigand committed rGaeba7ba9f3da: Add SystemZ release notes (authored by uweigand).
Add SystemZ release notes
Feb 11 2020, 3:54 AM

Feb 10 2020

uweigand added a comment to D72675: [Clang][Driver] Fix -ffast-math/-ffp-contract interaction.

I'm not sure whether this is deliberate (but it seems weird) or just a bug. I can ask the GCC developers ...

Please do. If there's a rationale, we should know.

Feb 10 2020, 8:34 AM · Restricted Project
uweigand accepted D74086: [SystemZ] Add a subtarget cache like some other targets already have..

It seems that the general precedence (when invoking llc) is that first (in reverse order) is "target-features", and then -mattr. So for example a function with the attribute "target-features"="-vector" compiled with llc -mattr=+vector, gets a (SystemZTargetMachine::getSubtargetImpl) FS -vector,+vector, meaning that +vector wins.

Feb 10 2020, 8:01 AM · Restricted Project

Feb 7 2020

uweigand added a comment to D74086: [SystemZ] Add a subtarget cache like some other targets already have..

We need the "+soft-float" feature in UsesVectorABI(), so we have the front end add it, so we don't need to check "-use-soft-float"="true", like the other targets, right?

Feb 7 2020, 8:36 AM · Restricted Project
uweigand accepted D73378: [SystemZ] Add implementation for the intrinsic llvm.read_register.

LGTM as well.

Feb 7 2020, 8:00 AM · Restricted Project
uweigand accepted D74146: [SytemZ] Disable vector ABI when using option -march=arch[8|9|10].
Feb 7 2020, 8:00 AM · Restricted Project, Restricted Project
uweigand added a comment to D74163: [demangler] PPC and S390: Fix parsing of e-prefixed long double literals.
For S390, type code 'g' is used for 128-bit long double literals
Feb 7 2020, 6:03 AM · Restricted Project, Restricted Project, Restricted Project
uweigand added a comment to D74146: [SytemZ] Disable vector ABI when using option -march=arch[8|9|10].

LGTM, thanks!

Feb 7 2020, 2:49 AM · Restricted Project, Restricted Project

Feb 4 2020

uweigand accepted D72189: [SystemZ] Support -msoft-float.

LGTM, thanks!

Feb 4 2020, 1:38 AM · Restricted Project, Restricted Project

Feb 3 2020

uweigand added a comment to D72189: [SystemZ] Support -msoft-float.

If soft-float, then we have *no* VectorABI!

Oh! I misunderstood your previous comment "with -msoft-float, GCC also falls back to the 16-byte vector alignment, so we must match that for ABI compatibility" to mean that (source code) vectors should be aligned to 16 bytes in memory.

Feb 3 2020, 6:25 AM · Restricted Project, Restricted Project

Jan 31 2020

uweigand added a comment to D72189: [SystemZ] Support -msoft-float.

Ah, I see. But note that you're now not supporting "use-soft-float" at all (which I think is fine at this step!), so you should update all tests to no longer use "use-soft-float".

Done.
All llc invocations use -mattr=soft-float instead of relying on the function attributes, as must be done.

Jan 31 2020, 5:32 AM · Restricted Project, Restricted Project

Jan 28 2020

uweigand added a comment to D72189: [SystemZ] Support -msoft-float.

174 ↗

	(On Diff #238345)

ok, removed it from this patch.

I had to change soft-float-02.ll to use -mattr=soft-float instead of a function attribute after removing this.

Jan 28 2020, 11:10 AM · Restricted Project, Restricted Project

Jan 27 2020

uweigand added inline comments to D72189: [SystemZ] Support -msoft-float.
Jan 27 2020, 5:26 AM · Restricted Project, Restricted Project

Jan 24 2020

uweigand accepted D71816: [DAGCombiner] Add combine for (not (strict_fsetcc)) to create a strict_fsetcc with the opposite condition..

LGTM.

Jan 24 2020, 6:41 AM · Restricted Project
uweigand added a comment to D72906: [X86] Improve X86 cmpps/cmppd/cmpss/cmpsd intrinsics with strictfp.

The constrained fcmp intrinsics don't allow the TRUE/FALSE predicates.

Hmm, maybe they should then? The only reason I didn't add them initially was that I wasn't sure they were useful for anything; if they are, it should be straightforward to add them back.

What would we lower it to on a target that doesn’t support it natively?

Any supported compare (quiet or signaling as appropriate, just so we get the correct exceptions), and then ignore the result (and use true/false constant result instead)?

Sure. Is that something we want to force all targets to have to implement just to handle this case for X86? Unless we can come up with a generic DAG combine to pick a valid condition alternate so that the lowering code for each target doesn't have to deal with it.

Jan 24 2020, 4:52 AM · Restricted Project

Jan 21 2020

uweigand accepted D72722: [FPEnv] [SystemZ] Platform-specific builtin constrained FP enablement.

LGTM, thanks!

Jan 21 2020, 8:51 AM · Restricted Project

Jan 20 2020

uweigand added a comment to D72675: [Clang][Driver] Fix -ffast-math/-ffp-contract interaction.

I've had a quick look at GCC, and it seems there's a couple of different issues.

Jan 20 2020, 9:49 AM · Restricted Project

Jan 17 2020

uweigand added a comment to D72906: [X86] Improve X86 cmpps/cmppd/cmpss/cmpsd intrinsics with strictfp.

The constrained fcmp intrinsics don't allow the TRUE/FALSE predicates.

Hmm, maybe they should then? The only reason I didn't add them initially was that I wasn't sure they were useful for anything; if they are, it should be straightforward to add them back.

What would we lower it to on a target that doesn’t support it natively?

Jan 17 2020, 9:35 AM · Restricted Project
uweigand added a comment to D72906: [X86] Improve X86 cmpps/cmppd/cmpss/cmpsd intrinsics with strictfp.

The constrained fcmp intrinsics don't allow the TRUE/FALSE predicates.

Jan 17 2020, 2:24 AM · Restricted Project

Jan 16 2020

uweigand added a comment to D72722: [FPEnv] [SystemZ] Platform-specific builtin constrained FP enablement.

What are the semantics of vfnmadb with respect to when it rounds vs the negation?

Jan 16 2020, 5:00 PM · Restricted Project
uweigand committed rGcebba7ce3952: [SystemZ] Avoid unnecessary conversions in vecintrin.h (authored by uweigand).
[SystemZ] Avoid unnecessary conversions in vecintrin.h
Jan 16 2020, 10:04 AM
uweigand added a comment to D72722: [FPEnv] [SystemZ] Platform-specific builtin constrained FP enablement.

A few comments (see inline) -- otherwise this looks good to me, thanks!

Jan 16 2020, 10:03 AM · Restricted Project