Page MenuHomePhabricator

[PowerPC] Support constrained fp operation for scalar fptosi/fptoui
ClosedPublic

Authored by qiucf on Jun 10 2020, 12:27 AM.

Details

Summary

This patch adds support for constrained conversion operation (fptoui/fptosi) from f32/f64 to i32/i64.

Vector support will be done in following patches. For targets older than ISA 2.06, we need to make strict_fsetcc/strict_fsetccs work well first.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
steven.zhang added inline comments.Jun 10 2020, 1:14 AM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
8148

!Strict is not needed.

8150

Please give the default initializer to avoid uninitialized local variable if llvm_unreachable is off.

llvm/test/CodeGen/PowerPC/fp-strict-conv.ll
172

Does this attribute need ?

qiucf updated this revision to Diff 270025.Jun 10 2020, 8:04 PM
qiucf marked 9 inline comments as done.
qiucf edited the summary of this revision. (Show Details)

Address Steven's comments.

qiucf added inline comments.Jun 10 2020, 8:07 PM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
8148

If Strict is false, we can allow Chain to be null.

llvm/test/CodeGen/PowerPC/fp-strict-conv.ll
172

All function calls done in a function that uses constrained floating point intrinsics must have the strictfp attribute.

Although output won't change if we remove this attr. It's better to keep it according to langref.

qiucf updated this revision to Diff 270027.Jun 10 2020, 8:38 PM
qiucf marked an inline comment as done.

Remove redundant chain logic.

steven.zhang added inline comments.Jun 10 2020, 10:18 PM
llvm/test/CodeGen/PowerPC/fp-strict-conv-f128.ll
16

So, what is it if it is ppcfp128 ?

Some code style comments.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
8253–8255

Don't use Tmp but with some meaningful name.

8254

Op.getValueType() ?

steven.zhang added inline comments.Jun 12 2020, 2:23 AM
llvm/test/CodeGen/PowerPC/fp-strict-conv-f128.ll
4

Please specify option -enable-ppc-quad-precision to enable the quad precision support in powerpc.

qiucf updated this revision to Diff 270660.Jun 14 2020, 10:30 PM
qiucf marked 2 inline comments as done.
qiucf edited the summary of this revision. (Show Details)

Rebase after D81818 and add f128 support

steven.zhang added inline comments.Jun 15 2020, 1:57 AM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
8145

The Strict parameter is not needed as you can check the value of the Chain to know it.

8174

IsStrict, IsSigned

llvm/lib/Target/PowerPC/PPCInstrVSX.td
3278

Please move them into some group has the semantics of truncating. It is not bitconvert.

qiucf updated this revision to Diff 272360.Jun 22 2020, 2:48 AM
qiucf marked 5 inline comments as done.

Address some style-related comments

steven.zhang added inline comments.Jun 22 2020, 3:52 PM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
582

Don’t do this for spe target and remove the test for spe. Sorry about the back and forth.

8158

You don't need this assertion now.

8203

So, do we have problem if it is strict opcode in this code path?

8261

move the assertion into concertFPToInt

8428

ConvertIntToFP and ConvertFPToInt should have the same parameters.

8533

Please remove such kind of change as it is not part of your change.

nemanjai added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
582

I would defer to @jhibbits @chmeee (not sure which of Justin's ID's is active) regarding SPE bits.

8150

This function appears to just take a node and produce a strict version of that node. It seems like there isn't anything target dependent about such an operation so it is very suspicious to me why this is in the PPC back end.
If for some reason it has to be here, please explain why in a comment. If the only target specific part of this is the STRICT_MFVSR node, then at least the rest can be handled by target independent code, can't it?

8153

This seems dangerous to me. You are deciding whether to return a strict node based on whether a valid Chain is provided. I am personally against making decisions based on orthogonal concerns.
If the caller wants a strict node, that should be explicit rather than this strange implicit contract of "If you want a strict node, provide a valid chain."

llvm/lib/Target/PowerPC/PPCISelLowering.h
435

Why? The instruction simply moves bits around. It does not cause any exceptions, it is not subject to rounding, etc. If this is necessary, it needs to be clear from the comment why.

jhibbits added inline comments.Jun 24 2020, 6:48 AM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
582

What are the semantic differences between STRICT_FP_TO_UINT and FP_TO_UINT? EFDCTUIZ/EFSCTUIZ and their signed counterparts, which we currently use for the FP_TO_{U,S}INT, saturate if they can't be represented as a 32-bit integer, and round toward zero always (the non-Z variants round via the current rounding mode).

uweigand added inline comments.Jun 24 2020, 7:31 AM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
582

FP_TO_UINT assumes the current rounding mode is default, and exception conditions can be ignored. With STRICT_FP_TO_UINT those assumptions no longer apply, so it would appear that those instructions you mention should not be used there.

uweigand added inline comments.Jun 25 2020, 8:43 AM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
582

Ah -- my last comment was incorrect, sorry for any confusion.

STRICT_FP_TO_UINT/SINT are in fact an exception to most "STRICT" operations in that they do not use the current rounding mode, but always round towards zero. (Following the C standard as well as the LLVM IR specification.)

So for these operations the only difference between strict and non-strict variants is whether exception conditions can be ignored or not.

jhibbits added inline comments.Jun 25 2020, 12:10 PM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
582

Ah, okay. So, if I understand correctly, EF{S,D}CT{S,U}I should be used for fp_to_{s,u}int, and the current 'Z' variants should be used for the strict_fp_to_*int.

uweigand added inline comments.Jun 26 2020, 5:50 AM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
582

Hmm, if I'm reading the SPE_PEM correctly, I think the "Z" variants are in fact correct for both strict and non-strict variants: they round towards zero (which both variants do), and they handle exceptions (which the strict variant requires, while the non-strict variant doesn't care).

The non-"Z" variants seem wrong either way since they use the current rounding mode, which is incorrect for both strict and non-strict variants.

jhibbits added inline comments.Jun 26 2020, 2:35 PM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
582

Thanks for that explanation @uweigand now I understand. So the change here looks fine to me.

steven.zhang added inline comments.Jun 28 2020, 6:53 PM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
582

The reason why I asked to remove the spe here is to split this patch into two, one for PowerPC and another one for spe which need some inputs from spe experts. Does it make sense ?

qiucf marked 9 inline comments as done.Jun 29 2020, 2:33 AM
qiucf added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
8150

Thanks for the comments. The method (including some overloads) doesn't help much. So I removed it and wrote a simple helper method to get strict version of ppc-specific opcode.

There's method mutateStrictFPToFP doing similar things but maybe not suitable here. I'll send an NFC to make opcode conversion a hook so that each target can benefit from it.

8203

Do you mean these PPC-specific opcodes are not strict? But the result is either load/store or direct moved. What we do here is to keep operands of value consistent. So changing these opcodes to strict may be unnecessary.

8428

FPToInt is round-then-move, while IntToFP is move-then-round. So when IntToFP we need extra information from original Op besides the moved Src.

llvm/lib/Target/PowerPC/PPCISelLowering.h
435

Because (1) this prevents it being combined somewhere unexpectedly; (2) all strict nodes have extra operand for their chains, so replacing original strict_* node with non-strict one will cause operands mismatch. I added necessary comments. Thanks.

qiucf updated this revision to Diff 274020.Jun 29 2020, 2:36 AM
qiucf marked an inline comment as done.
  • Removed SPE logic from this revision.
  • Add some comments for strict nodes.
  • Removed getFPNode method.
  • Addressed other minor comments.
qiucf marked 2 inline comments as done.Jun 29 2020, 2:41 AM
qiucf added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
582

I split D82747 out from this. It's much clearer and independent from this.

qiucf updated this revision to Diff 278704.Jul 17 2020, 3:13 AM
qiucf marked an inline comment as done.

Reflect changes after strict conversion for SPE and enable-ppc-quad-precision's removal.

steven.zhang accepted this revision.Aug 4 2020, 2:34 AM

LGTM now. Please hold on for several days to see if @nemanjai or @uweigand have comments.

This revision is now accepted and ready to land.Aug 4 2020, 2:34 AM

This doesn't look correct. As far as I can see, none of the conversion functions were actually changed to handle strict operations. For one, you'll need strict variants of all the PowerPC-specific conversion operations, use them in all the conversion subroutines, and consistently track their chain nodes.

The patch only adds a strict variant of the direct move, which seems to me the only operation where actually a strict version is not required ...

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
8230

This doesn't look right. The Chain produced by this strict node just vanishes, this cannot be correct.

8247

And you need strict versions of all these conversion operations, I'd assume.

8258

Nothing in the function actually handles strict nodes, that cannot be right.

8307

Why do we need a strict version of a plain move?

8312

Again, nothing in this function actually handles strict nodes ...

qiucf added a comment.Aug 10 2020, 9:49 AM

This doesn't look correct. As far as I can see, none of the conversion functions were actually changed to handle strict operations. For one, you'll need strict variants of all the PowerPC-specific conversion operations, use them in all the conversion subroutines, and consistently track their chain nodes.

The patch only adds a strict variant of the direct move, which seems to me the only operation where actually a strict version is not required ...

Thanks for pointing them out! I have something unclear about chains:

(1) If a constrained operation is expanded into several FP nodes a-b-c, they should all have chain set to former operation (b's chain is a, c's chain is b) even if they have def relationship?

(2) In MachineInstr emitting after ISel, chains are identified just by operand type (countOperand), so some chains are not ignored and assert hit. Is this expected?

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
8307

Yes, a strict move doesn't look reasonable. But the original strict_fptosi node will be replaced by the result. So if directly return the move, operands will not match (no chain in mfvsr). Is there a better way here?

This doesn't look correct. As far as I can see, none of the conversion functions were actually changed to handle strict operations. For one, you'll need strict variants of all the PowerPC-specific conversion operations, use them in all the conversion subroutines, and consistently track their chain nodes.

The patch only adds a strict variant of the direct move, which seems to me the only operation where actually a strict version is not required ...

Thanks for pointing them out! I have something unclear about chains:

(1) If a constrained operation is expanded into several FP nodes a-b-c, they should all have chain set to former operation (b's chain is a, c's chain is b) even if they have def relationship?

That may depend on the specific semantics on which of those nodes may or may not trap. In some cases, the original sequence may in fact not be valid at all for strict mode. But if it is, then they'll need to be chained up properly. If they have data dependencies, then it usually makes sense for the chain to follow that dependency. In other cases, the may be an option for more flexibility by allowing certain operations to be re-scheduled. In those cases you'd give the same input chain to all operations and collect all output chains via a TokenFactor.

(2) In MachineInstr emitting after ISel, chains are identified just by operand type (countOperand), so some chains are not ignored and assert hit. Is this expected?

I'm not sure I understand what specific case you're refering to. But in any case, a chain should *never* be simply ignored, that would always be a bug.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
8307

So this gets expanded to a PPCISD::FCTI... variant (inside convertFPToInt) followed by the MFVSR. Now, with proper chain handling, the input chain of the strict_fptosi is consumed by a strict variant of FCTI..., and the output chain of that STRICT_FCTI... is then the correct output chain for the whole operation. The data output (only) of the STRICT_FCTI... acts then as the input of the MFVSR, and the output of the MFVSR is the correct value output of the whole operation.

So if short, you need to replace

(out-val, out-chain) = strict_fptosi (in-val, in-chain)

by

(tmp-val, out-chain) = STRICT_FCTI... (in-val, in-chain)
out-val = MFVSR (tmp-val)

This probably will require some ReplaceAllUses... instead of just returning a result, as is already done elsewhere with chain output instructions.

qiucf updated this revision to Diff 285277.Aug 13 2020, 1:09 AM

Thanks for the detailed explanation!

  • Remove strict mfvsr
  • Update tests
  • Add strict fc*
  • Add chains to some expanded operation
uweigand added inline comments.Aug 13 2020, 5:00 AM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
8068

This looks still wrong to me. I think you should replace the *chain* with the one from Conv, and return the value (i.e. Mov). Something like:

SDValue Mov = DAG.getNode(PPCISD::MFVSR, dl, Op.getValueType(), Conv);
DAG.ReplaceAllUsesOfValueWith(SDValue(Op, 1), Conv.getValue(1));
return Mov;

(Actually, returning the value is then the same in strict and non-strict cases, so this can be merged.)

Or, another possible approach would be to not use ReplaceAllUses at all but reconstruct the multi-output value via DAG.getMergeValues. Something like;

SDValue Mov = DAG.getNode(PPCISD::MFVSR, dl, Op.getValueType(), Conv);
return DAG.getMergeValues({Mov, Conv.getValue(1)}, dl);
llvm/lib/Target/PowerPC/PPCInstrVSX.td
3772

This also doesn't look quite correct. The XSCVQP... instructions are not (yet?) marked as mayRaiseFPException, instead they're marked as hasSideEffects. This means that the exception flag is probably not going to be automatically transferred over to the MI level.

I think if the instructions are changed to set mayRaiseFPException, that should work correctly. But it would be best to have a test case that validates that the "nofpexcept" marker is transferred depending on the value of the "fpexect." metadata in the strict intrinsic (in LLVM IR).

qiucf updated this revision to Diff 286546.Aug 19 2020, 6:19 AM
qiucf marked 2 inline comments as done.
  • Return a merge_value for round and move.
  • Set fp exception bit for f128 round instructions.
llvm/lib/Target/PowerPC/PPCInstrVSX.td
3772

Thanks for the reminder. The FP exception bits in PPC instruction definition files need to be carefully re-examined with more tests..

uweigand accepted this revision.Aug 19 2020, 8:10 AM

This LGTM now. Thanks!

This revision was automatically updated to reflect the committed changes.