Page MenuHomePhabricator

Add constrained fptrunc and fpext intrinsics
Needs ReviewPublic

Authored by kpn on Dec 19 2018, 12:20 PM.

Details

Summary

This ticket was split off from D43515. It contains just the experimental constrained fptrunc and fpext intrinsics plus related changes like documentation. These two intrinsics are simpler to implement so I don't see a reason they need to wait on the rest of D43515.

Diff Detail

Event Timeline

kpn created this revision.Dec 19 2018, 12:20 PM
kpn added a comment.Jan 2 2019, 9:03 AM

I just realized I missed a couple of changes. Let me work on those and I'll update later.

kpn updated this revision to Diff 182019.Jan 16 2019, 5:44 AM

It was pointed out before I split this ticket out that I wasn't properly passing the correct operand in ExpandNode() when handling the two new STRICT nodes. I've corrected that.

cwabbott added inline comments.
docs/LangRef.rst
14830

Hi,

I've been working on implementing a Vulkan extension which allows the user to specifiy different rounding modes for the AMDGPU backend. I'm not sure how this works in C/C++, but we're required to support floating-point truncation with non-standard rounding modes. Is there a reason the rounding mode isn't an argument here?

kpn marked an inline comment as done.Jan 25 2019, 9:24 AM
kpn added inline comments.
docs/LangRef.rst
14830

Going back and rereading D43515, I don't see an explicit reason given back then. And I can't find anything in the C99 or IEEE 754 standards, or in the LLVM documentation, that would mandate any particular rounding mode. So I'm open to adding a rounding mode argument to the constrained fptrunc.

Andrew? What do you think of making constrained fptrunc go back to taking a rounding mode argument?

Having said that, the constrained FP intrinsics are to avoid optimizations that change program behavior taking traps into account. Is this the behavior you need for Vulcan?

cwabbott added inline comments.Jan 28 2019, 7:30 AM
docs/LangRef.rst
14830

Vulkan doesn't support trapping floating-point exceptions, so we don't have to worry about that. However, my understanding is that we still need to communicate the rounding mode to LLVM, to prevent it from constant folding floating-point operations with the wrong rounding mode, so we still need to use the intrinsics.

There's also the complication that we may need to emit some "internal" floating-point operations which need to have some defined rounding mode different from what the user specified. The backend has total control over the control register that specifies the rounding mode, and in some cases (e.g. fp32 -> fp16 truncation) the rounding mode is actually specified statically by the instruction itself rather than the control register, so my thought was that we could make AMDGPU just always use the rounding specified by the argument to the constrained intrinsic, emitting changes to the control register if necessary. I'm not sure if the CodeGen infrastructure is set up to do that.

cwabbott added inline comments.Jan 28 2019, 9:16 AM
docs/LangRef.rst
14830

Oh, and I forgot another thing: the extension also adds support for letting the user either flush or preserve denormalized values. However, this is per-source-module, and sometimes we need to stitch together multiple source modules which have different rounding needs, emitting an instruction in between them to change the denorm flushing and/or rounding mode. So it seems we really do need to use the constrained intrinsics, to prevent code motion of floating-point operations around that register setting.

kpn added a comment.Jan 29 2019, 7:18 AM

OK, I'm working on it now.

kpn updated this revision to Diff 184127.Jan 29 2019, 11:02 AM

Add back a rounding mode argument to constrained fptrunc as requested by Connor Abbott.

docs/LangRef.rst
14830

We should definitely have a rounding mode argument for fptrunc. I think the reason we missed that the first time around is probably due to the unfortunate naming of this operation (i.e. it isn't actually truncating) and the confusion with ISD::ROUND.

This is looking pretty good. I don't think I know the Selection DAG well enough to offer a proper review of that. I'll see if I can get Craig's attention on it.

docs/index.rst
193

This seems a bit too prominently placed. Most people don't care about these intrinsics. I would recommend sinking this down into the subsytem documentation section. There is no clear organization there, so it's hard to say where it should go. Maybe just above or below the exception handling section (just based on my perception of the generality of each).

lib/IR/Verifier.cpp
4668

The formatting is non-standard and inconsistent in this section.

4683

You could combine these two lines as:

if (auto *VecTy = dyn_cast<VectorType>(OperandTy))

4694

Redefining "Operand" here makes the code confusing. I'd rather see Operand and Result as separate variables and make local variables for their types. I would also make the Assert statements more specific to what they are actually checking. For instance,

if (OperandTy->isVectorTy()) {

Assert(ResultTy->isVectorTy(), ...
Assert(OperandTy->getVectorNumElements() == ResultTy->getVectorNumElements(), ...

}

Is there a reason that these vector checks are specific to fptrunc and fpext? Is it just because they don't have a "same type" restriction in the intrinsic definition?

The floating point type assertions could be simplified as

Assert(OperandTy->isFPorFPVectorTy(),...
Assert(ResultTy->isFPorFPVectorTy(),...

You should also be checking that OperandTy->getScalarSizeInBits() > ResultTy->getScalarSizeInBits for fptrunc and vice versa for fpext.

craig.topper added inline comments.Jan 31 2019, 2:19 PM
lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
771

What makes us reach this case? I would expect we'd scalarize based on the result type before we got to the operand type.

3237

We can't really widen this can we? Won't that put garbage in the upper elements?

test/CodeGen/X86/vector-constrained-fp-intrinsics.ll
3974

Please add an AVX command line so v4f64 will be a legal type.

kpn marked 6 inline comments as done.Feb 5 2019, 9:46 AM
kpn added inline comments.
lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
771

Test case CodeGen/AArch64/neon-fpround_f128.ll says this code is needed. That case goes through the non-STRICT version of this function. This STRICT function was copied from the non-STRICT function and called in the appropriate places alongside that function. And a trivial conversion of the test case to use the constrained intrinsics does indeed go through this STRICT function.

I did want to try to keep the functions unified, but sometimes the result was too ugly to live.

3237

Is that what getUNDEF() does? Give llvm license to put garbage in registers? My assumption is that the code is fine because this function is a copy of WidenVecRes_Convert() with the needed changes for the strict node being chained. If if there's a problem here there's also a problem in that function.

lib/IR/Verifier.cpp
4683

The rewrite moots this point.

4694

I've rewritten this code and with your suggestions it does look much nicer.

I think I did put the vector checks are for fptrunc and fpext because they aren't checked earlier.

I've also added the checks for the appropriate changes in ScalarSizeInBits.

kpn updated this revision to Diff 185337.Feb 5 2019, 9:48 AM

Address review comments.

craig.topper added inline comments.Feb 5 2019, 10:19 AM
lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
3237

That is what the undef means. The existing code isn't required to be exception safe so garbage is fine. The constrained intrinsics have to be exception safe. This is why the implementation of WidenVecRes_StrictFP is different than the non-trapping case in WidenVecRes_BinaryCanTrap. I believe FADD/FSUB/FDIV/FMUL are considered non-trapping on most targets.

kpn marked an inline comment as done.Feb 8 2019, 10:11 AM

I'm seeing a regression in WebAssembly/PR40267.ll that I need to look into. If anyone else is seeing this failure let me know.

lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
3237

It looks like this "WidenNumElts % InVTNumElts" block can just be eliminated. The fallback at the end of the function should handle the case without any undefs.

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2821

Would this lowering to EmitStackConvert(...) discard the rounding mode? I'm not familiar with this code, but I *think* it would.

E.g. STRICT_FP_ROUND would lower to a truncating store.

lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
771

Have you considered scalarizing the operands with a generic function? Something like ScalarizeVecRes_StrictFPOp(...)? I suspect we'll see reuse of this code.

@craig.topper

What makes us reach this case? I would expect we'd scalarize based on the result type before we got to the operand type.

The different operand and result types for these operations are probably why ScalarizeVecRes_StrictFPOp(...) didn't trip. Just guessing though...

kpn marked 2 inline comments as done.Feb 18 2019, 11:31 AM
kpn added inline comments.
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2821

Yes it would, assuming EmitStackConvert() works correctly for floating point types. A 'make check' of llvm with the default targets enabled doesn't trigger the call to DAG.getTruncStore(), so I'm not actually sure it works.

But none of the other intrinsics handle rounding modes currently, so I think we should leave this to future work.

lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
771

You know, I thought about it. But trying to generalize it would hide the parallel with ScalarizeVecRes_FP_ROUND(). Right now the connection between the two is obvious. I'm not sure that we'd gain much at this point in time by generalizing and losing that bit of readability.

If we do end up needing to generalize this code in the future then we can do it then.

kpn updated this revision to Diff 187265.Feb 18 2019, 11:34 AM

Address review comments.

Fix regression on WebAssembly accidentally caused by incorrect svn update.

jsji added a subscriber: jsji.Mon, Feb 25, 10:42 AM
kpn updated this revision to Diff 191120.Mon, Mar 18, 10:04 AM

Rebase. Ping.