Page MenuHomePhabricator

More math intrinsics for conservative math handling
AbandonedPublic

Authored by kpn on Feb 20 2018, 9:44 AM.

Details

Summary

This builds on D27028 and D32319's work on constrained math intrinsics.

Quoting from D32319: "The purpose of the constrained intrinsics is to force the optimizer to respect the restrictions that will be necessary to support things like the STDC FENV_ACCESS ON pragma without interfering with optimizations when these restrictions are not needed."

There are more patches coming, but I wanted to start with just a handful here.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
kpn added inline comments.Jun 20 2018, 6:06 AM
docs/LangRef.rst
15201

I hope this is what you mean. I've changed it to show that the result is a different type. I've followed the naming scheme used elsewhere in this document.

lib/IR/Verifier.cpp
4772

Done.

I don't know if we'll have a rounding mode but not exceptions. I doubt it, but I can't say for certain.

craig.topper added inline comments.Jun 20 2018, 11:52 PM
lib/CodeGen/StrictFP.cpp
76 ↗(On Diff #152072)

I believe you should be getting TM by doing this.

auto *TPC = getAnalysisIfAvailable<TargetPassConfig>();
if (!TPC)
  return false;

auto &TM = TPC->getTM<TargetMachine>();

There is only one other pass that takes TargetMachine in its constructor. The others use what I've put above. So I believe that is the preferred way.

84 ↗(On Diff #152072)

There is little reason to pass Context around. Value has a getContext as does Type. So you can get the context easily whenever you need it.

99 ↗(On Diff #152072)

This cast is unnecessary. IntrinsicInst is a subclass of Value

101 ↗(On Diff #152072)

This should use TLI->getValueType.

166 ↗(On Diff #152072)

Why do you need a SmallVector here? Why can't you just call getArgOperand?

167 ↗(On Diff #152072)

This cast is unecessary.

171 ↗(On Diff #152072)

What if the intrinsic uses a vector type?

kpn marked 6 inline comments as done.Jul 26 2018, 9:15 AM
kpn added inline comments.
lib/CodeGen/StrictFP.cpp
171 ↗(On Diff #152072)

It would have been caught by the IR verifier. A vector would have been rejected there.

kpn updated this revision to Diff 157504.Jul 26 2018, 9:16 AM
craig.topper added inline comments.Jul 26 2018, 9:56 AM
lib/CodeGen/StrictFP.cpp
171 ↗(On Diff #152072)

I don't see where the IR verifier rejects vectors. I just see that it checks that element counts are equal. And why should it reject vectors? We need to support fptosi/fptoui for vectors.

kpn added inline comments.Jul 26 2018, 9:59 AM
lib/CodeGen/StrictFP.cpp
171 ↗(On Diff #152072)

It doesn't reject them, now. This latest patch added support for vectors. My inexperience with Phabricator lost the comment that said that.

kpn updated this revision to Diff 157703.Jul 27 2018, 10:02 AM

Delete a line accidentally left in.

kpn updated this revision to Diff 161048.Aug 16 2018, 9:48 AM

Rebase. Ping.

kpn updated this revision to Diff 163844.Sep 4 2018, 9:17 AM

Rebase. Ping.

In D43515#1013383, @kpn wrote:

At the request of my employer's legal department:
Copyright © 2018 SAS Institute Inc., Cary, NC, USA. All Rights Reserved.

If it's not just a remark, but is supposed to have some legal/whatever meaning,
i'm not sure some comment in some review is the correct direction.

docs/LangRef.rst
15128

This probably has insufficient amount of ^.
Might want to actually test-build the docs.

Adding new constrained instrinsics and adding the pass should be separate patches I think. Changing the syntax of frem should be another patch.

docs/LangRef.rst
15062

This change should be in a separate patch. There's too much going on in this patch and this is easy to overlook.

kpn added a comment.Sep 10 2018, 10:44 AM

Adding new constrained instrinsics and adding the pass should be separate patches I think. Changing the syntax of frem should be another patch.

Will do.

kpn updated this revision to Diff 165782.Sep 17 2018, 9:50 AM

Split out changes as requested. This diff is just the four new intrinsics. The fptoui pass and the change to frem will be later.

I've also corrected some documentation issues in this iteration.

craig.topper added inline comments.Oct 3 2018, 5:22 PM
docs/LangRef.rst
15209

This reads funny. I think it should maybe be "result of truncating a floating point"

15243

This also reads funny

include/llvm/CodeGen/ISDOpcodes.h
590

These need comments. Does STRICT_FP_ROUND have the TRUNC argument that FP_ROUND has?

kpn added inline comments.Oct 4 2018, 6:42 AM
docs/LangRef.rst
15209

How about if I just copy the text used by the normal fptrunc instruction?

15243

Same as fptrunc. I could just copy the text from the fpext instruction?

include/llvm/CodeGen/ISDOpcodes.h
590

I could rearrange and lump them in with the non-strict versions of each. Then I'd just need one extra line restating that the STRICT_ versions prevent optimizations.

Yes, STRICT_FP_ROUND does have the TRUNC argument that FP_ROUND has, but it is currently always zero. Fixing this require rerouting this one strict node to go through the same codepath as the non-strict node. That would make it different from all the other constrained nodes. Should I go ahead and make that change? I _think_ it is safe if the TRUNC argument really does work like it is documented.

I also just noticed that I need to put back STRICT_FP_TO_UINT in at least one place. It should be everywhere STRICT_FP_TO_SINT is handled _except_ in the default lowering.

craig.topper added inline comments.Oct 4 2018, 10:51 AM
docs/LangRef.rst
15209

Sure

15243

Sure

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7767

This doesn't copy the second argument to FP_ROUJND over does it?

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6948–6953

Is this adding a second argument to STRICT_FP_EXTEND as well? I don't think the non-strict FP_EXTEND has two arguments.

kpn added inline comments.Oct 4 2018, 10:53 AM
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7767

No. It should.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6948–6953

Agreed. I'll fix it.

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7743

There are a lot of comments on this, so I may have missed something. Take with a grain of salt...

I don't think these are correct. These can trap so can't be speculatively executed. They would need a chain.

kpn added inline comments.Oct 4 2018, 11:58 AM
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7743

I couldn't figure out how to have these be chained but have the non-strict continue to not be chained. Too many things fell over if they didn't match.

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7743

I'm not an expert with this code, so cc @andrew.w.kaylor.

This doesn't seem like the right direction though. Maybe these unchained operations should be left to a different patch until a proper solution is found.

kpn marked 5 inline comments as done.Nov 1 2018, 9:08 AM
kpn updated this revision to Diff 172160.Nov 1 2018, 10:17 AM

Address review comments.

Add use of the chain to these four new SDNode types.

test/CodeGen/X86/fp-intrinsics.ll
293

Same here as the vector version below. Do we want the truncating convert? That was surprising to me.

356

Do we want unsigned convert tests too? fptoui?

I see that there are SystemZ tests to cover them, so maybe that's sufficient? Just pointing this out so others can see.

test/CodeGen/X86/vector-constrained-fp-intrinsics.ll
4026

This surprised me. Should this be the truncating convert? Or should it be vcvtsd2si?

5586

Same question as the scalar versions. Do we want <4 x float> to <4 x i32>/etc casts?

test/Feature/fp-intrinsics.ll
311

I haven't been following along closely, so please forgive if this was already discussed...

Should we have float->i32 casts too? Also double->i64?

313

Should 'fpext.f64' have a float argument instead of a double?

kpn added inline comments.Nov 5 2018, 12:36 PM
test/CodeGen/X86/fp-intrinsics.ll
356

The SystemZ tests target hardware new enough to lower to a single instruction.

The tests for fptoui on x86 use the default lowering, but the default lowering is disallowed since it does speculative execution and traps. The support for fixing that I had in this patch but was asked to split it out into another patch. So there's no constrained fptoui test here using the default lowering in this patch.

test/Feature/fp-intrinsics.ll
313

Probably, yes. Will fix.

kpn added inline comments.Nov 6 2018, 10:29 AM
test/CodeGen/X86/fp-intrinsics.ll
293

The strict intrinsic results in the same instruction as the regular fptosi instruction. Having them be the same means the mutation from strict to non-strict is working correctly.

And, yes, rounding towards zero is correct. That's why there's no rounding metadata.

test/CodeGen/X86/vector-constrained-fp-intrinsics.ll
4026

Yes, it should be a truncating conversion.

5586

On the odd chance that it may tickle the vector legalizer I'll go ahead and add tests with float.

Same answer as the scalar tests: Rounding towards zero is correct.

test/Feature/fp-intrinsics.ll
311

This is an opt test. I'm not sure we'd benefit from placing those extra tests here.

kpn updated this revision to Diff 172820.Nov 6 2018, 12:02 PM

Rebase.

Minor test changes.

include/llvm/CodeGen/ISDOpcodes.h
590

This ordering does not mesh with what is already in place. The STRICT_XXX opcodes are currently clustered together, where these new opcodes are grouped with their non-strict counterparts.

I'm not opposed to grouping the corresponding non-strict and strict opcodes, but that would probably be better left to a separate patch. I think it makes sense to keep everything uniform until a final decision is made, for clarity's sake.

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2850

This doesn't seem correct. Shouldn't Node->getOperand(0) be the argument for FP_ROUND and the Chain for STRICT_FP_ROUND? Same for FP_EXTEND too.

Also, does using a truncating store provide us with the same trapping behavior as an explicit trunc instruction? I don't know off the top of my head, but it may be different. To be fair, a user probably won't care too much about optimizing away a trap, but the purists might.

2909–2916

This seems incorrect, unless I missed something. The first operand to FP_TO_SINT should be the argument, but the first operand to STRICT_FP_TO_SINT should be the Chain. It does not look like expandFP_TO_SINT accounts for that.

kpn added a subscriber: kbarton.May 10 2019, 12:32 PM

A few minor comments about commoning asserts and using dyn_cast instead of cast<>.
Aside from that, I think this looks good. That said, I'm by no means an expert in this area so don't feel I'm qualified to give a final approval to commit.

lib/IR/Verifier.cpp
4759

Could you use the isFPOrFPVectorTy here to check both conditions, and then remove the assert in the else below?

4760

Can you use a dyn_cast here instead?
if (auto *OperandT = dyn_cast<VectorType>(Operand->getType())) {

do vector stuff

}

4775

Similarly, could you use isIntOrIntVectorTy here and remove the assert in the if/else below?

4799

same comment about commoning the asserts here.

4843

It looks like getArgOperand expects an unsigned. Is there a specific reason you are using an int here?
If it's possible that RoundingIdx is negative, then you should probably add an assert here before passing it to getArgOperand.

kpn marked 3 inline comments as done.May 15 2019, 10:55 AM

The only thing left in this ticket to commit is the fptosi and fptoui changes. I'm working on incorporating review comments from D55897 into this ticket and fixing other bugs that I'm coming across. I'll hopefully update this ticket next week.

lib/IR/Verifier.cpp
4760

Yes, that's much more concise.

4799

The fptrunc and fpext changes were split out into another ticket and updated there. That committed code is more concise in I suspect the way you are intending.

kpn added a subscriber: ajwock.May 21 2019, 6:26 AM
kpn updated this revision to Diff 201965.May 29 2019, 9:41 AM

Address review comments from here and from D55897.

This patch is now down to just handling fptosi and fptoui.

How would you feel about rebooting this as a new patch? There's a lot of irrelevant history here, and I feel like I'm missing some context as I review it.

In general, I see that you're down to just implementing the fptosi and fptoui cases. I'm concerned about what happens in the fptoui case. It's mentioned in a few of the earlier comments that the default expansion of this opcode introduces speculative exceptions, and if that's being handled in the latest implementation I haven't read it closely enough to see what's going on. If it is being handled, I'd expect to see a comment block somewhere explaining what's being done.

kpn added a comment.Jun 18 2019, 11:07 AM

How would you feel about rebooting this as a new patch? There's a lot of irrelevant history here, and I feel like I'm missing some context as I review it.

I can do that.

In general, I see that you're down to just implementing the fptosi and fptoui cases. I'm concerned about what happens in the fptoui case. It's mentioned in a few of the earlier comments that the default expansion of this opcode introduces speculative exceptions, and if that's being handled in the latest implementation I haven't read it closely enough to see what's going on. If it is being handled, I'd expect to see a comment block somewhere explaining what's being done.

I believe the speculative exceptions should be largely fixed by r348251, committed by rksimon. I changed the code to continue to use strict nodes when expanding a strict node. Since the strict nodes are chained, does that not solve the part of the problem not solved by r348251? Is the existing comment not enough, or do I need to copy some of the commit message into code comments?

In D43515#1548845, @kpn wrote:

I believe the speculative exceptions should be largely fixed by r348251, committed by rksimon. I changed the code to continue to use strict nodes when expanding a strict node. Since the strict nodes are chained, does that not solve the part of the problem not solved by r348251? Is the existing comment not enough, or do I need to copy some of the commit message into code comments?

Sorry, I wasn't aware of Simon's change. That definitely simplifies what needs to be done here, and, yes, the existing comment is sufficient.