Page MenuHomePhabricator

More math intrinsics for conservative math handling
Needs ReviewPublic

Authored by kpn on Feb 20 2018, 9:44 AM.

Details

Summary

This builds on D27028 and D32319's work on constrained math intrinsics.

Quoting from D32319: "The purpose of the constrained intrinsics is to force the optimizer to respect the restrictions that will be necessary to support things like the STDC FENV_ACCESS ON pragma without interfering with optimizations when these restrictions are not needed."

There are more patches coming, but I wanted to start with just a handful here.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
andrew.w.kaylor added inline comments.Mar 6 2018, 9:14 AM
docs/LangRef.rst
14210

That's a reasonable point. The same issue came up with frem. The rounding mode doesn't apply to frem, but I kept the argument just so that it could be handled internally the same way as the other constrained intrinsics and then documented in the language reference that the rounding mode has no effect. I'm not completely convinced that was a good decision on my part.

If you want to look at what would have to change to support constrained intrinsics with no rounding mode argument, I would likely support that. I don't think it would be much extra handling. There are just some things where the class that is used to represent the intrinsics (ConstrainedFPIntrinsic) would need to be aware of the possibility that this argument is omitted.

kpn updated this revision to Diff 144714.May 1 2018, 7:32 AM

This new diff adds changing the default lowering of STRICT_FP_TO_UINT to not allow speculative execution. I've also eliminated the rounding metadata from instructions when it wasn't needed.

kpn updated this revision to Diff 144722.May 1 2018, 7:53 AM

Missed one place in the documentation that needed updating.

At some point we should create a document that describes the entire flow of FP instructions through the instruction selection process. To be honest I don't remember how it all works, and that makes it difficult to review changes like this. It would also be nice to verify that we all have the same understanding of how it works. I don't mean to volunteer you to produce the entire document, but would you mind giving me a rough outline? I'm still concerned about the case that is not chained.

docs/LangRef.rst
14128

You need to do something more here to document the difference between the return type and the argument type. Also in fpext below.

lib/CodeGen/StrictFP.cpp
11 ↗(On Diff #144722)

Can you say more here about what these transformations are? It's clear that you intend this as a generic pass that currently does one thing but might have others added later. That's good, but I'd like to see the possibilities described here as they are implemented.

74 ↗(On Diff #144722)

Per the LLVM Programmer's Manual (http://llvm.org/docs/ProgrammersManual.html#iterating-over-the-instruction-in-a-function) you should be using an inst_iterator here.

136 ↗(On Diff #144722)

Could you add an example here of what the resulting IR will look like? It would make the code a lot easier to follow.

144 ↗(On Diff #144722)

I think you can use "IntDst->getBitWidth()" here. Also, APInt::getSignedMinValue() does this same thing and is a bit more self-documenting.

150 ↗(On Diff #144722)

I believe conversion of a NaN to an integer should raise "INVALID" (which the fcmp will) and then the result is undefined, but the 'true' case does less so I think ULT is preferable.

151 ↗(On Diff #144722)

We are going to need a constrained version of fcmp, and when we have it you should use it here.

When the IRBuilder supports constrained floating point modes, it would be nice to use that here but I guess you can't do that yet, so maybe just a comment saying we should later?

153 ↗(On Diff #144722)

This is an odd name. How about "within.sint.range"? In any case, I think '.' is more common than '_' as a name space holder, probably because the name will automatically get '.<n>' appended if it's a duplicate.

201 ↗(On Diff #144722)

This description is wrong.

lib/CodeGen/TargetPassConfig.cpp
566 ↗(On Diff #144722)

This doesn't seem like the right place to do this. Should it be happening much later, like around CodeGenPrepare?

lib/IR/Verifier.cpp
4546

Since you've broken this out into a switch statement, can you separate the unary and ternary ops and give them each the appropriate assert? I think that would be much more readable than this compound check (which I realize was my creation).

4581

The default should probably always been an error.

4591

How about this?

int RoundingIdx = (HasExceptionMD ? NumOperands - 2 : NumOperands - 1);

On the other hand, are we ever going to have intrinsics that have a rounding mode but not exception behavior?

test/CodeGen/X86/fp-intrinsics.ll
281

Can you check the entire expanded IR pattern here? It might be worth having a separate test that verifies the StrictFP pass in isolation.

346

I believe your decorated intrinsic names are incorrect in all of these cases. The name needs to specify both return type and argument type. Take a look at what opt produces if you give it these names as inputs.

kpn updated this revision to Diff 152072.Jun 20 2018, 6:05 AM
kpn marked 12 inline comments as done.
kpn added inline comments.
docs/LangRef.rst
14128

I hope this is what you mean. I've changed it to show that the result is a different type. I've followed the naming scheme used elsewhere in this document.

lib/CodeGen/StrictFP.cpp
144 ↗(On Diff #144722)

error: no member named getBitWidth' in llvm::Value'.

lib/IR/Verifier.cpp
4591

Done.

I don't know if we'll have a rounding mode but not exceptions. I doubt it, but I can't say for certain.

craig.topper added inline comments.Jun 20 2018, 11:52 PM
lib/CodeGen/StrictFP.cpp
76 ↗(On Diff #152072)

I believe you should be getting TM by doing this.

auto *TPC = getAnalysisIfAvailable<TargetPassConfig>();
if (!TPC)
  return false;

auto &TM = TPC->getTM<TargetMachine>();

There is only one other pass that takes TargetMachine in its constructor. The others use what I've put above. So I believe that is the preferred way.

84 ↗(On Diff #152072)

There is little reason to pass Context around. Value has a getContext as does Type. So you can get the context easily whenever you need it.

99 ↗(On Diff #152072)

This cast is unnecessary. IntrinsicInst is a subclass of Value

101 ↗(On Diff #152072)

This should use TLI->getValueType.

166 ↗(On Diff #152072)

Why do you need a SmallVector here? Why can't you just call getArgOperand?

167 ↗(On Diff #152072)

This cast is unecessary.

171 ↗(On Diff #152072)

What if the intrinsic uses a vector type?

kpn marked 6 inline comments as done.Jul 26 2018, 9:15 AM
kpn added inline comments.
lib/CodeGen/StrictFP.cpp
171 ↗(On Diff #152072)

It would have been caught by the IR verifier. A vector would have been rejected there.

kpn updated this revision to Diff 157504.Jul 26 2018, 9:16 AM
craig.topper added inline comments.Jul 26 2018, 9:56 AM
lib/CodeGen/StrictFP.cpp
171 ↗(On Diff #152072)

I don't see where the IR verifier rejects vectors. I just see that it checks that element counts are equal. And why should it reject vectors? We need to support fptosi/fptoui for vectors.

kpn added inline comments.Jul 26 2018, 9:59 AM
lib/CodeGen/StrictFP.cpp
171 ↗(On Diff #152072)

It doesn't reject them, now. This latest patch added support for vectors. My inexperience with Phabricator lost the comment that said that.

kpn updated this revision to Diff 157703.Jul 27 2018, 10:02 AM

Delete a line accidentally left in.

kpn updated this revision to Diff 161048.Aug 16 2018, 9:48 AM

Rebase. Ping.

kpn updated this revision to Diff 163844.Sep 4 2018, 9:17 AM

Rebase. Ping.

In D43515#1013383, @kpn wrote:

At the request of my employer's legal department:
Copyright © 2018 SAS Institute Inc., Cary, NC, USA. All Rights Reserved.

If it's not just a remark, but is supposed to have some legal/whatever meaning,
i'm not sure some comment in some review is the correct direction.

docs/LangRef.rst
14055

This probably has insufficient amount of ^.
Might want to actually test-build the docs.

Adding new constrained instrinsics and adding the pass should be separate patches I think. Changing the syntax of frem should be another patch.

docs/LangRef.rst
13989

This change should be in a separate patch. There's too much going on in this patch and this is easy to overlook.

kpn added a comment.Sep 10 2018, 10:44 AM

Adding new constrained instrinsics and adding the pass should be separate patches I think. Changing the syntax of frem should be another patch.

Will do.

kpn updated this revision to Diff 165782.Sep 17 2018, 9:50 AM

Split out changes as requested. This diff is just the four new intrinsics. The fptoui pass and the change to frem will be later.

I've also corrected some documentation issues in this iteration.

craig.topper added inline comments.Oct 3 2018, 5:22 PM
docs/LangRef.rst
14136

This reads funny. I think it should maybe be "result of truncating a floating point"

14170

This also reads funny

include/llvm/CodeGen/ISDOpcodes.h
552

These need comments. Does STRICT_FP_ROUND have the TRUNC argument that FP_ROUND has?

kpn added inline comments.Oct 4 2018, 6:42 AM
docs/LangRef.rst
14136

How about if I just copy the text used by the normal fptrunc instruction?

14170

Same as fptrunc. I could just copy the text from the fpext instruction?

include/llvm/CodeGen/ISDOpcodes.h
552

I could rearrange and lump them in with the non-strict versions of each. Then I'd just need one extra line restating that the STRICT_ versions prevent optimizations.

Yes, STRICT_FP_ROUND does have the TRUNC argument that FP_ROUND has, but it is currently always zero. Fixing this require rerouting this one strict node to go through the same codepath as the non-strict node. That would make it different from all the other constrained nodes. Should I go ahead and make that change? I _think_ it is safe if the TRUNC argument really does work like it is documented.

I also just noticed that I need to put back STRICT_FP_TO_UINT in at least one place. It should be everywhere STRICT_FP_TO_SINT is handled _except_ in the default lowering.

craig.topper added inline comments.Oct 4 2018, 10:51 AM
docs/LangRef.rst
14136

Sure

14170

Sure

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7431

This doesn't copy the second argument to FP_ROUJND over does it?

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6430–6435

Is this adding a second argument to STRICT_FP_EXTEND as well? I don't think the non-strict FP_EXTEND has two arguments.

kpn added inline comments.Oct 4 2018, 10:53 AM
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7431

No. It should.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6430–6435

Agreed. I'll fix it.

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7407

There are a lot of comments on this, so I may have missed something. Take with a grain of salt...

I don't think these are correct. These can trap so can't be speculatively executed. They would need a chain.

kpn added inline comments.Oct 4 2018, 11:58 AM
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7407

I couldn't figure out how to have these be chained but have the non-strict continue to not be chained. Too many things fell over if they didn't match.

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7407

I'm not an expert with this code, so cc @andrew.w.kaylor.

This doesn't seem like the right direction though. Maybe these unchained operations should be left to a different patch until a proper solution is found.

kpn marked 5 inline comments as done.Nov 1 2018, 9:08 AM
kpn updated this revision to Diff 172160.Nov 1 2018, 10:17 AM

Address review comments.

Add use of the chain to these four new SDNode types.

test/CodeGen/X86/fp-intrinsics.ll
293

Same here as the vector version below. Do we want the truncating convert? That was surprising to me.

346

Do we want unsigned convert tests too? fptoui?

I see that there are SystemZ tests to cover them, so maybe that's sufficient? Just pointing this out so others can see.

test/CodeGen/X86/vector-constrained-fp-intrinsics.ll
2618

This surprised me. Should this be the truncating convert? Or should it be vcvtsd2si?

3378

Same question as the scalar versions. Do we want <4 x float> to <4 x i32>/etc casts?

test/Feature/fp-intrinsics.ll
310

I haven't been following along closely, so please forgive if this was already discussed...

Should we have float->i32 casts too? Also double->i64?

312

Should 'fpext.f64' have a float argument instead of a double?

kpn added inline comments.Nov 5 2018, 12:36 PM
test/CodeGen/X86/fp-intrinsics.ll
346

The SystemZ tests target hardware new enough to lower to a single instruction.

The tests for fptoui on x86 use the default lowering, but the default lowering is disallowed since it does speculative execution and traps. The support for fixing that I had in this patch but was asked to split it out into another patch. So there's no constrained fptoui test here using the default lowering in this patch.

test/Feature/fp-intrinsics.ll
312

Probably, yes. Will fix.

kpn added inline comments.Nov 6 2018, 10:29 AM
test/CodeGen/X86/fp-intrinsics.ll
293

The strict intrinsic results in the same instruction as the regular fptosi instruction. Having them be the same means the mutation from strict to non-strict is working correctly.

And, yes, rounding towards zero is correct. That's why there's no rounding metadata.

test/CodeGen/X86/vector-constrained-fp-intrinsics.ll
2618

Yes, it should be a truncating conversion.

3378

On the odd chance that it may tickle the vector legalizer I'll go ahead and add tests with float.

Same answer as the scalar tests: Rounding towards zero is correct.

test/Feature/fp-intrinsics.ll
310

This is an opt test. I'm not sure we'd benefit from placing those extra tests here.

kpn updated this revision to Diff 172820.Nov 6 2018, 12:02 PM

Rebase.

Minor test changes.

include/llvm/CodeGen/ISDOpcodes.h
552

This ordering does not mesh with what is already in place. The STRICT_XXX opcodes are currently clustered together, where these new opcodes are grouped with their non-strict counterparts.

I'm not opposed to grouping the corresponding non-strict and strict opcodes, but that would probably be better left to a separate patch. I think it makes sense to keep everything uniform until a final decision is made, for clarity's sake.

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2808

This doesn't seem correct. Shouldn't Node->getOperand(0) be the argument for FP_ROUND and the Chain for STRICT_FP_ROUND? Same for FP_EXTEND too.

Also, does using a truncating store provide us with the same trapping behavior as an explicit trunc instruction? I don't know off the top of my head, but it may be different. To be fair, a user probably won't care too much about optimizing away a trap, but the purists might.

2868

This seems incorrect, unless I missed something. The first operand to FP_TO_SINT should be the argument, but the first operand to STRICT_FP_TO_SINT should be the Chain. It does not look like expandFP_TO_SINT accounts for that.