This is an archive of the discontinued LLVM Phabricator instance.

More math intrinsics for conservative math handling
AbandonedPublic

Authored by kpn on Feb 20 2018, 9:44 AM.

Details

Summary

This builds on D27028 and D32319's work on constrained math intrinsics.

Quoting from D32319: "The purpose of the constrained intrinsics is to force the optimizer to respect the restrictions that will be necessary to support things like the STDC FENV_ACCESS ON pragma without interfering with optimizations when these restrictions are not needed."

There are more patches coming, but I wanted to start with just a handful here.

Diff Detail

Event Timeline

kpn created this revision.Feb 20 2018, 9:44 AM
kpn added a comment.Feb 20 2018, 9:48 AM

At the request of my employer's legal department:
Copyright © 2018 SAS Institute Inc., Cary, NC, USA. All Rights Reserved.

docs/LangRef.rst
13283

I think you need to say more about the semantics. The LLVM IR fptoui and fptosi are specifically documented as rounding to zero. I don't think we want that with the constrained intrinsics so we need to very specifically document how they will be different from the standard instructions.

13330

This intrinsic is a bit troublesome. The llvm.round intrinsic says that it "returns the same values as the libm round functions would, and handles error conditions in the same way." The libm round function, in turn, is documented as rounding to the nearest integer (and away from zero in halfway cases) regardless of the current rounding mode.

So what do we want the constrained form of the intrinsic to do? I think it needs to ignore the rounding mode. I'm not sure about exception behavior. If it doesn't respect exception behavior then we probably don't want to have the constrained form of this intrinsic at all.

13366

This seems to be leaking SelectionDAG implementation details into the IR space. How is this used?

13407

This is a replacement for fpext, right? I think you should say that somewhere.

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
968

STRICT_FP_TO_SINT needs to do something more than this. The default lowering of FP_TO_SINT is known to raise spurious FE_INEXACT exceptions because it involves speculative execution.

1151

Since this gets unique handling why isn't it just a separate case from the others?

1156

The style/formatting is wrong here. I think you need curly braces around your else-clause and the "else" itself needs to be on the same line as the curly brace above it.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6199–6200

Why are you not attaching these nodes to the chain?

kpn added inline comments.Feb 21 2018, 12:29 PM
docs/LangRef.rst
13283

What do we want these intrinsics to do that is different from the normal IR fptoui and fptosi?

I don't think any of the constrained intrinsics _today_ are doing anything with the rounding and exception metadata. That makes it hard for me to say much about it in documentation, today.

Well, unless I misunderstood the current code. That's totally possible.

13330

We'll still need the constrained intrinsic to avoid getting reordered.

How about if I rename the intrinsic to be fptrunc instead of round? Then the rounding would be explicit.

13366

It seems to exist for the MVT::ppcf128 type. A quick grep doesn't show any other users.

Test coverage is lacking. I added a test using the intrinsic, but I had to mark it expected fail since I couldn't get it to work.

To avoid the risk of bugs getting introduced later I went ahead and implemented the intrinsic. Would it be better to not have the intrinsic and to instead have a pass that replaces the non-STRICT SDNode with a STRICT version? That would avoid said leaking into the IR space. It would, however, mean that llvm would have an opinion on when STRICT nodes should be used. I'm not sure that's a good thing.

13407

I think I picked names for the intrinsics that matched the SelectionDAG node enum. Perhaps it would be better to match the bitcode language names? In which case this intrinsic would be "fpext" instead of "extend".

Either way that's a good idea for the documentation to at least mention fpext.

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
968

I have seen FP_TO_SINT cause traps when it shouldn't. Would having that default lowering use the chain in the STRICT_ case solve that issue?

1151

Good point. And making it a separate case also takes care of the formatting issue in the else block.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6199–6200

Because they need to match what the default lowering is expecting. Otherwise a variety of failures happen.

docs/LangRef.rst
13283

You're right, none of the other intrinsics are doing anything specific with the rounding mode. The purpose of the intrinsics is to prevent the optimizer from doing things that would introduce compile-time rounding. However, the general assumption of the intrinsics is that whatever you set the rounding mode to at runtime is the rounding behavior you will get.

I'd want to consult a front end expert as to exactly what should be happening. A quick glance at the C99 standard tells me that when a floating point number is converted to an integer it is truncated toward zero. I think that means the same thing as the LLVM language reference claim that fptoui and fptosi round the number toward zero.

So I think that we don't want the runtime rounding mode to change the behavior of these intrinsics, and so we should document it as such.

13330

No, fptrunc does something else, right? My concern is that if this is preserving the behavior of the round library function (and I think we need to) then the rounding mode argument isn't relevant, so maybe it should be omitted in this case. The key thing is to be explicit about its behavior in the documentation and then make sure the implementation actually does what we say it will.

13366

I think it's better not to have the intrinsic for now. I don't understand what the SD node is doing well enough to say much more, but it looks to me like the SD node shouldn't be there either. It's a target-specific hack that leaked into the target-independent code if my understanding is correct.

13407

These intrinsics should be driven by what the front end needs. If no front end is generating an equivalent now then we don't want an intrinsic. So, yes, please match the bitcode language names.

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
968

No, I don't think the chain will fix this. We need to implement strict lowering that does something different.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6199–6200

There is code that removes the chain when the strict node is mutated to a non-strict node. That should be preventing lowering problems. I believe the chain was necessary to prevent re-ordering prior to final instruction selection.

kpn added inline comments.Feb 26 2018, 12:24 PM
docs/LangRef.rst
13283

Our front end guy here confirms truncation. I've updated my working copy to state that fptoui and fptosi truncate.

13330

Well, if I'm reading SelectionDAGBuilder::visitFPTrunc() correctly then fptrunc gets turned into ISD::FP_ROUND. So, no, renaming this intrinsic to be fptrunc would not be wrong. Contrast this with llvm.round getting turned into ISD::FROUND. I think we need constrained intrinsics for both, so this one in this patch should be renamed after fptrunc.

I haven't touched the ISD::FROUND node yet, and that's what llvm.round gets turned into from the looks of SelectionDAGBuilder::visitIntrinsicCall(). That could be a later patch.

I agree that the rounding mode should be ignored and shouldn't be in the intrinsic's metadata.

WDYT?

13366

Done. It will be removed in the next diff.

13407

Will do.

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
968

Is there something we can do at this level to fix this? If there is then I'm all for it, but if there isn't then we should probably still put the intrinsic in.

We'll need it eventually, and currently none of the constrained intrinsics solve the complete optimization problem. So I wonder if this intrinsic is really all that different from the other experimental constrained intrinsics.

If a backend models FP side effects then wouldn't the existing default lowering work correctly?

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6199–6200

Mutation happens too late in some cases.

If it happened early enough then there wouldn't be any need for the strict intrinsics to be mentioned in SelectionDAGLegalize::ExpandNode(). Since it doesn't we can't use the chain.

Should that mutation happen earlier?

docs/LangRef.rst
13330

You shouldn't be thinking in terms of the SelectionDAG at this level. ISD::FP_ROUND is very poorly named and does not do the same thing that llvm.round does. ISD::FP_ROUND converts a floating point number to a smaller type, but not necessarily an integer. That's fptrunc.

I suppose you are right that we do need a constrained version of fptrunc. I'm a bit concerned by the things that the LLVM language definition says are undefined. Those are the cases that will be of most interest for the constrained case and we should document the expected behavior, but I think we need to consider why the current spec says the result is undefined.

In any event llvm.round, which is what I thought you were replacing here, is something completely different.

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
968

To be honest I'm not entirely sure how to fix this in the current selection DAG model. The issue is that we need to introduce a branch to fix the problem, but by the time we're selecting instructions it's too late to do that. I think it needs to be addressed when we're building the DAG,

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6199–6200

We need the mutation to be put off as long as possible. I think it should be possible to unlink the chain in ExpandNode if necessary. Depending on what's happening there, we might even want the expanded node to make use of the chain.

kpn added inline comments.Mar 6 2018, 6:13 AM
docs/LangRef.rst
13407

Say, now that I think about this, what does rounding have to do with extending a FP value? Shouldn't this intrinsic just plain not have rounding metadata?

andrew.w.kaylor added inline comments.Mar 6 2018, 9:14 AM
docs/LangRef.rst
13407

That's a reasonable point. The same issue came up with frem. The rounding mode doesn't apply to frem, but I kept the argument just so that it could be handled internally the same way as the other constrained intrinsics and then documented in the language reference that the rounding mode has no effect. I'm not completely convinced that was a good decision on my part.

If you want to look at what would have to change to support constrained intrinsics with no rounding mode argument, I would likely support that. I don't think it would be much extra handling. There are just some things where the class that is used to represent the intrinsics (ConstrainedFPIntrinsic) would need to be aware of the possibility that this argument is omitted.

kpn updated this revision to Diff 144714.May 1 2018, 7:32 AM

This new diff adds changing the default lowering of STRICT_FP_TO_UINT to not allow speculative execution. I've also eliminated the rounding metadata from instructions when it wasn't needed.

kpn updated this revision to Diff 144722.May 1 2018, 7:53 AM

Missed one place in the documentation that needed updating.

At some point we should create a document that describes the entire flow of FP instructions through the instruction selection process. To be honest I don't remember how it all works, and that makes it difficult to review changes like this. It would also be nice to verify that we all have the same understanding of how it works. I don't mean to volunteer you to produce the entire document, but would you mind giving me a rough outline? I'm still concerned about the case that is not chained.

docs/LangRef.rst
13325

You need to do something more here to document the difference between the return type and the argument type. Also in fpext below.

lib/CodeGen/StrictFP.cpp
12

Can you say more here about what these transformations are? It's clear that you intend this as a generic pass that currently does one thing but might have others added later. That's good, but I'd like to see the possibilities described here as they are implemented.

75

Per the LLVM Programmer's Manual (http://llvm.org/docs/ProgrammersManual.html#iterating-over-the-instruction-in-a-function) you should be using an inst_iterator here.

137

Could you add an example here of what the resulting IR will look like? It would make the code a lot easier to follow.

145

I think you can use "IntDst->getBitWidth()" here. Also, APInt::getSignedMinValue() does this same thing and is a bit more self-documenting.

151

I believe conversion of a NaN to an integer should raise "INVALID" (which the fcmp will) and then the result is undefined, but the 'true' case does less so I think ULT is preferable.

152

We are going to need a constrained version of fcmp, and when we have it you should use it here.

When the IRBuilder supports constrained floating point modes, it would be nice to use that here but I guess you can't do that yet, so maybe just a comment saying we should later?

154

This is an odd name. How about "within.sint.range"? In any case, I think '.' is more common than '_' as a name space holder, probably because the name will automatically get '.<n>' appended if it's a duplicate.

202

This description is wrong.

lib/CodeGen/TargetPassConfig.cpp
566

This doesn't seem like the right place to do this. Should it be happening much later, like around CodeGenPrepare?

lib/IR/Verifier.cpp
4470

Since you've broken this out into a switch statement, can you separate the unary and ternary ops and give them each the appropriate assert? I think that would be much more readable than this compound check (which I realize was my creation).

4516

The default should probably always been an error.

4526

How about this?

int RoundingIdx = (HasExceptionMD ? NumOperands - 2 : NumOperands - 1);

On the other hand, are we ever going to have intrinsics that have a rounding mode but not exception behavior?

test/CodeGen/X86/fp-intrinsics.ll
281

Can you check the entire expanded IR pattern here? It might be worth having a separate test that verifies the StrictFP pass in isolation.

346

I believe your decorated intrinsic names are incorrect in all of these cases. The name needs to specify both return type and argument type. Take a look at what opt produces if you give it these names as inputs.

kpn updated this revision to Diff 152072.Jun 20 2018, 6:05 AM
kpn marked 12 inline comments as done.
kpn added inline comments.
lib/CodeGen/StrictFP.cpp
145

error: no member named getBitWidth' in llvm::Value'.

kpn added inline comments.Jun 20 2018, 6:06 AM
docs/LangRef.rst
13325

I hope this is what you mean. I've changed it to show that the result is a different type. I've followed the naming scheme used elsewhere in this document.

lib/IR/Verifier.cpp
4526

Done.

I don't know if we'll have a rounding mode but not exceptions. I doubt it, but I can't say for certain.

craig.topper added inline comments.Jun 20 2018, 11:52 PM
lib/CodeGen/StrictFP.cpp
77

I believe you should be getting TM by doing this.

auto *TPC = getAnalysisIfAvailable<TargetPassConfig>();
if (!TPC)
  return false;

auto &TM = TPC->getTM<TargetMachine>();

There is only one other pass that takes TargetMachine in its constructor. The others use what I've put above. So I believe that is the preferred way.

85

There is little reason to pass Context around. Value has a getContext as does Type. So you can get the context easily whenever you need it.

100

This cast is unnecessary. IntrinsicInst is a subclass of Value

102

This should use TLI->getValueType.

167

Why do you need a SmallVector here? Why can't you just call getArgOperand?

168

This cast is unecessary.

172

What if the intrinsic uses a vector type?

kpn marked 6 inline comments as done.Jul 26 2018, 9:15 AM
kpn added inline comments.
lib/CodeGen/StrictFP.cpp
172

It would have been caught by the IR verifier. A vector would have been rejected there.

kpn updated this revision to Diff 157504.Jul 26 2018, 9:16 AM
craig.topper added inline comments.Jul 26 2018, 9:56 AM
lib/CodeGen/StrictFP.cpp
172

I don't see where the IR verifier rejects vectors. I just see that it checks that element counts are equal. And why should it reject vectors? We need to support fptosi/fptoui for vectors.

kpn added inline comments.Jul 26 2018, 9:59 AM
lib/CodeGen/StrictFP.cpp
172

It doesn't reject them, now. This latest patch added support for vectors. My inexperience with Phabricator lost the comment that said that.

kpn updated this revision to Diff 157703.Jul 27 2018, 10:02 AM

Delete a line accidentally left in.

kpn updated this revision to Diff 161048.Aug 16 2018, 9:48 AM

Rebase. Ping.

kpn updated this revision to Diff 163844.Sep 4 2018, 9:17 AM

Rebase. Ping.

In D43515#1013383, @kpn wrote:

At the request of my employer's legal department:
Copyright © 2018 SAS Institute Inc., Cary, NC, USA. All Rights Reserved.

If it's not just a remark, but is supposed to have some legal/whatever meaning,
i'm not sure some comment in some review is the correct direction.

docs/LangRef.rst
13252

This probably has insufficient amount of ^.
Might want to actually test-build the docs.

Adding new constrained instrinsics and adding the pass should be separate patches I think. Changing the syntax of frem should be another patch.

docs/LangRef.rst
13190

This change should be in a separate patch. There's too much going on in this patch and this is easy to overlook.

kpn added a comment.Sep 10 2018, 10:44 AM

Adding new constrained instrinsics and adding the pass should be separate patches I think. Changing the syntax of frem should be another patch.

Will do.

kpn updated this revision to Diff 165782.Sep 17 2018, 9:50 AM

Split out changes as requested. This diff is just the four new intrinsics. The fptoui pass and the change to frem will be later.

I've also corrected some documentation issues in this iteration.

craig.topper added inline comments.Oct 3 2018, 5:22 PM
docs/LangRef.rst
13333

This reads funny. I think it should maybe be "result of truncating a floating point"

13367

This also reads funny

include/llvm/CodeGen/ISDOpcodes.h
534

These need comments. Does STRICT_FP_ROUND have the TRUNC argument that FP_ROUND has?

kpn added inline comments.Oct 4 2018, 6:42 AM
docs/LangRef.rst
13333

How about if I just copy the text used by the normal fptrunc instruction?

13367

Same as fptrunc. I could just copy the text from the fpext instruction?

include/llvm/CodeGen/ISDOpcodes.h
534

I could rearrange and lump them in with the non-strict versions of each. Then I'd just need one extra line restating that the STRICT_ versions prevent optimizations.

Yes, STRICT_FP_ROUND does have the TRUNC argument that FP_ROUND has, but it is currently always zero. Fixing this require rerouting this one strict node to go through the same codepath as the non-strict node. That would make it different from all the other constrained nodes. Should I go ahead and make that change? I _think_ it is safe if the TRUNC argument really does work like it is documented.

I also just noticed that I need to put back STRICT_FP_TO_UINT in at least one place. It should be everywhere STRICT_FP_TO_SINT is handled _except_ in the default lowering.

craig.topper added inline comments.Oct 4 2018, 10:51 AM
docs/LangRef.rst
13333

Sure

13367

Sure

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7118

This doesn't copy the second argument to FP_ROUJND over does it?

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6204

Is this adding a second argument to STRICT_FP_EXTEND as well? I don't think the non-strict FP_EXTEND has two arguments.

kpn added inline comments.Oct 4 2018, 10:53 AM
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7118

No. It should.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6204

Agreed. I'll fix it.

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7105

There are a lot of comments on this, so I may have missed something. Take with a grain of salt...

I don't think these are correct. These can trap so can't be speculatively executed. They would need a chain.

kpn added inline comments.Oct 4 2018, 11:58 AM
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7105

I couldn't figure out how to have these be chained but have the non-strict continue to not be chained. Too many things fell over if they didn't match.

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7105

I'm not an expert with this code, so cc @andrew.w.kaylor.

This doesn't seem like the right direction though. Maybe these unchained operations should be left to a different patch until a proper solution is found.

kpn marked 5 inline comments as done.Nov 1 2018, 9:08 AM
kpn updated this revision to Diff 172160.Nov 1 2018, 10:17 AM

Address review comments.

Add use of the chain to these four new SDNode types.

test/CodeGen/X86/fp-intrinsics.ll
281

Same here as the vector version below. Do we want the truncating convert? That was surprising to me.

346

Do we want unsigned convert tests too? fptoui?

I see that there are SystemZ tests to cover them, so maybe that's sufficient? Just pointing this out so others can see.

test/CodeGen/X86/vector-constrained-fp-intrinsics.ll
4372 ↗(On Diff #172160)

This surprised me. Should this be the truncating convert? Or should it be vcvtsd2si?

4701 ↗(On Diff #172160)

Same question as the scalar versions. Do we want <4 x float> to <4 x i32>/etc casts?

test/Feature/fp-intrinsics.ll
309

I haven't been following along closely, so please forgive if this was already discussed...

Should we have float->i32 casts too? Also double->i64?

311

Should 'fpext.f64' have a float argument instead of a double?

kpn added inline comments.Nov 5 2018, 12:36 PM
test/CodeGen/X86/fp-intrinsics.ll
346

The SystemZ tests target hardware new enough to lower to a single instruction.

The tests for fptoui on x86 use the default lowering, but the default lowering is disallowed since it does speculative execution and traps. The support for fixing that I had in this patch but was asked to split it out into another patch. So there's no constrained fptoui test here using the default lowering in this patch.

test/Feature/fp-intrinsics.ll
311

Probably, yes. Will fix.

kpn added inline comments.Nov 6 2018, 10:29 AM
test/CodeGen/X86/fp-intrinsics.ll
281

The strict intrinsic results in the same instruction as the regular fptosi instruction. Having them be the same means the mutation from strict to non-strict is working correctly.

And, yes, rounding towards zero is correct. That's why there's no rounding metadata.

test/CodeGen/X86/vector-constrained-fp-intrinsics.ll
4372 ↗(On Diff #172160)

Yes, it should be a truncating conversion.

4701 ↗(On Diff #172160)

On the odd chance that it may tickle the vector legalizer I'll go ahead and add tests with float.

Same answer as the scalar tests: Rounding towards zero is correct.

test/Feature/fp-intrinsics.ll
309

This is an opt test. I'm not sure we'd benefit from placing those extra tests here.

kpn updated this revision to Diff 172820.Nov 6 2018, 12:02 PM

Rebase.

Minor test changes.

include/llvm/CodeGen/ISDOpcodes.h
534

This ordering does not mesh with what is already in place. The STRICT_XXX opcodes are currently clustered together, where these new opcodes are grouped with their non-strict counterparts.

I'm not opposed to grouping the corresponding non-strict and strict opcodes, but that would probably be better left to a separate patch. I think it makes sense to keep everything uniform until a final decision is made, for clarity's sake.

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
3033

This doesn't seem correct. Shouldn't Node->getOperand(0) be the argument for FP_ROUND and the Chain for STRICT_FP_ROUND? Same for FP_EXTEND too.

Also, does using a truncating store provide us with the same trapping behavior as an explicit trunc instruction? I don't know off the top of my head, but it may be different. To be fair, a user probably won't care too much about optimizing away a trap, but the purists might.

3088

This seems incorrect, unless I missed something. The first operand to FP_TO_SINT should be the argument, but the first operand to STRICT_FP_TO_SINT should be the Chain. It does not look like expandFP_TO_SINT accounts for that.

kpn added a subscriber: kbarton.May 10 2019, 12:32 PM

A few minor comments about commoning asserts and using dyn_cast instead of cast<>.
Aside from that, I think this looks good. That said, I'm by no means an expert in this area so don't feel I'm qualified to give a final approval to commit.

lib/IR/Verifier.cpp
4513

Could you use the isFPOrFPVectorTy here to check both conditions, and then remove the assert in the else below?

4514

Can you use a dyn_cast here instead?
if (auto *OperandT = dyn_cast<VectorType>(Operand->getType())) {

do vector stuff

}

4529

Similarly, could you use isIntOrIntVectorTy here and remove the assert in the if/else below?

4553

same comment about commoning the asserts here.

4597

It looks like getArgOperand expects an unsigned. Is there a specific reason you are using an int here?
If it's possible that RoundingIdx is negative, then you should probably add an assert here before passing it to getArgOperand.

kpn marked 3 inline comments as done.May 15 2019, 10:55 AM

The only thing left in this ticket to commit is the fptosi and fptoui changes. I'm working on incorporating review comments from D55897 into this ticket and fixing other bugs that I'm coming across. I'll hopefully update this ticket next week.

lib/IR/Verifier.cpp
4514

Yes, that's much more concise.

4553

The fptrunc and fpext changes were split out into another ticket and updated there. That committed code is more concise in I suspect the way you are intending.

kpn added a subscriber: ajwock.May 21 2019, 6:26 AM
kpn updated this revision to Diff 201965.May 29 2019, 9:41 AM

Address review comments from here and from D55897.

This patch is now down to just handling fptosi and fptoui.

How would you feel about rebooting this as a new patch? There's a lot of irrelevant history here, and I feel like I'm missing some context as I review it.

In general, I see that you're down to just implementing the fptosi and fptoui cases. I'm concerned about what happens in the fptoui case. It's mentioned in a few of the earlier comments that the default expansion of this opcode introduces speculative exceptions, and if that's being handled in the latest implementation I haven't read it closely enough to see what's going on. If it is being handled, I'd expect to see a comment block somewhere explaining what's being done.

kpn added a comment.Jun 18 2019, 11:07 AM

How would you feel about rebooting this as a new patch? There's a lot of irrelevant history here, and I feel like I'm missing some context as I review it.

I can do that.

In general, I see that you're down to just implementing the fptosi and fptoui cases. I'm concerned about what happens in the fptoui case. It's mentioned in a few of the earlier comments that the default expansion of this opcode introduces speculative exceptions, and if that's being handled in the latest implementation I haven't read it closely enough to see what's going on. If it is being handled, I'd expect to see a comment block somewhere explaining what's being done.

I believe the speculative exceptions should be largely fixed by r348251, committed by rksimon. I changed the code to continue to use strict nodes when expanding a strict node. Since the strict nodes are chained, does that not solve the part of the problem not solved by r348251? Is the existing comment not enough, or do I need to copy some of the commit message into code comments?

In D43515#1548845, @kpn wrote:

I believe the speculative exceptions should be largely fixed by r348251, committed by rksimon. I changed the code to continue to use strict nodes when expanding a strict node. Since the strict nodes are chained, does that not solve the part of the problem not solved by r348251? Is the existing comment not enough, or do I need to copy some of the commit message into code comments?

Sorry, I wasn't aware of Simon's change. That definitely simplifies what needs to be done here, and, yes, the existing comment is sufficient.