This is an archive of the discontinued LLVM Phabricator instance.

[X86][FastISel] Fix with.overflow eflags clobber (PR49587)
ClosedPublic

Authored by nikic on Mar 14 2021, 8:52 AM.

Details

Summary

If the successor block has a phi node, then additional moves may be inserted into predecessors, which may clobber eflags. Don't try to fold the with.overflow result into the branch in that case.

This is done by explicitly checking for any phis in successor blocks, not sure if there's some more principled way to address this. Other fused compare and branch patterns avoid the issue by emitting the comparison when handling the branch, so that no instructions may be inserted in between. In this case, the with.overflow call is emitted separately (and I don't think this is avoidable, as it will generally have at least two users).

Fixes https://bugs.llvm.org/show_bug.cgi?id=49587.

Diff Detail

Event Timeline

nikic created this revision.Mar 14 2021, 8:52 AM
nikic requested review of this revision.Mar 14 2021, 8:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 14 2021, 8:52 AM
nagisa added a subscriber: nagisa.Mar 15 2021, 4:21 AM

My feeling is that this fix is a work-around for what is a regression llvm11->12. Here's a comment I had sitting incomplete since yesterday in bugzilla:

Short summary of what I think is happening here:

FastISel::handlePHINodesInSuccessorBlocks is called from FastISel::selectInstruction before the branch is lowered. This generates a constant 0, clobbering the flags. Later, when lowering the branch itself, we have this code:

https://github.com/llvm/llvm-project/blob/fcdf7f6224610a51dc2ff47f2f1e3377329b64a7/llvm/lib/Target/X86/X86FastISel.cpp#L1758-L1769

which figures it is able to fold no problem.

In versions <=11 the handlePHINodesInSuccessorBlocks would insert the instruction for producing the constant 0 at the beginning of the MachineBB, now it only does so right before looking at the terminator.

Spent some time bisecting, the regression is somewhere in the range of a57def30f539..d79642b3db1d.

I'm hoping to finish the bisection today and look at the reasoning for the diff/commit that introduced the regression in the first place. If its not serious, it might make more sense to simply revert it.

nikic added a comment.Mar 15 2021, 4:28 AM

@nagisa I'm pretty sure that this issue got exposed as a side-effect of D91734, but I don't think that change is to blame -- it just breaks some assumptions in existing code regarding where local values can be (re)materialized.

Yeah that looks like the one.

Alright, in that case, then, there are a couple other alternative approaches that I think would likely address this class of problems in a more general way:

First, changing the emission of constant->register moves to always produce a mov, thus no longer clobbering the flags ever. This way adding arbitrary constant production all over the place wouldn't have potentially far reaching consequences.

Alternatively, mayhaps it is possible to check that we haven't inserted any machine instructions between the code generated for the intrinsic and the branch? That would make the "Make sure nothing is in the way" check significantly more direct, I feel.

(But also, feel free to ignore me, I'm not nowhere near familiar enough with FastISel to be able to say what it ought to look like)

Yeah that looks like the one.

Alright, in that case, then, there are a couple other alternative approaches that I think would likely address this class of problems in a more general way:

First, changing the emission of constant->register moves to always produce a mov, thus no longer clobbering the flags ever. This way adding arbitrary constant production all over the place wouldn't have potentially far reaching consequences.

That would be a possibility, but does mean that we'd be using mov instead of xor for all zero constant materializations, which seems rather undesirable to me. At least if we don't want to add special-case logic just for phis.

Alternatively, mayhaps it is possible to check that we haven't inserted any machine instructions between the code generated for the intrinsic and the branch? That would make the "Make sure nothing is in the way" check significantly more direct, I feel.

This isn't possible because FastISel selects instructions in reverse order. This means that we first select the terminator (where we have to make the decision on whether to fold), then we select the phi movement, and then we select the overflow intrinsic.

TBH this looks like a relatively safe workaround to me - I mentioned it to @probinson who was of the opinion that D91734 brought the issue to light as it tends to move constant materialization further down a basic block making these kind of clashes more likely.

Any more comments?

nagisa added a comment.EditedMar 28 2021, 5:27 AM

None from me. As I said I'm not too qualified to have much say in fastisel, so my other comments are broadly ignorable. I do think that some fix for this needs to land (and be nominated for a backport into LLVM 12) sooner rather than later

nikic added a comment.Mar 28 2021, 6:45 AM

For reference, here's how the alternative approach of not using xor for zero initialization would be look like (if you ignore some test update failures): https://gist.github.com/nikic/04876ffe96a5ee2ffb4be102cbbb8d60

RKSimon accepted this revision.Mar 29 2021, 3:00 AM

LGTM

This revision is now accepted and ready to land.Mar 29 2021, 3:00 AM
This revision was automatically updated to reflect the committed changes.