This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAGISel] Chain any mayRaiseFPException instruction created from a strict FP node
Needs ReviewPublic

Authored by craig.topper on Jun 7 2022, 2:48 PM.

Details

Summary

Tablegen doesn't set the OFPL_Chain flag for all the mayRaiseFPException
instructions in an isel output pattern. In practice, it only sets it
for the root of the output pattern and only if the pattern in the
instruction class in tablegen is empty or contains a strict FP node.

It's unclear to me that tablegen has enough information to set the
OPFL_Chain flag.

The result of this is that we don't always add a chain through
mayRaiseFPException instructions created during isel.

This patch updates the isel machinery to detect when we're emitting
nodes for a pattern that contained a strict FP node and will thread
a chain through all mayRaiseFPException nodes we create.

Some isel patterns particularly on PowerPC emit the same compare
instruction multiple times in their isel pattern and expects them
to be CSEd. To ensure they CSE, this patch avoids updating InputChain
until the end of the match or when we find an OPFL_Chain flag.

Hopefully fixes PR54617.

Diff Detail

Event Timeline

craig.topper created this revision.Jun 7 2022, 2:48 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 7 2022, 2:48 PM
craig.topper requested review of this revision.Jun 7 2022, 2:48 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 7 2022, 2:48 PM
efriedma added inline comments.Jun 7 2022, 2:57 PM
llvm/test/CodeGen/PowerPC/fp-strict-conv-f128.ll
625

What's up with the repeated fcmpo instructions? Are the compares repeated in the input, or does your code to ensure CSE not cover this case?

craig.topper added inline comments.Jun 7 2022, 3:12 PM
llvm/test/CodeGen/PowerPC/fp-strict-conv-f128.ll
625

They exist in the input to isel

Originating from type legalization.

Expand float operand: t14: i1,ch = strict_fsetccs t0, t5, ConstantFP:ppcf128<APFloat(4746794007248502784)>, setlt:ch
                                                                                 
Creating new node: t26: i1,ch = strict_fsetccs t0, t2, ConstantFP:f64<2.147484e+09>, setoeq:ch
Creating new node: t27: i1,ch = strict_fsetccs t26:1, t4, ConstantFP:f64<0.000000e+00>, setlt:ch
Creating new node: t28: i1 = and t26, t27                                        
Creating new node: t30: i1,ch = strict_fsetccs t27:1, t2, ConstantFP:f64<2.147484e+09>, setune:ch
Creating new node: t31: i1,ch = strict_fsetccs t30:1, t2, ConstantFP:f64<2.147484e+09>, setlt:ch
Creating new node: t32: i1 = and t30, t31                                        
Creating new node: t33: i1 = or t32, t2

Would it make this code any simpler to say that nodes that can raise FP exceptions always have a chain, even if the original operation isn't strict? I mean, that restricts scheduling a bit, but not sure we care. Maybe it doesn't help, though...

How many PowerPC patterns need the CSE code? Would it make sense to get rid of the CSE code, and just reimplement the relevant patterns using custom lowering, or something like that?

Besides that, I can't see any way to reduce the complexity here. Maybe extract the "Update InputChain if there are any strict fp nodes" into a helper, since it looks like it's copy-pasted a few times.

How many PowerPC patterns need the CSE code? Would it make sense to get rid of the CSE code, and just reimplement the relevant patterns using custom lowering, or something like that?

It's basically anything that uses CRNotPat in PPCInstrInfo.td. This is because crnot is defined as

def crnot : OutPatFrag<(ops node:$in),
                      (CRNOR $in, $in)>;

which replicates the $in. I think this is quite a few patterns to custom select.

I tried replacing CRNOR $in, $in with CRXOR $in, (CRSET) and rely on PPCDAGToDAGISel::PeepholeCROps to turn it into a CRNOR. This works except at -O0.

Another option could be to add a CRNOT CodeGenOnly unary opcode.

@nemanjai what do you think?

From a SystemZ perspective this looks good to me.

llvm/test/CodeGen/SystemZ/vector-constrained-fp-intrinsics.ll
4301

Not sure why this patch causes the two output registers to be computed in reverse order now, but either order should be fine. (And now the order in vectorized code matches the order in scalar code ...)

How many PowerPC patterns need the CSE code? Would it make sense to get rid of the CSE code, and just reimplement the relevant patterns using custom lowering, or something like that?

It's basically anything that uses CRNotPat in PPCInstrInfo.td. This is because crnot is defined as

def crnot : OutPatFrag<(ops node:$in),
                      (CRNOR $in, $in)>;

which replicates the $in. I think this is quite a few patterns to custom select.

I tried replacing CRNOR $in, $in with CRXOR $in, (CRSET) and rely on PPCDAGToDAGISel::PeepholeCROps to turn it into a CRNOR. This works except at -O0.

Another option could be to add a CRNOT CodeGenOnly unary opcode.

@nemanjai what do you think?

I am really sorry about the delay in responding to this. I think a unary pseudo for CRNOT is a perfectly reasonable solution and is in line with other instructions for which we wanted to avoid duplicating inputs that have chains (such as XXPERMDIs, etc.).

I'll put up a patch to do this ASAP.

I am really sorry about the delay in responding to this. I think a unary pseudo for CRNOT is a perfectly reasonable solution and is in line with other instructions for which we wanted to avoid duplicating inputs that have chains (such as XXPERMDIs, etc.).

I'll put up a patch to do this ASAP.

Posted https://reviews.llvm.org/D133577 that handles this.