This is an archive of the discontinued LLVM Phabricator instance.

[X86] Improve X86 cmpps/cmppd/cmpss/cmpsd intrinsics with strictfp
ClosedPublic

Authored by craig.topper on Jan 17 2020, 1:34 AM.

Details

Summary

The constrained fcmp intrinsics don't allow the TRUE/FALSE predicates.
Using them will assert. To workaround this I'm emitting the old X86 specific intrinsics that were never removed from the backend when we switched to using fcmp in IR. We have no way to mark them as being strict, but that's true of all target specific intrinsics so doesn't seem like we need to solve that here.

I've also added support for selecting between signaling and quiet.

Still need to support SAE which will require using a target specific
intrinsic. Also need to fix masking to not use an AND instruction
after the compare.

Diff Detail

Event Timeline

craig.topper created this revision.Jan 17 2020, 1:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 17 2020, 1:34 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

The constrained fcmp intrinsics don't allow the TRUE/FALSE predicates.

Hmm, maybe they should then? The only reason I didn't add them initially was that I wasn't sure they were useful for anything; if they are, it should be straightforward to add them back.

The constrained fcmp intrinsics don't allow the TRUE/FALSE predicates.

Hmm, maybe they should then? The only reason I didn't add them initially was that I wasn't sure they were useful for anything; if they are, it should be straightforward to add them back.

What would we lower it to on a target that doesn’t support it natively?

The constrained fcmp intrinsics don't allow the TRUE/FALSE predicates.

Hmm, maybe they should then? The only reason I didn't add them initially was that I wasn't sure they were useful for anything; if they are, it should be straightforward to add them back.

What would we lower it to on a target that doesn’t support it natively?

Any supported compare (quiet or signaling as appropriate, just so we get the correct exceptions), and then ignore the result (and use true/false constant result instead)?

The constrained fcmp intrinsics don't allow the TRUE/FALSE predicates.

Hmm, maybe they should then? The only reason I didn't add them initially was that I wasn't sure they were useful for anything; if they are, it should be straightforward to add them back.

What would we lower it to on a target that doesn’t support it natively?

Any supported compare (quiet or signaling as appropriate, just so we get the correct exceptions), and then ignore the result (and use true/false constant result instead)?

Sure. Is that something we want to force all targets to have to implement just to handle this case for X86? Unless we can come up with a generic DAG combine to pick a valid condition alternate so that the lowering code for each target doesn't have to deal with it.

The constrained fcmp intrinsics don't allow the TRUE/FALSE predicates.

Hmm, maybe they should then? The only reason I didn't add them initially was that I wasn't sure they were useful for anything; if they are, it should be straightforward to add them back.

What would we lower it to on a target that doesn’t support it natively?

Any supported compare (quiet or signaling as appropriate, just so we get the correct exceptions), and then ignore the result (and use true/false constant result instead)?

Sure. Is that something we want to force all targets to have to implement just to handle this case for X86? Unless we can come up with a generic DAG combine to pick a valid condition alternate so that the lowering code for each target doesn't have to deal with it.

Hmm, OK. Given that this X86-specific builtin code seems to be the only place in clang where FCMP_TRUE/FCMP_FALSE can ever get emitted anyway, I guess you might as well implement the workaround (use some other compare to get the exception) right there, just for X86.

The constrained fcmp intrinsics don't allow the TRUE/FALSE predicates.

Hmm, maybe they should then? The only reason I didn't add them initially was that I wasn't sure they were useful for anything; if they are, it should be straightforward to add them back.

What would we lower it to on a target that doesn’t support it natively?

Any supported compare (quiet or signaling as appropriate, just so we get the correct exceptions), and then ignore the result (and use true/false constant result instead)?

Sure. Is that something we want to force all targets to have to implement just to handle this case for X86? Unless we can come up with a generic DAG combine to pick a valid condition alternate so that the lowering code for each target doesn't have to deal with it.

Hmm, OK. Given that this X86-specific builtin code seems to be the only place in clang where FCMP_TRUE/FCMP_FALSE can ever get emitted anyway, I guess you might as well implement the workaround (use some other compare to get the exception) right there, just for X86.

I think I'd like to do that as a follow up. This patch as is will at least fix the assertion failure and get the signalling behavior correct for the non- TRUE/FALSE intrinsics. These are going to be far more common to use so fixing those seems like the priority.

craig.topper edited the summary of this revision. (Show Details)

Use x86 specific intrinsics for the true/false case.

andrew.w.kaylor accepted this revision.Jan 29 2020, 1:11 PM

lgtm

I have a couple of comments, but nothing that couldn't be addressed in a later patch.

clang/lib/CodeGen/CGBuiltin.cpp
12363

How hard would it be to generate a select with known safe values ahead of the compare in the constrained case?

clang/test/CodeGen/avx-builtins-constrained.c
170

Does this have the strictfp attribute here? I don't think we do anything with that, but it will likely be useful when we start handling strictfp for target-specific intrinsics.

This revision is now accepted and ready to land.Jan 29 2020, 1:11 PM
craig.topper marked 2 inline comments as done.Jan 29 2020, 3:13 PM
craig.topper added inline comments.
clang/lib/CodeGen/CGBuiltin.cpp
12363

Ultimately, I think we need target specific masked intrinsics or generic predicated compare intrinsics. Most of our X86 specific FP intrinsics are not exception correct because we aggressively removed masking when we probably shouldn't have.

clang/test/CodeGen/avx-builtins-constrained.c
170

I don't think it does. We didn't create it with createConstrainedFPCall so I don't think anything would have added it.

This revision was automatically updated to reflect the committed changes.