Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Avoid migrating existing patches. Phabricator shutdown timeline

[AArch64] Add instruction selection for strict FP
ClosedPublic

Authored by john.brawn on Dec 2 2021, 5:06 AM.

Details

Summary

This consists of marking the various strict opcodes as legal, and adjusting instruction selection patterns so that 'op' is 'any_op'.

FP16 and vector instruction additionally require some extra work in lowering and legalization, so we can't set IsStrictFPEnabled just yet. Also more work needs to be done for full strict fp support (marking instructions that can raise exceptions as such, and modelling FPCR use for controlling rounding).

Diff Detail

Event Timeline

john.brawn created this revision.Dec 2 2021, 5:06 AM
john.brawn requested review of this revision.Dec 2 2021, 5:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 2 2021, 5:06 AM
kpn added a comment.Dec 2 2021, 10:53 AM

That flag tells front ends that strictfp will generate correct instructions. Don't set it if more work is needed for correct instructions. The support should at least have parity with X86 and SystemZ.

That flag tells front ends that strictfp will generate correct instructions. Don't set it if more work is needed for correct instructions. The support should at least have parity with X86 and SystemZ.

I think IsStrictFPEnabled is just the SelectionDAG flag. The frontend has a separate flag HasStrictFP.

john.brawn planned changes to this revision.Dec 3 2021, 5:24 AM

That flag tells front ends that strictfp will generate correct instructions. Don't set it if more work is needed for correct instructions. The support should at least have parity with X86 and SystemZ.

I think IsStrictFPEnabled is just the SelectionDAG flag. The frontend has a separate flag HasStrictFP.

Yes, without HasStrictFP in the clang TargetInfo the constrained fp instrinsics won't be emitted (unless you use -fexperimental-strict-floating-point).

On an unrelated note, I've noticed that for globalisel (used at -O0 in aarch64) to work I also need to adjust a few things.

If you could split things up a bit whilst you are at it, that might make it easier to get through the review bit at a time too.

llvm/test/CodeGen/AArch64/fp-intrinsics.ll
1–2

Using update_llc_test_checks would be good, I think.

john.brawn updated this revision to Diff 392044.Dec 6 2021, 6:29 AM

It turns out that though global isel is enabled at -O0 it also falls back to non-global isel when it sees things it can't handle, and there's several things unrelated to strict fp (or that are but aren't aarch64-specific) that cause this to happen. I've added the a run line to run the test with -global-isel=true anyway, but with the option that means we get this fallback behaviour. I'll post some follow-on patches later to fix some of these global isel things.

john.brawn added inline comments.Dec 6 2021, 6:40 AM
llvm/test/CodeGen/AArch64/fp-intrinsics.ll
1–2

It doesn't do a good job of handling the slight differences between the non-global-isel and global-isel output. It generates check lines that work for one but fail with the other.

This is still a pretty big patch. Is it possible to break it up into some logically separate parts?

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1403

Is it better to have a section like this which is all the "Strict-fp handling", or should all this code be next to the existing operations?

We already have ISD::STRICT_FP_TO_SINT above next to ISD::FP_TO_SINT. And if SVE was added I would expect it to be in the SVE section (not that this is super well laid out). Should the ISD::STRICT_FCOS be next to the ISD::FCOS?

llvm/test/CodeGen/AArch64/fp-intrinsics.ll
1–2

That might be because the order of the check prefixes matter when it generates the checks, from most-general to least general. This file is bound to get updated by someone at some point to run the scripts, it's the only way to keep files like this maintainable, we might as well do it now.

The file is very big though, and looks like it would be better as a few different test files.

This is still a pretty big patch. Is it possible to break it up into some logically separate parts?

The best I've been able to do is break out the parts related to fp16 legalization and lowering out into a separate patch, which I'll have ready later today.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1403

After fiddling about with things a bit it does look like it's probably better to have the handling of OP to be next to STRICT_OP, so I'll do that.

Split off fp16 handling to a separate patch, moved things round so STRICT_OP is grouped with OP.

john.brawn edited the summary of this revision. (Show Details)Dec 10 2021, 9:06 AM
john.brawn set the repository for this revision to rG LLVM Github Monorepo.
john.brawn planned changes to this revision.Dec 13 2021, 8:55 AM

I'm looking at the Clang.CodeGen::aarch64-neon-scalar-x-indexed-elem-constrained.c failure, and it looks like there are other constrained fp tests that are currently xfailed but maybe should actually pass with this patch.

Matt added a subscriber: Matt.Jan 4 2022, 9:24 AM
john.brawn updated this revision to Diff 398167.Jan 7 2022, 8:56 AM
john.brawn edited the summary of this revision. (Show Details)

Added some more addOperationAction lines to fix failures in tests. However looking at the tests there's some XFAILed tests that still fail due to more work needing to be done on handling of vector types. I'll do that in a separate patch.

Ping. I've also uploaded D117795 for fixes related to vector instructions.

Sorry for the delay - these patch is quite big and some of the details in constrained intrinsics can be.. subtle. It makes them difficult to review.

The best I've been able to do is break out the parts related to fp16 legalization and lowering out into a separate patch, which I'll have ready later today.

I'm not sure I see why all these parts need to be in a single patch. Smaller patches are much more preferable for review. Is it because the entire test file needs to be made to work at the same time? That file looks like it's testing too many separate intrinsics.

But I've tried to take a look through. Comments inline

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
655

Was LLROUND of f16 previously Legal by default? So this doesn't alter anything by explicitly marking it?

941–963

The formatting is apparently off here. I used to like the old code because you could do a search for ISD::FROUND and see all the setOperationAction(ISD::FROUND, VT, Legal) lines come up. Alas.

llvm/lib/Target/AArch64/AArch64InstrInfo.td
4986

Are optimizations like this desirable (or always valid?) for strict nodes? Same for the loads below. Do we have test cases for them?

6288

Do these make a lot of sense, with a strict fma but a non-strict fneg?

john.brawn planned changes to this revision.Jan 28 2022, 4:03 AM

I'll split off the parts that are purely patterns into separate patches.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
655

Yes, TargetLoweringBase::initActions declares a bunch of operations as Expand for all floating-point types except f16 which makes them Legal by default (possibly a bug in TargetLoweringBase::initActions).

941–963

clang-format is a bit strange about how it likes to format these kinds of for loops, in that it has three different schemes (entries in list closely packed; entries in list vertical-aligned using whitespace; each entry on its own line) that it chooses between, and adding or removing elements can cause the whole thing to be dramatically reformatted. It wants here to use the "each entry on its own line" scheme which defeats the point of using a for loop (making the thing more compact so you can see the "these are the operations that are expanded for v1f64" part on one screen) so I've ignored it.

llvm/lib/Target/AArch64/AArch64InstrInfo.td
4986

Anything where there's a one-to-one mapping between a strict selectiondag node and a floating-point instruction is fine (or more generally where we have an instruction sequence that can cause the same floating-point exceptions in the same order and respects rounding modes).

These aren't tested, no. Actually it probably makes sense to move these kinds of patterns into separate patches and add tests for them there, as that can be done separately from the basic isel stuff.

6288

fneg is purely a bit-flipping operation that doesn't have strict/non-strict versions and combining it doesn't change exception/rounding behaviour.

Removed patterns that aren't strictly required for isel, I'll create new patches for those.

Remove non-essential indexed mul/mla patterns that I missed in the last version.

The non-essential patterns are now D118485, D118487, D118489. There were also patterns involving load then scvtf/ucvft where it's not clear where they're tested and I haven't been able to trigger with a quick test, so I may just drop those.

john.brawn updated this revision to Diff 405206.Feb 2 2022, 3:56 AM

Moved a couple of setOperationAction lines from D117795 to here.

Sorry, I thought I had already left this comment from the last time I tried to review these.

Is a f128 fminimum / fmaximum something that should be handled? I don't see any tests for fminimum in general.

Moved a couple of setOperationAction lines from D117795 to here.

New patches are free. Feel free to use them :) The developer policy is quite clear on trying to keep patches small: https://llvm.org/docs/DeveloperPolicy.html#incremental-development.
This is fine for now so long as you answer the question inline though, and from what I can tell this patch appears OK, minus the nitpicking. about fminimum and formatting.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
941–963

I find the loop quite hard to read, to be honest. It is difficult to see which operations the loop is acting on without more clear structure. Maybe it's trying to do too much all at once? We should make sure all new code is clang-formatted. (I know there is still some old code that isn't formatted yet).

But I do acknowledge that I sometimes have opinions about these kind of things that are not shared with the rest of coders. Simpler is better in my opinion, and simpler isn't always smaller.

I think that if we add this, it will either be clang-formatted now or it will be formatting by someone in the future. So best to come up with something that looks OK when formatted. Does splitting the loop into strict and nonstrict help?

1490

FCmp are a pretty common operation. Is it not something we would want to support, without expanding?

Don't we have intrinsics that would expect to become a single vector operation?

Is a f128 fminimum / fmaximum something that should be handled? I don't see any tests for fminimum in general.

Yes, it should, I'll do that and add tests.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
941–963

I had a bit of a look at how clang-format is deciding how to format things here, and it turns out that the decision of how to format things in columns just tries a different numbers of columns, puts items in the order it sees them into columns, then discards any layout with over-long lines. So it won't try having one item span multiple columns, or break a line early when it would be over-long. How it decides between column layout and tightly packed I didn't manage to figure out.

Anyway, the upshot is that rearranging the items slightly means clang-format can successfully find a layout, so I'll be doing that.

1490

Yes, it's something we would want to support, hence the FIXME. And with regards to intrinsics, generating code that's correct is more important than using a single instruction.

john.brawn updated this revision to Diff 406514.Feb 7 2022, 9:55 AM

Adjusted loops so that the formatting matches what clang-format prefers. It turns out that non-strict f128 fmaximum/fminimum doesn't work, so I've added a FIXME comment about that and just added tests for non-f128.

dmgreen added inline comments.Feb 8 2022, 11:18 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
941–963

OK that's good. Clang format can sure be funny at times. Thanks for doing that.

1490

In my experience, FIXME's are things people leave in the code because they have no plan of fixing them. Correct code is of course important, but people also have an expectation of a certain level of performance from the intrinsics.

llvm/lib/Target/AArch64/AArch64InstrInfo.td
4033

Is there a test for FNMADD (and FNMSUB)? I don't see it in D118487 either.

4225

None of the test below seem to be Neon. Are they tested somewhere else?

john.brawn added inline comments.Feb 9 2022, 9:26 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1490

Yes, I'm not planning on fixing this. My primary goal here is to have code generation that is valid, and secondarily when we have some existing ISel patterns that apply to non-strict and would also be valid for strict adjust them to work with strict. Vector STRICT_FSETCC/STRICT_FSETCCS are more complex than this, hence leaving them as Expand (which is the default for strict vector operations anyway).

llvm/lib/Target/AArch64/AArch64InstrInfo.td
4225

D117795 adds neon tests.

john.brawn updated this revision to Diff 407189.Feb 9 2022, 9:30 AM

Added tests for fnmadd and fnmsub.

dmgreen accepted this revision.Feb 15 2022, 12:47 AM

OK. From what I can tell, this looks correct.

This revision is now accepted and ready to land.Feb 15 2022, 12:47 AM
This revision was landed with ongoing or failed builds.Feb 17 2022, 5:12 AM
This revision was automatically updated to reflect the committed changes.