Page MenuHomePhabricator

[ARM][BFloat] Legalize bf16 type even without fullfp16.

Authored by simon_tatham on Jun 23 2020, 5:36 AM.



This change permits scalar bfloats to be loaded, stored, moved and
used as function call arguments and return values, whenever the bf16
feature is supported by the subtarget.

Previously that was only supported in the presence of the fullfp16
feature, because the code generation strategy depended on instructions
from that extension. This change adds alternative code generation
strategies so that those operations can be done even without fullfp16.

The strategy for loads and stores is to replace VLDRH/VSTRH with
integer LDRH/STRH plus a move between register classes. I've written
isel patterns for those, conditional on not having the fullfp16
feature (so that in the fullfp16 case, the existing patterns will
still be used).

For function arguments and returns, instead of writing isel patterns
to match VMOVhr and VMOVrh, I've avoided generating those SDNodes
in the first place, by factoring out the code that constructs them
into helper functions MoveToHPR and MoveFromHPR which have a
fallback for non-fullfp16 subtargets.

The current output code is not especially pretty: in the new test file
you can see unnecessary store/load pairs implementing no-op bitcasts,
and lots of pointless moves back and forth between FP registers and
GPRs. But it at least works, which is an improvement on the previous

Diff Detail

Event Timeline

simon_tatham created this revision.Jun 23 2020, 5:36 AM

Hi Simon, thanks for working on this. Looks good overall. A few remarks inline.


Shouldn't this and the following 4 lines be contitional to Subtarget->hasBF16() ?


Can you either define a new pattern with this predicate, or delete the FPRegs16Pat pattern and use the same syntax for uniformity? Just a personal preference.

595 ↗(On Diff #272692)

Why do we need this block of changes?

simon_tatham marked 4 inline comments as done.
simon_tatham edited the summary of this revision. (Show Details)
simon_tatham removed a reviewer: labrinea.
simon_tatham removed a subscriber: labrinea.

New version addressing review comments.


How did I miss that?! Thanks, good catch.

595 ↗(On Diff #272692)

I put it in because otherwise the Thumb invocations of llc in my new test file failed with the 'Unsupported addressing mode' assertion in this function.

However, now I look more closely, that's probably because I was using the wrong instruction – I was using LDRH for both Arm and Thumb, whereas I should have used t2LDRHi12 in Thumb. With that change, I don't seem to need this extra addressing mode support any more.

(But I couldn't tell you why it's not needed, because as far as I can tell LDRH itself still does use addressing mode 3, and is still selected in Arm state.)

(Sorry @labrinea – removing you as a reviewer was an unintentional side effect of a bad arc command.)

labrinea accepted this revision.Jun 23 2020, 7:17 AM

LGTM with a nit. Can you also remove FPRegs16Pat from now that is no longer used?

This revision is now accepted and ready to land.Jun 23 2020, 7:17 AM
dmgreen added inline comments.Jun 23 2020, 8:28 AM

I'm not sure if setAllExpand should be guarded by !hasFullFP16? In either case they would be in pretty much the same situation - most operations are illegal with bf16. Does something go wrong always doing this?


I would expect HasNoFPRegs16 + f16 to never come up? If so you could alternatively move that single pattern back into the instruction above.

It isn't unused – there are still some isel patterns using it, which I haven't touched in this commit.

simon_tatham marked an inline comment as done.Jun 23 2020, 8:35 AM
simon_tatham added inline comments.

I'll give that a try too.

Updated for @dmgreen's review suggestions, which both seem to work as far as I can see.

dmgreen accepted this revision.Jun 23 2020, 9:08 AM

Thanks. If that setAllExpand does start causing problems we can always modify it. LGTM

simon_tatham marked an inline comment as done.Jun 23 2020, 9:09 AM
simon_tatham added inline comments.

(Huh, where did my reply to this comment go? I must have forgotten to press 'Save Draft'.)

I did it that way in order to avoid affecting the hasFullFP16 code path, because that way I was confident I wouldn't break it as a side effect :-) But I've just tried out the version with setAllExpand unconditional, and it seems to work just as well.

This revision was automatically updated to reflect the committed changes.