This is an archive of the discontinued LLVM Phabricator instance.

[ARM][MVE] Fix a corner case of checking for MVE-I with -mfpu=none
ClosedPublic

Authored by chill on Jan 13 2020, 10:46 AM.

Details

Summary

-march=armv8.1-m.main+mve.fp+nomve -mfpu=none should disable FP registers and
instructions moving to/from FP registers.

This patch fixes the case when "+mve" (added to the feature list by "+mve.fp"),
is followed by "-mve" (added by "+nomve").

Diff Detail

Event Timeline

chill created this revision.Jan 13 2020, 10:46 AM

I guess this makes probably sense, but just checking why this should have the effect of enabling MVE-I? Is there prior art -mfpu=none has a similar effect? Has this been synchronised with the GCC community?
Do we need more tests, e.g. target parser unit tests, and some more negative tests fp16 isn't set or hard float abi?

Probably if I was designing these flags from scratch, I wouldn't choose the semantics of "-mfpu=none is *very* similar to -mfloat-abi=soft, only that it should not disable MVE-I.", but I guess we're following gcc here?

clang/lib/Driver/ToolChains/Arch/ARM.cpp
473

If we're targeting a chip with mve support, why is llvm::is_contained(Features, "+mve") false? Shouldn't the ARMTargetParser code that computes the full feature list add it?

chill added a comment.Jan 14 2020, 3:23 AM

I guess this makes probably sense, but just checking why this should have the effect of enabling MVE-I? Is there prior art -mfpu=none has a similar effect?

The overall theme is that MVE-I can be present without an FPU. Thus -mfpu=none should not disable MVE-I, whether explicitly requested by +mve or
implicitly by +mve.fp (MVE-FP requires MVE-I).

Has this been synchronised with the GCC community?

GCC does not have -mfpu=none.

Do we need more tests, e.g. target parser unit tests, and some more negative tests fp16 isn't set or hard float abi?

No, this patch does not change TargetParser, thus by itself does not require extra tests in TargetParser unit tests.
Yeah, I guess I could replicate all the -mfloat-abi=soft tests in arm-mfpu.c, but with -mfpu=none. (Perhaps, it should
have been done as a part of https://reviews.llvm.org/D71843, but this one is the same/follow-up anyway).

Probably if I was designing these flags from scratch, I wouldn't choose the semantics of "-mfpu=none is *very* similar to -mfloat-abi=soft, only that it should not disable MVE-I.", but I guess we're following gcc here?

MVE-I without an FPU is an architectural feature, so it only makes sense leaving MVE-I enabled after -mfpu=none.
We can go one or the other way only while handling -mfloat-abi=soft. I don't think GCC contains these changes yes, but we have agreed to propose the same thing
to GCC and Clang/LLVM.

chill marked an inline comment as done.Jan 14 2020, 3:47 AM
chill added inline comments.
clang/lib/Driver/ToolChains/Arch/ARM.cpp
473

The command line may not mention +mve at all, e.g. -march=armv8.1-m.main+mve.fp -mfpu=none. In that case "mve" feature is derived by the backend, but only if mve.fp is present.
I suppose the target parser could derive +mve from +mve.fp (say, in appendArchExtFeatures).

I'll try that and see if it would come up nicer.

In terms of the user-visible behavior, I guess my question is whether it would make sense to add "-mfpu=mve"/"-mfpu=mve.fp", and make "-mfpu=none" mean "no FP registers". I'm not sure why a user would specify "-mfpu=none" if they didn't want to disable the floating-point registers altogether.

chill marked 2 inline comments as done.Jan 15 2020, 5:52 AM
chill added inline comments.
clang/lib/Driver/ToolChains/Arch/ARM.cpp
473

This parent revision https://reviews.llvm.org/D72762 implements the above idea.

chill planned changes to this revision.Jan 15 2020, 5:52 AM
chill updated this revision to Diff 238316.Jan 15 2020, 10:40 AM
chill retitled this revision from [ARM][MVE] MVE-I should not be disabled by -mfpu=none to [ARM][MVE] Fix a corner case of checking for MVE-I with -mfpu=none.
chill edited the summary of this revision. (Show Details)

In terms of the user-visible behavior, I guess my question is whether it would make sense to add "-mfpu=mve"/"-mfpu=mve.fp", and make "-mfpu=none" mean "no FP registers". I'm not sure why a user would specify "-mfpu=none" if they didn't want to disable the floating-point registers altogether.

I don't find this approach particularly natural and unsurprising. When a user says -mfpu=X, "X" ought to be some FPU designation,
serving as a user-friendly name for a set of architectural features ("FPU means FPU" :D )
We not aiming at providing to users specifically the feature "disabling floating-point registers" as such, by using the -mfpu=... option,
-mfloat-abi=soft does that. Indeed, is was a consequence of disabling the floating-point instructions,
as the only ones, that access FP registers, but that's no longer the case, since the MVE-I (and some VMOVs) can access the FP register file,
without having an FP unit on chip (naturally, "FP-registers" is a misnomer, since they hold integer vectors in that case).

-mfpu=none would make sense in combinations like -mcpu=Foo -mfpu=none, where "Foo" is some future core, that has an FPU.
It is not redundant with respect to -mfloat-abi=soft, as it leaves the door open to passing integer vector arguments via the so-called "FP registers".

It is not redundant with respect to -mfloat-abi=soft, as it leaves the door open to passing integer vector arguments via the so-called "FP registers".

So then "-mfloat-abi=hard -mfpu=none" means "pass floating-point values in registers, but don't use any other floating-point operations"? Does the behavior vary depending on whether the target supports floating-point registers due to MVE, vs. some other "FP" feature?

When a user says -mfpu=X, "X" ought to be some FPU designation, serving as a user-friendly name for a set of architectural features ("FPU means FPU" :D )

We already use -mfpu to enable instructions that aren't actually floating-point operations. For example, -mfpu=neon enables integer vector operations.

While we're on the subject, it's always seemed odd to me in the first place that -mfloat-abi=soft prevents all use of the FP registers! By a literal reading of the option name, it surely ought to only disallow use of the FP registers at function call boundaries. I'd expect it to mean "Make my functions ABI-compatible with code built for a CPU with no FP regs at all, but within that constraint, do whatever will get the job done fastest"; using FP registers and hardware FP operations inside a function ought still to be legitimate, provided you transfer the return value back into the right integer register(s) once you've finished computing it.

chill added a comment.Jan 16 2020, 3:20 AM

It is not redundant with respect to -mfloat-abi=soft, as it leaves the door open to passing integer vector arguments via the so-called "FP registers".

So then "-mfloat-abi=hard -mfpu=none" means "pass floating-point values in registers, but don't use any other floating-point operations"?

No, it does not mean that, at least not yet, and it's not immediately obvious if that would be of any advantage. But it does affect integer vectors.

When a user says -mfpu=X, "X" ought to be some FPU designation, serving as a user-friendly name for a set of architectural features ("FPU means FPU" :D )

We already use -mfpu to enable instructions that aren't actually floating-point operations. For example, -mfpu=neon enables integer vector operations.

Fair enough, but it also enables floating-point operations, which a hypothetical -mfpu=mve would not and should not. Indeed, -mfpu=mve.fp is no worse than
-mfpu=neon, but then I would like to no spread mve and mvp.fp across opions.

tl;dr: IMHO, -march=...+mve+mve.fp is more natural than -mfpu=mve/-mfpu=mve.fp or -march=...+mve -mfpu=mve.fp.

It is not redundant with respect to -mfloat-abi=soft, as it leaves the door open to passing integer vector arguments via the so-called "FP registers".

So then "-mfloat-abi=hard -mfpu=none" means "pass floating-point values in registers, but don't use any other floating-point operations"?

No, it does not mean that, at least not yet, and it's not immediately obvious if that would be of any advantage. But it does affect integer vectors.

There are three alternatives here I can think of:

  1. "-mfloat-abi=hard -mfpu=none" is illegal
  2. "-mfloat-abi=hard -mfpu=none" is legal on targets that support MVE, and passes floating-point/vector values in registers
  3. "-mfloat-abi=hard -mfpu=none" is legal on targets that support MVE, and the ABI varies depending on whether "-mfpu=none" is specified.

(3) seems very unintuitive.

Yes, passing values in float registers when there isn't any arithmetic is slightly strange, but there's precedent: -mfloat-abi=hard -mfpu=vfpv4-sp-d16 passes double-precision values in float registers.

When a user says -mfpu=X, "X" ought to be some FPU designation, serving as a user-friendly name for a set of architectural features ("FPU means FPU" :D )

We already use -mfpu to enable instructions that aren't actually floating-point operations. For example, -mfpu=neon enables integer vector operations.

Fair enough, but it also enables floating-point operations, which a hypothetical -mfpu=mve would not and should not. Indeed, -mfpu=mve.fp is no worse than
-mfpu=neon, but then I would like to no spread mve and mvp.fp across opions.

tl;dr: IMHO, -march=...+mve+mve.fp is more natural than -mfpu=mve/-mfpu=mve.fp or -march=...+mve -mfpu=mve.fp.

You don't need to specify -mfpu at all, if you just want whatever the "-march/-mfloat-abi" implies.

Actually, can we just forbid using -mfpu for MVE targets? It doesn't really seem useful if the only legal value is "-mfpu=none".

chill added a comment.Jan 17 2020, 4:24 AM

It is not redundant with respect to -mfloat-abi=soft, as it leaves the door open to passing integer vector arguments via the so-called "FP registers".

So then "-mfloat-abi=hard -mfpu=none" means "pass floating-point values in registers, but don't use any other floating-point operations"?

No, it does not mean that, at least not yet, and it's not immediately obvious if that would be of any advantage. But it does affect integer vectors.

I'm sorry, I lied here a bit. With regard to scalar floating-point arguments/returns, if FP registers are present, they will be used
when -mfloat-abi=hard is given, regardless of the reason they are present, FPU or MVE, e.g.:

  • -march=armv8.1-m.main -mfpu=none -mfloat-abi=hard: floating-point arguments in GPRs
  • -march=armv8.1-m.main+mve -mfpu=none -mfloat-abi=hard: floating-point arguments in FPRs (with the parent patch)

There are three alternatives here I can think of:

  1. "-mfloat-abi=hard -mfpu=none" is illegal

That should not be illegal. With Armv8.1-M, it's valid to have FP registers and not have FPU/FP-operations.
Currently it's not illegal even with Armv7-a, the -mfloat-abi=hard is just silently ignored. Maybe it warrants a warning ...

  1. "-mfloat-abi=hard -mfpu=none" is legal on targets that support MVE, and passes floating-point/vector values in registers

Yes, it should be legal and -mfloat-abi=hard has an effect.

  1. "-mfloat-abi=hard -mfpu=none" is legal on targets that support MVE, and the ABI varies depending on whether "-mfpu=none" is specified.

(3) seems very unintuitive.

What are the options there (baseline -march=...+mve):

OptionsInteger vectorssingle-precision FP argsdouble-precision FP argsFP operationsComments
-mfpu=none -mfloat-abi=hardFPRFPRFPRNomore registers for passing arguments, need to constantly move to/from FP registers for performing operations
-mfpu=none -mfloat-abi=hardFPRGPRGPRNoless scalar argument registers, no extra/overhead moves between GPRs/FPRs (that's NOT how it works now)
-mfpu=fpv5-sp-d16 -mfloat-abi=hardFPRFPRFPRsingle-precisionmore registers for passing arguments, need to constantly move to/from FP register for performing double-precision operations
-mfpu=fpv5-sp-d16 -mfloat-abi=hardFPRFPRGPRsingle-precisionmore registers for passing arguments, no extra moves no extra/overhead moves between GPRs/FPRs for double-precision operations( (that's NOT how it works now)
-mfpu=fpv5-d16 -mfloat-abi=hard:FPRFPRFPRdouble- and single- precision

Certainly, some of the options may prove to make the wrong tradeoffs, but I don't think any of them stands out as obviously ridiculous.

Yes, passing values in float registers when there isn't any arithmetic is slightly strange, but there's precedent: -mfloat-abi=hard -mfpu=vfpv4-sp-d16 passes double-precision values in float registers.

When a user says -mfpu=X, "X" ought to be some FPU designation, serving as a user-friendly name for a set of architectural features ("FPU means FPU" :D )

We already use -mfpu to enable instructions that aren't actually floating-point operations. For example, -mfpu=neon enables integer vector operations.

Fair enough, but it also enables floating-point operations, which a hypothetical -mfpu=mve would not and should not. Indeed, -mfpu=mve.fp is no worse than
-mfpu=neon, but then I would like to no spread mve and mvp.fp across opions.

tl;dr: IMHO, -march=...+mve+mve.fp is more natural than -mfpu=mve/-mfpu=mve.fp or -march=...+mve -mfpu=mve.fp.

You don't need to specify -mfpu at all, if you just want whatever the "-march/-mfloat-abi" implies.

Actually, can we just forbid using -mfpu for MVE targets? It doesn't really seem useful if the only legal value is "-mfpu=none".

No, it's not the only legal value, an MVE-I target does not imply an FPU at all, and an MVE-FP target does not imply double-precision scalar operations.

I don't like fiddling with the ABI based on CPU features; it's common to mix code with different CPU features enabled (particularly for desktop/mobile CPUs; less so on microcontrollers, but it can still happen). If we want to support some other ABI for performance reasons, we should make the user request it explicitly.

No, it's not the only legal value, an MVE-I target does not imply an FPU at all, and an MVE-FP target does not imply double-precision scalar operations.

Let me see if I understand the potential combinations:

"-march=armv8.1-m.main+mve.fp" allows mve, vfp-single, and mve.fp, but not vfp-double
"-march=armv8.1-m.main+mve" allows mve, but not vfp or mve.fp
"-march=armv8.1-m.main+mve.fp -mfpu=none" allows mve, but not vfp or MVE.fp.
"-march=armv8.1-m.main+mve -mfpu=none" allows mve, but not vfp or MVE.fp.
"-march=armv8.1-m.main+mve.fp -mfpu=fp-armv8" allows mve, MVE.fp, vfp-single, and vfp-double.
"-march=armv8.1-m.main+mve -mfpu=fp-armv8" allows mve and vfp-single, and vfp-double, but not mve.fp.
"-march=armv8.1-m.main+mve.fp -mfpu=neon-fp-armv8" allows mve, mve.fp, vfp-single, vfp-double, and neon?

All of the above allow all vfp mov/load/store instructions; the only way to restrict those on a target with MVE is "-mfloat-abi=soft".

Is this right? Do we have documentation that describes this?

chill added a comment.Jan 20 2020, 6:02 AM

I don't like fiddling with the ABI based on CPU features; it's common to mix code with different CPU features enabled (particularly for desktop/mobile CPUs; less so on microcontrollers, but it can still happen).
If we want to support some other ABI for performance reasons, we should make the user request it explicitly.

Absolutely, we are definitely not proposing a new ABI.

No, it's not the only legal value, an MVE-I target does not imply an FPU at all, and an MVE-FP target does not imply double-precision scalar operations.

Let me see if I understand the potential combinations:

"-march=armv8.1-m.main+mve.fp" allows mve, vfp-single, and mve.fp, but not vfp-double
"-march=armv8.1-m.main+mve" allows mve, but not vfp or mve.fp
"-march=armv8.1-m.main+mve.fp -mfpu=none" allows mve, but not vfp or MVE.fp.
"-march=armv8.1-m.main+mve -mfpu=none" allows mve, but not vfp or MVE.fp.
"-march=armv8.1-m.main+mve.fp -mfpu=fp-armv8" allows mve, MVE.fp, vfp-single, and vfp-double.
"-march=armv8.1-m.main+mve -mfpu=fp-armv8" allows mve and vfp-single, and vfp-double, but not mve.fp.

Yes, except enabling the FPU. That's IMHO the ideal behaviour, but FPU is enabled by default for Armv8.x-M.Mainline (perhaps an oversight, the default differs even between 8.0-M and 8.1-M).
IMHO, FPU should not be enabled by -march=armv8.x-M.Main alone, since FP is an optional extension.

"-march=armv8.1-m.main+mve.fp -mfpu=neon-fp-armv8" allows mve, mve.fp, vfp-single, vfp-double, and neon?

Ideally this ought to give an error, since there's no NEON fort armv8-M.

All of the above allow all vfp mov/load/store instructions; the only way to restrict those on a target with MVE is "-mfloat-abi=soft".

Is this right?

Yes.

Do we have documentation that describes this?

We have the Armv8-M ARM, which describes the dependencies between optional architecture extensions (https://developer.arm.com/docs/ddi0553/latest).
It also documents moves to/from "FP"-registers, as requiring "Armv8-M Floating-point Extension only or MVE"

Okay, the combinations make sense, then.

It would be nice to add some documentation on the clang side; we currently have no documentation at all for -march or -mfpu in https://clang.llvm.org/docs/ .

chill added a comment.Jan 27 2020, 3:26 AM

I've a added a few words of documentation here: https://reviews.llvm.org/D73459

efriedma added inline comments.Jan 27 2020, 4:36 PM
clang/lib/Driver/ToolChains/Arch/ARM.cpp
282

How many times can "+mve"/"-mve" appear in the list? If more than once, do you need to use rfind?

"const auto&" doesn't really make sense here; you can just use "auto".

chill updated this revision to Diff 240836.Jan 28 2020, 4:19 AM
chill marked an inline comment as done.
This revision is now accepted and ready to land.Feb 10 2020, 6:41 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 11 2020, 4:04 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript