This is an archive of the discontinued LLVM Phabricator instance.

Make -mgeneral-regs-only more like GCC's
Needs ReviewPublic

Authored by george.burgess.iv on Oct 2 2017, 2:50 PM.

Details

Summary

(Copy/pasting the reviewer list from D26856.)

Addresses https://bugs.llvm.org/show_bug.cgi?id=30792 .

In GCC, -mgeneral-regs-only emits errors upon trying to emit floating-point or
vector operations that originate from C/C++ (but not inline assembly).
Currently, our behavior is to accept those, but we proceed to "try to form
some new horribly unsupported soft-float ABI."

Additionally, the way that we disable vector/FP ops causes us to crash when
inline assembly uses any vector/FP operations, which is bad.

This patch attempts to address these by:

  • making -mgeneral-regs-only behave exactly like -mno-implicit-float to the backend, which lets inline assembly use FP regs/operations as much as it wants, and
  • emitting errors for any floating-point expressions/operations we encounter in the frontend.

The latter is the more interesting bit. We want to allow declarations with
floats/vectors as much as possible, but the moment that we actually
try to use a float/vector, we should diagnose it. In less words:

float f(float a); // OK
int j = f(1); // Not OK on two points: returns a float, takes a float
float f(float a) {  // Not OK: defines a function that takes a float and returns
                    // a float
  return 0; // Not OK: 0 is converted to a float.
}

A trivial implementation of this leaves us with a terrible diagnostic
experience (e.g.

int r() {
  int i = 0, j = 0;
  return 1.0 + i + j;
}

emits many, many diagnostics about implicit float casts, floating adds, etc.
Other kinds of diagnostics, like diagnosing default args, are also very
low-quality), so the majority of this patch is an attempt to handle common
cases more gracefully.

Since the target audience for this is presumably very small, and the cost of not
emitting a diagnostic when we should is *a lot* of debugging, I erred on the
side of simplicity for a lot of this. I think this patch does a reasonably good
job of offering targeted error messages in the majority of cases.

There are a few cases where we'll allow floating-point/vector values to
conceptually be used:

int i = 1.0 + 1; // OK: guaranteed to fold to an int

float foo();
int bar(int i = foo()); // OK: just a decl.

int baz() {
  int a = bar(1); // OK: we never actually call foo().
  int b = bar(); // Error: calling foo().
  return a + b;
}

struct A { float b; };

void qux(struct A *a, struct A *b) {
  // OK: we've been codegening @llvm.memcpys for this seemingly since 2012.
  // For the moment, this bit is C-only, and very constrained (e.g. assignments
  // only, rhs must trivially be an lvalue, ...).
  *a = *b;
}

The vibe I got from the bug is that the soft-float incantations we currently
emit when using -mgeneral-regs-only are basically unused, so I'm unsure
if we want a flag/option that lets users flip back to the current
-mgeneral-regs-only behavior. This patch lacks that feature, but I'm happy
to add it if people believe doing so would be valuable.

One final note: this may seem like a problem better solved in CodeGen.
I avoided doing so because that approach had a few downsides:

  • how we codegen any expression that might have FP becomes observable,
  • checks for this become spread out across many, many places, making it really easy to miss a case/forget to add it in a new place (see the above "few users, bugs are expensive" point), and
  • it seems really difficult to "look upwards" in CodeGen to pattern match these into nicer diagnostics, especially in the case of default arguments, etc.

Diff Detail

Event Timeline

davide added a subscriber: davide.Oct 2 2017, 3:12 PM
srhines added a subscriber: srhines.Oct 2 2017, 3:16 PM

As far as I can see, there are three significant issues with the current -mgeneral-regs-only:

  1. We don't correctly ignore inline asm clobbers for registers which aren't allocatable (https://bugs.llvm.org/show_bug.cgi?id=30792)
  2. We don't diagnose calls which need vector registers according to the C calling convention.
  3. We don't diagnose operations which have to be lowered to libcalls which need vector registers according to the C calling convention (fptosi, @llvm.sin.*, etc.).

All three of these could be addressesed in the AArch64 backend in a straightforward manner.

Diagnosing floating-point operations in Sema in addition to whatever backend fixes we might want is fine, I guess, but I don't really like making "-mgeneral-regs-only" into "-mno-implicit-float" plus some diagnostics; other frontends don't benefit from this checking, and using no-implicit-float is asking for an obscure miscompile if IR generation or an optimization accidentally produces a floating-point value.

I like the idea of fixing those things, too; I'll start poking them soon. :)

Even if we do end up fixing all of that, I still think it would be good to try to diagnose this in the frontend. So, if anyone has comments on this while I'm staring at the aarch64 backend, please let me know.

test/CodeGen/aarch64-mgeneral_regs_only.c
2 ↗(On Diff #117418)

Oops, forgot a RUN: here.

rengolin edited edge metadata.Oct 4 2017, 5:41 AM
  1. We don't correctly ignore inline asm clobbers for registers which aren't allocatable (https://bugs.llvm.org/show_bug.cgi?id=30792)

This looks like a different (but related) issue. That should be fixed in the back-end, regardless of the new error messages.

  1. We don't diagnose calls which need vector registers according to the C calling convention.

The function that checks for it returns true for vectors (lib/Sema/SemaExprCXX.cpp:7487). However, the tests cover floating point, but they don't cover vector calls (arm_neon.h).

  1. We don't diagnose operations which have to be lowered to libcalls which need vector registers according to the C calling convention (fptosi, @llvm.sin.*, etc.).

Yup. That's bad.

All three of these could be addressesed in the AArch64 backend in a straightforward manner.

I worry about declarations and front-end optimisations, which may move the line info to a completely different place (where it's used, for example). But I have no basis for that worry. :)

Diagnosing floating-point operations in Sema in addition to whatever backend fixes we might want is fine, I guess, but I don't really like making "-mgeneral-regs-only" into "-mno-implicit-float" plus some diagnostics; other frontends don't benefit from this checking, and using no-implicit-float is asking for an obscure miscompile if IR generation or an optimization accidentally produces a floating-point value.

I agree with both statements. But it would be good to have the errors in Sema in addition to the back-end fixes, if there are cases where Clang's diagnostics is more precise on line info and carat position.

lib/Sema/SemaExprCXX.cpp
7477 ↗(On Diff #117418)

The TypeCache object seems local.

It doesn't look like it needs to survive outside of this function, as per its usage and the comment:

// We may see recursive types in broken code.

and it just adds another argument passing.

george.burgess.iv marked 2 inline comments as done.

Addressed feedback. Thanks!

After looking around and internalizing a little bit of how backends in LLVM work, the path forward I have in mind is to basically funnel another bit to the backend that indicates whether the user specified -mgeneral-regs-only. If so, that flag will act like "-crypto,-fp-armv8,-neon" when we're setting up the AArch64 backend (e.g. floats/vectors will still be illegal, vector regclasses won't be added, ...). Importantly, this approach wouldn't actually remove the aforementioned feature flags, so I think it'll allow our assembler to handle non-general ops without any extra effort. We can also query this bit to see if we want to diagnose calls where the ABI mandates that vector regs are used. Thoughts welcome :)

However, the tests cover floating point, but they don't cover vector calls (arm_neon.h).

#include <arm_neon.h> gives me ~12,000 errors, presumably because there are so many functions that take vectors/floats defined in it. The hope was that calling bar and foo in aarch64-mgeneral_regs_only.c would cover similar cases when the banned type was a vector. Is there another another case you had in mind?

lib/Sema/SemaExprCXX.cpp
7477 ↗(On Diff #117418)

Good point. I just ended up making this conservative in the face of broken code; should work just as well, and we don't have to deal with sketchy corner cases. :)

void added a subscriber: void.Apr 6 2018, 10:57 AM

Hallo! I was wondering what the status of this patch was. :-)

Hi! It fell off my radar, but I'm happy to put it back on my queue. :)

There's still a few aarch64-specific backend bits I need to fix before this patch should go in.

parched removed a subscriber: parched.Aug 11 2018, 5:48 AM
parched added a subscriber: parched.

Bump, this is still listed as a TODO in the Linux kernel that works around the issue.

phosek added a subscriber: phosek.May 17 2019, 7:55 PM

We (Fuchsia) would like to see this landed as well so we can start using this in our kernel.

void added a comment.May 19 2019, 2:44 AM

We (Fuchsia) would like to see this landed as well so we can start using this in our kernel.

I get the feeling that this patch has been abandoned by the author. Would someone like to resurrect it?

I'm happy to give rebasing it a shot later this week. My recollection of the prior state of this patch was that we wanted some backend work done to double-check that no illegal ops get generated by optimizations and such, since these checks are purely done in the frontend. I don't foresee myself having time in the near future to make that happen, so is that something that we want to continue to block this patch on? If so, then someone else is probably going to need to do that piece. Otherwise, I think people were happy enough with this patch as-is?

We don't necessarily need to block the clang changes on the backend error reporting actually being implemented, I guess, if the architecture we want is settled.

With this patch, do we pass the general-regs-only attribute to the backend? If so, would that be the attribute we'd want to check to emit errors from the backend from any "accidental" floating-point operations?

clang/lib/Sema/SemaExprCXX.cpp
7921

Do you really want to enforce isStruct() here? That's types declared with the keyword "struct".

7938

Do we have to be concerned about base classes here?

efriedma added inline comments.May 21 2019, 12:23 PM
clang/lib/Sema/SemaExprCXX.cpp
8032

We don't always lower struct copies to memcpy(); I'm not sure this is safe.

8084

Just because we can constant-fold an expression, doesn't mean we will, especially at -O0.

void added inline comments.May 22 2019, 11:25 AM
clang/include/clang/Basic/LangOptions.def
145

Everywhere else you use "general regs only" instead of "ops". Should that be done here?

george.burgess.iv marked 10 inline comments as done.

Addressed feedback, modulo the constant foldable comment thread.

Thanks for the feedback!

With this patch, do we pass the general-regs-only attribute to the backend? If so, would that be the attribute we'd want to check to emit errors from the backend from any "accidental" floating-point operations?

Yeah, the current design is for us to pass +general-regs-only as a target 'feature' per function. Given that there's no code to actually handle that at the moment, I've put a FIXME in its place. Please let me know if there's a better way to go about this.

clang/include/clang/Basic/LangOptions.def
145

Yeah, I'm not sure why I named it Ops. Fixed

clang/lib/Sema/SemaExprCXX.cpp
7921

Good catch -- generalized this.

7938

Yup. Added tests for this, too

8032

I see; removed. If this check ends up being important (it doesn't seem to be in local builds), we can revisit. :)

8084

Are there any guarantees that we offer along these lines? The code in particular that this cares about boils down to a bunch of integer literals doing mixed math with FP literals, all of which gets casted to an int. Conceptually, it seems silly to me to emit an addition for something as straightforward as int i = 1 + 2.0;, even at -O0, though I totally agree that you're right, and codegen like this is reasonable at -O0: https://godbolt.org/z/NS0L17

(This also brings up a good point: this visitor probably shouldn't be run on IsConstexpr expressions; fixed that later on)

efriedma added inline comments.May 23 2019, 4:42 PM
clang/lib/Sema/SemaExprCXX.cpp
8084

On trunk, we now have the notion of a ConstantExpr; this represents an expression which the language guarantees must be constant-evaluated. For example, initializers for static variables in C are always constant-evaluated.

(On a related note, now that we have ConstantExpr, the IsConstexpr operand to ActOnFinishFullExpr probably isn't necessary.)

Beyond that, no, we don't really have any guarantees. We may or may not try to constant-evaluate an expression, depending on whether we think it'll save compile-time. For example, we try to fold branch conditions to avoid emitting the guarded block of code, but we don't try to fold the initialization of an arbitrary variable.

I don't think we want to introduce any additional guarantees here, if we can avoid it.

Something I ran into when reviewing https://reviews.llvm.org/D62639 is that on AArch64, for varargs functions, we emit floating-point stores when noimplicitfloat is specified. That seems fine for -mno-implicit-float, but maybe not for -mgeneral-regs-only?

george.burgess.iv marked 4 inline comments as done.
george.burgess.iv added a reviewer: efriedma.

Chatted with Eli offline; updated here to reflect the conclusions of that.

Importantly, this patch readds some of the peepholes we try to not diagnose, since the target users of this quite commonly do things that, after macro expansion, fold into e.g., (int)(3.0 + 1). By wrapping these into ConstantExprs at the cast point, we get our nice guaranteed lowering to 0 FP/vector ops in IR.

Similarly for struct assignment, I couldn't find a way to get an assignment of a struct of multiple fields to turn into a not-memcpy, so it seems safe to me to keep that around. I have tests to this effect, and am happy to add more if people can think of cases these tests may not adequately cover.

Apologies for the latency of my updates.

Something I ran into when reviewing https://reviews.llvm.org/D62639 is that on AArch64, for varargs functions, we emit floating-point stores when noimplicitfloat is specified. That seems fine for -mno-implicit-float, but maybe not for -mgeneral-regs-only?

Interesting. Yeah, -mgeneral-regs-only definitely doesn't want to use FP for varargs calls. As discussed offline, this doesn't appear to be an issue today, and is something we can look into in the future.

clang/lib/Sema/SemaExprCXX.cpp
8032

(readded as noted)

8084

Resolving per offline discussion; anything wrapped in ConstantExpr is no longer checked, and we no longer try to constant evaluate anything else.

On a related note, now that we have ConstantExpr, the IsConstexpr operand to ActOnFinishFullExpr probably isn't necessary

Looks like the lack of IsConstexpr fails to save us from one case:

constexpr float x = 1.;

In this context, all we have to work with from this point is an IsConstexpr IntegerLiteral

Does this have any significant impact on -fsyntax-only performance?

Hopefully @rsmith can take a quick look at the use of ConstantExpr here; I think it's fine, but we don't use ConstantExpr like that elsewhere.

clang/lib/Driver/ToolChains/Arch/AArch64.cpp
190–191

Do we need to get rid of the "-neon" etc.?

193

Can we add the feature now, even if it isn't backed by any actual code in the backend yet?

clang/lib/Sema/SemaExprCXX.cpp
8003

Should this be E->isRValue()? Not that the difference really matters for C, but it affects rvalue references in C++.

george.burgess.iv marked 6 inline comments as done.

Addressed feedback

Herald added a project: Restricted Project. · View Herald TranscriptSep 30 2019, 5:31 PM

Does this have any significant impact on -fsyntax-only performance?

I'm sure there are pathological cases where this hurts perf, but my intuition tells me that we won't get bitten badly by any of them in the real world. It should be a branch per cast + full expr for people who don't use it. For those who do, we're walking 2 types for each cast (plus maybe constexpr evaluations for casts from FP/vec to non-FP/vec values), and walking the types for each FullExpr.

Again, you can likely craft code that makes this expensive (which we can likely fix with strategically-placed caches), but being bitten by that in practice seems somewhat unlikely to me. Happy to try and add caching and/or take numbers with caches if you'd like.

To evaluate the build time impact of this, I did 20 builds of aarch64 Linux with/without this patch. Of the metrics I tracked (memory, system/user/wall time), all reported differences were < their stdev except for user time. User time regression with this patch was 0.37%, with a stdev of 0.12%. Happy to try to gather -fsyntax-only numbers if there's a simple way to do that.

clang/lib/Driver/ToolChains/Arch/AArch64.cpp
190–191

Yeah, good call. Looks like I forgot a "RUN:" in my tests checking for whether or not asm works, so the thing that should've caught that wasn't enabled. :)

clang/lib/Sema/SemaExprCXX.cpp
8003

Yeah, flipped to that and added tests -- thanks

srjek added a subscriber: srjek.Jan 10 2021, 9:38 AM
ychen added a subscriber: ychen.Feb 4 2021, 6:08 PM