Reserving registers x1-7 is used to support CONFIG_ARM64_LSE_ATOMICS in Linux kernel. This change adds support for reserving registers x1 through x7.
Details
Diff Detail
Event Timeline
| test/CodeGen/AArch64/arm64-platform-reg.ll | ||
|---|---|---|
| 16 | It would probably be good to test that multiple/all of these options work together as well. | |
x1-x7 are argument registers in the calling convention; what's supposed to happen if there's a call in the code?
I can see two possibilities:
- We emit an error.
- We change the calling convention so it doesn't use the reserved register.
Your patch implements neither of these choices; instead, it will just miscompile.
Changing the calling convention seems like it would be potentially surprising, so I would think that emitting an error is the preferred case here. This feature is being used in areas of the Linux kernel already, and is mostly intended for experts to fine-tune their own calling conventions in some performance-sensitive areas.
I believe this is aiming to implement the -ffixed-reg command line option from gcc.
The documentation is at https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html:
-ffixed-reg
Treat the register named reg as a fixed register; generated code should never refer to it (except perhaps as a stack pointer, frame pointer or in some other fixed role).reg must be the name of a register. The register names accepted are machine-specific and are defined in the REGISTER_NAMES macro in the machine description macro file.
This flag does not have a negative form, because it specifies a three-way choice.
A few thoughts:
- From the documentation and a few related StackOverflow questions I get the impression that in the gcc implementation, the programmer is supposed to understand what he's doing when using this option. Not sure if gcc provides any error messages or in exactly which circumstances it will error. My hand wavy feel is that it'd be nice to at least warn when something seems incorrect - but given this is a command line option that seems to be used by "users who know best", the compiler giving an error may prevent the user from doing something he understands and needs to be done and cannot be achieved in any other way?
- It seems this command line option was discussed years ago too: http://lists.llvm.org/pipermail/llvm-dev/2012-October/054033.html. I think Chris's point about LTO compilation may be especially relevant. Shouldn't this be implemented as a function attribute to make sure this won't break in LTO builds?
- I assume we'll end up supporting this feature for multiple/all targets. A target-independent way to support this would be nice. My guess is that incrementally introducing this with only AArch64 support to start with would be fine, as long as there is an idea of what the path forward is to make this feature available for all targets.
- How hard would it be to generalize this to be able to specify reserving any register, not just the X and (implicitly) W registers, automatically derived from the register info available in lib/Target/AArch64/AArch64RegisterInfo.td?
Shouldn't this be implemented as a function attribute to make sure this won't break in LTO builds?
You can express target features as function attributes: `"target-features"="+reserve-x1" etc. Maybe worth adding a testcase to show that works.
the compiler giving an error may prevent the user from doing something he understands and needs to be done and cannot be achieved in any other way
I'm mostly worried we'll end up in situations where the option appears to work, but then miscompiles with a newer compiler, or different optimization options. For example, the code you explicitly wrote doesn't pass arguments in x7, but argpromotion increases the number of arguments to a function. Or your code doesn't pass arguments in x2, but -Os makes a struct assignment lower to a call to memcpy instead of expanding the copy inline. Or a user tries to reserve x16, we outline some code, and the linker inserts a stub that clobbers x16.
the compiler giving an error may prevent the user from doing something he understands and needs to be done and cannot be achieved in any other way
I'm mostly worried we'll end up in situations where the option appears to work, but then miscompiles with a newer compiler, or different optimization options. For example, the code you explicitly wrote doesn't pass arguments in x7, but argpromotion increases the number of arguments to a function. Or your code doesn't pass arguments in x2, but -Os makes a struct assignment lower to a call to memcpy instead of expanding the copy inline. Or a user tries to reserve x16, we outline some code, and the linker inserts a stub that clobbers x16.
That makes a lot of sense to me. I'm not sure how easy it would be to implement such an error/warning/diagnostic without needing to add some custom logic to a lot of passes. Did you happen to have any thoughts on what a feasible way might be to implement that?
For call argument registers, we only assign arguments to registers in one place, so it's easy to emit an error message if the IR contains a call which would require a reserved register. The tricky part would be avoiding spurious error messages. There are two potential sources of spurious error messages: calls to builtin functions (compiler-rt builtins/memcpy/etc.), and IPO passes which rewrite the calling convention.
For IPO passes, we could disallow rewriting the signature of functions using the C calling convention (so only fastcc functions would get rewritten), and change the calling convention so fastcc functions never use reserved registers. Only one problem with this: fastcc calls where the caller/callee have different target attributes currently misbehave (see also https://bugs.llvm.org/show_bug.cgi?id=37358 .)
For the builtin functions, not sure what the right solution would be; it gets awkward fast. If we're expecting that users will only use these for small amounts of code, we might be able to get away with just avoiding calls to builtins in common cases.
In practice, for the kernel's use-case, I think it's unlikely it would actually trigger either of these issues...? But hard to be sure.
(It's basically impossible to reliably reserve x16 or x17 given the linker constraints.)
Looking into it a bit more, I'm not sure -ffixed-x1 is actually what the kernel wants. They don't actually need to reserve the registers, just make them callee-save. clang has an attribute which can do this in a much more targeted way; see https://clang.llvm.org/docs/AttributeReference.html#preserve-all-clang-preserve-all-clang-preserve-all .
@efriedma Maybe not relevant to the patch here but our kernel devs were looking into preserve_all but it does not seem to work for AArch64.
$ cat test.c 
void attribute((preserve_all)) foo(void *ptr) { }
$ clang -c test.c -> compiles for x86_64
$ clang -c -target aarch64-unknown-linux-gnu
fatal error: error in backend: Unsupported calling convention.
I'm generally opposed to supporting the -ffixed- -fcall-used- and -fcall-saved- options from GCC. I think they're almost never the correct answer to a problem someone has.
In particular, here, it does seem that a calling convention annotation on the functions would be a significantly better way of spelling this.
Ugh, nevermind -- I take that back. *Old* clang didn't complain about it, new clang does. Haven't checked, maybe it was ignoring it before or something. :(
Anyhow, IMO, we should make that work, and not do this.
It does seem like the kernel only needs registers to be callee-saved for a small number of functions. I agree that calling convention annotation is a nicer solution. 
I'll investigate further if we can support CONFIG_ARM64_LSE_ATOMICS with function attributes instead of -ffixed-, etc.
Making registers callee-saved for CONFIG_ARM64_LSE_ATOMICS regresses kernel performance since it results in extra save/restores of argument registers. So for performance it's preferable to use -ffixed-x[1-7].
GCC implementation of -ffixed does not alter the calling convention, i.e. function calls use x0-x7 as argument registers regardless of -ffixed- flags. The flags prevent allocation of those registers. Linux kernel code relies on this behavior, e.g. https://github.com/torvalds/linux/blob/master/arch/arm64/lib/Makefile#L14. atomic_ll_sc.o is built with -ffixed-x[1-7], but its callers are not.
IIUC, miscompilation can happen if caller and callee have different sets of reserved registers. In this case we should emit a warning (not an error) since the user is expected handle such cases, e.g. kernel code https://github.com/torvalds/linux/blob/master/arch/arm64/include/asm/atomic_ll_sc.h
As efriedma pointed out, clang can (already does?) express -ffixed- flags as function attributes `"target-features"="+reserve-x1". So it should be possible to issue appropriate warnings during link time.
Kernel maintainers prefer if clang supported -ffixed flags for theirs use case. https://www.spinics.net/lists/arm-kernel/msg671434.html
Is it reasonable to implement -ffixed-x[0-7] flags for argument registers without changing calling convention and emitting warnings when caller/callee have different set of reserved registers? Afterwards, we could generalize to other aarch64 GPRs. Then we could implement a more general attribute, which I imagine shouldn't be too difficult since IR already has a representation for reserved registers.
IIUC, miscompilation can happen if caller and callee have different sets of reserved registers.
This is not the problem, at all. The problem is simply ensuring that the option does something sane. gcc effectively ignores the option in some cases. For example, consider the following with -ffixed-x1:
struct S { int x[100]; };
struct S x, y;
void f() { x=y; }gcc generates the instruction "add x1, x3, :lo12:y", which violates the request to reserve x1. With the current version of the patch, LLVM will do something similar.
So we have the following options:
- Copy gcc's behavior, ignore that it's broken, and wait for someone to file a bug when a new version of clang breaks the Linux kernel.
- Come up with some way to reliably avoid situations like the testcase. Maybe some combination of error messages in situations where the compiler would generate broken code, generating inline implementations instead of libcalls, and avoiding certain optimizations. This is complicated, and I'm not sure it's really worth the effort when exactly one file in the whole world would use the option.
- Don't support -ffixed-x1, and convince the kernel maintainers not to require it.
Emit a warning during call lowering if user requests to reserve an argument register and a function call is made. 
And added test cases.
- Copy gcc's behavior, ignore that it's broken, and wait for someone to file a bug when a new version of clang breaks the Linux kernel.
Current patch will emit a warning during call lowering if user requests to reserve an argument register and a function call is made. This should hopefully cover calls to builtins (like in your example). But as you pointed out there are probably other places where reserved registers can be spuriously used. I'm not opposed to continuously supporting these flags to build Linux kernel.
| lib/Target/AArch64/AArch64RegisterInfo.cpp | ||
|---|---|---|
| 154 | LLVM library code isn't allowed to write directly to stderr; you have to go through LLVMContext::diagnose. If you're going to put user data into an error message, you have to escape it; we don't want to send terminal escape codes stderr. Fix that, and make it an error, and I guess I'm fine with this, assuming it's enough to handle the kernel usage. It's still ugly, but given code that doesn't make any calls, argument registers aren't special, so it shouldn't miscompile. | |
| lib/Target/AArch64/AArch64RegisterInfo.cpp | ||
|---|---|---|
| 154 | As far as -ffixed-x# flags are concerned, this should be enough to support kernel usage. | |
| lib/Target/AArch64/AArch64.td | ||
|---|---|---|
| 75–78 | Can you support reserving all of the registers instead of a subset? | |
| lib/Target/AArch64/AArch64.td | ||
|---|---|---|
| 75–78 | We probably can't reliably reserve some registers, e.g. x16, x17. And we would need more error handling for special usages of x19, x29 (maybe more). But I'd like to keep this change down to x1-7 since those are the ones that will actually be used. | |
It might be good to get one more signoff before merging. Maybe with someone with more experience than me.
Needs a test showing we correctly print the error message for code with calls (with fastisel, globalisel, and sdag isel, for plain calls and runtime library calls like memcpy).
| lib/Target/AArch64/AArch64.td | ||
|---|---|---|
| 75–78 | I've been told that there are more projects than just the linux kernel using -ffixed-reg command line options. That being said, I'm happy for this functionality to be added incrementally, for now just focussing on what happens to be needed for the linux kernel. As long as it's clear that at some point in the future this feature may need to be extended. | |
Added test cases to make sure that correct errors are emitted with fastisel, globalisel, and sdag isel for plain calls and runtime library calls like memcpy.
| lib/Target/AArch64/AArch64.td | ||
|---|---|---|
| 75–78 | Ack | |
| lib/Target/AArch64/AArch64.td | ||
|---|---|---|
| 75–78 | We'd like to use this option as well for Fuchsia but we need the ability to reserve other registers beside x1-7 so as is this change is insufficient for us. | |
| lib/Target/AArch64/AArch64.td | ||
|---|---|---|
| 75–78 | If you have a need, feel free to post a followup patch. Each register needs to be evaluated separately to make sure reserving it actually works. | |
Sorry, I had added you as author via:
$ git commit --amend --author="Tri Vo <trong@google.com>"
before
$ git svn dcommit
but it seems git-svn changed the authorship back to me. It seems it's a common convention to add:
Patch By: <author>
to the commit message to work around this limitation when committing on others behalf. I'm trying to see now if I can revert, then re-land with this note in the commit.
Since you're already making this change, can we support reserving all 32 registers?