If a return value is explicitly rounded to 64 bits, an additional
zext instruction is emitted, and in some cases it prevents tail call
optimization.
As discussed in D100225, this rounding is not necessary and can be
disabled.
Paths
| Differential D100591
[Clang][AArch64] Disable rounding of return values for AArch64 ClosedPublic Authored by asavonic on Apr 15 2021, 12:05 PM.
Details Summary If a return value is explicitly rounded to 64 bits, an additional As discussed in D100225, this rounding is not necessary and can be
Diff Detail
Event TimelineHerald added subscribers: mstorsjo, danielkiss, kristof.beyls. · View Herald TranscriptApr 15 2021, 12:05 PM
Comment Actions Hmm. I think the right thing to do here is to recognize generally that we're emitting a mandatory tail call, and so suppress *all* the normal transformations on the return value. The conditions on mandatory tail calls should make that possible, and it seems like it would be necessary for a lot of types. Aggregates especially come to mind — if an aggregate is returned in registers, we're probably going to generate code like %0 = alloca %struct.foo %1 = call {i64,i64} @function() %2 = bitcast %0 to {i64,i64}* store %1, %2 %3 = bitcast %0 to {i64,i64}* %4 = load %3 ret %4 (Actually, probably much worse, with a lot of extract_values and so on.) I assume that is going to completely break TCO, and we really need to generate %0 = call {i64,i64} @function() ret %0 The *only* way we can do that is to recognize that the call has to be done differently in IRGen. Comment Actions
I assume it can be tricky to detect such call. The final decision (tail call vs normal call) is made before instruction selection, after all LLVM IR optimization passes. So we can miss tail calls that are not obvious on non-optimized code, or get false-positive results for calls that a backend decides to emit as normal calls. In any case, this patch can be useful not only for tail calls: trunc + zext sequence generated to round a return value can be problematic for other cases as well. Comment Actions
Well, I mean in the frontend. I certainly wouldn't expect the backend to recognize the pattern I described and somehow turn it into a tail call!
Sure, I can imagine that it's hard to eliminate the extra zext in the backend. Maybe we should have an undef_extend? You should get backend sign-off before making Swift generate non-target-legal return types. Comment Actions Ping. Comment Actions On big-endian targets the rounding up to 64-bits (specified in the AAPCS) is significant; it means that structs get passed in the high bits of x0 rather than low. E.g. https://godbolt.org/z/6v36oexsW. I think this patch would break that. Comment Actions
Thanks a lot! I've disabled the change for big-endian AArch64 targets. Comment Actions Thanks for updating it. A little disappointing that we can't support BE first-class, but much more important that it's not broken and it's not actually that common. So I think this is OK now, the backend should cope fine with the oddly sized types. This revision is now accepted and ready to land.Apr 27 2021, 6:00 AM Closed by commit rGb451ecd86e13: [Clang][AArch64] Disable rounding of return values for AArch64 (authored by asavonic). · Explain WhyMay 4 2021, 10:29 AM This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 342794 clang/lib/CodeGen/TargetInfo.cpp
clang/test/CodeGen/aarch64-varargs.c
clang/test/CodeGen/arm64-arguments.c
clang/test/CodeGen/arm64-microsoft-arguments.cpp
clang/test/CodeGen/attr-noundef.cpp
clang/test/CodeGenCXX/microsoft-abi-sret-and-byval.cpp
clang/test/CodeGenCXX/trivial_abi.cpp
|
I'm not sure if i24 here is a problem or not. Let me know if we need to handle this differently.