Page MenuHomePhabricator

[Clang][AArch64] Disable rounding of return values for AArch64
ClosedPublic

Authored by asavonic on Apr 15 2021, 12:05 PM.

Details

Summary

If a return value is explicitly rounded to 64 bits, an additional
zext instruction is emitted, and in some cases it prevents tail call
optimization.

As discussed in D100225, this rounding is not necessary and can be
disabled.

Diff Detail

Event Timeline

asavonic created this revision.Apr 15 2021, 12:05 PM
asavonic requested review of this revision.Apr 15 2021, 12:05 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 15 2021, 12:05 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
asavonic added inline comments.Apr 15 2021, 12:07 PM
clang/test/CodeGen/arm64-arguments.c
65–76

I'm not sure if i24 here is a problem or not. Let me know if we need to handle this differently.

Hmm. I think the right thing to do here is to recognize generally that we're emitting a mandatory tail call, and so suppress *all* the normal transformations on the return value. The conditions on mandatory tail calls should make that possible, and it seems like it would be necessary for a lot of types. Aggregates especially come to mind — if an aggregate is returned in registers, we're probably going to generate code like

%0 = alloca %struct.foo
%1 = call {i64,i64} @function()
%2 = bitcast %0 to {i64,i64}*
store %1, %2
%3 = bitcast %0 to {i64,i64}*
%4 = load %3
ret %4

(Actually, probably much worse, with a lot of extract_values and so on.) I assume that is going to completely break TCO, and we really need to generate

%0 = call {i64,i64} @function()
ret %0

The *only* way we can do that is to recognize that the call has to be done differently in IRGen.

I think the right thing to do here is to recognize generally that we're emitting a mandatory tail call, and so suppress *all* the normal transformations on the return value.

I assume it can be tricky to detect such call. The final decision (tail call vs normal call) is made before instruction selection, after all LLVM IR optimization passes. So we can miss tail calls that are not obvious on non-optimized code, or get false-positive results for calls that a backend decides to emit as normal calls.

In any case, this patch can be useful not only for tail calls: trunc + zext sequence generated to round a return value can be problematic for other cases as well.

I think the right thing to do here is to recognize generally that we're emitting a mandatory tail call, and so suppress *all* the normal transformations on the return value.

I assume it can be tricky to detect such call. The final decision (tail call vs normal call) is made before instruction selection, after all LLVM IR optimization passes. So we can miss tail calls that are not obvious on non-optimized code, or get false-positive results for calls that a backend decides to emit as normal calls.

Well, I mean in the frontend. I certainly wouldn't expect the backend to recognize the pattern I described and somehow turn it into a tail call!

In any case, this patch can be useful not only for tail calls: trunc + zext sequence generated to round a return value can be problematic for other cases as well.

Sure, I can imagine that it's hard to eliminate the extra zext in the backend. Maybe we should have an undef_extend?

You should get backend sign-off before making Swift generate non-target-legal return types.

Ping.
Please let me know if the patch is acceptable for AArch64, or something else should be done to avoid overhead from rounding of return values.

On big-endian targets the rounding up to 64-bits (specified in the AAPCS) is significant; it means that structs get passed in the high bits of x0 rather than low. E.g. https://godbolt.org/z/6v36oexsW. I think this patch would break that.

On big-endian targets the rounding up to 64-bits (specified in the AAPCS) is significant; it means that structs get passed in the high bits of x0 rather than low. E.g. https://godbolt.org/z/6v36oexsW. I think this patch would break that.

Thanks a lot! I've disabled the change for big-endian AArch64 targets.

asavonic updated this revision to Diff 340517.Apr 26 2021, 7:21 AM

Keep rounding of return values for big-endian targets.

t.p.northover accepted this revision.Apr 27 2021, 6:00 AM

Thanks for updating it. A little disappointing that we can't support BE first-class, but much more important that it's not broken and it's not actually that common. So I think this is OK now, the backend should cope fine with the oddly sized types.

This revision is now accepted and ready to land.Apr 27 2021, 6:00 AM