Page MenuHomePhabricator

[Clang][AArch64] Coerce integer return values through an undef vector
Needs ReviewPublic

Authored by asavonic on Apr 9 2021, 12:59 PM.

Details

Summary

If target ABI requires coercion to a larger type, higher bits of the
resulting value are supposed to be undefined. However, before this
patch Clang CG used to generate a zext instruction to coerce a value
to a larger type, forcing higher bits to zero.

This is problematic in some cases:

struct st {
  int i;
};
struct st foo(i);
struct st bar(int x) {
  return foo(x);
}

For AArch64 Clang generates the following LLVM IR:

define i64 @bar(i32 %x) {
  %call = call i64 @foo(i32 %0)
  %coerce.val.ii = trunc i64 %call to i32
  ;; ... store to alloca and load back
  %coerce.val.ii2 = zext i32 %1 to i64
  ret i64 %coerce.val.ii2
}

Coercion is done with a trunc and a zext. After optimizations we
get the following:

define i64 @bar(i32 %x) local_unnamed_addr #0 {
entry:
  %call = tail call i64 @foo(i32 %x)
  %coerce.val.ii2 = and i64 %call, 4294967295
  ret i64 %coerce.val.ii2
}

The compiler has to keep semantic of the zext instruction, even
though no extension or truncation is required in this case.
This extra and instruction also prevents tail call optimization.

In order to keep information about undefined higher bits, the patch
replaces zext with a sequence of an insertelement and a bitcast:

define i64 @_Z3bari(i32 %x) local_unnamed_addr #0 {
entry:
  %call = tail call i64 @_Z3fooi(i32 %x) #2
  %coerce.val.ii = trunc i64 %call to i32
  %coerce.val.vec = insertelement <2 x i32> undef, i32 %coerce.val.ii, i8 0
  %coerce.val.vec.ii = bitcast <2 x i32> %coerce.val.vec to i64
  ret i64 %coerce.val.vec.ii
}

InstCombiner can then fold this sequence into a nop, and allow tail
call optimization (see D100227).

Diff Detail

Event Timeline

asavonic created this revision.Apr 9 2021, 12:59 PM
asavonic requested review of this revision.Apr 9 2021, 12:59 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 9 2021, 12:59 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
asavonic edited the summary of this revision. (Show Details)Apr 11 2021, 8:08 AM
asavonic added reviewers: rjmccall, dmgreen.

Why does the ABI "require" this to be returned as an i64 if some of the bits are undefined?

Why does the ABI "require" this to be returned as an i64 if some of the bits are undefined?

AArch64 ABI requires return values (of composite types) to be rounded up to 64 bits (see AArch64ABIInfo::classifyReturnType). I assume that if a value, say i16, is rounded up to i64, then the upper 48 bits can be arbitrary (undefined). I think this is aligned with the description of CreateCoercedLoad:

/// CreateCoercedLoad - Create a load from \arg SrcPtr interpreted as
/// a pointer to an object of type \arg Ty, known to be aligned to
/// \arg SrcAlign bytes.
///
/// This safely handles the case when the src type is smaller than the
/// destination type; in this situation the values of bits which not
/// present in the src are undefined.

AArch64 only has 64-bit integer registers, so of course the algorithm is specified that way. LLVM could nonetheless choose to return an i32.

So we can just remove this rounding from classifyReturnType?
Thanks a lot John! I will upload this change as a separate review.

You should probably talk it over with AArch64 backend folks, but yes, abstractly it seems reasonable. CC'ing Tim Northover.

After reading the summary/intent of the patch, I thought the same thing as @rjmccall. Simply returning an i32 for the above example and removing the rounding-up seems right to me.