If target ABI requires coercion to a larger type, higher bits of the
resulting value are supposed to be undefined. However, before this
patch Clang CG used to generate a zext instruction to coerce a value
to a larger type, forcing higher bits to zero.
This is problematic in some cases:
struct st { int i; }; struct st foo(i); struct st bar(int x) { return foo(x); }
For AArch64 Clang generates the following LLVM IR:
define i64 @bar(i32 %x) { %call = call i64 @foo(i32 %0) %coerce.val.ii = trunc i64 %call to i32 ;; ... store to alloca and load back %coerce.val.ii2 = zext i32 %1 to i64 ret i64 %coerce.val.ii2 }
Coercion is done with a trunc and a zext. After optimizations we
get the following:
define i64 @bar(i32 %x) local_unnamed_addr #0 { entry: %call = tail call i64 @foo(i32 %x) %coerce.val.ii2 = and i64 %call, 4294967295 ret i64 %coerce.val.ii2 }
The compiler has to keep semantic of the zext instruction, even
though no extension or truncation is required in this case.
This extra and instruction also prevents tail call optimization.
In order to keep information about undefined higher bits, the patch
replaces zext with a sequence of an insertelement and a bitcast:
define i64 @_Z3bari(i32 %x) local_unnamed_addr #0 { entry: %call = tail call i64 @_Z3fooi(i32 %x) #2 %coerce.val.ii = trunc i64 %call to i32 %coerce.val.vec = insertelement <2 x i32> undef, i32 %coerce.val.ii, i8 0 %coerce.val.vec.ii = bitcast <2 x i32> %coerce.val.vec to i64 ret i64 %coerce.val.vec.ii }
InstCombiner can then fold this sequence into a nop, and allow tail
call optimization (see D100227).