If the following simple program is compiled with LTO the call to foobar() will not be tailcall optimized. This is because the tailcall elimination pass is only ran in the initial compilation step. This means link-time inlining is not visible to it.
------------ 1.c ---------------- extern void foobar(void); extern void bar(int *); void foo() { int a[10]; bar(a); foobar(); } -------------------------------- ------------ 2.c ---------------- void bar(int *p) { *p = 10; } --------------------------------
$ clang -flto 1.c 2.c -c -O2
$ llvm-lto 1.o 2.o --exported-symbol=foo -save-merged-module -o 3.o
$ llvm-dis 3.o.merged.bc -o -
... ; Function Attrs: nounwind uwtable define dso_local void @foo() local_unnamed_addr #0 { entry: call void @foobar() #2 ret void } ...
Even without link-time inlining, LTO may be able to perform additional tailcall optimization due to the visibility of the nocapture attribute. For example, if the program above is modified to make bar() noinline, foobar() can still be tailcalled as the parameter to bar() is marked nocapture:
; Function Attrs: noinline norecurse nounwind uwtable writeonly define internal fastcc void @bar(i32* nocapture %p) unnamed_addr #3 { entry: store i32 10, i32* %p, align 4, !tbaa !4 ret void }
(Before D53519, this case would not have been optimized due to the lifetime markers.)