We call tail-call-elim near the beginning of the pipeline, but that is too early to annotate calls that get added later.
In the motivating case from issue #47852, the missing 'tail' on memset leads to sub-optimal codegen.
I experimented with removing the early instance of tail-call-elim instead of just adding another pass, but that appears to be slightly worse for compile-time: +0.15% vs. +0.08% time.
"tailcall" shows adding the pass; "tailcall2" shows moving the pass to later, then adding the original early pass back (so 1596886802 is functionally equivalent to 180b0439dc ):
https://llvm-compile-time-tracker.com/index.php?config=NewPM-O3&stat=instructions&remote=rotateright