The simple case of:
int *callee();
void *caller(void *a) {
if (a == NULL)
return callee();
return a;
}would generate a regular call instead of a tail call because we don't
look through the bitcast of the call to callee when duplicating the
return blocks.
There is a peekThroughBitcast() util function in instcombine. Might be worth lifting to some common location, so we don't have to repeat it in other IR passes.