Hi David, Hal, Quentin (and all),
If we know that the control flow is modelling an if-statement where the only instruction in 'then' basic block (excluding the terminator) is a call to cttz/ctlz, it may be beneficial to speculate the cttz/ctlz call and let SimplifyCFG convert the associated phi instruction in the 'end' basic block into a select.
Example:
;;
entry:
%cmp = icmp eq i64 %Val, 0 br i1 %cmp, label %end.bb, label %then.bb
then.bb:
%c = tail call i64 @llvm.cttz.i64(i64 %val, i1 true) br label %EndBB
end.bb:
%cond = phi i64 [ %c, %then.bb ], [ 64, %entry]
;;
The call to @llvm.cttz.i64 could be speculated. This would allow to fold the entire code sequence into:
%cmp = icmp eq i64 %Val, 0 %c = tail call i64 @llvm.cttz.i64(i64 %val, i1 false) %cond = select i1 %cmp, i64 64, i64 %c
The constraints are:
 a) The 'then' basic block is taken only if the input operand to the cttz/ctlz is different than zero;
 b) The phi node propagates the size-of (in bits) of the value %val in input to the cttz/ctlz if %val is zero.
If all these constraints are met, the optimizer can hoist the call to cttz/ctlz from the 'then' basic block into the 'entry' basic block. The phi instruction would then be replaced by a select statement.
The new cttz/ctlz instruction will also have the 'undef on zero' flag set to 'false'. This would allow the instruction combiner to further simplify the code by getting rid of the redundant select.
The IR from the example can be obtained from the following code:
///
unsigned long long foo(unsigned long long A) {
return A ? __builtin_ctzll(A) : 64; }
///
On X86-64 targets with feature TZCNT, this patch would allow the backend to generate optimal assembly code for function 'foo':
tzcntq %rdi, %rax retq
On X86-64 targets with no TZCNT, the call to cttz with the 'undef on zero' flag cleared would be expanded into a longer sequence involving a conditional move instruction:
bsfq %rdi, %rcx movl $64, %eax cmovneq %rc, %rax popq %rbp retq
On X86 (not x86-64) targets with no LZCNT/TZCNT and no CMOV instructions, the backend would futher expand the conditional moves introducing machine basic blocks. Basically it would revert this optimization re-introducing the if-else structure.
My only questions are: is SimplifyCFG the correct place where to put this logic? If not, then where do you think I should put it?
Please, let me know what you think.
Thanks,
Andrea
This could be:
if (match(ThenV, m_Intrinsic<Intrinsic::cttz>(Op0) || match(ThenV, m_Intrinsic<Intrinsic::ctlz>(Op0))