This is an archive of the discontinued LLVM Phabricator instance.

[X86] For lzcnt/tzcnt intrinsics use cttz/ctlz intrinsics with zero_undef flag set to false.
ClosedPublic

Authored by craig.topper on Sep 22 2018, 10:18 AM.

Details

Summary

Previously we used a select and the zero_undef=true intrinsic. In -O2 this pattern will get optimized to zero_undef=false. But in -O0 this optimization won't happen. This results in a compare and cmov being wrapped around a tzcnt/lzcnt instruction.

By using the zero_undef=false intrinsic directly without the select, we can improve the -O0 codegen to just an lzcnt/tzcnt instruction.

Diff Detail

Event Timeline

craig.topper created this revision.Sep 22 2018, 10:18 AM

Are there other targets that would benefit from this and if so should we provide a more generic intrinsic?

The only other header that uses the existing builtins is arm_acle.h. But
ARM returns false in isCLZForZeroUndef. So they should be creating the
cttz/ctlz intrinsics with false for the second argument from
__builtin_clz/ctz. The sanitizer code
in CodeGenFunction::EmitCheckedArgForBuiltin also checks isCLZForZeroUndef
to determine if it should emit a runtime check to flag 0 as a sanitizer
error.

~Craig

RKSimon accepted this revision.Sep 26 2018, 5:03 AM

LGTM, thanks for checking.

Please update the *-intrinsics-fast-isel.ll llvm test cases to match the *-builtins.c changes.

This revision is now accepted and ready to land.Sep 26 2018, 5:03 AM
This revision was automatically updated to reflect the committed changes.