For now llvm intrinsic ctlz/cttz are supported by extension zbb and xtheadbb,
and both extensions support returning well-defined results for zero inputs.
It's possible to set isCLZForZeroUndef flag to false by default on RISCV targets
as ARM and other target does.
Details
- Reviewers
asb craig.topper kito-cheng
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
I made the flag isCLZForZeroUndef set to false only when target support extension zbb or xtheadbb, and this will prevent making codegen worse when target does not support abb or xtheadbb.
Could you please post a separate patch that has a test that will show the codegen change (and demonstrate how it is unchanged when zbb or xtheadbb)?
Sure. I post a patch to demonstrate current codegen behavior for clz/ctz intrinsics when has ebb or xtheadbb: https://reviews.llvm.org/D152250.
From the C language perspective with this change, __builtin_clz/ctz is still considered undefined for 0 and code that uses it is ill-formed. isCLZForZeroUndef is only intended to prevent the middle end from optimizing based on the undefinedness and creating surprises. See also https://discourse.llvm.org/t/should-ubsan-detect-0-input-to-builtin-clz-ctz-regardless-of-target/71060
I see builtin_clz/ctz returning an undefined value for 0 input matches gcc's document, but when I test builtin_clz/ctz with 0 input on gcc, it returns a valid value rather than an undefined value. It looks gcc does not follow gcc's document. I'm not sure which one is better that match document to return undefined for 0, or match gcc's behavior to return defined value?
From what I can see in the assembly here https://godbolt.org/z/s4qqz83EK, gcc's undefined behavior sanitizer does consider an input of 0 to be undefined.