Hi.
First, I hope you are fine and the same for your relatives.
I wrote a patch which solves the issue 17128.
The goal of this patch is to replace such snippet:
int cttz(unsigned long x){ unsigned long i = 0; while(i < 64 && (((x >> i) & 0x1) == 0)) i++; return i; }
by calls to llvm cttz intrinsic which can then be translated to the corresponding assembly instruction, if the architecture has one.
In my case, the intrinsic was replaced by bfsq instruction.
To confirm my results, I wrote cttz.ll test to confirm the patch works and I ran the the check-llvm-unit and check-llvm targets.
The second gave me one failure for Bindings/Go/go.test:
# llvm.org/llvm/bindings/go/llvm.test /usr/lib/go-1.11/pkg/tool/linux_amd64/link: running /usr/bin/c++ failed: exit status 1 ld.lld: error: unknown --compress-debug-sections value: zlib-gnu collect2: error: ld returned 1 exit status
I do not think this problem is related to my patch but rather to my configuration.
I also quickly benchmarked my modifications.
First, I measured the compilation time of this program by compiling it 100 times using this command clang -O3 -S -emit-llvm and time as measuring tool:
#include <stdlib.h> int cttz(unsigned long x){ unsigned long i = 0; while(i < 64 && (((x >> i) & 0x1) == 0)) i++; return i; } int main(void){ int bits_field; int first_set; bits_field = rand(); first_set = cttz(bits_field); return first_set; }
The results are the following (in second):
100 compilations | mean for 1 compilation | |
without patch | 20.54 | .21 |
with patch | 16.19 | .17 |
So, the patch reduces compilation time of approximately 23%.
However, I am not really sure of this result as I first though that the modification would make the compilation slower.
Maybe it is quicker due to the loop removing and then not having to optimize it.
Then, I measure the performance of the generated code by running the above code 10000 times using time as measuring tool, the results are as follows (in millisecond):
10000 runs | mean for 1 run | |
without patch | 8730 | .873 |
with patch | 8220 | .822 |
So, the patch reduces code execution time by around 6%.
If you see any way to improve the patch or mistake I made, feel free to share.
Best regards.
Drop the comment about jdoerfert. Its fine for a commit message, but doesn't add anything to the understanding of the code.