Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Time | Test | |
---|---|---|
60,030 ms | x64 debian > libFuzzer.libFuzzer::large.test |
Event Timeline
I looked at the effect of the 2 proposed patches together, and we should add at least 2 more tests (can be added to this commit or separately):
define i1 @shifted_mask_testb_lower32(i64 %a) { %v0 = and i64 %a, 66846720 ; 0xff << 18 %v1 = icmp ne i64 %v0, 0 ret i1 %v1 } define i1 @shifted_mask_testw_lower32(i64 %a) { %v0 = and i64 %a, 131072 ; 0xffff << 1 %v1 = icmp ne i64 %v0, 0 ret i1 %v1 }
The difference for these is that the shifted mask does not extend into the upper 32-bits of the value. We probably want to convert these to use shifts too, and we'll handle that with both patches applied.
This codegens as testl $66846720, %edi ; setne %al without any of our changes which to me seems to be as good as it gets. Using a shift for the "shifted_mask" cases only makes sense when the constant value becomes so big that it no longer fits the 32bit immediates on X86...
I will add the function anyway as an example for something that should not be optimized, does that make sense?
define i1 @shifted_mask_testw_lower32(i64 %a) { %v0 = and i64 %a, 131072 ; 0xffff << 1
I guess this should have been 131070 :)
%v1 = icmp ne i64 %v0, 0 ret i1 %v1}
This ends up being just another variant of the other function where it is best to just use a testl with immediate, so I think I can skip this.
The difference for these is that the shifted mask does not extend into the upper 32-bits of the value. We probably want to convert these to use shifts too, and we'll handle that with both patches applied.
I think those should keep using testl + immediate.
Oops - yes, I missed dropping the last bit.
The difference for these is that the shifted mask does not extend into the upper 32-bits of the value. We probably want to convert these to use shifts too, and we'll handle that with both patches applied.
I think those should keep using testl + immediate.
I'm not sure if there's a universal answer. As you mentioned, it may depend on throughput vs. saving on instruction size. Either way, the tests can be there to show current/expected codegen.