Transform
pow(C,x)
To
exp2(log2(C)*x)
if C > 0, C != inf, C != NaN (and C is not power of 2, since we have some fold for such case already).
log(C) is folded by the compiler and exp2 is much faster to compute than pow.
Differential D64099
[InstCombine] pow(C,x) -> exp2(log2(C)*x) xbolva00 on Jul 2 2019, 2:06 PM. Authored by
Details Transform To if C > 0, C != inf, C != NaN (and C is not power of 2, since we have some fold for such case already). log(C) is folded by the compiler and exp2 is much faster to compute than pow.
Diff Detail Event TimelineComment Actions I'm a little wary about fold the case where doesNotAccessMemory is false, but I guess it's likely okay. I'm guessing it doesn't matter whether you use exp or exp2 here? I guess the performance is probably similar, but it isn't obvious to me. Should we specifically have test coverage for transforming pow(10, x) to exp?
Comment Actions Since using exp2(log()) was mathematically incorrect, have you run any benchmark that validates the results, such SPEC CPU2006 or CPU2017? If so, what kind of improvement did it register? Comment Actions I tested the first patch and the second revision with change to exp2 locally with simple examples to check for correctness and performance - and yes, I didn’t check previous revision with folding so a mistake was made. I have no access to the SPEC benchmarks. Comment Actions What correctness tests did you run? If necessary, I can run SPEC to validate this patch at least on x86-64. Comment Actions GCC has exp variant of this transformation so I checked their preconditions - unsafe math. My first patch had isFast() check but @efriedma requested to specify it better. I think afn/ninf/nnan is all we need. I wrote some small tests and compared printed results between Clang and GCC. I don’t think we could formally prove this transformation with Alive. So, If you can test this patch with SPEC, please do - Thanks! Comment Actions It was actually harder to find a SPEC benchmark that triggered this case. As a matter of fact, it shows up only in CPU2017's 525.x264_r. But I can confirm that it works successfully. Thank you for this patch.
|
Please, update this comment.