OpenCL loses fast math information by going through libcall wrappers
around intrinsics.
Do this to preserve call site flags which are lost when inlining. It's
not safe in general to propagate flags during inline, so avoid dealing
with this by just special casing some of the useful calls.
deferring the log handling since this exposed some failures when using fast math for denormal values (apparently the conformance test enforces some accuracy limit with fast math?)