The wrappers aren't buying anything and just add complexity.
We do not propagate fast math flags into the linked library functions,
but they'll naturally be applied if the intrinsic is directly emitted.
We also don't need to treat the native case differently, since we just
directly select the generic intrinsic anyway.
f64 case requires a backend change, so defer that for now.