This adds the generic FMA utilities for the GPU. We implement these
through the builtins which map to the FMA instructions in the ISA. These
may not have strict compliance with other assumptions in the the libc
such as rounding modes. I've included the relevant information on how
the GPU vendors map the behaviour. This should help make it easier to
implement some future generic versions.
Depends on D152486
I don't really see the point of documenting this here. It's a weird place to give platform specifics and FMA is about as well defined as you can get