This patch only shows a way how to custom lowering the constrained fma operation.
What does this patch do:
- Expand SDNodeFlags APIs to set up SDNodeFlags at the initial DAG build phase when reading the constrained fps metadata data.
- AMDGPU backend sets up resister modes based on retrieved SDNodeFlags.
I don't really like the fact that these are separate flags, given that they're mutually exclusive.
Also, I think we're eventually going to need to be able to distinguish between assumed rounding modes (where the instruction encoding isn't expected to include the rounding mode) and forced rounding modes (where the rounding mode will be encoded in the instruction). I don't have a specific vision for how that will need to work, but I know there are instructions that work this way and we'll need to handle at least intrinsics that use them. As I recall someone at AMD mentioned wanting behavior like that for flush-to-zero also.
The currently documented behavior of the constrained FP intrinsics is that the rounding mode tells the optimizer what it may assume about the rounding mode at the intrinsic location. Something else must have been done to set the rounding mode. If you are lowering to instructions that include a rounding mode, how do you handle the RoundDynamic case?