Page MenuHomePhabricator

AMDGPU : Custom lowering constrained fps.
Needs RevisionPublic

Authored by wdng on Oct 6 2017, 9:38 AM.



This patch only shows a way how to custom lowering the constrained fma operation.

What does this patch do:

  1. Expand SDNodeFlags APIs to set up SDNodeFlags at the initial DAG build phase when reading the constrained fps metadata data.
  2. AMDGPU backend sets up resister modes based on retrieved SDNodeFlags.

Diff Detail


Event Timeline

wdng created this revision.Oct 6 2017, 9:38 AM
wdng updated this revision to Diff 118028.Oct 6 2017, 9:56 AM

Fixed format issue.

andrew.w.kaylor requested changes to this revision.Oct 6 2017, 12:09 PM
andrew.w.kaylor added inline comments.
368 ↗(On Diff #118028)

I don't really like the fact that these are separate flags, given that they're mutually exclusive.

Also, I think we're eventually going to need to be able to distinguish between assumed rounding modes (where the instruction encoding isn't expected to include the rounding mode) and forced rounding modes (where the rounding mode will be encoded in the instruction). I don't have a specific vision for how that will need to work, but I know there are instructions that work this way and we'll need to handle at least intrinsics that use them. As I recall someone at AMD mentioned wanting behavior like that for flush-to-zero also.

The currently documented behavior of the constrained FP intrinsics is that the rounding mode tells the optimizer what it may assume about the rounding mode at the intrinsic location. Something else must have been done to set the rounding mode. If you are lowering to instructions that include a rounding mode, how do you handle the RoundDynamic case?

384 ↗(On Diff #118028)

I don't think all of the rounding modes can default to false. Maybe you need a RoundDefault option. For instance, if SDNodeFlags::isDefined() returns true, but the node doesn't have a constrained rounding mode I'd need to check four different flags and then make an assumption to see that.

510 ↗(On Diff #118028)

I think you need some logic here to set the rounding mode to dynamic if the flags being intersected conflict.

515 ↗(On Diff #118028)

There needs to be a hierarchy here. For instance, if you merge ExceptIgnore with ExceptStrict it should result in ExceptStrict.


This change will fail on all platforms except AMDGPU as you have this patch written. If you need target-specific behavior here, we'll need a target hook of some kind.


All of the intrinsics above from fadd to fma fall through into this code. That's obviously not what you intended.


What about the other rounding modes?


If you're going to make the changes in this patch, you need at least reasonable default behavior for all other platforms.

This revision now requires changes to proceed.Oct 6 2017, 12:09 PM
rampitec added inline comments.Oct 9 2017, 3:17 PM
368 ↗(On Diff #118028)

As these flags only apply to the ISD::STRICT_* opcodes you probably do not need to add fields to a generic SDNodeFlags. STRICT_* opcodes can get an extra operand for rounding mode.


Looks like a good place to check for TLI.getOperationAction on the incoming Opcode and keep it as is if it is custom.

wdng updated this revision to Diff 119184.EditedOct 16 2017, 11:35 AM
wdng edited edge metadata.

Instead of using SDNodeFlags to store metadata information, this patch directly appends an extra operand for rounding mode during the DAG build phase based on Stats's suggestion. This patch currently implements strict fps for fadd, fsub, fmul, fma, and fsqrt. Thanks a lot for @andrew.w.kaylor and @rampitec comments for this!

Known issues:

  1. FDIV case is a special case, a separate patch will be created for it.
  2. Currently, we don't take care of the "round.dynamic" rounding mode for the time being.
  3. Will create a separate patch for f16 data type.
wdng marked 9 inline comments as done.Oct 16 2017, 11:36 AM
wdng retitled this revision from AMDGPU : Expand SDNodeFlags APIs & custom lowering constrained fps. to AMDGPU : Custom lowering constrained fps..
rampitec added inline comments.Oct 16 2017, 2:35 PM

You have removed it but declaration remains.


What about f16?


Use of "if(cond) ... else .. cond ? :" is weird.




You do not handle every one of that.


These commented lines not needed.




You are already doing translation, you can remove all the switches translating EqOpc into the same with chain. Just use it here.


OK, you have chained all the nodes which require to reside within two s_setreg statements. How do you prevent any other regular fp operations without a chain to be scheduled in between of them?

kzhuravl requested changes to this revision.Oct 17 2017, 2:09 PM
kzhuravl added inline comments.

These should go to relevant enums (like Id, Offset, WidthMinusOne, etc.).


Rename WidthBit to "Offset".


Remove. This can go to default?


New line.


Missing ID_SHIFT_.


What is 1? Do not use bare numbers.

This revision now requires changes to proceed.Oct 17 2017, 2:09 PM

It isn't clear to me how your custom lowering interacts, if at all, with existing table-driven selection patterns. One of the goals in the implementation up to this point has been to have the instruction selection fall back on existing pattern matching as much as possible so that we don't need to duplicate all of the cases that are currently handled. Can you explain to me how this applies in the AMDGPU case?


You have upward and downward reversed.


I think you're interpreting the rounding mode argument differently than I have, and therefore differently than the documentation in the LLVM Language Reference ("therefore" because I wrote the documentation). My intention was that the rounding mode argument was provided as information to the optimizer. It tells the optimizer what it can assume about rounding mode at the point of the operation. It was not intended to actually set the rounding mode.

I'm approaching this from the perspective of the STDC pragmas related to the FP environment. My understanding of these is that if FENV_ACCESS on is declared, we must assume dynamic (i.e. unknown) rounding mode in those scopes unless we can prove otherwise, but if the user wants to change the rounding mode a specific function call (such as fesetround) will be used.

I'm not sure what sort of front end you are assuming here, so that may explain the difference in your approach.

There are some x86 instructions that can incorporate a rounding mode operand, and it is my understanding that the AMDGPU architecture has similar needs. However, I believe we will need to extend the constrained FP intrinsics (or possibly introduce new intrinsics to handle cases like that.