This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU : Add S_SETREG instructions to fix fdiv precision issues.
ClosedPublic

Authored by tstellarAMD on Nov 8 2016, 2:36 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

wdng updated this revision to Diff 77265.Nov 8 2016, 2:36 PM
wdng retitled this revision from to AMDGPU : Enable/Disable S_SETREG in FDIV32..
wdng updated this object.
wdng added reviewers: arsenm, tstellarAMD.
wdng set the repository for this revision to rL LLVM.

Remove commented out code and add a test

wdng updated this revision to Diff 77559.Nov 10 2016, 3:07 PM
wdng updated this object.
wdng edited edge metadata.

This patch is not finished yet, which is used only for temporary code reviewing.

wdng updated this revision to Diff 77570.Nov 10 2016, 4:11 PM
wdng edited edge metadata.

Not finished yet, for temporary code reviews only.

wdng updated this revision to Diff 79972.Dec 1 2016, 12:44 PM
wdng updated this object.
wdng added a reviewer: b-sumner.

Add s_setreg instructions for fdiv to fix the precision issues.

wdng retitled this revision from AMDGPU : Enable/Disable S_SETREG in FDIV32. to AMDGPU : Add S_SETREG instructions to fix fdiv precision issues..Dec 1 2016, 12:46 PM
wdng edited edge metadata.
arsenm edited edge metadata.Dec 1 2016, 5:29 PM

This needs tests.

Should the old division path without this be left around?

lib/Target/AMDGPU/AMDGPUISelLowering.h
227–229 ↗(On Diff #77265)

These should be FMA_W_CHAIN, FMUL_W_CHAIN

lib/Target/AMDGPU/SIISelLowering.cpp
2775 ↗(On Diff #79972)

The constant should be a bitmask formed from the enums for the fields you are setting rather than the magic numbers

2780 ↗(On Diff #79972)

These lines go over 80 columns

2823 ↗(On Diff #77265)

You don't need any of the getValue(0)s

lib/Target/AMDGPU/SOPInstructions.td
593 ↗(On Diff #77265)

This shouldn't have isBarrier set

598 ↗(On Diff #77265)

You can move the hasSideEffects here instead of the let block since it's just the one instruction

lib/Target/AMDGPU/VOP3Instructions.td
224–226 ↗(On Diff #77265)

There should be a pattern which uses the complex pattern for the source modifiers

wdng updated this revision to Diff 80032.Dec 2 2016, 12:00 AM
wdng edited edge metadata.
wdng marked 6 inline comments as done.

Address review feedback.

tstellarAMD added inline comments.Dec 2 2016, 7:40 AM
lib/Target/AMDGPU/SIISelLowering.cpp
2826–2831 ↗(On Diff #80032)

The indentation here and in the rest of the block looks wrong.

lib/Target/AMDGPU/SIISelLowering.h
40–41 ↗(On Diff #80032)

Does the LowerFDIV32 function still exist? Is it used anymore?

lib/Target/AMDGPU/VOP3Instructions.td
224–235 ↗(On Diff #80032)

These are dead patterns since we are custom selecting the SDNodes.

test/CodeGen/AMDGPU/fdiv_setreg_chain.ll
6 ↗(On Diff #80032)

We should check the whole s_setreg_imm32 here to make sure we get the correct operands.

13 ↗(On Diff #80032)

Same thing here.

wdng updated this revision to Diff 80093.Dec 2 2016, 10:32 AM
wdng edited edge metadata.
wdng marked 5 inline comments as done.

Address feedback reviews.

lib/Target/AMDGPU/SIISelLowering.cpp
2779–2780 ↗(On Diff #80093)

These magic number still need to be replaced with enum values.

2825 ↗(On Diff #80093)

Another magic number here.

2843 ↗(On Diff #80093)

Extra whitespace change.

lib/Target/AMDGPU/VOP3Instructions.td
222–225 ↗(On Diff #80093)

Extra whitespace changes.

arsenm added inline comments.Dec 2 2016, 10:47 AM
test/CodeGen/AMDGPU/fdiv_setreg_chain.ll
16–20 ↗(On Diff #80093)

This should go in the existing fdiv test (which I would expect is already failing with this change)

16–20 ↗(On Diff #80093)

Also I think this should be run with SI/CI +/- denormals, and VI +/- denormals

wdng marked 2 inline comments as done.Dec 2 2016, 1:12 PM
wdng added inline comments.
test/CodeGen/AMDGPU/fdiv_setreg_chain.ll
16–20 ↗(On Diff #80093)

Is there a flag that we can use to control +/- denormals?

wdng added a reviewer: cfang.Dec 3 2016, 12:24 AM
tstellarAMD commandeered this revision.Dec 5 2016, 1:29 PM
tstellarAMD edited reviewers, added: wdng; removed: tstellarAMD.
  • Only write necessary bits.
  • Merge testcase into fdiv.ll
  • Remove magic numbers
  • Don't use s_setreg when denormals are enabled.
arsenm added inline comments.Dec 5 2016, 2:33 PM
lib/Target/AMDGPU/AMDGPUISelLowering.h
234–236 ↗(On Diff #80320)

I would only put the comment once for the block of the 2 instructions

lib/Target/AMDGPU/SIISelLowering.cpp
2967–2969 ↗(On Diff #80320)

We should probably not use target constant here and teach FoldImmediate to turn the register setreg into the immediate setreg to save the code size on multiple uses of the immediate, like will happen in the unrolled vector case

tstellarAMD marked an inline comment as done.
  • Select to s_setreg_b32 and fold immediates in SIFoldOperands
arsenm added inline comments.Dec 5 2016, 5:39 PM
lib/Target/AMDGPU/SIFoldOperands.cpp
178–182 ↗(On Diff #80356)

Changing the instruction back if it wasn't an immediate looks broken to me. I would expect it wouldn't need to fold register copies at all, and PeepholeOptimizer would take care of it.

  • Only change s_setreg opcode when immediate folding is known to succeed.
arsenm accepted this revision.Dec 6 2016, 2:42 PM
arsenm edited edge metadata.

LGTM

This revision is now accepted and ready to land.Dec 6 2016, 2:42 PM
This revision was automatically updated to reflect the committed changes.