This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU : Add S_SETREG instructions to fix fdiv precision issues.
ClosedPublic

Authored by tstellarAMD on Nov 8 2016, 2:36 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

wdng updated this revision to Diff 77265.Nov 8 2016, 2:36 PM
wdng retitled this revision from to AMDGPU : Enable/Disable S_SETREG in FDIV32..
wdng updated this object.
wdng added reviewers: arsenm, tstellarAMD.
wdng set the repository for this revision to rL LLVM.

Remove commented out code and add a test

wdng updated this revision to Diff 77559.Nov 10 2016, 3:07 PM
wdng updated this object.
wdng edited edge metadata.

This patch is not finished yet, which is used only for temporary code reviewing.

wdng updated this revision to Diff 77570.Nov 10 2016, 4:11 PM
wdng edited edge metadata.

Not finished yet, for temporary code reviews only.

wdng updated this revision to Diff 79972.Dec 1 2016, 12:44 PM
wdng updated this object.
wdng added a reviewer: b-sumner.

Add s_setreg instructions for fdiv to fix the precision issues.

wdng retitled this revision from AMDGPU : Enable/Disable S_SETREG in FDIV32. to AMDGPU : Add S_SETREG instructions to fix fdiv precision issues..Dec 1 2016, 12:46 PM
wdng edited edge metadata.
arsenm edited edge metadata.Dec 1 2016, 5:29 PM

This needs tests.

Should the old division path without this be left around?

lib/Target/AMDGPU/AMDGPUISelLowering.h
227–229

These should be FMA_W_CHAIN, FMUL_W_CHAIN

lib/Target/AMDGPU/SIISelLowering.cpp
2775

The constant should be a bitmask formed from the enums for the fields you are setting rather than the magic numbers

2780

These lines go over 80 columns

2823

You don't need any of the getValue(0)s

lib/Target/AMDGPU/SOPInstructions.td
593

This shouldn't have isBarrier set

598

You can move the hasSideEffects here instead of the let block since it's just the one instruction

lib/Target/AMDGPU/VOP3Instructions.td
224–226

There should be a pattern which uses the complex pattern for the source modifiers

wdng updated this revision to Diff 80032.Dec 2 2016, 12:00 AM
wdng edited edge metadata.
wdng marked 6 inline comments as done.

Address review feedback.

tstellarAMD added inline comments.Dec 2 2016, 7:40 AM
lib/Target/AMDGPU/SIISelLowering.cpp
2826–2831

The indentation here and in the rest of the block looks wrong.

lib/Target/AMDGPU/SIISelLowering.h
40–41 ↗(On Diff #80032)

Does the LowerFDIV32 function still exist? Is it used anymore?

lib/Target/AMDGPU/VOP3Instructions.td
224–235

These are dead patterns since we are custom selecting the SDNodes.

test/CodeGen/AMDGPU/fdiv_setreg_chain.ll
6 ↗(On Diff #80032)

We should check the whole s_setreg_imm32 here to make sure we get the correct operands.

13 ↗(On Diff #80032)

Same thing here.

wdng updated this revision to Diff 80093.Dec 2 2016, 10:32 AM
wdng edited edge metadata.
wdng marked 5 inline comments as done.

Address feedback reviews.

lib/Target/AMDGPU/SIISelLowering.cpp
2779–2780

These magic number still need to be replaced with enum values.

2825

Another magic number here.

2843

Extra whitespace change.

lib/Target/AMDGPU/VOP3Instructions.td
222–225

Extra whitespace changes.

arsenm added inline comments.Dec 2 2016, 10:47 AM
test/CodeGen/AMDGPU/fdiv_setreg_chain.ll
16–20 ↗(On Diff #80093)

This should go in the existing fdiv test (which I would expect is already failing with this change)

16–20 ↗(On Diff #80093)

Also I think this should be run with SI/CI +/- denormals, and VI +/- denormals

wdng marked 2 inline comments as done.Dec 2 2016, 1:12 PM
wdng added inline comments.
test/CodeGen/AMDGPU/fdiv_setreg_chain.ll
16–20 ↗(On Diff #80093)

Is there a flag that we can use to control +/- denormals?

wdng added a reviewer: cfang.Dec 3 2016, 12:24 AM
tstellarAMD commandeered this revision.Dec 5 2016, 1:29 PM
tstellarAMD edited reviewers, added: wdng; removed: tstellarAMD.
  • Only write necessary bits.
  • Merge testcase into fdiv.ll
  • Remove magic numbers
  • Don't use s_setreg when denormals are enabled.
arsenm added inline comments.Dec 5 2016, 2:33 PM
lib/Target/AMDGPU/AMDGPUISelLowering.h
234–236

I would only put the comment once for the block of the 2 instructions

lib/Target/AMDGPU/SIISelLowering.cpp
2967–2969

We should probably not use target constant here and teach FoldImmediate to turn the register setreg into the immediate setreg to save the code size on multiple uses of the immediate, like will happen in the unrolled vector case

tstellarAMD marked an inline comment as done.
  • Select to s_setreg_b32 and fold immediates in SIFoldOperands
arsenm added inline comments.Dec 5 2016, 5:39 PM
lib/Target/AMDGPU/SIFoldOperands.cpp
178–182 ↗(On Diff #80356)

Changing the instruction back if it wasn't an immediate looks broken to me. I would expect it wouldn't need to fold register copies at all, and PeepholeOptimizer would take care of it.

  • Only change s_setreg opcode when immediate folding is known to succeed.
arsenm accepted this revision.Dec 6 2016, 2:42 PM
arsenm edited edge metadata.

LGTM

This revision is now accepted and ready to land.Dec 6 2016, 2:42 PM
This revision was automatically updated to reflect the committed changes.