This patch fixes the fdiv precision issues.
Details
Diff Detail
- Build Status
Buildable 1864 Build 1864: arc lint + arc unit
Event Timeline
This needs tests.
Should the old division path without this be left around?
lib/Target/AMDGPU/AMDGPUISelLowering.h | ||
---|---|---|
227–229 | These should be FMA_W_CHAIN, FMUL_W_CHAIN | |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
2775 | The constant should be a bitmask formed from the enums for the fields you are setting rather than the magic numbers | |
2780 | These lines go over 80 columns | |
2823 | You don't need any of the getValue(0)s | |
lib/Target/AMDGPU/SOPInstructions.td | ||
593 | This shouldn't have isBarrier set | |
598 | You can move the hasSideEffects here instead of the let block since it's just the one instruction | |
lib/Target/AMDGPU/VOP3Instructions.td | ||
224–226 ↗ | (On Diff #77265) | There should be a pattern which uses the complex pattern for the source modifiers |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
2826–2831 | The indentation here and in the rest of the block looks wrong. | |
lib/Target/AMDGPU/SIISelLowering.h | ||
40–41 ↗ | (On Diff #80032) | Does the LowerFDIV32 function still exist? Is it used anymore? |
lib/Target/AMDGPU/VOP3Instructions.td | ||
224–235 ↗ | (On Diff #80032) | These are dead patterns since we are custom selecting the SDNodes. |
test/CodeGen/AMDGPU/fdiv_setreg_chain.ll | ||
6 ↗ | (On Diff #80032) | We should check the whole s_setreg_imm32 here to make sure we get the correct operands. |
13 ↗ | (On Diff #80032) | Same thing here. |
test/CodeGen/AMDGPU/fdiv_setreg_chain.ll | ||
---|---|---|
16–20 ↗ | (On Diff #80093) | Is there a flag that we can use to control +/- denormals? |
- Only write necessary bits.
- Merge testcase into fdiv.ll
- Remove magic numbers
- Don't use s_setreg when denormals are enabled.
lib/Target/AMDGPU/AMDGPUISelLowering.h | ||
---|---|---|
234–236 | I would only put the comment once for the block of the 2 instructions | |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
2967–2969 | We should probably not use target constant here and teach FoldImmediate to turn the register setreg into the immediate setreg to save the code size on multiple uses of the immediate, like will happen in the unrolled vector case |
lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
178–182 ↗ | (On Diff #80356) | Changing the instruction back if it wasn't an immediate looks broken to me. I would expect it wouldn't need to fold register copies at all, and PeepholeOptimizer would take care of it. |
These should be FMA_W_CHAIN, FMUL_W_CHAIN