If denorms are not flushed we can use max instead of multiplication
by 1. For double that is simply faster, while for float and half
it is shorter, because mul uses constant bus and VOP3.
Details
Details
Diff Detail
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/Target/AMDGPU/AMDGPUInstructions.td | ||
---|---|---|
45–47 ↗ | (On Diff #111594) | How / why this change? |
lib/Target/AMDGPU/SIInstructions.td | ||
1280–1285 ↗ | (On Diff #111594) | I think it would be clearer to have let Predicates = [NoFP16Denormals] rather than relying on AddedComplexity to prefer one pattern over the other |
test/CodeGen/AMDGPU/fcanonicalize-denorms.ll | ||
8 ↗ | (On Diff #111594) | Can you merge this with fcanonicalize.ll? That one avoids multiple run lines by using the attributes on the different functions |