Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Analysis/ConstantFolding.cpp | ||
---|---|---|
1918 | This folds amdgcn.sin(1.0) to a very small value that is not exactly 0.0. Should I add an explicit check for all of the quarter-integer values that should fold to exactly -1.0 or 0.0 or +1.0 ? |
llvm/lib/Analysis/ConstantFolding.cpp | ||
---|---|---|
1918 | I've never been sure what exactly the policy should be for folding these sorts of intrinsics. I think nobody is happy with relying on the host libm call to begin with, plus the hardware instructions don't give the exact same results. We're already constant folding more precise results for rcp than the hardware instruction gives though (which is probably more useful in general). I'm not sure this is how this will work going forward, but you could also consider strictfp on the call site (it may happen already, but you should add a test where certain folds are skipped) Also you should use numbers::pi rather than relying on M_PI (and I strongly prefer using an explicit 2.0 rather than 2) | |
llvm/test/Analysis/ConstantFolding/AMDGPU/sin.ll | ||
175 | Also should test snan -> qnan (this also won't happen for !ieee_mode, so this case should depend on strictfp also) | |
178 | Add some cases where strictfp calls are skipped |
llvm/lib/Analysis/ConstantFolding.cpp | ||
---|---|---|
1918 |
I thought:
But perhaps I'm wrong about some or all of this. | |
llvm/test/Analysis/ConstantFolding/AMDGPU/sin.ll | ||
175 | All of the FP constant folding code is skipped for infinities and nans so there's not much to check. (I didn't really mean to add any of these inf and nan checks here, just carelessly cut n pasted from another test.) |
llvm/lib/Analysis/ConstantFolding.cpp | ||
---|---|---|
1918 | I'm pretty sure the nan quieting behavior will change based on the ieee_mode bit. AMDGPU has a wider set of FP environment properties. Plus, these support exceptions anyway | |
llvm/test/Analysis/ConstantFolding/AMDGPU/sin.ll | ||
175 | That would be a bug, since I would expect them to fold too |
llvm/lib/Analysis/ConstantFolding.cpp | ||
---|---|---|
1918 | I don't understand the first comment above where the argument is 1.0. These intrinsics take radians, and not the scaled values the ISA expects,, correct? Similarly, if any bounds checking is done, shouldn't it be on radians? |
llvm/lib/Analysis/ConstantFolding.cpp | ||
---|---|---|
1918 | From the lowering it looks like they directly feed the subtarget instruction. The lowering for llvm.sin/llvm.cos inserts the scale or not depending on the target (I'm not sure why you would ever need the raw amdgcn intrinsic form) |
llvm/lib/Analysis/ConstantFolding.cpp | ||
---|---|---|
1918 | The device libraries have one function that uses both in the raw form, and doesn't consider the two subtarget behaviors |
llvm/lib/Analysis/ConstantFolding.cpp | ||
---|---|---|
1918 | LLPC sometimes generates the amdgcn.sin/cos intrinsics. The problem with introducing the multiply when llvm.sin/cos is lowered is that it doesn't get folded away if the argument was already something multiplied or divided by a constant, which is pretty common. |
llvm/test/Analysis/ConstantFolding/AMDGPU/sin.ll | ||
---|---|---|
175 | A missed optimization perhaps, but it's always been like that for all math intrinsics, so I don't plan to fix it as part of this patch. |
llvm/lib/Analysis/ConstantFolding.cpp | ||
---|---|---|
1918 | It should in the DAG lowering. We're not propagating fast math flags now in the sin lowering, maybe that's your problem? |
llvm/test/Analysis/ConstantFolding/AMDGPU/sin.ll | ||
---|---|---|
139 | Looks like this ends in BCD instead of BCC on Windows (with clang-cl as host compiler at least): http://45.33.8.238/win/16679/step_11.txt |
llvm/lib/Analysis/ConstantFolding.cpp | ||
---|---|---|
1918 | Good point, thanks, see D80813. Perhaps there is no good reason for LLPC to use the amdgcn intrinsics after all. | |
llvm/test/Analysis/ConstantFolding/AMDGPU/sin.ll | ||
139 | Thanks, should be fixed in rGc27214c23446e423ec2e7eb8650a65cc5f0a16aa. |
This folds amdgcn.sin(1.0) to a very small value that is not exactly 0.0. Should I add an explicit check for all of the quarter-integer values that should fold to exactly -1.0 or 0.0 or +1.0 ?