Lowering floating point division for 32-bit using IEEE 754
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
You should add more tests that use the fast math flags, and also run with unsafe-fp-math enabled on the function
test/CodeGen/AMDGPU/fdiv.ll | ||
---|---|---|
4 | This should come before the r600 line | |
81–82 | -DAG does not work as expected if you specify the same string multiple times in the same -DAG sequence. I would reduce the vector tests to a differentiating instruction or two per element (e.g. div_scale + div_fixup) |
Hi Matt, You commented that "You should add more tests that use the fast math flags, and also run with unsafe-fp-math enabled on the function". Looks like fast math is not a llc flag.
Could you explain in more details about using the fast math flags? e.g. What's the fast math flag? How the compiler backend retrieves the fast math flag? what code should we generate for fast-math? Thanks a lot!
The llc flag is -enable-unsafe-fp-math. There is also the per-function attribute "unsafe-fp-math"="true". We should probably use that, but I discovered a while ago that if one function has it used, the others must have "unsafe-fp-math"="false" because there is a bug where the setting is reset between functions
The new introduced flag " -amdgpu-fast-fdiv" is used to control use old and new added IEEE754 div implementation. For the -enable-unsafe-fp-math, could you please let me know which part of code will be tested? Thanks!
The test should show which division implementation you get when unsafe math is enabled. There also need to be tests for the fast math flags on the individual instructions
You still need to add tests that only use the fast math flags
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
2080–2081 | Variables should be capitalized and camel case |
Not quite clear about the individual fast math? Could you please describe in details?
There should be tests with just the fast math flags as described here: http://llvm.org/docs/LangRef.html#fast-math-flags
Rather than the globally enabled fast math option.
Looks like compiler chooses a different implementation for fpdiv if we have "-enable-unsafe-fp-math" flag enabled when using llc. Generated *.s does show different ISA output, but the breakpoint which I put at SITargetLowering::LowerFDIV is never reached. Is this correct?
Based on Matt's comments: Added tests (Merge Changpeng's LIT tests) with just the fast math flags as described here: http://llvm.org/docs/LangRef.html#fast-math-flags, rather than the globally enabled fast math option.
test/CodeGen/AMDGPU/fdiv.ll | ||
---|---|---|
4 | Do you mean fast-math flag + "-amdgpu-fast-fdiv" ? |
test/CodeGen/AMDGPU/fdiv.ll | ||
---|---|---|
4 | Yes |
test/CodeGen/AMDGPU/fdiv.ll | ||
---|---|---|
4 | Actually combining fast math + -amdgpu-fast-fdiv has been covered in these tests. When fast math flags + -enable-unsafe-fp-math are enabled (fast, arcp, etc.), " if (Flags->hasAllowReciprocal()) { ... } " will be executed. Orignal less accurate fpdiv implementation will be executed when only enabling the -amdgpu-fast-fdiv flag. If we enable fast math + -amdgpu-fast-fdiv, less accurate fpdiv will be tested. |
LGTM
test/CodeGen/AMDGPU/fdiv.ll | ||
---|---|---|
4 | Is it? This is what I'm unsure about and why I think there should be an additional test. Are the individual math node flags set if these are globally enabled? |
test/CodeGen/AMDGPU/fdiv.ll | ||
---|---|---|
50–55 | AFAIK there should be:
Perhaps the test should check that. |
test/CodeGen/AMDGPU/fdiv.ll | ||
---|---|---|
50–55 | Based on Matt's comment: "-DAG does not work as expected if you specify the same string multiple times in the same -DAG sequence". So, we reduce the vector tests to a differentiating instruction or two per element (e.g. div_scale + div_fixup). |
test/CodeGen/AMDGPU/fdiv.ll | ||
---|---|---|
50–55 | OK. However, v_div_fmas_f32 is still missing. Perhaps this is not important. |
test/CodeGen/AMDGPU/frem.ll | ||
---|---|---|
1–3 | This should also test without -amdgpu-fast-fdiv. -mcpu=SI should also be removed |
test/CodeGen/AMDGPU/frem.ll | ||
---|---|---|
1–3 | Actually I don't suggest we test with -amdgpu-fast-fdiv here at all. This option has already been tested in fdiv.ll, and testing here does not provide any additional value. |
test/CodeGen/AMDGPU/fdiv.ll | ||
---|---|---|
50–55 | Yes. |
Description is inaccurate. Should say something about faster 2.5 ulp fdiv