This is an archive of the discontinued LLVM Phabricator instance.

Lowering floating point division for 32-bit using IEEE 754
ClosedPublic

Authored by wdng on May 23 2016, 9:19 PM.

Details

Summary

Lowering floating point division for 32-bit using IEEE 754

Diff Detail

Repository
rL LLVM

Event Timeline

wdng updated this revision to Diff 58190.May 23 2016, 9:19 PM
wdng retitled this revision from to Lowering floating point division for 32-bit using IEEE 754.
wdng updated this object.
wdng added reviewers: arsenm, tstellarAMD, cfang.
wdng set the repository for this revision to rL LLVM.
arsenm added inline comments.May 24 2016, 12:50 PM
lib/Target/AMDGPU/SIISelLowering.cpp
42 ↗(On Diff #58190)

Description is inaccurate. Should say something about faster 2.5 ulp fdiv

2031 ↗(On Diff #58190)

No space after (

2032 ↗(On Diff #58190)

Break on same line

2034 ↗(On Diff #58190)

Indentation looks wrong

2047 ↗(On Diff #58190)

Don't all caps ON

2076 ↗(On Diff #58190)

You don't need the else

wdng updated this revision to Diff 58310.May 24 2016, 1:50 PM
wdng updated this object.
wdng marked 6 inline comments as done.
wdng removed a subscriber: arsenm.
arsenm edited edge metadata.May 24 2016, 2:42 PM

Needs tests

wdng updated this revision to Diff 58637.May 26 2016, 10:07 AM
wdng updated this object.
wdng edited edge metadata.

You should add more tests that use the fast math flags, and also run with unsafe-fp-math enabled on the function

test/CodeGen/AMDGPU/fdiv.ll
4 ↗(On Diff #58637)

This should come before the r600 line

81–82 ↗(On Diff #58637)

-DAG does not work as expected if you specify the same string multiple times in the same -DAG sequence. I would reduce the vector tests to a differentiating instruction or two per element (e.g. div_scale + div_fixup)

wdng added a comment.May 26 2016, 1:53 PM

Hi Matt, You commented that "You should add more tests that use the fast math flags, and also run with unsafe-fp-math enabled on the function". Looks like fast math is not a llc flag.
Could you explain in more details about using the fast math flags? e.g. What's the fast math flag? How the compiler backend retrieves the fast math flag? what code should we generate for fast-math? Thanks a lot!

The llc flag is -enable-unsafe-fp-math. There is also the per-function attribute "unsafe-fp-math"="true". We should probably use that, but I discovered a while ago that if one function has it used, the others must have "unsafe-fp-math"="false" because there is a bug where the setting is reset between functions

wdng added a comment.May 26 2016, 2:17 PM

The new introduced flag " -amdgpu-fast-fdiv" is used to control use old and new added IEEE754 div implementation. For the -enable-unsafe-fp-math, could you please let me know which part of code will be tested? Thanks!

The test should show which division implementation you get when unsafe math is enabled. There also need to be tests for the fast math flags on the individual instructions

wdng updated this revision to Diff 58855.May 27 2016, 3:54 PM
wdng updated this object.
wdng marked 2 inline comments as done.

Modify & add LIT tests based on Matt's comments

You still need to add tests that only use the fast math flags

lib/Target/AMDGPU/SIISelLowering.cpp
2080–2081 ↗(On Diff #58855)

Variables should be capitalized and camel case

wdng added a comment.May 31 2016, 10:25 AM

Any comments about my latest changes? Thanks!

wdng updated this revision to Diff 59089.May 31 2016, 10:42 AM
wdng marked an inline comment as done.

Capitalize variables to follow the camel case.

In D20557#444512, @wdng wrote:

Any comments about my latest changes? Thanks!

Still need tests with the individual fast math flags

wdng added a comment.May 31 2016, 11:45 AM
In D20557#444512, @wdng wrote:

Any comments about my latest changes? Thanks!

Still need tests with the individual fast math flags

Not quite clear about the individual fast math? Could you please describe in details?

In D20557#444720, @wdng wrote:
In D20557#444512, @wdng wrote:

Any comments about my latest changes? Thanks!

Still need tests with the individual fast math flags

Not quite clear about the individual fast math? Could you please describe in details?

There should be tests with just the fast math flags as described here: http://llvm.org/docs/LangRef.html#fast-math-flags

Rather than the globally enabled fast math option.

wdng added a comment.May 31 2016, 1:28 PM
In D20557#444720, @wdng wrote:
In D20557#444512, @wdng wrote:

Any comments about my latest changes? Thanks!

Still need tests with the individual fast math flags

Not quite clear about the individual fast math? Could you please describe in details?

There should be tests with just the fast math flags as described here: http://llvm.org/docs/LangRef.html#fast-math-flags

Rather than the globally enabled fast math option.

Looks like compiler chooses a different implementation for fpdiv if we have "-enable-unsafe-fp-math" flag enabled when using llc. Generated *.s does show different ISA output, but the breakpoint which I put at SITargetLowering::LowerFDIV is never reached. Is this correct?

wdng updated this revision to Diff 59137.May 31 2016, 3:11 PM

Based on Matt's comments: Added tests (Merge Changpeng's LIT tests) with just the fast math flags as described here: http://llvm.org/docs/LangRef.html#fast-math-flags, rather than the globally enabled fast math option.

arsenm added inline comments.Jun 2 2016, 12:04 PM
test/CodeGen/AMDGPU/fdiv.ll
4 ↗(On Diff #59137)

fast math + -amdgpu-fast-fdiv would be another combination to try

58 ↗(On Diff #59137)

There should be one that only uses arcp as well

wdng added inline comments.Jun 2 2016, 12:36 PM
test/CodeGen/AMDGPU/fdiv.ll
4 ↗(On Diff #59137)

Do you mean fast-math flag + "-amdgpu-fast-fdiv" ?

arsenm added inline comments.Jun 2 2016, 5:11 PM
test/CodeGen/AMDGPU/fdiv.ll
4 ↗(On Diff #59137)

Yes

wdng added inline comments.Jun 2 2016, 5:41 PM
test/CodeGen/AMDGPU/fdiv.ll
4 ↗(On Diff #59137)

Actually combining fast math + -amdgpu-fast-fdiv has been covered in these tests. When fast math flags + -enable-unsafe-fp-math are enabled (fast, arcp, etc.), " if (Flags->hasAllowReciprocal()) { ... } " will be executed. Orignal less accurate fpdiv implementation will be executed when only enabling the -amdgpu-fast-fdiv flag. If we enable fast math + -amdgpu-fast-fdiv, less accurate fpdiv will be tested.

wdng updated this revision to Diff 59627.Jun 3 2016, 2:41 PM
wdng marked an inline comment as done.

Added tests for fast-math arcp.

arsenm accepted this revision.Jun 6 2016, 1:42 PM
arsenm edited edge metadata.

LGTM

test/CodeGen/AMDGPU/fdiv.ll
4 ↗(On Diff #59627)

Is it? This is what I'm unsure about and why I think there should be an additional test. Are the individual math node flags set if these are globally enabled?

This revision is now accepted and ready to land.Jun 6 2016, 1:42 PM
artem.tamazov added inline comments.
test/CodeGen/AMDGPU/fdiv.ll
50–55 ↗(On Diff #59627)

AFAIK there should be:

  1. Two v_div_scale prior v_rcp
  2. Two (not one) v_fma prior v_mul
  3. Three (not one) v_fma after v_mul
  4. v_div_fmas prior v_div_fixup.

Perhaps the test should check that.

wdng added inline comments.Jun 7 2016, 8:40 AM
test/CodeGen/AMDGPU/fdiv.ll
50–55 ↗(On Diff #59627)

Based on Matt's comment: "-DAG does not work as expected if you specify the same string multiple times in the same -DAG sequence". So, we reduce the vector tests to a differentiating instruction or two per element (e.g. div_scale + div_fixup).

This revision was automatically updated to reflect the committed changes.
artem.tamazov added inline comments.Jun 8 2016, 6:28 AM
test/CodeGen/AMDGPU/fdiv.ll
50–55 ↗(On Diff #59627)

OK. However, v_div_fmas_f32 is still missing. Perhaps this is not important.

wdng updated this revision to Diff 60081.Jun 8 2016, 12:28 PM
wdng edited edge metadata.

Fixed LIT test failed for frem.ll and rsq.ll

arsenm added inline comments.Jun 8 2016, 12:32 PM
test/CodeGen/AMDGPU/frem.ll
1–3 ↗(On Diff #60081)

This should also test without -amdgpu-fast-fdiv. -mcpu=SI should also be removed

cfang added inline comments.Jun 8 2016, 1:02 PM
test/CodeGen/AMDGPU/frem.ll
1–3 ↗(On Diff #60081)

Actually I don't suggest we test with -amdgpu-fast-fdiv here at all. This option has already been tested in fdiv.ll, and testing here does not provide any additional value.

wdng updated this revision to Diff 60120.Jun 8 2016, 5:14 PM
wdng marked 2 inline comments as done.

Modified LIT test based on Matt's and Changpeng's suggestion.

wdng marked an inline comment as done.Jun 8 2016, 5:17 PM
wdng added inline comments.
test/CodeGen/AMDGPU/fdiv.ll
50–55 ↗(On Diff #60120)

Yes.

cfang edited edge metadata.Jun 9 2016, 11:42 AM

LGTM

Please go ahead to commit!