⚙ D20557 Lowering floating point division for 32-bit using IEEE 754

wdng updated this revision to Diff 58190.May 23 2016, 9:19 PM

wdng retitled this revision from to Lowering floating point division for 32-bit using IEEE 754.

wdng updated this object.

wdng added reviewers: arsenm, • tstellarAMD, cfang.

wdng set the repository for this revision to rL LLVM.

Herald added a subscriber: arsenm. · View Herald TranscriptMay 23 2016, 9:19 PM

arsenm added inline comments.May 24 2016, 12:50 PM

lib/Target/AMDGPU/SIISelLowering.cpp
42	Description is inaccurate. Should say something about faster 2.5 ulp fdiv
2031	No space after (
2032	Break on same line
2034	Indentation looks wrong
2047	Don't all caps ON
2076	You don't need the else

wdng updated this revision to Diff 58310.May 24 2016, 1:50 PM

wdng updated this object.

wdng marked 6 inline comments as done.

wdng removed a subscriber: arsenm.

Needs tests

wdng updated this revision to Diff 58637.May 26 2016, 10:07 AM

wdng updated this object.

wdng edited edge metadata.

You should add more tests that use the fast math flags, and also run with unsafe-fp-math enabled on the function

test/CodeGen/AMDGPU/fdiv.ll
4 ↗	(On Diff #58637)	This should come before the r600 line
81–82 ↗	(On Diff #58637)	-DAG does not work as expected if you specify the same string multiple times in the same -DAG sequence. I would reduce the vector tests to a differentiating instruction or two per element (e.g. div_scale + div_fixup)

Hi Matt, You commented that "You should add more tests that use the fast math flags, and also run with unsafe-fp-math enabled on the function". Looks like fast math is not a llc flag.
Could you explain in more details about using the fast math flags? e.g. What's the fast math flag? How the compiler backend retrieves the fast math flag? what code should we generate for fast-math? Thanks a lot!

The llc flag is -enable-unsafe-fp-math. There is also the per-function attribute "unsafe-fp-math"="true". We should probably use that, but I discovered a while ago that if one function has it used, the others must have "unsafe-fp-math"="false" because there is a bug where the setting is reset between functions

The new introduced flag " -amdgpu-fast-fdiv" is used to control use old and new added IEEE754 div implementation. For the -enable-unsafe-fp-math, could you please let me know which part of code will be tested? Thanks!

The test should show which division implementation you get when unsafe math is enabled. There also need to be tests for the fast math flags on the individual instructions

Modify & add LIT tests based on Matt's comments

You still need to add tests that only use the fast math flags

lib/Target/AMDGPU/SIISelLowering.cpp
2081–2082	Variables should be capitalized and camel case

Any comments about my latest changes? Thanks!

Capitalize variables to follow the camel case.

In D20557#444512, @wdng wrote:

Any comments about my latest changes? Thanks!

Still need tests with the individual fast math flags

In D20557#444692, @arsenm wrote:

In D20557#444512, @wdng wrote:

Any comments about my latest changes? Thanks!

Still need tests with the individual fast math flags

Not quite clear about the individual fast math? Could you please describe in details?

In D20557#444720, @wdng wrote:

In D20557#444692, @arsenm wrote:

In D20557#444512, @wdng wrote:

Any comments about my latest changes? Thanks!

Still need tests with the individual fast math flags

Not quite clear about the individual fast math? Could you please describe in details?

There should be tests with just the fast math flags as described here: http://llvm.org/docs/LangRef.html#fast-math-flags

Rather than the globally enabled fast math option.

In D20557#444721, @arsenm wrote:

In D20557#444720, @wdng wrote:

In D20557#444692, @arsenm wrote:

In D20557#444512, @wdng wrote:

Any comments about my latest changes? Thanks!

Still need tests with the individual fast math flags

Not quite clear about the individual fast math? Could you please describe in details?

There should be tests with just the fast math flags as described here: http://llvm.org/docs/LangRef.html#fast-math-flags

Rather than the globally enabled fast math option.

Looks like compiler chooses a different implementation for fpdiv if we have "-enable-unsafe-fp-math" flag enabled when using llc. Generated *.s does show different ISA output, but the breakpoint which I put at SITargetLowering::LowerFDIV is never reached. Is this correct?

Based on Matt's comments: Added tests (Merge Changpeng's LIT tests) with just the fast math flags as described here: http://llvm.org/docs/LangRef.html#fast-math-flags, rather than the globally enabled fast math option.

arsenm added inline comments.Jun 2 2016, 12:04 PM

test/CodeGen/AMDGPU/fdiv.ll
4 ↗	(On Diff #59137)	fast math + -amdgpu-fast-fdiv would be another combination to try
58 ↗	(On Diff #59137)	There should be one that only uses arcp as well

wdng added inline comments.Jun 2 2016, 12:36 PM

test/CodeGen/AMDGPU/fdiv.ll
4 ↗	(On Diff #59137)	Do you mean fast-math flag + "-amdgpu-fast-fdiv" ?

arsenm added inline comments.Jun 2 2016, 5:11 PM

test/CodeGen/AMDGPU/fdiv.ll
4 ↗	(On Diff #59137)	Yes

wdng added inline comments.Jun 2 2016, 5:41 PM

test/CodeGen/AMDGPU/fdiv.ll
4 ↗	(On Diff #59137)	Actually combining fast math + -amdgpu-fast-fdiv has been covered in these tests. When fast math flags + -enable-unsafe-fp-math are enabled (fast, arcp, etc.), " if (Flags->hasAllowReciprocal()) { ... } " will be executed. Orignal less accurate fpdiv implementation will be executed when only enabling the -amdgpu-fast-fdiv flag. If we enable fast math + -amdgpu-fast-fdiv, less accurate fpdiv will be tested.

Added tests for fast-math arcp.

LGTM

test/CodeGen/AMDGPU/fdiv.ll
4 ↗	(On Diff #59627)	Is it? This is what I'm unsure about and why I think there should be an additional test. Are the individual math node flags set if these are globally enabled?

This revision is now accepted and ready to land.Jun 6 2016, 1:42 PM

artem.tamazov added a subscriber: artem.tamazov.Jun 7 2016, 4:57 AM

artem.tamazov added inline comments.

test/CodeGen/AMDGPU/fdiv.ll
50–55 ↗	(On Diff #59627)	AFAIK there should be: Two v_div_scale prior v_rcp Two (not one) v_fma prior v_mul Three (not one) v_fma after v_mul v_div_fmas prior v_div_fixup. Perhaps the test should check that.

wdng added inline comments.Jun 7 2016, 8:40 AM

test/CodeGen/AMDGPU/fdiv.ll
50–55 ↗	(On Diff #59627)	Based on Matt's comment: "-DAG does not work as expected if you specify the same string multiple times in the same -DAG sequence". So, we reduce the vector tests to a differentiating instruction or two per element (e.g. div_scale + div_fixup).

Closed by commit rL272044: Differential Revision: http://reviews.llvm.org/D20557 (authored by wdng). · Explain WhyJun 7 2016, 12:11 PM

This revision was automatically updated to reflect the committed changes.

artem.tamazov added inline comments.Jun 8 2016, 6:28 AM

test/CodeGen/AMDGPU/fdiv.ll
50–55 ↗	(On Diff #59627)	OK. However, v_div_fmas_f32 is still missing. Perhaps this is not important.

Fixed LIT test failed for frem.ll and rsq.ll

arsenm added inline comments.Jun 8 2016, 12:32 PM

test/CodeGen/AMDGPU/frem.ll
1–3 ↗	(On Diff #60081)	This should also test without -amdgpu-fast-fdiv. -mcpu=SI should also be removed

cfang added inline comments.Jun 8 2016, 1:02 PM

test/CodeGen/AMDGPU/frem.ll
1–3 ↗	(On Diff #60081)	Actually I don't suggest we test with -amdgpu-fast-fdiv here at all. This option has already been tested in fdiv.ll, and testing here does not provide any additional value.

Modified LIT test based on Matt's and Changpeng's suggestion.

wdng marked an inline comment as done.Jun 8 2016, 5:17 PM

wdng added inline comments.

test/CodeGen/AMDGPU/fdiv.ll
50–55 ↗	(On Diff #60120)	Yes.

LGTM

Please go ahead to commit!

This is an archive of the discontinued LLVM Phabricator instance.

Lowering floating point division for 32-bit using IEEE 754
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 58190

lib/Target/AMDGPU/SIISelLowering.cpp

This is an archive of the discontinued LLVM Phabricator instance.

Lowering floating point division for 32-bit using IEEE 754ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 58190

lib/Target/AMDGPU/SIISelLowering.cpp

Lowering floating point division for 32-bit using IEEE 754
ClosedPublic