Page MenuHomePhabricator

[DAGCombiner] Combine frem into fdiv+ftrunc+fma
Needs ReviewPublic

Authored by qiucf on Aug 18 2021, 3:10 AM.


Group Reviewers
Restricted Project

This helps in C library function fmod. remainder needs similar optimization, but since that's a libcall instead of a IR/DAG op, we can do it in future patches.

Diff Detail

Event Timeline

qiucf created this revision.Aug 18 2021, 3:10 AM
qiucf requested review of this revision.Aug 18 2021, 3:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 18 2021, 3:10 AM
RKSimon added inline comments.Aug 20 2021, 7:32 AM

comment? // fold (frem x, y) -> (fma (fneg), y, x) ...


Should we check that ftrunc is available?

qiucf updated this revision to Diff 368025.Aug 22 2021, 7:39 PM
qiucf marked 2 inline comments as done.
foad added inline comments.Aug 23 2021, 7:03 AM

Is this always preferable, even on targets where FREM is legal?

qiucf updated this revision to Diff 368549.Aug 24 2021, 9:27 PM
qiucf marked an inline comment as done.
qiucf removed a reviewer: Restricted Project.
foad added inline comments.Aug 25 2021, 2:25 AM

I see you have added an isLegal check for FREM. But I don't understand why you are doing this as a combine in the first place, instead of changing the way FREM is legalized to do this as a lowering instead of calling a libcall.

qiucf added inline comments.Sep 13 2021, 11:59 PM

I'm not sure it's good idea to consider fast-math flags in legalizing.. Here frem just looks like how fdiv is transformed into series of operations in combiner.

I like this as a combine because we can chain it with other combines. For example, the fdiv becomes a reciprocal op in the current tests.

I'd like to see tests for other targets though. I did a quick test, and this may trigger for any of AArch64, AMDGPU, or x86 (use -mattr=avx to make sure ftrunc is supported).

There's an open question about exactly which FMF are needed to enable this transform. Should we require 'reassoc' or others? Is 'reassoc' enough by itself?
The current tests show 'fast' only, so we should reduce at least one of those to whatever we decide is the minimum requirement.

There's also a possible missing constraint when optimizing for minimum size - if we know this node can be lowered to a libcall, is that smaller than the expansion?

We might be able to answer some of these questions and share code with the transforms under DAGCombiner::visitFPOW().