This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] OOptimize floating point materialization
ClosedPublic

Authored by zatrazz on Jan 22 2019, 3:11 AM.

Details

Summary

This patch changes isFPImmLegal to return if the value can be enconded
as the immediate operand of a logical instruction besides checking if
for immediate field for fmov.

This optimizes some floating point materization, inclusive values
used on isinf lowering.

Diff Detail

Repository
rL LLVM

Event Timeline

zatrazz created this revision.Jan 22 2019, 3:11 AM

Sounds good. Please add simple tests for +inf and -inf on both double and float. Perhaps also float to double and vice versa from the IR constant (like the changed test shows), too. Thanks!

zatrazz updated this revision to Diff 183109.Jan 23 2019, 8:56 AM

Updated patch with additional tests. I added tests for __builtin_isinf expanded builtin for float, double, and long double (the latter still requires loading the constant).

rengolin accepted this revision.Jan 23 2019, 9:05 AM

LGTM, thanks!

This revision is now accepted and ready to land.Jan 23 2019, 9:05 AM

Is there some reason we're checking for specific floating-point constants here, as opposed to just calling AArch64_AM::isLogicalImmediate or AArch64_AM::isAnyMOVWMovAlias?

lib/Target/AArch64/AArch64ISelLowering.cpp
5430 ↗(On Diff #183109)

This comment is wrong.

zatrazz marked an inline comment as done.Jan 23 2019, 9:53 AM
zatrazz added inline comments.
lib/Target/AArch64/AArch64ISelLowering.cpp
5430 ↗(On Diff #183109)

What about:

We can materialize #0.0 as fmov $Rd, XZR and #INF as orr $Rx, XZR, CTE
plus fmov $Rd for 64-bit and 32-bit cases.

?

Is there some reason we're checking for specific floating-point constants here, as opposed to just calling AArch64_AM::isLogicalImmediate or AArch64_AM::isAnyMOVWMovAlias?

I am not sure, the DAGCombiner code seems to check on some points if the operation is legal and other if constant can be materialized without load (DAGCombiner::convertSelectOfFPConstantsToLoadOffset).

It would be a one-line change to make the AArch64 target treat all FP immediates as legal (this was implemented in https://reviews.llvm.org/rL223941 , but only enabled for the large code model).

In terms of what's actually profitable, it's not entirely obvious. If you look at timings on a Cortex-A57, the cost is actually exactly the same for mov+fmov vs. adrp+ldr; the mov+fmov sequence is always better because of the reduced cache pressure. And actually, the timings are still the same if you consider movw+movk+fmov vs. adrp+ldr (it's one instruction longer, but the movw+movk is fused). For f64, though, it's not obviously worthwhile to emit a five-instruction sequence vs. a two instruction constant-pool load. Some quick testing with gcc shows it emits mov+movk, but not longer sequences.

We could add a separate target hook for DAGCombiner::SimplifySelectCC, maybe, to distinguish between illegal immediates and immediates which are legal-but-expensive. Probably worth revising that code anyway; it currently doesn't consider the possibility of transforming to an integer SELECT_CC, then BITCASTing the result to a float.

We could also add code to use the SIMD MOVI for floating-point immediates in certain cases (such immediates are probably not that common, but you can make some interesting numbers like 32.f and -0.f).

In any case, it doesn't really make sense to special-case infinity, specifically.

zatrazz updated this revision to Diff 183155.Jan 23 2019, 12:37 PM
zatrazz retitled this revision from [AArch64] Optimize Inf materialization to [AArch64] OOptimize floating point materialization.
zatrazz edited the summary of this revision. (Show Details)

Updated patch from previous comments. I added an extra check using AArch64_AM::isLogicalImmediate for constants that can be materialized as the immediate operand of a logical instruction. We still need to handle the cases for some 8-bit immediates on fmov using getFPXXImm and positive 0.0.

efriedma added inline comments.Jan 23 2019, 12:51 PM
lib/Target/AArch64/AArch64ISelLowering.cpp
5446 ↗(On Diff #183155)

"operanf"

5451 ↗(On Diff #183155)

I have no idea what isLogicalImmediate does if the RegSize is 16.

zatrazz marked 2 inline comments as done.Jan 24 2019, 5:50 AM
zatrazz added inline comments.
lib/Target/AArch64/AArch64ISelLowering.cpp
5446 ↗(On Diff #183155)

Fixed locally.

5451 ↗(On Diff #183155)

My understanding it should be safe since fmov transfer for float16 will be done from 32 bit register.

efriedma added inline comments.Jan 24 2019, 8:45 PM
lib/Target/AArch64/AArch64ISelLowering.cpp
5451 ↗(On Diff #183155)

The proposed transform is fine for f16; I'm more concerned that isLogicalImmediate() itself won't return the right result, since the actual underlying operation is a 32-bit orr.

Actually, thinking about it a bit more, we might as well just mark ISD::ConstantFP Legal for f16, since it can always be done in two instructions.

On a related note, not sure the ISel patterns you need for f16 are implemented.

zatrazz updated this revision to Diff 183519.Jan 25 2019, 4:55 AM

Patch updated based on previous comments. The isAnyMOVWMovAlias
can catch more cases where we can materialize the floating-point constant
than isLogicalImmediate (128 .0 for instance). The positive zero still need to
be handled as specific case because the resulting fmov will use the zero
register instead of an immediate. The isAnyMOVWMovAlias path is not use
for fp16 (I will need to check further if is safe for all cases).

As a side note there still some cases where we can materialize the a fp-constant
without loads that current patch does not handle. The negative zero, for instance,
can be materialized as 'movi v0.2s, 0x80, lsl 24'. I will check how we can handle
cases like this.

Do we have test coverage for the various f16 cases somewhere? I'd like to make sure we don't crash or miscompile for f16 inf.

lib/Target/AArch64/AArch64ISelLowering.cpp
5442 ↗(On Diff #183519)

Maybe clarify precisely what sequence you expect will be emitted here?

5448 ↗(On Diff #183519)

"dbgs() << Imm" should work for APInt, I think.

zatrazz updated this revision to Diff 184532.Jan 31 2019, 9:48 AM

Updated patch based on previous comment:

  • Added a comment about isAnyMOVWMovAlias check intention.
  • Added _Float16 isinf check.

Also, unfortunately dbgs() does not have APFloat support yet.

efriedma accepted this revision.Jan 31 2019, 12:26 PM

LGTM with one nit.

lib/Target/AArch64/AArch64ISelLowering.cpp
5441 ↗(On Diff #184532)

MOVZ, MOVN, or ORR.

evandro added inline comments.Jan 31 2019, 1:35 PM
lib/Target/AArch64/AArch64ISelLowering.cpp
5442 ↗(On Diff #184532)

Since fmov h0, w0 is legal, shouldn't MVT::f16 be added here too?

efriedma added inline comments.Jan 31 2019, 1:48 PM
lib/Target/AArch64/AArch64ISelLowering.cpp
5442 ↗(On Diff #184532)

Yes, it's possible, but I'm fine handling it in a followup. We currently don't have an isel pattern to generate that fmov.

evandro added inline comments.Jan 31 2019, 2:06 PM
lib/Target/AArch64/AArch64ISelLowering.cpp
5442 ↗(On Diff #184532)

OK. Please, add a TODO comment here about it before pushing the patch.

Thank you.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 1 2019, 4:26 AM