This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Optimize floating point materialization
ClosedPublic

Authored by zatrazz on Feb 20 2019, 10:05 AM.

Details

Summary

This patch follows some ideas from r352866 to optimize the floating point materialization further. It changes isFPImmLegal to considere up to 2 mov instruction. The rationale is the cost is the same for mov+fmov vs. adrp+ldr; but the mov+fmov sequence is always better because of the reduced d-cache pressure. The timings are still the same if you consider movw+movk+fmov vs. adrp+ldr will be fused (although one instruction longer).

Diff Detail

Event Timeline

zatrazz created this revision.Feb 20 2019, 10:05 AM
efriedma added inline comments.Feb 20 2019, 11:43 AM
lib/Target/AArch64/AArch64ISelLowering.cpp
5404

Please don't copy-paste code. And this is a very inaccurate approximation for the logic in AArch64ExpandPseudo::expandMOVImm.

5410

Not sure this comment should be here; should be next to the code that actually makes this decision.

5417

Does it matter whether we're optimizing for size?

zatrazz updated this revision to Diff 188415.Feb 26 2019, 10:52 AM
zatrazz marked 3 inline comments as done.

New revision based on previous comments. I refactored the logic used on isFPImmLegal to evaluate whether to materialize the FP constant or not by adding a new function on common aarch64 code, AArch64_AM::getExpandImmCost. To avoid code replication I refactored the code by moving some definitions from AArch64ExpandPseudoInsts.cpp.

lib/Target/AArch64/AArch64ISelLowering.cpp
5404

Ack, I changed it on next revision to use the same ideas AArch64ExpandPseudo::expandMOVIm. It required to consolidate some common code.

5410

Ack, I moved to the code that actually uses it.

5417

I intend to send another patch to add the optimization for size information on isFPImmLegal, since it requires some refactoring in various backends.

Not sure I like the duplicated logic in getExpandImmCost; it doesn't have good test coverage, and it could fall out of sync in the future. That said, I've been considering refactoring the code in expandMOVImm anyway, to split the actual instruction emission away from the logic that figures out the appropriate sequence. Basically, the idea would be that instead of returning a number from getExpandImmCost, you return an abstraction of the instruction sequence: an array that contains, for each instruction, the appropriate opcode and immediate. isFPImmLegal just uses the number of elements in the array, while expandMOVImm actually emits instructions based on the array. I think this would shrink the code overall because the logic for building instructions is currently duplicated multiple times. (I was considering it more in the context of adding more possible sequences, but it works here as well.)

lib/Target/AArch64/AArch64ISelLowering.cpp
5422

Is this comment redundant?

5425

Where is forCodeSize defined?

lib/Target/AArch64/MCTargetDesc/AArch64AddressingModes.h
1054 ↗(On Diff #188415)

"return 2" isn't quite right; it could be 1.

zatrazz marked 3 inline comments as done.Feb 27 2019, 10:20 AM

Not sure I like the duplicated logic in getExpandImmCost; it doesn't have good test coverage, and it could fall out of sync in the future. That said, I've been considering refactoring the code in expandMOVImm anyway, to split the actual instruction emission away from the logic that figures out the appropriate sequence. Basically, the idea would be that instead of returning a number from getExpandImmCost, you return an abstraction of the instruction sequence: an array that contains, for each instruction, the appropriate opcode and immediate. isFPImmLegal just uses the number of elements in the array, while expandMOVImm actually emits instructions based on the array. I think this would shrink the code overall because the logic for building instructions is currently duplicated multiple times. (I was considering it more in the context of adding more possible sequences, but it works here as well.)

Right, I think I can follow this idea and rewrite my patch if you don't mind.

lib/Target/AArch64/AArch64ISelLowering.cpp
5422

Indeed, I will remove it.

5425

It is an artifact from a wrong rebase, I will fix it (it is meant for a different patch).

lib/Target/AArch64/MCTargetDesc/AArch64AddressingModes.h
1054 ↗(On Diff #188415)

Indeed, I will fix it.

evandro added inline comments.Feb 27 2019, 10:38 AM
lib/Target/AArch64/AArch64ISelLowering.cpp
5418

Perhaps you could check for AArch64Subtarget::hasFuseLiterals() and emit up to 5 instructions, thus including f64, unless optimizing for size.

Right, I think I can follow this idea and rewrite my patch if you don't mind.

Okay, thanks.

zatrazz updated this revision to Diff 189176.Mar 4 2019, 11:36 AM

Updated patch based on previous comments. It depends on https://reviews.llvm.org/D58915 and https://reviews.llvm.org/D58690

efriedma accepted this revision.Mar 4 2019, 1:30 PM

LGTM (but obviously can't be merged until the dependency is reviewed/merged)

This revision is now accepted and ready to land.Mar 4 2019, 1:30 PM
zatrazz closed this revision.Mar 18 2019, 11:44 AM