This patch follows some ideas from r352866 to optimize the floating point materialization further. It changes isFPImmLegal to considere up to 2 mov instruction. The rationale is the cost is the same for mov+fmov vs. adrp+ldr; but the mov+fmov sequence is always better because of the reduced d-cache pressure. The timings are still the same if you consider movw+movk+fmov vs. adrp+ldr will be fused (although one instruction longer).
Details
Diff Detail
Event Timeline
lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
5404 | Please don't copy-paste code. And this is a very inaccurate approximation for the logic in AArch64ExpandPseudo::expandMOVImm. | |
5410 | Not sure this comment should be here; should be next to the code that actually makes this decision. | |
5417 | Does it matter whether we're optimizing for size? |
New revision based on previous comments. I refactored the logic used on isFPImmLegal to evaluate whether to materialize the FP constant or not by adding a new function on common aarch64 code, AArch64_AM::getExpandImmCost. To avoid code replication I refactored the code by moving some definitions from AArch64ExpandPseudoInsts.cpp.
lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
5404 | Ack, I changed it on next revision to use the same ideas AArch64ExpandPseudo::expandMOVIm. It required to consolidate some common code. | |
5410 | Ack, I moved to the code that actually uses it. | |
5417 | I intend to send another patch to add the optimization for size information on isFPImmLegal, since it requires some refactoring in various backends. |
Not sure I like the duplicated logic in getExpandImmCost; it doesn't have good test coverage, and it could fall out of sync in the future. That said, I've been considering refactoring the code in expandMOVImm anyway, to split the actual instruction emission away from the logic that figures out the appropriate sequence. Basically, the idea would be that instead of returning a number from getExpandImmCost, you return an abstraction of the instruction sequence: an array that contains, for each instruction, the appropriate opcode and immediate. isFPImmLegal just uses the number of elements in the array, while expandMOVImm actually emits instructions based on the array. I think this would shrink the code overall because the logic for building instructions is currently duplicated multiple times. (I was considering it more in the context of adding more possible sequences, but it works here as well.)
lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
5422 | Is this comment redundant? | |
5425 | Where is forCodeSize defined? | |
lib/Target/AArch64/MCTargetDesc/AArch64AddressingModes.h | ||
1054 ↗ | (On Diff #188415) | "return 2" isn't quite right; it could be 1. |
Right, I think I can follow this idea and rewrite my patch if you don't mind.
lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
5422 | Indeed, I will remove it. | |
5425 | It is an artifact from a wrong rebase, I will fix it (it is meant for a different patch). | |
lib/Target/AArch64/MCTargetDesc/AArch64AddressingModes.h | ||
1054 ↗ | (On Diff #188415) | Indeed, I will fix it. |
lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
5418 | Perhaps you could check for AArch64Subtarget::hasFuseLiterals() and emit up to 5 instructions, thus including f64, unless optimizing for size. |
Right, I think I can follow this idea and rewrite my patch if you don't mind.
Okay, thanks.
Updated patch based on previous comments. It depends on https://reviews.llvm.org/D58915 and https://reviews.llvm.org/D58690
Please don't copy-paste code. And this is a very inaccurate approximation for the logic in AArch64ExpandPseudo::expandMOVImm.