Constants are crucial for code size in the ARM Thumb-1 instruction set.
The 16 bit instruction size often does not offer enough space for immediate arguments.
This means that additional instructions are frequently used to load constants into registers.
Since constants are hoisted, this can lead to significant register spillage if they are used multiple times in a single function.
This can be avoided by rematerialization, i.e. recomputing a constant instead of reloading it from the stack.
This patch extends the rematerialization functionality of llvm such that it is able to rematerialize ARM Thumb MOVSi8 instruction despite it setting the CPSR flags.
Details
Diff Detail
Event Timeline
lib/CodeGen/LiveRangeEdit.cpp | ||
---|---|---|
162 | For speed, I think you can iterate over just the defs instead of all the operands, they'll be fewer and it will negate the need to do the Reg and Def checks. | |
177 | Allocatable isn't about liveliness, it's just whether the register can be used by the allocator generally, I'm guessing that this works here because it can't allocate the CPSR. | |
181 | Can you not just query IsDead on the operand instead? |
lib/CodeGen/LiveRangeEdit.cpp | ||
---|---|---|
162 | Agreed, I will fix this. | |
177 | Yes, I'm aware of that. Long story short: I'm quite certain the condition I check here is sufficient but it is probably way too strong. | |
181 | I tried several different options here. |
lib/Target/ARM/ARMInstrThumb.td | ||
---|---|---|
1136 | Can this also be used for the movs register variant? |
lib/CodeGen/LiveRangeEdit.cpp | ||
---|---|---|
177 | The important thing is that the patch is targeted at CPSR, as you point out, which is not allocatable. |
lib/CodeGen/LiveRangeEdit.cpp | ||
---|---|---|
177 | Ok, definitely better to be conservative then. |
lib/Target/ARM/ARMInstrThumb.td | ||
---|---|---|
1136 | The infrastructure that the patch sets up could be used for much more generic instructions, yes! Unfortunately it's pointless though. The problem is that the used registers would have to be preserved by chance from the first invocation to the second. |
Hi Philip,
Thanks for looking into extending the rematerialization support.
I believe the interference checking is too weak to actually work in all cases (see inline comments). More over, I think we cannot say this is still trivially rematerializable given now the callers of that method need to do interference checks. Thus I would suggest to use a new/different hook. In particular, I am not convinced the register coalescer will do the right thing if it rematerializes an instruction with more than one def.
Cheers,
-Quentin
include/llvm/Target/TargetInstrInfo.h | ||
---|---|---|
111 ↗ | (On Diff #106606) | No else after return per LLVM coding standard. |
lib/CodeGen/LiveRangeEdit.cpp | ||
162 | This is wrong. This misses all the implicit-def. >/// Returns a range over all explicit operands that are register definitions. /// Implicit definition are not included! | |
166 | That doesn't seem right either. Indeed where is the check for virtregs? E.g., consider vreg0vreg1 | |
test/CodeGen/Thumb/movi8remat.ll | ||
1 ↗ | (On Diff #106606) | Please use a .mir test. |
Hi Quentin,
Thanks a lot for the comments, that's really valuable feedback.
I will take some time to dig through those issues and get back to you.
Best,
Philip
Hi Quentin,
Let me follow up on your comments.
I renamed the member functions to isPotentiallyTriviallyReMaterializable and isReallyPotentiallyTriviallyReMaterializableGeneric.
This is somewhat grotesque but does bring across the functionality more clearly I think. Obviously I am open to more aesthetic suggestions.
The problem with implicit defs should be removed now and was not present in a previous version of the patch.
I didn't pay attention and was mislead by misleading documentation for MachineInstr::getNumOperands.
I think your concern for checking virtual register defs stems from isPotentiallyTriviallyReMaterializable not being very clear in it's purpose.
I hope the change to the return enum value improves this. Do you agree that this is sufficient or am I overlooking something crucial here?
Best,
Philip
include/llvm/Target/TargetInstrInfo.h | ||
---|---|---|
111 ↗ | (On Diff #106606) | I will fix this. |
lib/CodeGen/LiveRangeEdit.cpp | ||
162 | I was too eager to change this, I think the previous version should have been correct. There appears to be a mistake in the documentation for MachineInstr::getNumOperands, it says "Access to explicit operands of the instruction." | |
166 | I did not weaken the check for additional virtual registers in TargetInstrInfo::isReallyTriviallyReMaterializableGeneric. Arguably the behavior that you imply (allowing additional virtreg defs) would be useful and more obvious but at the core this patch is concerned with the movi8 instruction and the current change is enough to rematerialize that. I will change the enum item YES_BUT_EXTRA_DEFS to YES_BUT_EXTRA_PHYSREG_DEFS to make the behavior of the function more obvious. | |
test/CodeGen/Thumb/movi8remat.ll | ||
1 ↗ | (On Diff #106606) | Ok, I will change it to a .mir test, thanks for providing the options. |
I have rebased onto the master branch at 244c796c894f50fb53f9dbe7627702661dfe69c2.
Quentin, you gave many useful suggestions before, what do you think is the way forward with this?
lib/CodeGen/LiveRangeEdit.cpp | ||
---|---|---|
165 | shouldn't we use auto type and range-based loop now? |
lib/CodeGen/LiveRangeEdit.cpp | ||
---|---|---|
165 | Thanks, I fixed it. |
I think I have addressed all your previous concerns, @qcolombet, is there anything else you'd like to see clarified?
Sam Parker suggested I add you as a reviewer, @MatzeB, do you have any suggestions?
Sorry to be pushy, this patch has been lying around for a while and it would be quite satisfying to get it through the process.
Hi,
Sorry for the delay, I haven’t had time to review the patch but I have one high level comment.
I am not sure the enum approach is desirable as it basically hard code the kind of constraints we can report and will grow very quickly if we want to extend it.
For instance, let say that on top of physreg defs we want to report virtual reg defs, now we would need “yes”, “no”, “yes_phys”, “yes_virt”, “yes_phys_virt” and the list grows with the cross product of everything we may want/need to report.
Anyhow, I’ll look closer at the patch to give more concrete feedbacks.
Cheers,
Q
I am not sure the enum approach is desirable as it basically hard code the kind of constraints we can report and will grow very quickly if we want to extend it.
For instance, let say that on top of physreg defs we want to report virtual reg defs, now we would need “yes”, “no”, “yes_phys”, “yes_virt”, “yes_phys_virt” and the list grows with the cross product of everything we may want/need to report.
Hi Quentin,
Thanks a lot for looking at this again.
I understand your concern, the easiest way of tackling it would probably be some kind of bitfield with flags for the different kind of (potential) extensions of the unrestricted rematerializability property.
However, I would vote for that to be introduced when the problem actually arises. It isn't so obvious to me that many more generalizations of rematerializability will be necessary and the path from enum to bitfield is easy.
What do you think?
Cheers,
Philip
However, I would vote for that to be introduced when the problem actually arises. It isn't so obvious to me that many more generalizations of rematerializability will be necessary and the path from enum to bitfield is easy.
What do you think?
I agree, moving with incremental steps is the right thing to do.
The question is more does this step go in the direction we want? (I am not saying it doesn't, I just haven't looked closer enough to have an opinion :))
Now, I just wanted to point out that there are more opportunities to generalize the rematerialization. The obvious one to me is rematerializing everything (like pulling chain of computations, full instructions (with both definitions and arguments)) and in that sense, only the instruction or chain of instructions carry the right level of information. Right now we often introduce pseudo instruction to work around this limitation and that's what I would like to solve.
Anyway, I'll take a closer look at the patch.
Now, I just wanted to point out that there are more opportunities to generalize the rematerialization. The obvious one to me is rematerializing everything (like pulling chain of computations, full instructions (with both definitions and arguments)) and in that sense, only the instruction or chain of instructions carry the right level of information. Right now we often introduce pseudo instruction to work around this limitation and that's what I would like to solve.
It's true, rematerializing a whole chain of instructions would require a more thorough approach.
I think it's important to be clear about motivations beforehand though: this patch was done to optimize code size for ARM Thumb. Since spilling and reloading takes only two instructions, rematerializing even chains of two instructions is pointless from that perspective. Obviously, there are other metrics to consider, but I think it would be important nonetheless to establish what exactly is to gain.
For speed, I think you can iterate over just the defs instead of all the operands, they'll be fewer and it will negate the need to do the Reg and Def checks.