Support the "-fzero-call-used-regs" option on AArch64. This involves much less
specialized code than the X86 version. Most of the checks can be done with
TableGen.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AArch64/AArch64FrameLowering.cpp | ||
---|---|---|
752 | Just a drive-by comment: I'm wondering if SVE registers should also be listed here? |
llvm/lib/Target/AArch64/AArch64FrameLowering.cpp | ||
---|---|---|
682 | What happens if -fomit-frame-pointer is specified? Is X29 used as a GPR then? | |
757 | sink this closer to first use, L767 | |
776–778 | so for 32b registers, we clear the whole 64b register? | |
787–792 | isn't it more canonical on ARM to move from the dedicated zero register XZR rather than use an immediate? | |
llvm/lib/Target/AArch64/AArch64RegisterInfo.td | ||
1404 | is i32 correct here? | |
llvm/lib/Target/X86/X86RegisterInfo.cpp | ||
659 | Does this allow us to clean up anything else in the body of this method? Consider making this and the tablegen related patch a distinct child patch. |
llvm/lib/Target/AArch64/AArch64FrameLowering.cpp | ||
---|---|---|
776–778 | Perhaps a more descriptive method name like getWidestRegisterAlias or the like? Perhaps we should simply assert if we get a non GPR rather than return 0, which might actually be a Register? Also, TargetRegisterClass has some notion of sub and super register classes. I wonder if have existing machinery to say, given a register class, what's the equivalent/aliases super register class (if that's even what a super register is). |
llvm/lib/Target/AArch64/AArch64FrameLowering.cpp | ||
---|---|---|
776–778 | That's what's happening with GCC. https://godbolt.org/z/W6b7zxYnK
I'm also using it for the vector registers. And 0 can't be a register. (See include/llvm/MC/Register.h.) Might be able to use the TRC. But I see that X86 has llvm::getX86SubSuperRegisterOrZero in X86MCTargetDesc.cpp which has a large table of registers so that you can get the register of the proper size. | |
787–792 | GCC outputs the immediate move. I'm not familiar though with what's more canonical. | |
llvm/lib/Target/AArch64/AArch64RegisterInfo.td | ||
1404 | That's what FPR32 is defined to be: def FPR32 : RegisterClass<"AArch64", [f32, i32], 32,(sequence "S%u", 0, 31)>; | |
llvm/lib/Target/X86/X86RegisterInfo.cpp | ||
659 | It's possible to simplify here, but it would take more work. I'll address that in a separate patch. |
llvm/lib/Target/AArch64/AArch64FrameLowering.cpp | ||
---|---|---|
752 | I'm not familiar with the SVE registers (I assume you mean the Z# and P# ones). Could you give an example program? |
llvm/lib/Target/AArch64/AArch64FrameLowering.cpp | ||
---|---|---|
752 | SVE is slightly tricker here because the set of registers the caller must preserve depends on the signature of the function. This is described here: https://github.com/ARM-software/abi-aa/blob/8a7b266879c60ca1c76e94ebb279b2dac60ed6a5/aapcs64/aapcs64.rst#613scalable-vector-registers The callee-preserved registers are z8-z23 and p4-p15 if the function is using the VARIANT_PCS, the code for that condition in the asm printer is here: if (MF->getFunction().getCallingConv() == CallingConv::AArch64_VectorCall || MF->getFunction().getCallingConv() == CallingConv::AArch64_SVE_VectorCall || STI->getRegisterInfo()->hasSVEArgsOrReturn(MF)) { Hope that helps a little. |
llvm/lib/Target/AArch64/AArch64RegisterInfo.td | ||
---|---|---|
1398 | Should this feature/attribute work with other calling conventions? If so, then it's probably best not to hard-code these values here, but rather to get them from the chosen calling convention for that particular function. The supported calling conventions are defined in AArch64CallingConvention.td. For example, you could iterate all registers in GPR64/FPR128/ZPR/PPR register classes and zero their values if they are not marked as callee saved. You can query this information from the call by looking at it's callee-saved regmask (see for example CSR_AArch64_AAPCS_RegMask to see how those are defined defined). |
llvm/lib/Target/AArch64/AArch64RegisterInfo.td | ||
---|---|---|
1398 | Ideally it could be retrieved from the *CallingConvention.td files, but in reality it's difficult because those files have a lot of CCIf...<> constructs in them, making a simple query complex. I don't know about the *_RegMask thing. Could you explain what it is and how it works? |
llvm/lib/Target/AArch64/AArch64FrameLowering.cpp | ||
---|---|---|
682 | GCC only clears registers R0-R18 with or without -fomit-frame-pointer. (That's using -fzero-call-used-regs=all, so register usage isn't a consideration.) I assume that it's correct, or at least close to it. | |
752 | Okay, so GCC does clear out SVE registers when using -march=armv8-a+sve: mov z0.h, #0 mov z1.h, #0 mov z2.h, #0 mov z3.h, #0 mov z4.h, #0 mov z5.h, #0 mov z6.h, #0 mov z7.h, #0 mov z16.h, #0 mov z17.h, #0 mov z18.h, #0 mov z19.h, #0 mov z20.h, #0 mov z21.h, #0 mov z22.h, #0 mov z23.h, #0 mov z24.h, #0 mov z25.h, #0 mov z26.h, #0 mov z27.h, #0 mov z28.h, #0 mov z29.h, #0 mov z30.h, #0 mov z31.h, #0 pfalse p0.b pfalse p1.b pfalse p2.b pfalse p3.b pfalse p4.b pfalse p5.b pfalse p6.b pfalse p7.b pfalse p8.b pfalse p9.b pfalse p10.b pfalse p11.b pfalse p12.b pfalse p13.b pfalse p14.b pfalse p15.b |
- Support SVE registers.
- Initial feature to gather argument registers from the *CallingConv.td files.
llvm/test/CodeGen/AArch64/zero-call-used-regs.ll | ||
---|---|---|
234 | Thanks for addressing the SVE case. Please can we have target-features=+sve tests as well? |
llvm/test/CodeGen/AArch64/zero-call-used-regs.ll | ||
---|---|---|
234 |
Yes. :-) I'm working on a few other clean-up things so they'll be coming soon. |
@peterwaller-arm @sdesmalen Could you comment on what is considered the canonical way to zero Arm registers? Is mov x1, #0 the way or mov x1, xzr or some other way?
- Used lists of argument registers generated from AArch64CallingConv.td.
- Add more tests for floating point and SVE.
I still think this would be easier to review if the isArgumentRegister tablegen changes were separated out into a distinct parent patch and then the existing x86 implementation updated to use, then this would rebased on top of as a child patch.
llvm/lib/Target/AArch64/AArch64FrameLowering.cpp | ||
---|---|---|
798 | Reuse HasSVE from L771? | |
llvm/test/CodeGen/AArch64/zero-call-used-regs.ll | ||
3–4 | If you use --check-prefixes=CHECK,<unique> (ie. --check-prefixes=CHECK,DEFAULT and --check-prefixes=CHECK,SVE) then when DEFAULT and SVE match, you can just use CHECK. That should help reduce the number of checks in this test significantly. Otherwise it's hard to tell what's different between the two cases, if anything at all. update_llc_test_checks should work with --check-prefixes IME. | |
260–263 | N00b question about SVE: do we need pfalse for each of the numbered p registers corresponding to the x registers we zeroed? i.e. here we have pfalse for p0-3, yet we zero z0-7. |
llvm/test/CodeGen/AArch64/zero-call-used-regs.ll | ||
---|---|---|
260–263 | No, the set of p registers are independent of the z registers. The calling convention states [0] that the predicate registers p0-p3 may be used for parameter passing (if you have an argument which belongs in a p register), so this looks reasonable. |
I'm referring to the changes to llvm/utils/TableGen/CallingConvEmitter.cpp and llvm/utils/TableGen/RegisterInfoEmitter.cpp. I would like to see those as a parent patch. I'm not sure what you're referring to, and it sounds like a child patch, not a parent patch.
Looks like this commit breaks msvc build: https://lab.llvm.org/buildbot/#/builders/222/builds/532
Hi @void ,
the zero-call-used-regs.ll test gets failed on llvm-clang-x86_64-expensive-checks-ubuntu builder with the following errors:
... *** Bad machine code: Illegal physical register for instruction *** - function: all_arg - basic block: %bb.0 entry (0x555be568bb88) - instruction: $q0 = MOVID 0 - operand 0: $q0 $q0 is not a FPR64 register. ...
see more details here https://lab.llvm.org/buildbot/#/builders/104/builds/7797/steps/6/logs/FAIL__LLVM__zero-call-used-regs_ll
Looks like you need to limit this test with REQURES: aarch64-registered-target or something similar.
The first failed build: https://lab.llvm.org/buildbot/#/builders/104/builds/7797
I think this is actually the verifier highlighting a codegen issue with the patch. Looks like it just got fixed though.
got it. Yes, looks like it fixed. The test got passed during the last build: https://lab.llvm.org/buildbot/#/builders/104/builds/7812
Thank you.
I'm sorry for the failure. I thought I had reverted the offending change, but didn't push it. :-/
Hi @void,
The msvc build is still broken. https://lab.llvm.org/buildbot/#/builders/222/builds/532
What happens if -fomit-frame-pointer is specified? Is X29 used as a GPR then?