Page MenuHomePhabricator

[AArch64] Add support for -fzero-call-used-regs
ClosedPublic

Authored by void on May 3 2022, 2:55 AM.

Details

Summary

Support the "-fzero-call-used-regs" option on AArch64. This involves much less
specialized code than the X86 version. Most of the checks can be done with
TableGen.

Diff Detail

Event Timeline

void created this revision.May 3 2022, 2:55 AM
void requested review of this revision.May 3 2022, 2:56 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMay 3 2022, 2:56 AM
kristof.beyls added inline comments.May 3 2022, 2:59 AM
llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
752

Just a drive-by comment: I'm wondering if SVE registers should also be listed here?

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
682

What happens if -fomit-frame-pointer is specified? Is X29 used as a GPR then?

757

sink this closer to first use, L767

776–778

so for 32b registers, we clear the whole 64b register?

787–792

isn't it more canonical on ARM to move from the dedicated zero register XZR rather than use an immediate?

llvm/lib/Target/AArch64/AArch64RegisterInfo.td
1404

is i32 correct here?

llvm/lib/Target/X86/X86RegisterInfo.cpp
659

Does this allow us to clean up anything else in the body of this method?

Consider making this and the tablegen related patch a distinct child patch.

nickdesaulniers added inline comments.May 3 2022, 1:14 PM
llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
776–778

Perhaps a more descriptive method name like getWidestRegisterAlias or the like? Perhaps we should simply assert if we get a non GPR rather than return 0, which might actually be a Register?

Also, TargetRegisterClass has some notion of sub and super register classes. I wonder if have existing machinery to say, given a register class, what's the equivalent/aliases super register class (if that's even what a super register is).

void added inline comments.May 3 2022, 3:17 PM
llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
776–778

That's what's happening with GCC. https://godbolt.org/z/W6b7zxYnK

Perhaps we should simply assert if we get a non GPR rather than return 0, which might actually be a Register?

I'm also using it for the vector registers. And 0 can't be a register. (See include/llvm/MC/Register.h.)

Might be able to use the TRC. But I see that X86 has llvm::getX86SubSuperRegisterOrZero in X86MCTargetDesc.cpp which has a large table of registers so that you can get the register of the proper size.

787–792

GCC outputs the immediate move. I'm not familiar though with what's more canonical.

llvm/lib/Target/AArch64/AArch64RegisterInfo.td
1404

That's what FPR32 is defined to be:

def FPR32 : RegisterClass<"AArch64", [f32, i32], 32,(sequence "S%u", 0, 31)>;
llvm/lib/Target/X86/X86RegisterInfo.cpp
659

It's possible to simplify here, but it would take more work. I'll address that in a separate patch.

void added inline comments.May 3 2022, 3:23 PM
llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
752

I'm not familiar with the SVE registers (I assume you mean the Z# and P# ones). Could you give an example program?

peterwaller-arm added inline comments.
llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
752

SVE is slightly tricker here because the set of registers the caller must preserve depends on the signature of the function.

This is described here: https://github.com/ARM-software/abi-aa/blob/8a7b266879c60ca1c76e94ebb279b2dac60ed6a5/aapcs64/aapcs64.rst#613scalable-vector-registers

The callee-preserved registers are z8-z23 and p4-p15 if the function is using the VARIANT_PCS, the code for that condition in the asm printer is here:

https://github.com/llvm/llvm-project/blob/78fd413cf736953ac623cabf3d5f84c8219e31f8/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp#L864-L875

if (MF->getFunction().getCallingConv() == CallingConv::AArch64_VectorCall ||
     MF->getFunction().getCallingConv() ==
         CallingConv::AArch64_SVE_VectorCall ||
     STI->getRegisterInfo()->hasSVEArgsOrReturn(MF)) {

Hope that helps a little.

sdesmalen added inline comments.
llvm/lib/Target/AArch64/AArch64RegisterInfo.td
1398

Should this feature/attribute work with other calling conventions? If so, then it's probably best not to hard-code these values here, but rather to get them from the chosen calling convention for that particular function. The supported calling conventions are defined in AArch64CallingConvention.td.

For example, you could iterate all registers in GPR64/FPR128/ZPR/PPR register classes and zero their values if they are not marked as callee saved. You can query this information from the call by looking at it's callee-saved regmask (see for example CSR_AArch64_AAPCS_RegMask to see how those are defined defined).

void added inline comments.May 4 2022, 3:42 PM
llvm/lib/Target/AArch64/AArch64RegisterInfo.td
1398

Ideally it could be retrieved from the *CallingConvention.td files, but in reality it's difficult because those files have a lot of CCIf...<> constructs in them, making a simple query complex.

I don't know about the *_RegMask thing. Could you explain what it is and how it works?

void added inline comments.May 7 2022, 3:37 AM
llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
682

GCC only clears registers R0-R18 with or without -fomit-frame-pointer. (That's using -fzero-call-used-regs=all, so register usage isn't a consideration.) I assume that it's correct, or at least close to it.

752

Okay, so GCC does clear out SVE registers when using -march=armv8-a+sve:

mov     z0.h, #0
mov     z1.h, #0
mov     z2.h, #0
mov     z3.h, #0
mov     z4.h, #0
mov     z5.h, #0
mov     z6.h, #0
mov     z7.h, #0
mov     z16.h, #0
mov     z17.h, #0
mov     z18.h, #0
mov     z19.h, #0
mov     z20.h, #0
mov     z21.h, #0
mov     z22.h, #0
mov     z23.h, #0
mov     z24.h, #0
mov     z25.h, #0
mov     z26.h, #0
mov     z27.h, #0
mov     z28.h, #0
mov     z29.h, #0
mov     z30.h, #0
mov     z31.h, #0
pfalse  p0.b
pfalse  p1.b
pfalse  p2.b
pfalse  p3.b
pfalse  p4.b
pfalse  p5.b
pfalse  p6.b
pfalse  p7.b
pfalse  p8.b
pfalse  p9.b
pfalse  p10.b
pfalse  p11.b
pfalse  p12.b
pfalse  p13.b
pfalse  p14.b
pfalse  p15.b
void updated this revision to Diff 427909.May 8 2022, 12:58 AM
  • Support SVE registers.
  • Initial feature to gather argument registers from the *CallingConv.td files.
peterwaller-arm added inline comments.May 9 2022, 1:29 AM
llvm/test/CodeGen/AArch64/zero-call-used-regs.ll
234

Thanks for addressing the SVE case. Please can we have target-features=+sve tests as well?

void added inline comments.May 9 2022, 12:28 PM
llvm/test/CodeGen/AArch64/zero-call-used-regs.ll
234

Thanks for addressing the SVE case. Please can we have target-features=+sve tests as well?

Yes. :-) I'm working on a few other clean-up things so they'll be coming soon.

void added a comment.May 9 2022, 12:33 PM

@peterwaller-arm @sdesmalen Could you comment on what is considered the canonical way to zero Arm registers? Is mov x1, #0 the way or mov x1, xzr or some other way?

void updated this revision to Diff 428218.May 9 2022, 3:23 PM
  • Used lists of argument registers generated from AArch64CallingConv.td.
  • Add more tests for floating point and SVE.

I still think this would be easier to review if the isArgumentRegister tablegen changes were separated out into a distinct parent patch and then the existing x86 implementation updated to use, then this would rebased on top of as a child patch.

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
798

Reuse HasSVE from L771?

llvm/test/CodeGen/AArch64/zero-call-used-regs.ll
3–4

If you use --check-prefixes=CHECK,<unique> (ie. --check-prefixes=CHECK,DEFAULT and --check-prefixes=CHECK,SVE) then when DEFAULT and SVE match, you can just use CHECK.

That should help reduce the number of checks in this test significantly. Otherwise it's hard to tell what's different between the two cases, if anything at all.

update_llc_test_checks should work with --check-prefixes IME.

260–263

N00b question about SVE: do we need pfalse for each of the numbered p registers corresponding to the x registers we zeroed? i.e. here we have pfalse for p0-3, yet we zero z0-7.

llvm/test/CodeGen/AArch64/zero-call-used-regs.ll
260–263

No, the set of p registers are independent of the z registers.

The calling convention states [0] that the predicate registers p0-p3 may be used for parameter passing (if you have an argument which belongs in a p register), so this looks reasonable.

[0] https://github.com/ARM-software/abi-aa/blob/8a7b266879c60ca1c76e94ebb279b2dac60ed6a5/aapcs64/aapcs64.rst#scalable-predicate-registers

void updated this revision to Diff 428764.May 11 2022, 1:45 PM
void marked 4 inline comments as done.

Fix think-o use of HasSVE. Use --check-prefixes in the testcase.

void added a comment.May 11 2022, 1:45 PM

I'll split off the TableGen changes into a separate patch. It will supersede those changes here, so it shouldn't delay other reviews here. :-)

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
798

Doh! Changed.

llvm/test/CodeGen/AArch64/zero-call-used-regs.ll
3–4

What hath God wrought?!

Done.

I'll split off the TableGen changes into a separate patch. It will supersede those changes here, so it shouldn't delay other reviews here. :-)

I'm referring to the changes to llvm/utils/TableGen/CallingConvEmitter.cpp and llvm/utils/TableGen/RegisterInfoEmitter.cpp. I would like to see those as a parent patch. I'm not sure what you're referring to, and it sounds like a child patch, not a parent patch.

void added a comment.May 11 2022, 2:14 PM

I'll split off the TableGen changes into a separate patch. It will supersede those changes here, so it shouldn't delay other reviews here. :-)

I'm referring to the changes to llvm/utils/TableGen/CallingConvEmitter.cpp and llvm/utils/TableGen/RegisterInfoEmitter.cpp. I would like to see those as a parent patch. I'm not sure what you're referring to, and it sounds like a child patch, not a parent patch.

That's what I meant. And I need those TableGen changes for this patch.

void added a comment.May 19 2022, 12:13 PM

Friendly ping.

nickdesaulniers accepted this revision.May 19 2022, 12:19 PM

Don't forget to rebase this on top of https://reviews.llvm.org/D125421.

This revision is now accepted and ready to land.May 19 2022, 12:19 PM
MaskRay accepted this revision.May 19 2022, 12:47 PM
void updated this revision to Diff 430858.May 19 2022, 4:56 PM

Last update.

This revision was landed with ongoing or failed builds.May 19 2022, 4:58 PM
This revision was automatically updated to reflect the committed changes.

Hi @void ,

the zero-call-used-regs.ll test gets failed on llvm-clang-x86_64-expensive-checks-ubuntu builder with the following errors:

...
*** Bad machine code: Illegal physical register for instruction ***
- function:    all_arg
- basic block: %bb.0 entry (0x555be568bb88)
- instruction: $q0 = MOVID 0
- operand 0:   $q0
$q0 is not a FPR64 register.
...

see more details here https://lab.llvm.org/buildbot/#/builders/104/builds/7797/steps/6/logs/FAIL__LLVM__zero-call-used-regs_ll

Looks like you need to limit this test with REQURES: aarch64-registered-target or something similar.

The first failed build: https://lab.llvm.org/buildbot/#/builders/104/builds/7797

fhahn added a subscriber: fhahn.May 20 2022, 2:11 PM

Hi @void ,

the zero-call-used-regs.ll test gets failed on llvm-clang-x86_64-expensive-checks-ubuntu builder with the following errors:

...
*** Bad machine code: Illegal physical register for instruction ***
- function:    all_arg
- basic block: %bb.0 entry (0x555be568bb88)
- instruction: $q0 = MOVID 0
- operand 0:   $q0
$q0 is not a FPR64 register.
...

I think this is actually the verifier highlighting a codegen issue with the patch. Looks like it just got fixed though.

got it. Yes, looks like it fixed. The test got passed during the last build: https://lab.llvm.org/buildbot/#/builders/104/builds/7812
Thank you.

void added a comment.May 20 2022, 2:22 PM

got it. Yes, looks like it fixed. The test got passed during the last build: https://lab.llvm.org/buildbot/#/builders/104/builds/7812
Thank you.

I'm sorry for the failure. I thought I had reverted the offending change, but didn't push it. :-/

Amir added a comment.May 21 2022, 2:39 PM

Hi @void,

The msvc build is still broken. https://lab.llvm.org/buildbot/#/builders/222/builds/532

got it. Yes, looks like it fixed. The test got passed during the last build: https://lab.llvm.org/buildbot/#/builders/104/builds/7812
Thank you.

I'm sorry for the failure. I thought I had reverted the offending change, but didn't push it. :-/

Allen added a subscriber: Allen.May 21 2022, 10:10 PM